This is a false dichotomy. You'll run into all sorts of pain and frustration as soon as Spark touches your codebase, no matter what kind of Scala you're writing.
I've almost every big data processing system out there and Spark causes no more pain than any other. Less than most. It's also a data processing system not a library so if you're integrating it into existing code (rather than writing code for it and running code on top of it) I'd argue you're doing it wrong.
One big reason, maybe the biggest, to write Spark jobs in Scala and (not move to pyspark) is code reuse between different components. My team maintains a fairly sizeable library that is common to our Spark jobs and several web services. Spark is a library dependency if you do anything remotely complex with it and it can easily creep up everywhere if you aren't careful.
Decoupling modules isn't always obvious in a codebase that's grown organically for 6-7 years now (long before I joined) and the cohabitation with Spark is inevitably going to cause some pains. A couple examples:
- Any library that depends on Jackson is likely to cause binary compatibility issues due to the ancient versions shipped with Spark. Guava can be a problem too. Soon enough you'll need to shade a bunch of libraries in your Spark assembly.
- We have a custom sparse matrix implementation that fits our domain well, it was completely broken by the new collections in Scala 2.13. It makes cross-publishing complicated if I don't want to be stuck to Scala 2.12 because of Spark.
A lot of the hardcore Scala community hates Spark which sort of summarizes the situation.