I'm writing a transpiler that uses global information from codebases, and so it transpiles potentially hundreds of files at once and creates rather complex data structures. Compute bound for quite a while, so I tried speeding it up with multiprocessing (since multithreading would be useless). But with multiprocessing it took longer to serialize/deserialize the complex datastructures for each process, so I had to give up. Next time I have time for this I'd probably try to use Jython as a drop-in replacement and see whether I can get it to run with GIL-less multithreading.
It sounds like you have a couple of hot paths and are not optimizing them. I can't tell for sure without seeing any code but nothing in your post screams out "this will be slow" or "I need parallism/concurrency". Perhaps it's the data structures you are using?
I already did extensive profiling and performance improvements, at this point I'm quite sure that if I could do multithreading on my lab's 24 core Xeon Haswell machines I'd be getting a nice speedup.
I'm writing a transpiler that uses global information from codebases, and so it transpiles potentially hundreds of files at once and creates rather complex data structures. Compute bound for quite a while, so I tried speeding it up with multiprocessing (since multithreading would be useless). But with multiprocessing it took longer to serialize/deserialize the complex datastructures for each process, so I had to give up. Next time I have time for this I'd probably try to use Jython as a drop-in replacement and see whether I can get it to run with GIL-less multithreading.