Memory usage is an interesting problem, because the failure mode is so much more painful (crashes and complete lock-ups) than high CPU usage.
And even if you return memory to the OS, that doesn't actually solve too-high memory usage.
Some things you can do:
The article mentions `__slots__` for reducing object memory use, and other approaches include just having fewer objects: for example, a dict of lists uses far less memory than a list of dicts with repeating fields. And you can also in many cases switch to a dataframe with Pandas, saving even more memory (https://pythonspeed.com/articles/python-object-memory/ covers all of those).
There's a great talk from a company that had some Netflow SaaS product, that I can't find right now.
They were suffering from GC pauses etc. on their ingestion hosts. They spent ages experimenting with various ways of tweaking garbage collection in Java, using different GCs, tweaking settings, etc. They even spent time experimenting with manual memory allocation, but found that to be extremely painful and somewhat fragile.
In the end they found that all they really needed to do was just produce less garbage, which was their ultimate "well duh!" revelation. They spent time looking at what was actually producing the garbage, and how they could avoid it. Got rid of a lot of standard coding patterns from their code in favour of new patterns that reduced object allocation, and away went all their problems.
DataClassFrames also takes the approach of storing lists of dataclasses as a “dataclass of arrays” - know as data orientated design:
https://gitHub.com/joshlk/dataclassframe
FWIW your comment caught my interest as a longstanding user of data frames and Python [1].
But I found the README quite confusing. I guess it's a new project so that's understandable.
Why use DataClassFrames and not pandas? Because it's statically typed? The comparison table puts pandas and dataclassframes on equal footing, and the rest of the README doesn't make much sense to me.
Disabling swap in Linux has helped me in better handling of memory in low memory environments, especially lock-ups even when the code is memory optimised.
That's nothing. Armin Ronacher was complaining somewhere about the performance hit from static typing in python and i never though about that before. It'd be best if all of that was removed in prod build along with unused imports, kind of what TypeScript does.
There used to be a performance hit because the annotation objects where all instantiated and attached to the function objects. This is no longer the case - they are only instantiated when inspected.
Very interesting:
Should I still use nohang in addition to prelockd and memavaild?
I mean prelockd could/should trigger the OOM earlier but nohang would still catch it earlier thanks to PSI?
What about zram and zswap advantages?
BTW you're helping making the world to be a better place but only nerds download those tools. It would help even more if you could lobby such tools as default in distros such as manjaro, arch, Ubuntu, fedora, etc
There's... a lot of hand waving in this article and no numbers.
I'd be pretty surprised if `malloc_trim` had a significant effect on cpython memory usage as most python memory gets allocated in 256KiB "arenas", which, what with fragmentation, are unlikely to _ever_ be reclaimed.
On the other hand, the article dismisses threads with some vagueries around the GIL, suggesting people need to reach straight for processes if they're serious. Really, unless your code has almost no I/O or C-accelerated, GIL-less sections, if you're not using both threads and processes, you're just burning memory unnecessarily.
(edit: oh and then there's async but I'm a bit old-fashioned for that)
One thing that has been known for a while is that there are allocation patterns where LibC malloc doesn't give back a sufficient amount of pages even when the pages are clean.
See for example https://www.joyfulbikeshedding.com/blog/2019-03-14-what-caus... . In the end I think the consensus was that jemalloc was just better than invocing malloc_trim, but invoking malloc_trim now and then can certainly be a lot better than using neither malloc_trim or jemalloc.
The idea is to tweak when `mmap` or `malloc` are used by the Python interpreter. One allows memory to be released to the OS right away, whereas the other is not.
It is a useful trick if your application is generating lots of small objects.
It doesn't work, that's their point. With modern python versions those env variables do next to nothing. It won't crash your python but it also won't help you.
Previously we used two per-worker thresholds to control respawn: reload-on-rss and evil-reload-on-rss.
...
However, since worker respawn is expensive (i.e. there are warm-up costs, LRU cache repopulation, etc..)
For a smaller scale, I always wondered if FastCGI being the "default" would have saved a lot of headaches. Your workers just get recycled automatically all the time.
If you can make your startup fast enough (and I think most apps can), then you can just let the OS do its job. Although it's true that Python can be really slow to start if you import many modules...
If the app is long running and forks, you may also want to look into gc.freeze() (added on 3.7) which will save you from copying the imports memory over time and make the GC shorter.
And even if you return memory to the OS, that doesn't actually solve too-high memory usage.
Some things you can do:
The article mentions `__slots__` for reducing object memory use, and other approaches include just having fewer objects: for example, a dict of lists uses far less memory than a list of dicts with repeating fields. And you can also in many cases switch to a dataframe with Pandas, saving even more memory (https://pythonspeed.com/articles/python-object-memory/ covers all of those).
For numeric data, a NumPy array gets rid of the per-integer overhead for Python, so a Python list of numbers use way more memory than an equivalent NumPy array (https://pythonspeed.com/articles/python-integers-memory/).