This is pretty basic PostgreSQL tuning (1). An oft-cited problem with PostgreSQL is that it uses a process-per-connection (session in PG lingo, I think). `work_mem` is the amount of memory a connection has to, you know, do work. Sorting, distinct, some joins.
Number of Connections * work_mem is usually going to eat up the biggest chunk of your PostgreSQL's memory. AFAIK, the configuration is extremely coarse. There's no way for example to say: I want all the connections from username "myapp" to have 10MB and those from user "reporting" to have 100MB. And it can't be adjusted on the fly, per query.
Being able to set aside 200GB of memory to be used, as needed, by all connections (maybe with _some_ limits to prevent an accident), would solve a lot of problems (and introduce a bunch of them, I know).
Since they're on RDS, I can't help but point out that I've seen DB queries on baremetal operate orders of magnitude faster. 10 minute to 1 second type thing. I couldn't help but wonder if they'd even notice the disk-based sorting on a proper (yes, I said it) setup.
I worked on a MySQL-on-RDS setup once where the company was splurging (and being charged through the nose, ofc) on a backing EBS volume with 20k IOPS. Meanwhile nobody had bothered to update the default settings, which limited all background work to use no more than 100 IOPS. Needless to say, this DB had severe performance problems.
Managed databases in the cloud are a bit like ORMs: they significantly reduce the amount of knowledge needed 99% of the time. Sadly, when the 1% of the time hits this also means that nobody on the team has built up the DBA skills required for the big hairy problem. I see this pattern repeated all the time when consulting for startups that just had their first real traction and now the magic database box is no longer working as before.
I think DB tuning seems like a perfect business opportunity for specialist consulting. Doing a really good job requires deep expertise but then it only needs to be repeated occasionally (if the operational profile or the data size changes significantly).
But I don't see this model much. I mostly hear of companies struggling to develop internal expertise or going without.
I did this for years. I was a server optimisation consultant. I'd always advise my clients that increasing the specs of the hardware would probably be cheaper in the short term than hiring me to go through the various configurations with a fine toothed comb. I also promised a fixed price report on what needed to be done, as well as a fixed price quote on doing the actual work. I found that this level of transparency and honesty never failed, and the work was always approved. I'd also offer training for the internal team.
I've pretty much completely changed careers now, but I found that this model did work well. It involved a lot of constant research and occasionally a huge amount of stress, but I loved it!
It does exist (I'm one of them), and it's much more common on expensive databases like Microsoft SQL Server and Oracle. With free databases, it's relatively inexpensive to throw more hardware at the problem. When licensing costs $2,000 USD per CPU core or higher, there's much more incentive for companies to bring an expert in quickly to get hardware costs under control.
Quite so. Of course, in most countries all drivers will have had mandatory driving lessons before they got their license and the instructor will have told them about the handbrake. No such license requirement exists for databases and it shows.
> Sadly, when the 1% of the time hits this also means that nobody on the team has built up the DBA skills required for the big hairy problem. I see this pattern repeated all the time when consulting for startups that just had their first real traction and now the magic database box is no longer working as before.
I call this the Kubernetes effect. 99% of the time, people think it's easy. Then the 1% hits and they have to completely rebuild it from the ground up. For us it was cluster certificate expiration and etcd corruption, and some other problems we couldn't ever pin down.
The other effect (which I'm sure has a name) is when they depend on it for a long time, but then a problem starts to occur, and because they can't figure out a way to solve it, they move to a completely different tech ("We had problems with updating our shards so we moved to a different database") and encounter a whole new problem and repeat the cycle.
Number of Connections * work_mem is usually going to eat up the biggest chunk of your PostgreSQL's memory. AFAIK, the configuration is extremely coarse. There's no way for example to say: I want all the connections from username "myapp" to have 10MB and those from user "reporting" to have 100MB. And it can't be adjusted on the fly, per query.
Being able to set aside 200GB of memory to be used, as needed, by all connections (maybe with _some_ limits to prevent an accident), would solve a lot of problems (and introduce a bunch of them, I know).
Since they're on RDS, I can't help but point out that I've seen DB queries on baremetal operate orders of magnitude faster. 10 minute to 1 second type thing. I couldn't help but wonder if they'd even notice the disk-based sorting on a proper (yes, I said it) setup.
(1) https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Serv...