Hacker Newsnew | past | comments | ask | show | jobs | submit | ants_a's commentslogin

This is not a big database usecase. It just needs one to not do silly things like opening a new database session for every query when it's well documented that this is expensive.

Ha, .net does that automatically unless you really want not to - connection pooling I mean.

pgbouncer added support for prepared statements a couple years back.

Over here we use a PKI cert for that. A smartcard providing the root of that trust is provided by the government after verifying your identity using the typical stuff used for identity documents (any biometric data on file, birth certificate, etc.). That still doesn't mean that it's impossible to steal an identity, or acquire a made up one, but it does make it a whole lot harder.

The thing about social security is that it was supposed to be used for a fairly narrow system, and the physical cards even have text like "not to be used as identification" on them. And then we used it for that anyway

The German equivalent to the SSN in it's ubiquity, the "federal tax id", is illegal to use for non-tax purposes.

As a German that feels about correct.


SSN is technically the same. The Social Security Act, actually has that point explicitly called out. Did anyone listen? Nope.

Do they have penalties in the 5+ digits for each such offense?

Wouldn't matter. No one is interested in enforcing it, and there is too much value in the datapoint to credit rating agencies to tear the entire edifice down. Hell, back in 2011, I was part of a group migrating away from SSN usage at the Federal level. The biggest delay? Waiting for another semantically compatible I'd to manifest. TIN (taxpayer identification number) could be synthetically combined with a couple other ID's in the dataset such that they could finally retire the SSN's we weren't supposed to be using in the first place.

What you are describing here does not match how postgres works. A read on the replica does not generate temporary tables, nor can anything on the replica create locks on the primary. The only two things a replica can do is hold back transcation log removal and vacuum cleanup horizon. I think you may have misdiagnosed your problem.


There is no backpressure from replication and streaming replication is asynchronous by default. Replicas can ask the primary to hold back garbage collection (off by default), which will eventually cause a slow down, but not blocking. Lagging replicas can also ask the primary to hold onto WAL needed to catch up (again, off by default), which will eventually cause disk to fill up, which I guess is blocking if you squint hard enough. Both will take considerable amount of time and are easily averted by monitoring and kicking out unhealthy replicas.


These are limitations in the current PostgreSQL implementation. It's quite possible to have consistent commits and snapshots across sharded databases. Hopefully some day in PostgreSQL too.


Postgres setups are typically based on physical replication, which is not an option on MySQL. My testing shows the limit to be about 177k tps with each transaction consisting of 3 updates and 1 insert.


Be careful. During consulting I ran into similar magnitude of writes for a mostly CRUD workload.

They had huge problems with VACUUM at high tps. Basically the database never had space to breath and cleanup.


For updating a single resource where the order of updates matters the best throughput one can hope for is the inverse of locking duration. Typical postgres using applications follow the pattern where a transaction involves multiple round trips between the application and the database to make decisions in the code running on the application server.

But this pattern is not required by PostgreSQL, it's possible to run arbitrarily complex transactions all on server side using more complex query patterns and/or stored procedures. In this case the locking time will be mainly determined by time-to-durability. Which, depending on infrastructure specifics, might be one or two orders of magnitude faster. Or in case of fast networks and slow disks, it might not have a huge effect.

One can also use batching in PostgreSQL to update the resource multiple times for each durability cycle. This will require some extra care from application writer to avoid getting totally bogged down by deadlocks/serializability conflicts.

What will absolutely kill you on PostgreSQL is high contention and repeatable read and higher isolation levels. PostgreSQL handles update conflicts with optimistic concurrency control, and high contention totally invalidates all of that optimism. So you need to be clever enough to achieve necessary correctness guarantees with read committed and the funky semantics it has for update visibility. Or use some external locking to get rid of contention in the database. The option for pessimistic locking would be very helpful for these workloads.

What would also help is a different kind of optimism, that would remove durability requirement from lock hold time, which would then result in readers having to wait for durability. Postgres can do tens of thousands of contended updates per second with this model. See the Eventual Durability paper for details.


I'm wondering if it would make sense to integrate the rim, motor and wheel bearing into a single assembly to save weight and cost. That combined with the weight and packaging benefits of not having half shafts and differentials might make it worth it. Plus there can be additional benefits, like the extra maneuverability that ZF Easy Turn and Hyundai's e-Corner have demonstrated.

30kW sustained/60 kW per wheel peak power is easily enough even for large passenger vehicles. Sustained could take 3 ton vehicle up a 10% grade at 120 km/h.


Things like that do exist though. There's an expensive Renault with motors like this, there's also the MW Motors Luka.

MW Motors eventually made a version where the electric motors were moved from the wheel hubs to a more conventional arrangement, so presumably they felt that it was some sort of problem, but they still make the original version and I've never been in one, so I can't be sure.


In that snippet are links to Postgres docs and two blog posts, one being the blog post under discussion. None of those contain the information needed to make the presented claims about throughput.

To make those claims it's necessary to know what work is being done while the lock is held. This includes a bunch of various resource cleanup, which should be cheap, and RecordTransactionCommit() which will grab a lock to insert a WAL record, wait for it to get flushed to disk and potentially also for it to get acknowledged by a synchronous replica. So the expected throughput is somewhere between hundreds and tens of thousands of notifies per second. But as far as I can tell this conclusion is only available from PostgreSQL source code and some assumptions about typical storage and network performance.


> In that snippet are links to Postgres docs and two blog posts

Yes, that's what a snippet generally is. The generated document from my very basic research prompt is over 300k in length. There are also sources from the official mailing lists, graphile, and various community discussions.

I'm not going to post the entire outout because it is completely beside the point. In my original post, I expliclity asked "What is the qualitative and quantitative nature of relevant workloads?" exactly because it's not clear from the blog post. If, for example, they only started hitting these issues with 10k simultaneous reads/writes, then it's reasonable to assume that many people who don't have such high work loads won't really care.

The ChatGPT snippet was included to show that that's what ChatGPT research told me. Nothing more. I basically typed a 2-line prompt and asked it to include the original article. Anyone who thinks that what I posted is authoritative in any way shouldn't be considering doing this type of work.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: