On this note, what is people's favorite way of scaling out Postgres? I'm told Slony + master/slaves configs are good for scaling out read-heavy situations, but it seems that there are a ton of alternatives on the market.
Is there a comprehensive guide on the various options and tradeoffs for this kind of things? Good resources one could use to learn more about it?
I recommend looking at streaming rep + hot stabdby first. Those are built in, officially supported by the project, lower administration effort and fewer pitfalls.
Ugh, slony is awful. You use Slony when you have to do a live migration from one version to another (say 9.3 to 9.4) without any downtime. WAL replication is the way to go most of the time. I've used Londiste, it's better than Slony if you need partial replication, but still messy to use compared to WAL.
Personally I haven't used any of the others (i.e. Bucardo).
WAL replication is currently the official and best supported mechanism for replication. If WAL replication does not suite your particular use case, then that is when to start looking at Slony, Bucardo, etc.
1) Where are you physically located? As I would want to move my API servers very close to the database for performance. Knowing which city and datacenter would help.
2) Please fix the logo in to the top left of your blog to go to your main website. It's a UI fail when logos do not go to the main website even when the blog is on a sub-domain.
Had to look around for their DC location as well - it's hidden in the footer. Seems they're available on AWS, DigitalOcean and SoftLayer, although Postgres is only available on AWS (us-east + eu-west) if I'm reading it correctly.
It would be great if you had more locations that favour non-Amazon customers though I understand why you would prioritise that first.
I mostly use Linode https://www.linode.com/speedtest and the locations of AWS are generally good for Amazon but bad for other services. For example in London nearly all smaller hosting providers, as well as major peering connectivity, is based around the LINX locations such as Telehouse.
Latency to the database is key, but without moving to AWS (which I wouldn't want to do for price and performance reasons) I couldn't achieve a low enough latency here to consider it.
Another thing... "sign-up for free" quickly followed by "enter payment method". Which is it? I wanted to sign up, in part just to receive future notifications and also to get a sense as to the qualitative feel of your dashboard tools. For reasons outlined above, I'm not going to buy a service today.
Great to see but I'd like some real numbers on the performance, in my experience AWS' poor storage performance has always been a show stopper for large database especially when trying to scale them if integrity is crucial to your platform.
We don't use AWS's storage — at least not EBS. We run on our own hardware and on the i2 instances on AWS that have high performance ephemeral SSD arrays. Our benchmarks of the i2s show their IO performance is quite good, as you'd expect of local SSDs.
It's not all ephemeral, the physical servers we run are old-school redundant (RAID-10, two power supplies, etc). Even ephemeral is persistent — it's just something that'll go away if you migrate your instance to new host hardware.
Deployments are on two servers and the write ahead log is streamed to both the slave and offsite secondary storage. Replication is async so there is potentially a small window of data loss if a whole server goes. We are considering letting people opt in to synchronous replication, but don't have it available yet.
if you aren't going to tell me how the scaling is implemented, how it impacts ACID, and describe your edge cases well, then It's going to be hard to trust the solution.
We don't have enough content about this yet, but we run a very vanilla Postgres setup on big, beefy servers and scale resources vertically, similar to how we start with MongoDB: https://blog.compose.io/how-we-scale-mongodb/
Scaling Postgres horizontally is not something most of our customers need right now. When we do release scale-out, it'll be obvious to customers how we do it — we might just make something from our good buddies at Citus Data available, for instance.
First of all, this is great news - I use compose for Mongo and have been waiting for a PostgreSQL option. However, is the ram allotment similar to that given in the Mongo deployments - 1/10th storage? So, about $125/month for 1GB ram and 10GB storage. Making the obvious comparison to Heroku (which, granted, doesn't offer the autoscaling feature), Compose looks quite expensive. At a glance, it seems that on Heroku one gets the same amount of ram and 6 times the storage for less than half the cost of Compose.
These are all on enterprise grade SSDs. At 10GB of storage, you likely have more raw IO capacity than you'd get on an EC2 instance with EBS. Right now, it's built for high performance Postgres users, not databases with a whole bunch of cold data and a minimal hot data set.
Ah I did not realize that it's not on EBS (aws announced SSD EBS last June so I thought perhaps you guys jumped on that opportunity). What's the max # of connections? Does it scale with storage as well?
You might find the HikariCP guidance on connection pools with SSD interesting. Less disk waiting can mean a much smaller pool is optimal, to avoid unnecessary context switching, among other things.
We're tuning the connection limits during beta, but it's not something we typically scale. We set that as high as we can without hurting stability, and then let people shoot themselves in the foot if they want. :)
If you use it and run into connection limit issues, let us know and we'll gladly up them for you.
I sense you've never worked in a suit & tie enterprise...
If you have backups, upgrade, access to monitoring tools, etc. 125$ / month is just 2 hours max of developer time.
That's a bargain and your standard Oracle admin should just feel threatened :-) (provided the company is willing to put its data on a server far away of its control room, which, I guess, doesn't happen so often :-))
so a bargain if you actually can make use of it...
It happens more than you would think. Big companies tend to have two classes of DB — the big special DB everyone uses, and then utility DBs (frequently hundreds or thousands) that devs use for smaller scale projects. They're usually pretty happy letting the tangential apps run elsewhere.
It's probably more accurate to say that people who buy Oracle don't read HN. Developers that use Oracle might read HN, and these are the people that are driving the future of databases. There's a reason most new DBs have an open source business model. :)
People who buy Oracle are likely reading management publications, and possibly Gigaom. They're not reading technical discussions of databases.
For the same price you can get 4-8GB RAM and 10x the storage on RDS, AWS's managed relational database solution. However, Compose is going for a different performance profile, not the typical CRUD app.
And DBA services, disaster recovery, redundancy, tools, etc. Sometimes it makes more sense to let developers run their own databases, or hire dedicated DB operations staff, but usually it doesn't. :D
PostgreSQL wants to be web scale but it will never touch MongoDB scale. MongoDB is true web scale with full cloud compatibility and horizontal scaling like how clouds spread out in the real world. MongoDB mimics physics because physics is green technology. MongoDB is truly efficient with zero carbon footprint unlike PostgreSQL which is like diesel exhaust clogging up your network pipes when you try to shard upwards and outwards into the virtual scalable cloud atmosphere. PostgreSQL chokes your environment and doesn't support 10gen's new invention the MAPREDUCE which is the successor to outdated SQL. If you use PostgresSQL with auto-scale your will never support big data but MongoDB can scale up to even 100GB of big rich object data without relations so the data truly represents your client's needs.
Correct. Storage is just the easiest way to define a service that's sold as a usage based utility. The price includes all the hardware resources, our support staff, DBA tools, etc.
I upvoted in the hope that I'll find out that "auto-scaling" means it automatically scales to large numbers of reads and/or writes, which would amount to one awesome service. Since pricing is based on storage, I'm inclined to think it only auto-scales to high storage needs. I'm hoping to be wrong. The part where they say release 9.4 brought them within reach technically of what they wanted to do gives me some hope that I'm wrong.
It automatically scales all resources based on data size. This means increasing IOPs, RAM allotments, and CPU capacity on the fly as the data grows. At 1TB of data, for instance, the DB would have access to 100GB of RAM, about 60,000 random IOPs, and 12 full CPU cores.
Is there a comprehensive guide on the various options and tradeoffs for this kind of things? Good resources one could use to learn more about it?