Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Now available: auto-scaling PostgreSQL deployments (compose.io)
139 points by mrkurt on Jan 5, 2015 | hide | past | favorite | 46 comments


On this note, what is people's favorite way of scaling out Postgres? I'm told Slony + master/slaves configs are good for scaling out read-heavy situations, but it seems that there are a ton of alternatives on the market.

Is there a comprehensive guide on the various options and tradeoffs for this kind of things? Good resources one could use to learn more about it?


The official PostgreSQL Documentation gives a good start, as well as explaining the many considerations involved in scaling.

http://www.postgresql.org/docs/9.4/interactive/different-rep...


http://www.postgresql.org/docs/9.4/interactive/high-availabi...

I recommend looking at streaming rep + hot stabdby first. Those are built in, officially supported by the project, lower administration effort and fewer pitfalls.


Ugh, slony is awful. You use Slony when you have to do a live migration from one version to another (say 9.3 to 9.4) without any downtime. WAL replication is the way to go most of the time. I've used Londiste, it's better than Slony if you need partial replication, but still messy to use compared to WAL.

Personally I haven't used any of the others (i.e. Bucardo).


Would you be so kind as to provide context on why WAL replication is superior to logical replication? I'm very interested in learning more about it.

Also, what specifically did you hate about Slony?

Any resources you'd recommend on this?


WAL replication is currently the official and best supported mechanism for replication. If WAL replication does not suite your particular use case, then that is when to start looking at Slony, Bucardo, etc.


This is great, but two things:

1) Where are you physically located? As I would want to move my API servers very close to the database for performance. Knowing which city and datacenter would help.

2) Please fix the logo in to the top left of your blog to go to your main website. It's a UI fail when logos do not go to the main website even when the blog is on a sub-domain.


Had to look around for their DC location as well - it's hidden in the footer. Seems they're available on AWS, DigitalOcean and SoftLayer, although Postgres is only available on AWS (us-east + eu-west) if I'm reading it correctly.


Postgres is currently in Ashburn VA (near AWS us-east-1) and Dublin, Ireland (eu-west-1).

Good catch on the logo, that irks me too.


It would be great if you had more locations that favour non-Amazon customers though I understand why you would prioritise that first.

I mostly use Linode https://www.linode.com/speedtest and the locations of AWS are generally good for Amazon but bad for other services. For example in London nearly all smaller hosting providers, as well as major peering connectivity, is based around the LINX locations such as Telehouse.

Latency to the database is key, but without moving to AWS (which I wouldn't want to do for price and performance reasons) I couldn't achieve a low enough latency here to consider it.

Another thing... "sign-up for free" quickly followed by "enter payment method". Which is it? I wanted to sign up, in part just to receive future notifications and also to get a sense as to the qualitative feel of your dashboard tools. For reasons outlined above, I'm not going to buy a service today.


Great to see but I'd like some real numbers on the performance, in my experience AWS' poor storage performance has always been a show stopper for large database especially when trying to scale them if integrity is crucial to your platform.


We don't use AWS's storage — at least not EBS. We run on our own hardware and on the i2 instances on AWS that have high performance ephemeral SSD arrays. Our benchmarks of the i2s show their IO performance is quite good, as you'd expect of local SSDs.


How do you get persistence if all your storage is ephemeral?


It's not all ephemeral, the physical servers we run are old-school redundant (RAID-10, two power supplies, etc). Even ephemeral is persistent — it's just something that'll go away if you migrate your instance to new host hardware.

Deployments are on two servers and the write ahead log is streamed to both the slave and offsite secondary storage. Replication is async so there is potentially a small window of data loss if a whole server goes. We are considering letting people opt in to synchronous replication, but don't have it available yet.


Ephemeral storage is just that - ephemeral. Amazon can decide to retire your instance at any time.

So your persistence plan is RAID 10 it sounds like.


Not many PostgreSQL DBaaS offerings out there. This is great news.


Why should I choose this over Amazon's RDS?


if you aren't going to tell me how the scaling is implemented, how it impacts ACID, and describe your edge cases well, then It's going to be hard to trust the solution.


We don't have enough content about this yet, but we run a very vanilla Postgres setup on big, beefy servers and scale resources vertically, similar to how we start with MongoDB: https://blog.compose.io/how-we-scale-mongodb/

Scaling Postgres horizontally is not something most of our customers need right now. When we do release scale-out, it'll be obvious to customers how we do it — we might just make something from our good buddies at Citus Data available, for instance.


Got it. thanks for the clarification.


First of all, this is great news - I use compose for Mongo and have been waiting for a PostgreSQL option. However, is the ram allotment similar to that given in the Mongo deployments - 1/10th storage? So, about $125/month for 1GB ram and 10GB storage. Making the obvious comparison to Heroku (which, granted, doesn't offer the autoscaling feature), Compose looks quite expensive. At a glance, it seems that on Heroku one gets the same amount of ram and 6 times the storage for less than half the cost of Compose.


These are all on enterprise grade SSDs. At 10GB of storage, you likely have more raw IO capacity than you'd get on an EC2 instance with EBS. Right now, it's built for high performance Postgres users, not databases with a whole bunch of cold data and a minimal hot data set.


Ah I did not realize that it's not on EBS (aws announced SSD EBS last June so I thought perhaps you guys jumped on that opportunity). What's the max # of connections? Does it scale with storage as well?


You might find the HikariCP guidance on connection pools with SSD interesting. Less disk waiting can mean a much smaller pool is optimal, to avoid unnecessary context switching, among other things.

https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-...


We're tuning the connection limits during beta, but it's not something we typically scale. We set that as high as we can without hurting stability, and then let people shoot themselves in the foot if they want. :)

If you use it and run into connection limit issues, let us know and we'll gladly up them for you.


We don't apply any artificial limit on the number of connections, or the number of ops/s.

We scale on storage / memory, we try to keep the storage and memory scaling to a defined ratio for optimal performance


People are paying $125/month for 1GB memory and 10GB space? I see how AWS is making so much money, those prices are just insane.


I sense you've never worked in a suit & tie enterprise...

If you have backups, upgrade, access to monitoring tools, etc. 125$ / month is just 2 hours max of developer time.

That's a bargain and your standard Oracle admin should just feel threatened :-) (provided the company is willing to put its data on a server far away of its control room, which, I guess, doesn't happen so often :-))

so a bargain if you actually can make use of it...


It happens more than you would think. Big companies tend to have two classes of DB — the big special DB everyone uses, and then utility DBs (frequently hundreds or thousands) that devs use for smaller scale projects. They're usually pretty happy letting the tangential apps run elsewhere.


That's a bargain and your standard Oracle admin should just feel threatened

Pretty sure that's not the comparison folks are making. They're comparing it to Heroku, RDS, and self-hosting


People that use Oracle don't even read HN, it's another league.


Out of curiosity, what do they read?


It's probably more accurate to say that people who buy Oracle don't read HN. Developers that use Oracle might read HN, and these are the people that are driving the future of databases. There's a reason most new DBs have an open source business model. :)

People who buy Oracle are likely reading management publications, and possibly Gigaom. They're not reading technical discussions of databases.


For the same price you can get 4-8GB RAM and 10x the storage on RDS, AWS's managed relational database solution. However, Compose is going for a different performance profile, not the typical CRUD app.


And DBA services, disaster recovery, redundancy, tools, etc. Sometimes it makes more sense to let developers run their own databases, or hire dedicated DB operations staff, but usually it doesn't. :D


PostgreSQL wants to be web scale but it will never touch MongoDB scale. MongoDB is true web scale with full cloud compatibility and horizontal scaling like how clouds spread out in the real world. MongoDB mimics physics because physics is green technology. MongoDB is truly efficient with zero carbon footprint unlike PostgreSQL which is like diesel exhaust clogging up your network pipes when you try to shard upwards and outwards into the virtual scalable cloud atmosphere. PostgreSQL chokes your environment and doesn't support 10gen's new invention the MAPREDUCE which is the successor to outdated SQL. If you use PostgresSQL with auto-scale your will never support big data but MongoDB can scale up to even 100GB of big rich object data without relations so the data truly represents your client's needs.


This joke got old quite a long time ago.

(And no I promise it's not because I'm using MongoDB.)


Cool, but why is it so expensive? $12 per GB per month? It is hosted on AWS so what does "high-performance" mean?


Presumably they don't also charge separately for requests, CPU cycles, or bandwidth, so the storage costs include that.


Correct. Storage is just the easiest way to define a service that's sold as a usage based utility. The price includes all the hardware resources, our support staff, DBA tools, etc.


I'm trying to understand why this is being upvoted so much.


I upvoted in the hope that I'll find out that "auto-scaling" means it automatically scales to large numbers of reads and/or writes, which would amount to one awesome service. Since pricing is based on storage, I'm inclined to think it only auto-scales to high storage needs. I'm hoping to be wrong. The part where they say release 9.4 brought them within reach technically of what they wanted to do gives me some hope that I'm wrong.


It automatically scales all resources based on data size. This means increasing IOPs, RAM allotments, and CPU capacity on the fly as the data grows. At 1TB of data, for instance, the DB would have access to 100GB of RAM, about 60,000 random IOPs, and 12 full CPU cores.


Is dataset size the only autoscaling criteria? I.E. my data set is relatively small but with very large transaction volume.

Additionally, do you have standard postgres modules installed? Specifically, at least for my use case, PostGIS?


PostGIS is coming soon, the contrib extensions are all available to be turned on for DBs.

Autoscaling is currently datasize only. You can scale deployments up manually, however, for DBs where our 1/10 ratio isn't quite right.


Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: