> DynamoDB powers multiple high-traffic Amazon properties and systems including Alexa, the Amazon.com sites, and all Amazon fulfillment centers. Over the course of Prime Day, these sources made trillions of calls to the DynamoDB API. DynamoDB maintained high availability while delivering single-digit millisecond responses and peaking at 126 million requests per second.
Amazon was very, very clear on this. For Google to use that number without the caveat is just completely underhanded and dishonest. Whoever wrote this is absolutely lacking in integrity.
I used DynamoDB as part of the job a few years ago and never got single-millisecond responses - it was 20ms minimum and 70+ on a cold-start, but I can accept that optimising Dynamo's various indexes is a largely opaque process. We had to add on hacks like setting the request timeout to 5ms and keeping the cluster warm by submitting a no-op query every 500ms to keep it even remotely stable. We couldn't even use DAX because the Ruby client didn't support it. At the start we only had a couple of thousand rows in the table so it would have legit been faster to scan the entire table and do the rest in memory. Postgres did it in 5ms.
If Amazon said they didn't use DAX that day I would say they were lying.
The average consumer or startup is not going to squeeze out the performance of Dynamo that AWS is claiming that they have achieved.
In fact, it might have been fairer in Ruby if they didn't hard-code the net client (Net/HTTP). I imagine performance could have been boosted by injecting an alternative.
What a cool lil side project/company! Going to circulate this among friends...
Little bit of well meaning advice: This needs copy editing -- inconsistent use of periods, typos, grammar. Little crap that doesn't matter in the big picture, but will block some from opening their wallets. :) ("OpenTeletry", "performances", etc.)
All in all this is quite cool, and I hope you get some customers and gather more data! (a 4k object size in S3 doesn't make sense to measure, but 1MB might be interesting. Also, check out HDRHistogram, it might be relevant to your interests)
Nice dash - if you don't mind a drive-by recommendation: I use Grafana for work a lot and it's nice to see a table legend with min, max, mean, and last metrics for these kinds of dashboards. Really makes it easy to grok without hovering over data points and guessing.
What is more important for me when using Grafana (though a summary is as well) is actually units, to know if it's second, millisecond, microsecond, and also if 0.5 is a quantile or what.
Numbers without units are dangerous in my opinion.
> We had to add on hacks like setting the request timeout to 5ms and keeping the cluster warm by submitting a no-op query every 500ms to keep it even remotely stable.
This sounds like you're blaming dynamo for you/your stack's inability to handle connections / connection pooling.
Been using DynamoDB for years and haven’t had to do any of the hacks you talk about doing. Not using ruby though. TCP keep-alive does help with perf though (which I think you might be suggesting.)
I don’t have p99 times in front of me right this second but it’s definitely lower than 20ms for reads and likely lower for writes. (EC2 in VPC).
They very well know that people don't read sh* anymore. Just throw numbers there, PowerPoint them and offer an "unbiased" comparison where Google shines - buy Google.
Worst case scenario, it's Google you're buying, not a random startup etc.
Just as a hand in the air...Be careful about what you're comparing here. # of API calls over a period of time is...largely irrelevant in the face of QPS. I can happily write a DDOS script that massively bombards a service, but if that halts my QPS then it doesn't matter. So sure, trillions of API calls were made (still impressive in the scope of the overall network of services, I'm not downplaying that), but ultimately, for DynamoDB and Spanner, it's the QPS that mattered to us in terms of comparisons of DB scaling and performance.
Google calls API calls “queries”… because of their history as a search engine. QPS == API calls/per second == Requests per second
That said, I can’t imagine these numbers mean much to anyone after a certain point. It’s not like either company is running a single service handling them. The scale is limited by their budget and access to servers because my traffic shouldn’t impact yours. I feel like the better number is RPS/QPS per table or per logical database or whatever.
Yes, but QPS vs. "queries to the API". The difference is the time slice. I should have been more explicit. The key here really is the time function between the numbers. That the AWS blog calls out trillions of API calls isn't relevant because there wasn't a specific time denominator. The 126M QPS is the important stat.
> DynamoDB powers multiple high-traffic Amazon properties and systems including Alexa, the Amazon.com sites, and all Amazon fulfillment centers. Over the course of Prime Day, these sources made trillions of calls to the DynamoDB API. DynamoDB maintained high availability while delivering single-digit millisecond responses and peaking at 126 million requests per second.
Amazon was very, very clear on this. For Google to use that number without the caveat is just completely underhanded and dishonest. Whoever wrote this is absolutely lacking in integrity.