From the AWS blog post they referenced- > DynamoDB powers multiple high-traffic ...

ljm · on Oct 11, 2023

I used DynamoDB as part of the job a few years ago and never got single-millisecond responses - it was 20ms minimum and 70+ on a cold-start, but I can accept that optimising Dynamo's various indexes is a largely opaque process. We had to add on hacks like setting the request timeout to 5ms and keeping the cluster warm by submitting a no-op query every 500ms to keep it even remotely stable. We couldn't even use DAX because the Ruby client didn't support it. At the start we only had a couple of thousand rows in the table so it would have legit been faster to scan the entire table and do the rest in memory. Postgres did it in 5ms.

If Amazon said they didn't use DAX that day I would say they were lying.

The average consumer or startup is not going to squeeze out the performance of Dynamo that AWS is claiming that they have achieved.

In fact, it might have been fairer in Ruby if they didn't hard-code the net client (Net/HTTP). I imagine performance could have been boosted by injecting an alternative.

iot_devs · on Oct 11, 2023

No need to guess when you can measure.

I am running https://cloud-canary.com a service where I monitor AWS primary services for latency and availability.

It comes with a lot of data.

For instance this is the latency I see doing operations against Dynamo.

https://cloudcanary.grafana.net/public-dashboards/c53e2092d6...

loxias · on Oct 11, 2023

What a cool lil side project/company! Going to circulate this among friends...

Little bit of well meaning advice: This needs copy editing -- inconsistent use of periods, typos, grammar. Little crap that doesn't matter in the big picture, but will block some from opening their wallets. :) ("OpenTeletry", "performances", etc.)

All in all this is quite cool, and I hope you get some customers and gather more data! (a 4k object size in S3 doesn't make sense to measure, but 1MB might be interesting. Also, check out HDRHistogram, it might be relevant to your interests)

iot_devs · on Oct 11, 2023

Thanks!

Any feedback is appreciated!

I pick 4k as a no-op against S3, something that very little time but still does some work.

I will definitely consider to increase it!

bennyg · on Oct 11, 2023

Nice dash - if you don't mind a drive-by recommendation: I use Grafana for work a lot and it's nice to see a table legend with min, max, mean, and last metrics for these kinds of dashboards. Really makes it easy to grok without hovering over data points and guessing.

RedlineTriad · on Oct 12, 2023

What is more important for me when using Grafana (though a summary is as well) is actually units, to know if it's second, millisecond, microsecond, and also if 0.5 is a quantile or what.

Numbers without units are dangerous in my opinion.

iot_devs · on Oct 11, 2023

Thanks a lot!

I'll definitely update it!

qwertox · on Oct 11, 2023

What a cool service. Congratulations!

RhodesianHunter · on Oct 11, 2023

> We had to add on hacks like setting the request timeout to 5ms and keeping the cluster warm by submitting a no-op query every 500ms to keep it even remotely stable.

This sounds like you're blaming dynamo for you/your stack's inability to handle connections / connection pooling.

tedivm · on Oct 11, 2023

Yeah that TLS handshake is an absolute killer if you run it for every request.

iends · on Oct 11, 2023

Been using DynamoDB for years and haven’t had to do any of the hacks you talk about doing. Not using ruby though. TCP keep-alive does help with perf though (which I think you might be suggesting.)

I don’t have p99 times in front of me right this second but it’s definitely lower than 20ms for reads and likely lower for writes. (EC2 in VPC).

mk89 · on Oct 11, 2023

They very well know that people don't read sh* anymore. Just throw numbers there, PowerPoint them and offer an "unbiased" comparison where Google shines - buy Google.

Worst case scenario, it's Google you're buying, not a random startup etc.

azmodeus · on Oct 11, 2023

Google doesn't have a great brand of not killing products. No support and randomly killing stuff is not a good business relationship

GabeWeiss_ · on Oct 11, 2023

Just as a hand in the air...Be careful about what you're comparing here. # of API calls over a period of time is...largely irrelevant in the face of QPS. I can happily write a DDOS script that massively bombards a service, but if that halts my QPS then it doesn't matter. So sure, trillions of API calls were made (still impressive in the scope of the overall network of services, I'm not downplaying that), but ultimately, for DynamoDB and Spanner, it's the QPS that mattered to us in terms of comparisons of DB scaling and performance.

vineyardmike · on Oct 11, 2023

Google calls API calls “queries”… because of their history as a search engine. QPS == API calls/per second == Requests per second

That said, I can’t imagine these numbers mean much to anyone after a certain point. It’s not like either company is running a single service handling them. The scale is limited by their budget and access to servers because my traffic shouldn’t impact yours. I feel like the better number is RPS/QPS per table or per logical database or whatever.

GabeWeiss_ · on Oct 11, 2023

Yes, but QPS vs. "queries to the API". The difference is the time slice. I should have been more explicit. The key here really is the time function between the numbers. That the AWS blog calls out trillions of API calls isn't relevant because there wasn't a specific time denominator. The 126M QPS is the important stat.