I used DynamoDB as part of the job a few years ago and *never* got single-millis...

iot_devs · on Oct 11, 2023

No need to guess when you can measure.

I am running https://cloud-canary.com a service where I monitor AWS primary services for latency and availability.

It comes with a lot of data.

For instance this is the latency I see doing operations against Dynamo.

https://cloudcanary.grafana.net/public-dashboards/c53e2092d6...

loxias · on Oct 11, 2023

What a cool lil side project/company! Going to circulate this among friends...

Little bit of well meaning advice: This needs copy editing -- inconsistent use of periods, typos, grammar. Little crap that doesn't matter in the big picture, but will block some from opening their wallets. :) ("OpenTeletry", "performances", etc.)

All in all this is quite cool, and I hope you get some customers and gather more data! (a 4k object size in S3 doesn't make sense to measure, but 1MB might be interesting. Also, check out HDRHistogram, it might be relevant to your interests)

iot_devs · on Oct 11, 2023

Thanks!

Any feedback is appreciated!

I pick 4k as a no-op against S3, something that very little time but still does some work.

I will definitely consider to increase it!

bennyg · on Oct 11, 2023

Nice dash - if you don't mind a drive-by recommendation: I use Grafana for work a lot and it's nice to see a table legend with min, max, mean, and last metrics for these kinds of dashboards. Really makes it easy to grok without hovering over data points and guessing.

RedlineTriad · on Oct 12, 2023

What is more important for me when using Grafana (though a summary is as well) is actually units, to know if it's second, millisecond, microsecond, and also if 0.5 is a quantile or what.

Numbers without units are dangerous in my opinion.

iot_devs · on Oct 11, 2023

Thanks a lot!

I'll definitely update it!

qwertox · on Oct 11, 2023

What a cool service. Congratulations!

RhodesianHunter · on Oct 11, 2023

> We had to add on hacks like setting the request timeout to 5ms and keeping the cluster warm by submitting a no-op query every 500ms to keep it even remotely stable.

This sounds like you're blaming dynamo for you/your stack's inability to handle connections / connection pooling.

tedivm · on Oct 11, 2023

Yeah that TLS handshake is an absolute killer if you run it for every request.

iends · on Oct 11, 2023

Been using DynamoDB for years and haven’t had to do any of the hacks you talk about doing. Not using ruby though. TCP keep-alive does help with perf though (which I think you might be suggesting.)

I don’t have p99 times in front of me right this second but it’s definitely lower than 20ms for reads and likely lower for writes. (EC2 in VPC).