Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Author of the blog post here.

Yeah, this is generally a good practice. The silver lining is that our suffering helped uncover the underlying issue faster. :)

This isn’t part of the blog post, but we also considered getting the servers and keeping them idle, without actual customer workload, for about a month in the future. This would be more expensive, but it could help identify potential issues without impacting our users. In our case, the crashes started three weeks after we deployed our first AX162 server, so we need at least a month (or maybe even longer) as a buffer period.



>The silver lining is that our suffering helped uncover the underlying issue faster.

Did you actually uncover the true root cause? Or did they finally uncap the power consumption without telling you, just as they neither confirmed nor denied having limited it?


The root cause was a problem with the motherboard, though the exact issue remains unknown to us. I suspect that a component on the motherboard may have been vulnerable to power limitations or fluctuations and that the newer-generation motherboards included additional protection against this. However, this is purely my speculation.

I don't believe they simply lifted a power cap (if there was one in the first place). I genuinely think the fix came after the motherboard replacements. We had 2 batches of motherboard replacements and after that, the issue disappeared.

If someone from Hetzner is here, maybe they can give extra information.


hetzner is currently replacing motherboards of their dedicated servers [1] But I dont know if thats the same issue that was mentioned in the article.

[1] https://status.hetzner.com/incident/7fae9cca-b38c-4154-8a27-...


Thats the same issue, yes.


Customers are the best QA. And they pay you too, instead of the reverse!


I'm pretty sure they pay for QA. QA cannot always catch every possible bug.


these crashes should have been caught easily


Were you able to identify the manufacturer and model/revision of the failing motherboards? This would be extremely helpful when shopping for seconds hand servers.


I cannot find the link now, but it was mentioned that it was ASRock mobos.


Thanks. This comment above does mention ASRock: https://news.ycombinator.com/item?id=43112594

On the other hand, dmidecode output in the article shows:

Manufacturer: Dell Inc. Product Name: 0H3K7P




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: