Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

well for example in our systems all api calls only moves from a know state to another known state and any call failure redirects the client/user to the dashboard trough an error handler so they have to reload the last good state saved on the database.

not perfect, but having a server crash is not much different than having a connection reset by a wifi status change or an upload timing out due the mobile network going away or the user navigating away or closing the browser.



It sounds like you are saying "The in-flight requests fail" to me.

I really don't like the idea of saying that it's simply okay to give random users a bad user experience like that when you are actually killing servers yourself all the time.


It's a different approach to managing risk -- minimizing impact of failure rather than minimizing the likelihood of failure.

It's nice to know that you can kill a process and the only impact is that in-flight requests fail, rather than having a more significant outage if a process crashes and the failover doesn't work, or the process doesn't automatically restart, etc.

If you accept that requests will fail you can build retries into the system. It's a lot harder to make a system more resilient if you avoid testing the failure scenarios.


Exactly! Chaos engineering is all about thoughtfully planned out experiments, to observe what the user experience will be when something fails. Doing this on your own terms allows you to improve the experience so that your customers aren't affected.

You can decide what happens when an in-flight request is dropped, whether you hold onto the state somehow and retry or the client could fail gracefully with a relevant error message.


Another thing that's not often caught by "normal" testing but that chaos engineering can capture is when multiple things fail together in random ways. It can be surprising how otherwise robust services can fail badly when multiple things go wrong at once.


When you have nested service calls, a single downstream failure shouldn’t fail all the way back to the root request, in most cases.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: