Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes.

Weird that https://www.cloudflarestatus.com/ isn't reporting this properly. It should be full of red blinking lights.



Yeah. I only work for a small company, but you can be certain we will not update the status page if only a small portion of customers are affected, and if we are fully down, rest assured there will be no available hands to keep the status page updated


>rest assured there will be no available hands to keep the status page updated

That's not how status pages if implemented correctly work. The real reason status pages aren't updated is SLAs. If you agree on a contract to have 99.99% uptime your status page better reflect that or it invalidates many contracts. This is why AWS also lies about it's uptime and status page.

These services rarely experience outages according their own figures but rather 'degraded performance' or some other language that talks around the issue rather than acknowledging it.

It's like when buying a house you need an independent surveyor not the one offered by the developer/seller to check for problems with foundations or rotting timber.


SLA’s usually just give you a small credit for the exact period of the incident, which is arymetric to the impact. We always have to negotiate for termination rights for failing to meet SLA standards but, in reality, we never exercise them.

Reality is that in an incident, everyone is focused on fixing issue, not updating status pages; automated checks fail or have false positives often too. :/


Yep, every SLA I've ever seen only offers credit. The idea that providers are incentivized to fudge uptime % due to SLAs makes no sense to me. Reputation and marketing maybe, but not SLAs.

The compensation is peanuts. $137 off a $10,000 bill for 10 hours of downtime, or 98.68% uptime in a month, is well within the profit margins.


This is weird - at this level contracts are supposed to be rock solid so why wouldn't they require accurate status reporting? That's trivial to implement, and you can even require to have it on a neutral third-party like UptimeRobot and be done with it.

I'm sure there are gray areas in such contracts but something being down or not is pretty black and white.


> something being down or not is pretty black and white

This is so obviously not true that I'm not sure if you're even being serious.

Is the control panel being inaccessible for one region "down"? Is their DNS "down" if the edit API doesn't work, but existing records still get resolved? Is their reverse proxy service "down" if it's still proxying fine, just not caching assets?


I understand there are nuances here, and I may be oversimplifying, but if part of the contract effectively says "You must act as a proxy for npmjs.com" yet the site has been returning 500 Cloudflare errors across all regions several times within a few weeks while still reporting a shining 99.99% uptime, something doesn't quite add up. Still, I'm aware I don't know much about these agreements, and I'm assuming the people involved aren't idiots and have already considered all of this.


> I'm sure there are gray areas in such contracts but something being down or not is pretty black and white.

Is it? Say you've got some big geographically distributed service doing some billions of requests per day with a background error rate of 0.0001%, what's your threshold for saying whether the service is up or down? Your error rate might go to 0.0002% because a particular customer has an issue so that customer would say it's down for them, but for all your other customers it would be working as normal.


> something being down or not is pretty black and white

it really isn't. We often have degraded performance for a portion of customers, or just down for customers of a small part of the service. It has basically never happened that our service is 100% down.


Are the contracts so easy to bypass? Who signs a contract with an SLA knowing the service provider will just lie about the availability? Is the client supposed to sue the provider any time there is an SLA breach?


Anyone who doesn't have any choice financially or gnostically. Same reason why people pay Netflix despite the low quality of most of their shows and the constant termination of tv series after 1 season. Same reason why people put up with Meta not caring about moderating or harmful content. The power dynamics resemble a monopoly


Why bother to put the SLA in the contract at all, if people have no choice but to sign it?

Netflix doesn't put in the contract that they will have high-quality shows. (I guess, don't have a contract to read right now.)


Most of services are not really critical but customers want to have 99.999% on the paper.

Most of the time people will just get by and ignore even full day of downtime as minor inconvenience. Loss of revenue for the day - well you most likely will have to eat that, because going to court and having lawyers fighting over it most likely will cost you as much as just forgetting about it.

If your company goes bankrupt because AWS/Cloudflare/GCP/Azure is down for a day or two - guess what - you won't have money to sue them ¯\_(ツ)_/¯ and most likely will have bunch of more pressing problems on your hand.


The company that is trying to cancel its contract early needs to prove the SLA was violated, which is very easy of the company providing the service also provides a page that says their SLA was violated. Otherwise it's much harder to prove.


The client is supposed to monitor availability themselves, that is how these contracts work.


I imagine there will be many levels of "approvals" to get the status page actually showing down, since SLA uptime contracts is involved.


I work for a small company. We have no written SLA agreements.


I have to say that if an incident becomes so overwhelming that nobody can spare even a moment to communicate with customers, that points to a deeper operational problem. A status page is not something you update only when things are calm. It is part of the response itself. It is how you keep users informed and maintain trust when everything else is going wrong.

If communication disappears entirely during an outage, the whole operation suffers. And if that is truly how a company handles incidents, then it is not a practice I would want to rely on. Good operations teams build processes that protect both the system and the people using it. Communication is one of those processes.


if we are fully down, rest assured there will be no available hands to keep the status page updated

There is no quicker way for customers to lose trust in your service than it to be down and for them to not know that you're aware and trying to fix it as quickly as possible. One of the things Cloudflare gets right is the frequent public updates when there's a problem.

You should give someone the responsibility for keeping everyone up to date during an incident. It's a good idea to give that task to someone quite junior - they're not much help during the crisis, and they learn a lot about both the tech and communication by managing it.


You won't be able to update the status page due to failures anyway.


Why not? A good status page runs on a different cloud provider in a different region, specifically to not be affected at the same time.


This is just business as usual, status pages are 95% for show now. The data center would have to be under water for the status page to say "some users might be experiencing disruptions".


They just did an update, and it is bad (in the sense that they are not realizing their clients are down?)

> Investigating - Cloudflare is investigating issues with Cloudflare Dashboard and related APIs.

> These issues do not affect the serving of cached files via the Cloudflare CDN or other security features at the Cloudflare Edge.

> Customers using the Dashboard / Cloudflare APIs are impacted as requests might fail and/or errors may be displayed.


> (in the sense that they are not realizing their clients are down?)

Their own website seems down too https://www.cloudflare.com/

--

500 Internal Server Error

cloudflare


>Customers using the Dashboard / Cloudflare APIs are impacted as requests might fail and/or errors may be displayed.

"Might fail"


well it does say that now, so…

which datacenter got flooded?


> In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary. Dec 05, 2025 - 09:00 UTC

It's a scheduled maintenance, so SLA should not apply right ?


https://updog.ai/status/cloudflare reported the incident 13 minutes ago (at the moment of writing this).


Yeah, their status site reports nothing but then clicking on some of the links on that site bring you the 500 error


Company internal status pages are always like this. When you don't report problems they don't exist!


It’s wild how non of the big corporations can make a functional status page


They could, but accurate reporting is not good for their SLAs


They can. They don't want to though.


Management is always going to take too long (in an engineer’s opinion) to manually throw the alerts on. They’re pressing people for quick fixes so they can claim their SLAs are intact.


They were intending to start a maintenance window starting 6 minutes ago, but they were already down by then.


There is an update:

"Cloudflare Dashboard and Cloudflare API service issues"

Investigating - Cloudflare is investigating issues with Cloudflare Dashboard and related APIs.

Customers using the Dashboard / Cloudflare APIs are impacted as requests might fail and/or errors may be displayed. Dec 05, 2025 - 08:56 UTC


Not weird, that’s tradition by now.


Interesting, I get a 500 if I try to visit coinbase.com, but my WebSocket connections to advanced-trade-ws.coinbase.com are still live with no issues.


probably these websockets are not going through cloudflare


> In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary. Dec 05, 2025 - 07:00 UTC

Something must have gone really wrong.


It's 1AM in San Francisco right now. I don't envy the person having to call Matthew Prince and wake him up for this one. And I feel really bad for the person that forgot a closing brace in whatever config file did this.


Agreed, I feel bad for them. But mostly because cloudflare's workflows are so bad that you're seemingly repeatedly set up for really public failures. Like how does this keep happening without leadership's heads rolling. The culture clearly is not fit for their level of criticality


> The culture clearly is not fit for their level of criticality

I don't think anyone's is.


How often do you hear of Akamai going down and they host a LOT more enterprise/high value sites than Cloudflare.

There's a reason Cloudflare has been really struggling to get into the traditional enterprise space and it isn't price.


A quick google turned up an Akamai outage in July that took Linode down and two in 2021. At that scale nobody's going to come up smelling like roses. I mostly dealt with Amazon crap at megacorp, but nobody that had to deal with our Akamai stuff had anything kind to say about them as a vendor.

At first blush it's getting harder to "defend" use of Cloudflare, but I'll wait until we get some idea of what actually broke. For the time being I'll save my outrage for the AI scrapers that drove everyone into Cloudflare's arms.


Was it a CDN or Linode failure?


The last place I heard of someone deploying anything to Akamai was 15 years ago in FedGov.

Akamai was historically only serving enterprise customers. Cloudflare opened up tons of free plans, new services, and basically swallowed much of that market during that time period.


> I don't envy the person having to call Matthew Prince

They shouldn't need to do that unless they're really disorganised. CEOs are not there for day to day operations.


> And I feel really bad for the person that forgot a closing brace in whatever config file did this.

If a closing brace take your whole infra. down, my guess is that we'll see more of this.


Life hack: Announce bug that brings your entire network down as scheduled maintenance.


Yes, the incident report claims this was limited to their client dashboard. It most certainly was not. I have the PagerDuty alerts to prove it...


> Investigating - Cloudflare is investigating issues with Cloudflare Dashboard and related APIs.

They seem to now, a few min after your comment


Im much more concerned with customer sites being down which indicates are not impacted. They are.. :/


They have enough data to at least automate yellow.


The AI agents can't help out on this time.


maybe we can back to stackoverflow :)


Now showing a message, posted at 08:56 UTC.


Yes, it’s really ‘weird’ that they refuse to share any details. Completely unlike AWS, for example. As if being open about issues with their own product wouldn’t be in their best interest. /s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: