Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The NY study is preliminary, not peer reviewed. The sampling strategy used over-represents people going out, and we have no information on which test was used or its accuracy.

Given antibody testing done in places like Vô, the NY results seem a bit of a stretch.

Hopefully Japan gets it together and starts testing properly, though.



There are two NYC studies with likely over-sampling phenomena:

- the NYC antibody study one you mention, done at shopping centers, indeed likely over-represents people going out. 20%-21% of that population has antibodies.

- the SARS-CoV-2 testing study with pregnant women [1], tested just before delivery. Among those, 13% tested positive. It is reasonable to expect that this study over-represents subjects that barely go out: pregnant women to go out as little as possible to protect themselves and their future baby.

Because the sampling over-representation is opposite in the two studies, the truth is likely in between. in terms of antibodies, it is likely that the pregnant women that tested positive weeks ago have now developed antibodies or will do so in the next 1-2 weeks. Among pregnant women, the 13% also ignores women that had already developed antibodies before delivery; another phenomenon that may push the truth above 13%.

[1]: https://www.nejm.org/doi/full/10.1056/NEJMc2009316


"the NYC antibody study one you mention, done at shopping centers, indeed likely over-represents people going out."

The samples were done at grocery stores. Non-essential businesses are closed in New York. Most people here still have to go out to get groceries.

There is no evidence that this has led to an oversample; these are hypotheses advanced by people on reddit and HN.


> Vô

Nitpick: it's Vò https://it.wikipedia.org/wiki/Vo%27

I used to ride my bike through there on many a Sunday when I lived in Padova.


The NY study does align fairly well with similar studies done in Santa Clara County and Los Angeles County in California, though. All three indicate a significantly lower IFR than previously expected.


> All three indicate a significantly lower IFR than previously expected.

The NY study, if it holds up, suggests an IFR in the 0.8-1.0% range for NYC (depending on whether or not you include the additional excess deaths), which is in the range most experts have been assuming (0.5%-1.0% has been a common range that's been tossed around). For example, the Imperial College model used 0.9% IFR as an input. Additionally, a 10x confirmed cases to actual cases ratio is in the range most experts were assuming.

The two CA studies were outliers (and, had significant and substantive critiques), and suggested an IFR as much as 10x lower than the NY study suggests. I wouldn't call those two studies as aligning with the NY study.


FYI Geneva did a representative study: https://www.hug-ge.ch/medias/communique-presse/seroprevalenc...

They estimate ~27k infections on April 17th, for comparison right now for that canton the authorities declare 213 deaths (likely undercounted afaik those are only deaths at hospital) and 4726 confirmed cases.

So a lower bound of 0.7% for IFR seems reasonable (and in line with other studies)


Do you have any links handy discussing the issues? I have not come across anything like that in my reading and would like to read more on the critiques.


Sure!

Andrew Gelman (Stats at Columbia) had a commonly shared piece: https://statmodeling.stat.columbia.edu/2020/04/19/fatal-flaw...

Also a good dive into the issues: https://medium.com/@balajis/peer-review-of-covid-19-antibody...

Mercury News also had a good article covering a lot of this: https://www.mercurynews.com/2020/04/20/feud-over-stanford-co...

And yes, lots of twitter discussions from folks in the field, e.g. Natalie Dean of University of Florida https://twitter.com/nataliexdean/status/1251309217215942656 and Trevor Bedford (Fred Hutchinson) https://twitter.com/trvrb/status/1251332447691628545 and others.


The most interesting thing (to me) about the Gelman page is that by the PPPS, he's hedging all of his most significant criticisms:

"The data as reported are also consistent with infection rates of 2% or 4%. Indeed, as I wrote above, 3% seems like a plausible number. As I wrote above, “I’m not saying that the claims in the above-linked paper are wrong,” and I’m certainly not saying we should take our skepticism in their specific claims and use that as evidence in favor of a null hypothesis. I think we just need to accept some uncertainty here. The Bendavid et al. study is problematic if it is taken as strong evidence for those particular estimates, but it’s valuable if it’s considered as one piece of information that’s part of a big picture that remains uncertain. When I wrote that the authors of the article owe us all an apology, I didn’t mean they owed us an apology for doing the study, I meant they owed us an apology for avoidable errors in the statistical analysis that led to overconfident claims. But, again, let’s not make the opposite mistake of using uncertainty as a way to affirm a null hypothesis."

The twitterthink reaction to this study has been vicious, mostly based on amateur re-hashes of the Gelman critique, which even Gelman himself doesn't really believe.


The study pre-print is published and some of the numbers are publicly available, we don't need to play a game of revelations here between one person and another, or incorporate Twitter users into the mix. (I didn't even realize this was being criticized over Twitter, as I don't really use the service.) Gelman's critique is quite substantive, and commenters on Gelman's post have created Bayesian analyses which incorporate the uncertainty from test sensitivity and specificity.

When I made one in PyMC3 (which lined up with a commenter's approach with PyStan), the 97% CI for the prevalence based on the non-poststratified data I got had the prevalence between (-0.3%, 1.7%). What does that mean? The test just isn't certain enough to allow us to make any conclusions, not that the null hypothesis is correct or that we can reject the null hypothesis.

There's nothing wrong with performing the study. Indeed, the publishing of the study allows us to have these vigorous debates about methods and informs future trials from being more exact and not suffering from the same problems as previous studies. But trying to extrapolate a conclusion for something as important as COVID based on studies with extremely high uncertainty is highly irresponsible. Sometimes we have to accept that coming up with statistically significant conclusions is difficult.


"When I made one in PyMC3 (which lined up with a commenter's approach with PyStan), the 97% CI for the prevalence based on the non-poststratified data I got had the prevalence between (-0.3%, 1.7%). What does that mean? The test just isn't certain enough to allow us to make any conclusions, not that the null hypothesis is correct or that we can reject the null hypothesis."

Yeah, that doesn't sound substantially different than Gelman's frequentist intuition in the blog post. I'm not sure the more complex methods are adding much here, except that you can now examine the posterior, and see what portion of the density lies below zero (i.e. probably not much of it).

IMO the "CI includes zero" was weak when Gelman advanced it, because even though it's possible, it was clear from the assay error rates that the outcome was on the tails of the distribution; even if 95% of repeated samples may include zero, very few of them actually would. So at the end of the day, as you have demonstrated, you get a non-post-stratified posterior that encompasses the point estimate they gave (1.5%), but your confidence interval is different, and perhaps the mean is lower.

Now you're just left with debating the validity of the bias adjustments they made.

That said, it's wrong to frame this in terms of a "rejecting the null hypothesis". There's no hypothesis in an observational study like this.


> So at the end of the day, as you have demonstrated, you get a non-post-stratified posterior that encompasses the point estimate they gave (1.5%), but your confidence interval is different, and perhaps the mean is lower.

You cannot use confidence intervals to argue the validity of a point estimate inside of the CI. When using frequentist methods, we usually have some sort of control group where we can use a paired test to compare sample means in order to reject a hypothesis.

I wanted to use Bayesian methods not because they were more complex, but because I felt that when a control group is not available, a Bayesian analysis would be a lot more obvious about surfacing uncertainty. Bayesian methods also allow us to actually simulate P(prevalence | data). And no, just because 1.5% is in the 95th percentile of the posterior prevalence, does not mean you can say that 1.5% is a valid estimate. What the CI shows is that, with 97% confidence, the prevalence is somewhere between -0.3% and 1.7%. Additionally, the mean of this posterior came out to 0.8% prevalence, which to me is good as, to me, saying it's inconclusive. In fact, if we use the median of P(prevalence | data), then we get very close to 0.8%, so this test is basically showing that the prevalence in this population is negligible.


"You cannot use confidence intervals to argue the validity of a point estimate inside of the CI."

You're using a Bayesian method, so you have a posterior distribution. You can sample from it.

"And no, just because 1.5% is in the 95th percentile of the posterior prevalence, does not mean you can say that 1.5% is a valid estimate."

You told me that was the confidence interval on the parameter. The confidence interval contains the point estimate for the original study. It's as valid as any other point within the confidence interval. As you say: "you cannot use confidence intervals to argue the validity of a point estimate inside the CI".

"What the CI shows is that, with 97% confidence, the prevalence is somewhere between -0.3% and 1.7%."

Which includes 1.5%.


> You told me that was the confidence interval on the parameter. The confidence interval contains the point estimate for the original study. It's as valid as any other point within the confidence interval. As you say: "you cannot use confidence intervals to argue the validity of a point estimate inside the CI".

> Which includes 1.5%.

And everything else in the CI. If we're treating this like a CI, then it's like saying a dice will land on 1, just because it's equally likely to land on 6.

The actual P(1.5% | prevalence) is quite low at 3%.


"And everything else in the CI. If we're treating this like a CI, then it's like saying a dice will land on 1, just because it's equally likely to land on 6. The actual P(1.5% | prevalence) is quite low at 3%."

You just said that you can't use a CI to estimate the likelihood of any point within the CI (you actually can, for well-behaved problems, but I digress) when I commented that 0% isn't a likely outcome within the interval.

Literally the same argument. If you want to argue that 1.5% is unlikely, then you have to accept that 0% is unlikely for the same reasons.


A lot of the discussion is happening on twitter. One such thread:

https://twitter.com/wfithian/status/1252692357788479488

> I have been corresponding with the authors of the well-known Santa Clara County COVID-19 preprint, and I am alarmed at their sloppy behavior. The confidence interval calculation in their preprint made demonstrable math errors - 'not' just questionable methodological choices.

..

> The errors are not debatable and can be seen in these two screenshots of the supplement: 0.0034, the standard error meant to measure uncertainty about prevalence pi, is not the square root of 0.039, and the variance of a binomial estimate of proportion depends on the sample size.

Another critique:

https://twitter.com/jjcherian/status/1251272333177880576

> Ok, so what's wrong with the confidence intervals in this preprint? Well they publish a confidence interval on the specificity of the test that runs between 98.3% and 99.9%, but only 1.5% of all the tests came back positive!

> That means that if the true specificity of the test lies somewhere close to 98.3%, nearly all of the positive results can be explained away as false positives (and we know next to nothing about the true prevalence of COVID-19 in Santa Clara County)

> They report a 95% confidence interval for the prevalence of COVID-19 in Santa Clara County that runs from 2.01% to 3.49% though! That seems oddly narrow, given that they have already shown that it is within the realm of possibility that the data collected are all false positives!



There are major statistical problems with the Santa Clara and Los Angeles studies though. Namely selection bias, and using a test that is inaccurate enough that all reported positives are plausibly false positives.

That said, IFRs in the 1% or less range have been projected for some time, and everyone who pays attention to the numbers knows that reported dramatically understates reality (the only debate is over how much).


That would only be true if their tests work well. In my experience, IgG and IgM suck big time, and I wouldn’t trust results. Would love to see one of those randomized studies with qRT-PCR tests.


The problem with RT-PCR is that it only pick up viral RNA from a currently active infection, while IgG and IgM (antibody) tests will tell you if the person tested had been infected in the past.

Also, depending on how effective the detected antibodies are in fighting off the infection, we might get insight in to how much immunity people have and how long it lasts.

RT-PCR will not tell you any of that.


You’re making a massive assumption there - that the IgG and IgM tests tell you if the person has been infected in the past. To which my experience says: no, it doesn’t. Both false positive and negative rates are very high in comparison to people previously negative / positive with RT-PCR tests.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: