Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
IBM's Watson recommended 'unsafe and incorrect' cancer treatments (gizmodo.com)
107 points by airstrike on July 26, 2018 | hide | past | favorite | 65 comments


That's okay, though, right? We're not talking about fully automating diagnosis and treatment. We're talking about using a tool to help doctors diagnose and treat more effectively.

i.e. it's more like predictive text input than anything. If it makes it faster for you to diagnose, makes harder to diagnose stuff easier, and recommends the right treatment most of the time then it saves the doctor some time and energy.

The only question is whether it does that.


Having systems like autopilot in planes can make pilots worse at flying because they don't spend as much time practicing at the controls and monitoring what's happening. Then when something goes wrong, they risk not paying close enough attention to catch it, or not being practiced well enough to correct it.

If you get complacent and just assume the computer knows what it's doing (because it usually does) this can and will end very badly.

The person whose Tesla drove into a concrete barrier at 65 mph earlier this year was perfectly capable of driving a car, but they mistakenly believed that the computer had it under control.

99% Invisible has a pair of podcast episodes on the subject (2015):

https://99percentinvisible.org/episode/children-of-the-magen...

https://99percentinvisible.org/episode/johnnycab-automation-...


That's the difference between Oracles and Agents.

Oracle = make suggestion

Agent = Act on your behalf.

Waze might be starting off as an Oracle. but once you tie it into self driving cars, it'll become an agent acting on your behalf.

At a certain tipping point, creates really interesting questions. For example, if you re-direct 100% of traffic to less congested route, you end up creating traffic jams, so then you have to decide who goes on which route... maybe people who pay more get faster routing, or people deemed more important or busier by the algorithm...


Instead of autopilot, think of this as (or should be) more like fly-by-wire systems in every modern fighter jet. The computer reads the pilot's inputs and translates them to the most effective possible control adjustments, making many more changes than a pilot would be able to make even given the controls.

Or maybe the tool should be limited to retrospective evaluation of doctors' decisions. Basically an automated peer review.


I would argue against this one, simply because of the Airbus incident from some time ago where to avoid pushing feedback into the controls, the plane was designed to average out the inputs from the two pilots. The older, more experienced pilot was attempting to maneuver the plane in such a way to correct their problem (I believe it was a stall) and the younger pilot was panicking and attempting the opposite maneuver, the plane averaged it to roughly no course adjustment, and they hit a mountainside.

I'm all for automating driving and other such tasks once the computers are ready, but until we know they're at least most likely ready to do it, I want the ability to turn it off and do it myself.

(And frankly, I want that ability anyway because I genuinely love driving and it makes me sad to think someday I won't be able to do it.)


For reference, I believe this is the crash you're referring to: https://en.wikipedia.org/wiki/Air_France_Flight_447

The fly-by-wire control system ordinarily prevents stalls, but it had disengaged due to an iced over sensor and was operating without stall protection. The plane stalled at 38,000 feet and fell into the ocean.

They should have had plenty of time to correct the stall, but one pilot was pulling back his control stick (the opposite of what they needed to do), and since the two sticks aren't physically linked the other pilot didn't know he was doing that.

It's one of the topics discussed in the podcast that I linked above.


Yeah that's the one.


Even if you let the computer run the show wholesale it could work out net positive. Autopilot might cause problems, but it might also prevent twice as many accidents that would have been caused by pilot error.

We're still in pretty early days here, so I don't know if Watson's advice is at a point where that could be true. In the meantime, I do like the idea of having an unaided doctor and the computer program evaluate it independently, then afterward say "Ok, let's see what the computer thought" before coming to any final conclusions. Maybe it'd help avoid any dangerous recommendations. Or maybe people would say "Well the computer has a lot more data than me, it's probably right." I think that depends on whether the differences are errors similar to what a human would make, or if they're the "oh my god how did it even come up with this result" nonsensical error that are easily spotted by human review.


That's one of the implicit reasons why US based airlines in the past used to recruit retired pilots from U.S Air Force. In case instruments fail or contradictory signals are present, pilots with physical skills (kinesthesia, etc) can steer away from dangers.

Two incidents demonstrate that:

1. https://en.wikipedia.org/wiki/Asiana_Airlines_Flight_214

2. https://en.wikipedia.org/wiki/Air_France_Flight_447

Pilots on both came from the civilian background. The difference between civilian training and combat training is that the latter is more trained to operate in conditions that involve instruments failure, distress, etc.


Aren't both of those just examples of pilot error, regardless of autopilot?

In the case of the Asiana flight, the captain chose a visual approach when he could have let the autopilot land it. And from there it's simple pilot error.

In the case of the air France flight, one of the pilots chose to ignore a stall warning and pull up instead of pushing down to gain speed. Sadly, in the case of no pilot input the plane actually would have ended up avoiding the stall on its own.

I of course agree that military pilots with real experience will likely perform better than civilian pilots, I just doubt autopilot had any part to play in either of those accidents, like the comment you replied to implies.


There's also been a lot of discussion and research about how to best interact with autopilot, and when the best times to turn it off and operate the airplane manually are.

There is a talk[1] about this that is really popular among pilots, and I honestly think it seems increasingly relevant to the rest of the world as more and more things become computer automated in some fashion.

[1] - https://www.youtube.com/watch?v=pN41LvuSz10


If it is poorly trained and recommends incorrect treatment, then no it is a poor tool. If it recommends only in clear cases and then gives the doctor tools to narrow down treatment in more obscure cases that would be good. But current machine learning tech and more importantly marketing does not actually provide that capability. Because the ML software does not work like human decision making. So it cannot really know when it’s making a poor guess and any insight into its decision making process would be useless to the doctors involved. This tech is way too immature for the uses it’s being put to. But the tech companies have started to believe their own hype.


No it's not okay!

It increases the probability of (harmful) medical error. The "unsafe" is a big, big gotcha for regulatory approval. As someone correctly pointed out, this is relative to human error rates (can't view article because of registration).


The other question is how does it compare to human doctors, they also sometimes recommend unsafe or wrong treatment.


One way I tend to view ML is that, when its wrong, its catastrophically wrong. Because it doesn't actually understand anything its doing, and simply looking at a probability model and picking based off it, you end up with the issue that a few carefully selected pixel changes in a picture of a cat takes the model from a cat to an ostrich.

The model does not see a significant difference between the cat/ostrich, or cancer/cold, whereas we do; this implies that, when the model is wrong, it is likely to not just provide an incorrect treatment, but a catastrophically incorrect treatment.

Where the human sees a cat-like creature, and if not guess a cat, then something similar to a cat (4 legged, furry, etc), the ML model is willing to jump anywhere, in the worst case.

So its not just rate of misdiagnosis, but by how much as well.


The some other other questions: how much does it cost, compared to training a human? Does it actually save any time / resources if it's outputs must always be scrutinized?


That may be a question someone outside of an FDA approval process might asked. But if you have been, you understand the importance of "safety and effectiveness", and you will note there is nothing there about "cost effective" or "marketable".


It depends on how doctors view IBM's magic black box. If they grow to trust and depend on it, that could become a real problem.

People have a tendency to defer, and have a bias towards deferring to machines that seemingly behave as accurately as a calculator would with basic arithmetic.

A true crisis can arise if doctors can shift liability by claiming to have just followed Watson's results.


If you genuinely don't know how to spell something (children, ESL) then auto-correct might cause you to accept a correction to the wrong homophone and fool you into thinking it helped you.

If the doctor doesn't know how to quickly and cheaply verify the machine's guess, they'll just rubber stamp its recommendations.


Even worse, depending on how well the IBM sales team pawns this hokum off as a legitimate diagnostic tool, the next logical step will be for malpractice insurance companies forcing physicians to consult with Watson before making any decisions.


Not really... I'm mean it's great for IBM financially, but not all that great for Hospitals and Medical Centers duped into this scam:

"University of Texas says the project cost MD Anderson more than $62 million"

source: MD Anderson Benches IBM Watson In Setback For Artificial Intelligence In Medicine https://www.forbes.com/sites/matthewherper/2017/02/19/md-and...

As for your analogy, you got still tragically flawed predictive text technology for free. Nobody bought iPhones for their crappy predictive text capabilities. Imagine the horror if you actually had to pay for predictive text. As it is, it's a nice to have freebie that came along with your phone, which works.... sometimes.


To me the key is how you use the system. I like the idea of using this not to help but second guess the doctor. The doctor has to make their own diagnosis. If Watson differs a arbiter has to look at it. This obviously optimizes for quality and not cost.


We're talking about using a tool to help doctors diagnose and treat more effectively.

A very expensive tool when healthcare systems are strapped for cash. Super unethical of IBM to try and extract profits here.


"healthcare systems are strapped for cash"

I don't know about that. Most of the new construction in Connecticut seems to be building medical offices for Hartford Healthcare or other medical groups.


>A very expensive tool when healthcare systems are strapped for cash. Super unethical of IBM to try and extract profits here.

Do you realize they would have never developed Watson if there weren't profits involved?


I have no objection to them selling a working technology but they knew Watson was pure snake oil and did it anyway. Same as Theranos.


If people require 100% accuracy out of the gate from any software then we might as well give up now. It's not as if doctors get it right all the time, if you know anyone with medical issues they've likely heard different opinions from different doctors themselves. Anything more complicated then a broken bone seems to get mixed responses about what should be done, or even what the cause of the complication is in the first place.


A technology in this stage of immaturity is fine to trial but IBM should be paying the hospitals not the other way round. They are doing experiments with peoples lives - again just like Theranos.


We're putting a lot of faith in tools that can best be described as immature. I don't think it's out of the question to get A.I. (not speaking of Watson, which is A.I. adjacent) to the point where it can perform perfectly at human intelligence tasks. The point is to have these systems perform better than their human counterparts (at least, I think that should be aim for something like Watson for Health). The people who built these things are learning themselves. As we make more progress in technology and AI, I don't doubt that we could break that barrier. There may be an upper limit because maybe the human brain is incapable of solving some problems, but even there I think it's not difficult to imagine workarounds. Simply put, I don't think the answer is to think these problems could never be solved.

Additionally, human-assisted A.I. is not the solution, it's a non-answer to the problem of creating systems that can think and perform at human levels of intelligence. It's okay to admit if we don't have the ability to make these things, but its disingenuous to believe that human involvement in helping computers get to the right answer is the right answer. Though yes, we need this right now to move things along where they otherwise might stand still.


Totally agree. Applying existing ML/"AI" to treating cancer is extremely misguided, and IMO dangerous and unethical.

ML performs very well for specific well defined tasks that have an obvious outcome and are highly narrow in scope.

Cancer is a disease that we can't even treat ourselves in many cases. It requires a great deal of creativity and critical thinking to reach solutions on a case by case basis. Who is so arrogant they thought this should be replaced by a bunch of overhyped software?


There has long been a calculus of using more risky or long shot treatments for the more deadly or hopeless diseases. Certain types of cancer are essentially death sentences.

Maybe it's counterintuitive, but it makes a lot more sense to use ML for cancer than say a broken arm. There are so many systems interacting in cancer that affect its progression that humans really are at there limits in trying to understand them.

And as others have said, no one is letting Baymax loose in the oncology ward and firing all the doctors. This is just one more tool in a doc's tool belt -- and far from the only that will give misleading results.


> There are so many systems interacting in cancer that affect its progression that humans really are at there limits in trying to understand them.

Bacterial infections also have a ridiculous amount of systems involved and yet that was figured out by humans.

I don't see how ML helps with cancer at all right now. The problem isn't the amount of data. It's the quality of it.


It's not dangerous and unethical if humans are there to pass final judgment on the answers. I was saying we can get to point where we might not even need that. This technology is in development, it has a lot of potential it can, will possibly, reach. Stories like these are good for caution, but doesn't mean we shouldn't use these tool there.


"We're putting a lot of faith in..."

OK. Back in the real world, successful AI companies build useful software products that inform human decision making but do not directly, unilaterally, unintermediatedly result in real-world kinetic action.


I suggest holding on to the knee jerk reaction till you read the whole post..? I never said these tools will never be worthy of trust and performance at the levels we attribute to humans, but at their current point, there are some tasks that they're not good at because either the tool is limited by design or hardware, or we just haven't found the right solution yet.


Honest question: how much top-tier ML research comes out of IBM Research? I feel like they get what they paid for---and they pay shit from what I hear.


I don't think that top-tier research is necessarily equal to top pay. Otherwise universities would not stand any chance.

That said, I don't think IBM Watson has published much top-tier research.


University work tends to pay in different ways. Lots of opportunity for your own projects, cool toys, small teams that report directly to the owner.


Academia is not the same as industry, so total comp is not comparable. Tenure, more work freedom, social prestige are intangibles that would need to be factored into academic pay.


I worked at TJ Watson research center at the end of the 20th century but not in a research role or on ML.

It’s weird because IBM spends a lot on research (6 billion when I was there) but they are always trying to get marketable things out of said research. They still had chip fabs when I was there so they had people working on chemistry physics and math for chip things. They had chess playing machines (deep blue) and a backgammon one using ai.

They hired a lot of phds who were for the most part very self motivated. Very smart people. I’m not sure if they are still getting the best and brightest but for a lot the IBM letters are a draw.


I'd be very interested in other's opinions, but my observation is that high quality research != good products. There's a major disconnect between the labs and those building/selling the products.


Should have partnered with Google if you actually wanted something that works. IBM had its heyday, but the brain drain of talent and knowledge has already happened and it likely will never recover.


Is that also true of their quantum computing researchers? It does look like they put out some pretty decent research in that area.


The quality level of publications coming out from Google far surpasses IBM both in terms novelty and content in my opinion.

Why not look through the papers each company has released though and decide for yourself? https://ai.google/research/teams/applied-science/quantum-ai/ https://www.ibm.com/blogs/research/category/quantcomp/


The headline, and conclusion of the article, feel incomplete. A complete line would read "IBM Watson Reportedly Recommended 'Unsafe and Incorrect' Cancer Treatments At A Rate X% Higher Than Doctors." Doctors make mistake. Machine-based intelligences will make mistakes too. If you throw out a system that errs in 1 in every 100,000 cases so that you can stick with a system that errs in 1 in every 250 cases, you're not going to get anywhere.


Sounds like they prepared limited and bad training data for it. As usual, garbage in, garbage out.


I work in this space. The breakthrough will most likely come from computational biologists, not an external ML group. Liquid biopsy is close and that's the domain that reeeallly needs ML/AI in additional to strong mechanistic models.


What’s liquid biopsy?


Here's a nice review: https://www.ncbi.nlm.nih.gov/m/pubmed/28233803/

a biopsy from your peripheral plasma, opposed to solid tumor. A big issue in cancer medicine is it's super fucking hard to get any kind of noninvasive measurement. Typically it's done with surgery or with CAT scans, which have extremely low precision.

Research in the last few years has been pointing to the idea that we can detect tumor derived DNA fragments in plasma. The challenge being, healthy dna makes up 99.9% of it, this means that currently the methods only work when the tumor burden (metastatic for instance) is high. Not ideal for early detection or treatment monitoring. But if the computational tools improve (rn none assume such a low mixture), you could see sensitivity and precision increase to the point that it's useful for predicting therapeutic outcomes and for early detection.


remember there was an article about how Watson was recommending treatments that doctors were never considering and that they were more effective.

i guess no, they were not even relevant, which is why doctors were not recommending them


Both that story and this one can be true.

It can recommend unconsidered, more effective treatments just as readily as it can recommend unconsidered, dangerous ones — particularly if the system was as extensively trained using MSK's physicians' preferred treatments, instead of (or even additionally to) actual clinical data as The Fine Article suggests.


Watson is medical AI's Theranos. That IBM would actually try and sell something like this doesn't bode well for their future.


Out of something from science fiction.. would an "issue" like this ever be the type of problem where the AI can "understand" the cancer better than we can and see where a certain treatment might be more beneficial than we've thought of?

Or is this just a plain old failure?


"Watson" is a marketing term for a bunch of slap-dash and half-assed analytics products that don't work.

It was never more than a cool Jeopardy robot and some sweet Bob Dylan ads


The example in the article given was that Watson recommended giving a blood thinning agent to someone with "severe bleeding", so I think, at least in this example, that there's a long way to go until Watson is ready for diagnosing without a doctor's consent and review.


Not to mention that even if the AI technology here was decent (which it doesn't appear to be) it suffered greatly from a dearth of samples. Machine learning requires big data, and cancer treatment data is, in many instances, just not big enough. An AI solution would need millions of well-labeled, perfectly formatted cases to learn from; there may be thousands of those for a particular subtype of cancer, but there's not millions.


Not if you understand mechanism. In reality the breakthrough will be biologists applying ML/AI, not the other way around. Domain and mechanistic knowledge are king.


>Out of something from science fiction.. would an "issue" like this ever be the type of problem where the AI can "understand" the cancer better

Ever? I don't see why not. In our lifetime? Probably not and especially not with current technology.


Anyone who has done their research would have known IBM Watson is a scam. IBM has become a complete different company the moment they discovered that they could make far more money just by providing extended consulting service to many governments.


Really makes me wonder if AI will every be able to model the heuristics which domain experts uses on a daily basis.


To Err is Human. Thus IBM Watson with all the Random Clock work has to work on Human Error.


Watson was probably right but it didn't fall in line with BigPharma's $$$ plans.



The new url is paywalled though...


Oops. Changed back. Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: