That's okay, though, right? We're not talking about fully automating diagnosis and treatment. We're talking about using a tool to help doctors diagnose and treat more effectively.
i.e. it's more like predictive text input than anything. If it makes it faster for you to diagnose, makes harder to diagnose stuff easier, and recommends the right treatment most of the time then it saves the doctor some time and energy.
Having systems like autopilot in planes can make pilots worse at flying because they don't spend as much time practicing at the controls and monitoring what's happening. Then when something goes wrong, they risk not paying close enough attention to catch it, or not being practiced well enough to correct it.
If you get complacent and just assume the computer knows what it's doing (because it usually does) this can and will end very badly.
The person whose Tesla drove into a concrete barrier at 65 mph earlier this year was perfectly capable of driving a car, but they mistakenly believed that the computer had it under control.
99% Invisible has a pair of podcast episodes on the subject (2015):
Waze might be starting off as an Oracle. but once you tie it into self driving cars, it'll become an agent acting on your behalf.
At a certain tipping point, creates really interesting questions. For example, if you re-direct 100% of traffic to less congested route, you end up creating traffic jams, so then you have to decide who goes on which route... maybe people who pay more get faster routing, or people deemed more important or busier by the algorithm...
Instead of autopilot, think of this as (or should be) more like fly-by-wire systems in every modern fighter jet. The computer reads the pilot's inputs and translates them to the most effective possible control adjustments, making many more changes than a pilot would be able to make even given the controls.
Or maybe the tool should be limited to retrospective evaluation of doctors' decisions. Basically an automated peer review.
I would argue against this one, simply because of the Airbus incident from some time ago where to avoid pushing feedback into the controls, the plane was designed to average out the inputs from the two pilots. The older, more experienced pilot was attempting to maneuver the plane in such a way to correct their problem (I believe it was a stall) and the younger pilot was panicking and attempting the opposite maneuver, the plane averaged it to roughly no course adjustment, and they hit a mountainside.
I'm all for automating driving and other such tasks once the computers are ready, but until we know they're at least most likely ready to do it, I want the ability to turn it off and do it myself.
(And frankly, I want that ability anyway because I genuinely love driving and it makes me sad to think someday I won't be able to do it.)
The fly-by-wire control system ordinarily prevents stalls, but it had disengaged due to an iced over sensor and was operating without stall protection. The plane stalled at 38,000 feet and fell into the ocean.
They should have had plenty of time to correct the stall, but one pilot was pulling back his control stick (the opposite of what they needed to do), and since the two sticks aren't physically linked the other pilot didn't know he was doing that.
It's one of the topics discussed in the podcast that I linked above.
Even if you let the computer run the show wholesale it could work out net positive. Autopilot might cause problems, but it might also prevent twice as many accidents that would have been caused by pilot error.
We're still in pretty early days here, so I don't know if Watson's advice is at a point where that could be true. In the meantime, I do like the idea of having an unaided doctor and the computer program evaluate it independently, then afterward say "Ok, let's see what the computer thought" before coming to any final conclusions. Maybe it'd help avoid any dangerous recommendations. Or maybe people would say "Well the computer has a lot more data than me, it's probably right." I think that depends on whether the differences are errors similar to what a human would make, or if they're the "oh my god how did it even come up with this result" nonsensical error that are easily spotted by human review.
That's one of the implicit reasons why US based airlines in the past used to recruit retired pilots from U.S Air Force. In case instruments fail or contradictory signals are present, pilots with physical skills (kinesthesia, etc) can steer away from dangers.
Pilots on both came from the civilian background. The difference between civilian training and combat training is that the latter is more trained to operate in conditions that involve instruments failure, distress, etc.
Aren't both of those just examples of pilot error, regardless of autopilot?
In the case of the Asiana flight, the captain chose a visual approach when he could have let the autopilot land it. And from there it's simple pilot error.
In the case of the air France flight, one of the pilots chose to ignore a stall warning and pull up instead of pushing down to gain speed. Sadly, in the case of no pilot input the plane actually would have ended up avoiding the stall on its own.
I of course agree that military pilots with real experience will likely perform better than civilian pilots, I just doubt autopilot had any part to play in either of those accidents, like the comment you replied to implies.
There's also been a lot of discussion and research about how to best interact with autopilot, and when the best times to turn it off and operate the airplane manually are.
There is a talk[1] about this that is really popular among pilots, and I honestly think it seems increasingly relevant to the rest of the world as more and more things become computer automated in some fashion.
If it is poorly trained and recommends incorrect treatment, then no it is a poor tool. If it recommends only in clear cases and then gives the doctor tools to narrow down treatment in more obscure cases that would be good. But current machine learning tech and more importantly marketing does not actually provide that capability. Because the ML software does not work like human decision making. So it cannot really know when it’s making a poor guess and any insight into its decision making process would be useless to the doctors involved. This tech is way too immature for the uses it’s being put to. But the tech companies have started to believe their own hype.
It increases the probability of (harmful) medical error. The "unsafe" is a big, big gotcha for regulatory approval. As someone correctly pointed out, this is relative to human error rates (can't view article because of registration).
One way I tend to view ML is that, when its wrong, its catastrophically wrong. Because it doesn't actually understand anything its doing, and simply looking at a probability model and picking based off it, you end up with the issue that a few carefully selected pixel changes in a picture of a cat takes the model from a cat to an ostrich.
The model does not see a significant difference between the cat/ostrich, or cancer/cold, whereas we do; this implies that, when the model is wrong, it is likely to not just provide an incorrect treatment, but a catastrophically incorrect treatment.
Where the human sees a cat-like creature, and if not guess a cat, then something similar to a cat (4 legged, furry, etc), the ML model is willing to jump anywhere, in the worst case.
So its not just rate of misdiagnosis, but by how much as well.
The some other other questions: how much does it cost, compared to training a human? Does it actually save any time / resources if it's outputs must always be scrutinized?
That may be a question someone outside of an FDA approval process might asked. But if you have been, you understand the importance of "safety and effectiveness", and you will note there is nothing there about "cost effective" or "marketable".
It depends on how doctors view IBM's magic black box. If they grow to trust and depend on it, that could become a real problem.
People have a tendency to defer, and have a bias towards deferring to machines that seemingly behave as accurately as a calculator would with basic arithmetic.
A true crisis can arise if doctors can shift liability by claiming to have just followed Watson's results.
If you genuinely don't know how to spell something (children, ESL) then auto-correct might cause you to accept a correction to the wrong homophone and fool you into thinking it helped you.
If the doctor doesn't know how to quickly and cheaply verify the machine's guess, they'll just rubber stamp its recommendations.
Even worse, depending on how well the IBM sales team pawns this hokum off as a legitimate diagnostic tool, the next logical step will be for malpractice insurance companies forcing physicians to consult with Watson before making any decisions.
As for your analogy, you got still tragically flawed predictive text technology for free. Nobody bought iPhones for their crappy predictive text capabilities. Imagine the horror if you actually had to pay for predictive text. As it is, it's a nice to have freebie that came along with your phone, which works.... sometimes.
To me the key is how you use the system. I like the idea of using this not to help but second guess the doctor. The doctor has to make their own diagnosis. If Watson differs a arbiter has to look at it. This obviously optimizes for quality and not cost.
I don't know about that. Most of the new construction in Connecticut seems to be building medical offices for Hartford Healthcare or other medical groups.
If people require 100% accuracy out of the gate from any software then we might as well give up now. It's not as if doctors get it right all the time, if you know anyone with medical issues they've likely heard different opinions from different doctors themselves. Anything more complicated then a broken bone seems to get mixed responses about what should be done, or even what the cause of the complication is in the first place.
A technology in this stage of immaturity is fine to trial but IBM should be paying the hospitals not the other way round. They are doing experiments with peoples lives - again just like Theranos.
i.e. it's more like predictive text input than anything. If it makes it faster for you to diagnose, makes harder to diagnose stuff easier, and recommends the right treatment most of the time then it saves the doctor some time and energy.
The only question is whether it does that.