Artificial intelligence holds great promise for healthcare, and it is already being put to use by many forward-looking hospitals and health systems.
One challenge for healthcare CIOs and clinical users of AI-powered health technologies is the biases that may pop up in algorithms. These biases, such as algorithms that improperly skew results because of race, can compromise the ultimate work of AI – and clinicians.
We spoke recently with Dr. Sanjiv M. Narayan, co-director of the Stanford Arrhythmia Center, director of its Atrial Fibrillation Program and professor of medicine at Stanford University School of Medicine. He offered his perspective on how biases arise in AI – and what healthcare organizations can do to prevent them.
Q. How do biases make their way into artificial intelligence?
A. There is an increasing focus on bias in artificial intelligence, and while there is no cause for panic yet, some concern is reasonable. AI is embedded in systems from wall to wall these days, and if these systems are biased, then so are their results. This may benefit us, harm us or benefit someone else.
A major issue is that bias is rarely obvious. Think about your results from a search engine “tuned to your preferences.” We already are conditioned to expect that this will differ from somebody else’s search on the same topic using the same search engine. But, are these searches really tuned to our preferences, or to someone else’s preferences, such as a vendor? The same applies across all systems.
Bias in AI occurs when results cannot be generalized widely. We often think of bias resulting from preferences or exclusions in training data, but bias can also be introduced by how data is obtained, how algorithms are designed, and how AI outputs are interpreted.
How does bias get into AI? Everybody thinks of bias in training data – the data used to develop an algorithm before it is tested on the wide world. But this is only the tip of the iceberg.
All data is biased. This is not paranoia. This is fact. Bias may not be deliberate. It may be unavoidable because of the way that measurements are made – but it means that we must estimate the error (confidence intervals) around each data point to interpret the results.
Think of heights in the U.S. If you collected them and put them all onto a chart, you’d find overlapping groups (or clusters) of taller and shorter people, broadly indicating adults and children, and those in between. However, who was surveyed to get the heights? Was this done during the weekdays or on weekends, when different groups of people are working?
If heights were measured at medical offices, people without health insurance may be left out. If done in the suburbs, you’ll get a different group of people compared to those in the countryside or those in cities. How large was the sample?
Bias in training data is the bias that everybody thinks about. AI is trained to learn patterns in data. If a particular dataset has bias, then AI – being a good learner – will learn that too.
A now classic example is Amazon. Some years ago, Amazon introduced a new AI-based algorithm to screen and recruit new employees. The company was disappointed when this new process did nothing to help diversity, equity and inclusion.
“All data is biased. This is not paranoia. This is fact.”
Dr. Sanjiv M. Narayan, Stanford University School of Medicine
When they looked closely, it turned out that that the data used for training came from applications submitted to Amazon primarily from white men over a 10-year period. Using this system, new applicant resumes were downgraded if they contained the terms “women’s” or “women’s colleges.” Amazon stopped using this system.
On another front, AI algorithms are designed to learn patterns in data and match them to an output. There are many AI algorithms, and each has strengths and weaknesses. Deep learning is acknowledged as one of the most powerful today, yet it performs best on large data sets that are well labeled for the precise output desired.
Such labeling is not always available, and so other algorithms are often used to do this labeling automatically. Sometimes, labeling is done not by hand, but by using an algorithm trained for a different, but similar, task. This approach, termed transfer learning, is very powerful. However, it can introduce bias that is not always appreciated.
Other algorithms involve steps called auto-encoders, which process large data into reduced sets of features that are easier to learn. This process of feature extraction, for which many techniques exist, can introduce bias by discarding information that could make the AI smarter during wider use – but that are lost even if the original data was not biased.
There are many other examples where choosing one algorithm over another can modify results from the AI.
Then there is bias in reporting results. Despite its name, AI is typically not “intelligent” in the human sense. AI is a fast, efficient way of classifying data – your smartphone recognizing your face, a medical device recognizing an abnormal pattern on a wearable device or a self-driving car recognizing a dog about to run in front of you.
The internal workings of AI involve mathematical pattern recognition, and at some point all of this math has to be put into a bin of Yes or No. (It’s your face or not, it’s an abnormal or normal heart rhythm, and so on.) This process often requires some fine-tuning. This may be to reduce bias in data collection, in the training set, in the algorithm, or to attempt to broaden the usefulness.
For instance, you may decide to make your self-driving car very cautious, so that if it senses any disturbance at the side of the road it alarms “caution,” even if the internal AI would have not sounded the alarm.
Q. What kind of work are you currently doing with AI?
A. I am a professor and physician at Stanford University. I treat patients with heart conditions, and my lab has for a long time done research into improving therapy in individual patients using AI and computer methods to better understand disease processes and health.
In cardiology, we are fortunate in having many ways to measure the heart that increasingly are available as wearable devices and that can directly guide treatment. This is very exciting, but also introduces challenges. One major issue that is emerging in medicine is AI bias.
Bias in medical AI is a major problem, because making a wrong diagnosis or suggesting [the] wrong therapy could be catastrophic. Each of the types of bias I have described can apply to medicine. Bias in data collection is a critical problem. Typically, we only have access to data from patients we see.
However, what about patients without insurance, or those who only choose to seek medical attention when very sick? How will AI work when they ultimately do present to the emergency room? The AI may have been trained on people who were less sick, younger or of different demographics.
Another interesting example involves wearables, which can tell your pulse by measuring light reflectance from your skin [photoplethysmography]. Some of these algorithms are less accurate in people of color. Companies are working on solutions that address this bias by working on all skin tones.
Other challenges in medical AI include ensuring accuracy of AI systems (validation), ensuring that multiple systems can be compared for accuracy, which ideally would use the same testing data. But this may be proprietary for each specific system – and ensuring that patients have access to their data. The Heart Rhythm Society recently called for this “transparent sharing” of data.
Q. What is one practice for keeping biases out of AI?
A. Understanding the various causes of bias is the first step in the adoption of what is sometimes called effective “algorithmic hygiene.” An essential practice is to ensure as much as possible that training data are representative.
Representative of what? No data set can represent the entire universe of options. Thus, it is important to identify the target application and audience upfront, and then tailor the training data to that target.
A related approach is to train multiple versions of the algorithm, each of which is trained to input a dataset and classify it, then repeat this for all datasets that are available. If the output from classification is the same between models, then the AI models can be combined.
A similar approach is to input the multiple datasets to the AI, and train it to learn all at once. The advantage of this approach is that the AI will learn to reinforce the similarities between input datasets, and yet generalize to each dataset.
As AI systems continue to be used, one tailored design is to update their training dataset so that they are increasingly tailored to their user base. This can introduce unintended consequences. First, as the AI becomes more and more tailored to the user base, this may introduce bias compared to the carefully curated data often used originally for training.
Second, the system may become less accurate over time because the oversight used to ensure AI accuracy may no longer be in place in the real world. A good example of this is the Microsoft ChatBot, which was designed to be a friendly companion but, on release, rapidly learned undesirable language and behaviors, and had to be shut down.
Finally, the AI is no longer the same as the original version, which is an issue for regulation of medical devices as outlined in the Food and Drug Administration guidelines on Software as a Medical Device.
Q. What is another best practice for preventing AI bias?
A. There are multiple approaches to eliminate bias in AI, and none are foolproof. These range from approaches to formulate an application so that it is relatively free of bias, to collecting data in a relatively unbiased way, to designing mathematical algorithms to minimize bias.
The technology of AI is moving inexorably toward greater integration across all aspects of life. As this happens, bias is more likely to occur through the compounding of complex systems but also, paradoxically, less easy to identify and prevent.
It remains to be seen how this field of ethical AI develops and whether quite different approaches are developed for highly regulated fields such as medicine, where transparency and explainable AI are of critical importance, and other endeavors.