How a Data Scientist Thinks about Risk Stratification
“Risk”. It’s a word we hear every day in the healthcare industry. We want to avoid risk, we want to predict risk, we want to find patients that are high risk. We want to risk stratify populations (organize people into a set number of mutually exclusive tiers of increasing risk).
My recent blog posts have centered around the concept of Population Health. Clearly the idea of risk is particularly important in this world, where the goals are to keep well individuals healthy, avoid poor outcomes for those that are already sick, and minimize costs. Understanding, assessing, and predicting risk are all essential to this effort.
But what is “risk”? If you asked a physician, an insurer, and an average Joe on the street to describe “high risk” from a healthcare perspective, you would likely get very different answers. A physician might describe someone with high risk of developing a disease, high risk of a serious disease complication, or high risk of mortality. An insurer might describe someone at risk for a high amount of spending in the immediate future. The average Joe might describe someone at high risk for impairment/inability to function in daily life. Understanding the context-appropriate definition of risk is the first step toward building analytics to support risk analysis. And the appropriate definition is always dependent on the real world application.
Even when the application is understood, there is still considerable work to be done to identify the appropriate data and characteristics that lead to poor outcomes. Consider a discharge nurse who sees hundreds of patients a month as they prepare to depart from the hospital. Most knowledgeable hospital staff are aware that the most experienced discharge nurses will be able to tell you, with a high degree of accuracy, who is likely to show up back in the hospital in the near future. Multiple studies have tried to quantify the drivers of this type of “nurse’s intuition”. How do they know?
In 1964, United States Supreme Court Justice Potter Stewart used the now infamous phrase: “I know it when I see it” to describe his threshold test for obscenity in the case of Jacobellis v. Ohio. A discharge nurse might say much the same thing when asked to describe a patient at high risk for readmission. I know it when I see it. Characteristics such as illness burden, past behavior, social situation, self-care ability, home support, and others are often referred to, but the reality is that it’s the entire picture, and often a bit of an ambiguous “gut feeling” thrown in for good measure.
So how does Data Science fit into this picture? Our challenge as Data Scientists is to turn “I know it when I see it” into a measurable mathematical formula, so that everyone “knows it” even without seeing it in person. It involves extensive experimentation with different data sources, variables, and modeling techniques, as well as building in the capability for models to evolve and learn over time. At Truven Health Analytics, my team is exclusively focused on developing and testing new models, using various kinds of data that are readily available to us. In future blogs, we’ll describe some of these models including risk of developing diabetes and risk of admission. Truven Health, an IBM Company, now is positioned to move deeply into this space and develop these types of risk models by bringing together traditionally disparate data sources, clinical knowledge, and cutting edge modeling techniques.
Senior Director, Advanced Analytics