From Code to Cure

Armed with enormous amounts of clinical data, teams of computer scientists, statisticians, and physicians are rewriting the rules of medical research.

by David J. Craig Published Spring 2018
  • Comments (0)
  • Email
  • ShareThis
  • Print
  • Download
  • Text Size A A A


Every year, in the fall or winter, a wave of influenza hits the United States. And every year, health officials struggle to respond, because they don’t know when the flu will strike or what parts of the country will be hardest hit. In a typical flu season, tens of thousands of Americans are killed by the virus, but if the timing and severity of outbreaks could be anticipated, then health officials could respond more effectively and save lives.

Jeffrey Shaman ’03GSAS, an associate professor of environmental health sciences at Columbia’s Mailman School of Public Health, has found a way to predict flu outbreaks using big-data analysis. Originally trained as a climate scientist, Shaman has for the past several years been developing computer systems that can anticipate the timing and magnitude of flu epidemics by analyzing many different types of data, some of which pertain to actual incidences of influenza and others to conditions in which the virus generally likes to spread. A typical forecast produced by his team might declare, for example, that there is a 60 percent chance of the city’s flu season peaking in intensity in five weeks.

“That can give health-care workers more time to prepare,” says Shaman, whose team currently publishes weekly flu forecasts for eighty-one US cities and all fifty states on its Columbia website. “They can stock up on medications like Tamiflu, assign more staff to emergency rooms, and launch public-awareness campaigns to maximize their impact.”

Predicting flu outbreaks has long been a dream of public-health researchers, but until recently scientists knew too little about how influenza spreads. Even the most obvious feature of influenza’s global migration cycle — that it emerges in temperate regions in both the Northern and Southern Hemispheres during cold months — had been difficult to explain. 

Shaman achieved a major breakthrough in this area when, in 2008, he discovered that the flu virus is adept at spreading in conditions of low humidity, such as those that prevail in North America during the winter. “No one’s sure why this is, but there are a number of theories that attempt to explain why the flu virus, when expelled from a host as tiny airborne droplets, would be sensitive to ambient humidity,” he says. “Some scientists have speculated that when it’s less humid, chemical changes occur in the droplets that may protect flu viral particles trapped inside and make them more likely to infect people who inhale them.”

Shaman, who studied hydrology and atmospheric sciences for many years before turning his attention to influenza, made this discovery by reanalyzing data that a group of Mount Sinai Hospital virologists had collected in a series of lab experiments that assessed the impact of humidity and temperature on the flu’s transmissibility. The virologists had concluded that these factors had only a modest impact on flu transmission; Shaman, who as an environmental scientist was accustomed to dealing with such data sets, showed that humidity was, in fact, a very important factor.

“Whereas the original authors had looked at the effects of relative humidity, or the amount of water vapor in the air as a percentage of what it can hold at a given temperature, my team looked at the effects of absolute humidity, which is a more straightforward, mass-based measure, and we found that its effects were pronounced,” he says.

Armed with this insight, Shaman and his colleagues began work on a flu forecasting system that was one of the first of its type. In order to train their computer to predict future epidemics, they first downloaded and studied information about every case of influenza reported to the US Centers for Disease Control and Prevention (CDC) since 2003, along with detailed climate data covering the same period. The researchers then developed a computer model capable of making probabilistic predictions based on a steady stream of flu data it would receive from a number of disease-monitoring organizations, including the CDC, and climate data. They also taught the system to incorporate data that Google had just begun releasing daily on the locations and numbers of people searching for flu-related keywords.

“The Google data stream was vital because it gave us nearly instantaneous knowledge about what was happening on the ground,” says Sasi Kandula, a Columbia computer scientist who has contributed to the project. “Traditional epidemiological data, which consists of doctors’ reports of flu cases, is typically a week or two old by the time an organization like the CDC releases it.”

In 2012, after nearly four years spent developing their system, Shaman and his colleagues began releasing real-time flu predictions. The next year, CDC officials evaluated the Columbia team’s predictions along with those produced by five other research groups, and they declared the Columbia team’s the most reliable.

Shaman’s team studied information about every case of influenza reported to the CDC since 2003, along with climate data from the same period.

Since then, Shaman and his colleagues have been refining their models. By studying the pace at which influenza spreads through populations of varying densities and cities with different types of infrastructure, for example, they’ve improved the geographic resolution of their predictions to the point where, last winter, they developed a new forecasting system able to specify where in large cities the flu would hit first, down to the level of individual neighborhoods.

At the same time, the researchers have taken their work to the international stage, collaborating with scientists in Hong Kong and several other cities in Southeast Asia to build flu-prediction systems designed specifically for that region. “Forecasting flu outbreaks in this part of the world is important, since new and dangerous strains of the virus often emerge there,” says Wan Yang, a Columbia epidemiologist, environmental engineer, and computer scientist who is working on the project.

In the US, meanwhile, Shaman’s team is attempting to plug some major gaps in our knowledge of how influenza spreads from person to person. One possibility that has long kept epidemiologists awake at night, Shaman says, is that some people carrying the flu virus may not develop symptoms and therefore go about their days blithely infecting others. The winter before last, Shaman and his colleagues, as part of a federally funded study, began collecting nasal swabs from large numbers of people in schools, daycare centers, and other public places in New York City.

“We’re on the lookout for people who aren’t visibly sick, yet are shedding the virus,” says Shaman. 

He says that if significant numbers of asymptomatic people are found to be contagious, this might prompt city health officials to proactively screen people for influenza. No matter what he and his colleagues discover through swab sampling, Shaman says, the study will move them one step closer to their ultimate goal, which is gaining a comprehensive understanding of how influenza moves through populations.

“Right now, flu forecasting is probably at the point where weather forecasting was fifty years ago,” says Shaman, who notes that his forecasts are used only informally by health officials. “But as we develop better, more sophisticated influenza surveillance, and as we’re better able to assimilate all the available data, that situation is going to change very quickly.”

  • Email
  • ShareThis
  • Print
  • Recommend (14)
Log in with your UNI to post a comment

The best stories wherever you go on the Columbia Magazine App

Maybe next time