From Code to Cure

Armed with enormous amounts of clinical data, teams of computer scientists, statisticians, and physicians are rewriting the rules of medical research.

by David J. Craig Published Spring 2018
  • Comments (0)
  • Email
  • ShareThis
  • Print
  • Download
  • Text Size A A A

Portraits by Jörg Meyer

The deluge is upon us.

We are living in the age of big data, and with every link we click, every message we send, and every movement we make, we generate torrents of information. 

In the past two years, the world has produced more than 90 percent of all the digital data that has ever been created. New technologies churn out an estimated 2.5 quintillion bytes per day. Data pours in from social media and cell phones, weather satellites and space telescopes, digital cameras and video feeds, medical records and library collections. Technologies monitor the number of steps we walk each day, the structural integrity of dams and bridges, and the barely perceptible tremors that indicate a person is developing Parkinson’s disease. These are the building blocks of our knowledge economy.

This tsunami of information is also providing opportunities to study the world in entirely new ways. Nowhere is this more evident than in medicine. Today, breakthroughs are being made not just in labs but on laptops, as biomedical researchers trained in mathematics, computer science, and statistics use powerful new analytic tools to glean insights from enormous data sets and help doctors prevent, treat, and cure disease.

“The medical field is going through a major period of transformation, and many of the changes are driven by information technology,” says George Hripcsak ’85PS,’00PH, a physician who chairs the Department of Biomedical Informatics at Columbia University Irving Medical Center (CUIMC). “Diagnostic techniques like genomic screening and high-resolution imaging are generating more raw data than we’ve ever handled before. At the same time, researchers are increasingly looking outside the confines of their own laboratories and clinics for data, because they recognize that by analyzing the huge streams of digital information now available online they can make discoveries that were never possible before.” 

To date, the most dramatic achievements of data science in medicine have been in the realm of genomics. Physicians at many leading health-care organizations and medical schools, including Columbia’s, now routinely analyze the DNA of their patients, parsing the millions of chemical units that make each one of us unique, in order to more precisely diagnose illness. This has enabled physicians to craft personalized treatments for many forms of cancer, as well as for certain cardiovascular, neurological, pulmonary, and ophthalmological disorders.

But the use of data science in medicine extends far beyond genomics. Today, researchers at CUIMC are using the power of data to identify previously unrecognized drug side effects; they are predicting outbreaks of infectious diseases by monitoring Google search queries and social-media activity; and they are developing novel cancer treatments by using predictive analytics to model the internal dynamics of diseased cells. These ambitious projects, many of which involve large interdisciplinary teams of computer scientists, engineers, statisticians, and physicians, represent the future of academic research.

“Our ability to collect, analyze, and interpret more and larger data sets is infusing new ideas and energy into virtually every academic field today — from data-rich disciplines like astronomy, biology, and climate science to increasingly data-driven professions like law, business, and journalism,” says Jeannette M. Wing, director of Columbia’s Data Science Institute, which supports collaborations between data scientists and researchers in other fields across the University. “Since data is everywhere, data science is applicable everywhere. What’s happening at the medical campus right now represents a kind of collaboration we’re bringing to every corner of Columbia.”


For CUIMC researcher Nicholas Tatonetti, any sizable collection of digital medical records represents a treasure trove of potential discoveries. 

Consider, for example, what the young computer scientist has been able to accomplish in recent years by mining an FDA database of prescription-drug side effects. The archive, which contains millions of reports of adverse drug reactions that physicians have observed in their patients, is continuously monitored by government scientists whose job it is to spot problems and pull drugs off the market if necessary. And yet by drilling down into the database with his own analytic tools, Tatonetti has found evidence that dozens of commonly prescribed drugs may interact in dangerous ways that have previously gone unnoticed. Among his most alarming findings: the antibiotic ceftriaxone, when taken with the heartburn medication lansoprazole, can trigger a type of heart arrhythmia called QT prolongation, which is known to cause otherwise healthy people to suddenly drop dead.

“What’s surprising is that neither of those medications had ever been linked to heart problems on its own,” says Tatonetti, an assistant professor of biomedical informatics, systems biology, and medicine. “That’s part of the reason nobody had spotted the risk.” 

Tatonetti made the discovery by employing a novel deductive technique: he searched the FDA database for instances of people developing heart problems after taking drugs that aren’t known to cause cardiovascular issues but that share numerous other side effects with medications that are. Then, to assess the strength of the correlations he found, he designed a set of algorithms inspired by an analytic approach called signal-detection theory, which was developed by the US Air Force in the 1940s to help radar operators determine whether objects picked up by their antennas were actually airplanes. These tools enabled Tatonetti to separate the signal from the noise in the FDA archive, accomplishing something that was, to a data scientist, akin to detecting a pea beneath a pile of mattresses.

But Tatonetti didn’t stop there. He then dove into CUIMC’s own patient archive, which contains clinical data on five million patients dating back to 1989. This confirmed that people who had been prescribed ceftriaxone and lansoprazole at the same time often developed irregular heartbeats. Finally, Tatonetti teamed up with Robert Kass, a CUIMC pharmacologist, to undertake a series of experiments to see exactly how ceftriaxone and lansoprazole affect the heart. The results were dramatic: in combination, the drugs were shown to block an electric pathway inside heart cells that controls their pulsing.

“The scope of data collection that went into these studies, and the level of analytic sophistication that was required along the way, is like nothing else I’ve ever seen in the area of drug safety,” says Raymond Woosley, a national expert on QT prolongation who assisted in Tatonetti’s investigation. 

  • Email
  • ShareThis
  • Print
  • Recommend (14)
Log in with your UNI to post a comment

The best stories wherever you go on the Columbia Magazine App

Maybe next time