COVER STORY

From Code to Cure

Armed with enormous amounts of clinical data, teams of computer scientists, statisticians, and physicians are rewriting the rules of medical research.

by David J. Craig Published Spring 2018
  • Comments (0)
  • Email
  • ShareThis
  • Print
  • Download
  • Text Size A A A

Since coming to Columbia straight out of Stanford’s graduate school in 2012, Tatonetti has surprised colleagues time and time again with his ability to glean answers to big, bold questions by trawling collections of digital health records. Last year, he published a comprehensive analysis of how a person’s birth date influences his or her lifetime risk for developing many common health problems. That study, which is based on Tatonetti’s analysis of the medical records of ten million people in the United States, South Korea, and Taiwan, will likely be picked over for years by public-health researchers eager to understand how seasonal environmental conditions — like levels of sunlight, mold, or air pollution — affect pregnant women and their unborn children. And later this year, Tatonetti and several colleagues will publish a groundbreaking analysis of CUIMC medical records that reveals the relative heritability of 467 medical conditions — from anxiety to celiac disease to cystic fibrosis — for which no reliable estimates of heritability have ever been available. The key to the mystery had been staring at researchers for years, right on the hospital intake forms that every patient fills out: the familial relationships of the patients’ emergency contacts.

“We mapped out the relationships of millions of patients and then looked to see the degree to which medical conditions run in families,” says Tatonetti, noting that the results could help researchers identify genes that contribute to disease.

A thirty-five-year-old with a neatly trimmed beard and tattoos on his forearms, Tatonetti is part of a new wave of tech-savvy medical researchers who have largely bypassed traditional investigative approaches, such as observing patients firsthand in clinical studies, in favor of sifting through piles of existing medical data in search of scientific gold. The potential for this kind of research, its proponents say, has grown dramatically in recent years as the health-care industry has fully embraced digital record-keeping: whereas ten years ago the majority of US health-care institutions still relied on paper files to track their patients’ medical care, today only a small percentage of them do.

“The shift toward electronic record-keeping has just totally blown open the possibilities for what you can do as a medical researcher,” says Tatonetti. “A few years ago, if I’d told epidemiologists that I was planning to investigate how a person’s birth month relates to her health, they would have laughed me out of the room.” 

Tatonetti came to Columbia, he says, because CUIMC was one of the first medical centers to adopt electronic record-keeping and therefore possesses one of the richest patient databases in the world. Today it contains tens of millions of hospital intake forms, lab results, X-ray reports, prescription orders, immunization records, echocardiograms, vital signs, doctors’ and nurses’ notes, and discharge summaries. Faculty in Columbia’s Department of Biomedical Informatics have pioneered innovative ways of using such data — both to improve patient care and to advance scientific knowledge. On the clinical side, they have developed artificial-intelligence systems that can analyze a patient’s entire medical history within seconds and then alert a CUIMC physician if, for instance, the patient is due for an immunization, is allergic to a medication that he or she is about to be prescribed, or is showing early signs of difficult-to-diagnose conditions like chronic kidney disease. To support new kinds of research, they have created special database-management tools that enable CUIMC officials to share patient data with researchers at Columbia and beyond in ways that protect the patients’ privacy.

“A big priority within the research community right now is figuring out how scientists from different medical centers can pool our data, so that we can all conduct more powerful studies,” says George Hripcsak, the chair of the biomedical-informatics department. He says that CUIMC is at the forefront of efforts to meet this challenge. “We’ve organized a number of national and international consortiums that expand scientists’ access to medical data while at the same time protecting patient privacy.”

“The shift toward electronic record-keeping has just totally blown open the possibilities for what you can do as a medical researcher.”

The most ambitious of these initiatives, the Observational Health Data Sciences and Informatics program (OHDSI), has created a data-sharing network that enables researchers at academic institutions in twenty-five countries to study the medical records of some four hundred million people, drawn from eighty health-care organizations around the world. Researchers participating in the network, of which CUIMC is the coordinating center, are now mining the records for insights into any number of topics: racial disparities in health-care access, country-by-country differences in how physicians treat common diseases, and problems that arise when children are prescribed adult medications, to name a few. Hripcsak himself is using the archive to assemble what will be a first-of-its-kind catalog revealing the rates at which people who take any of thousands of prescription drugs experience side effects. He says that physicians currently have no way of knowing how frequently many drug side effects occur, because the clinical trials conducted by pharmaceutical companies prior to releasing new drugs — which remain a primary source of scientific information about drug safety today — are too small to accurately assess their prevalence. But Hripcsak believes that by documenting all the health problems that millions of people have experienced shortly after starting on prescription drugs, and then using a number of analytic tricks to weed out incidental correlations in the data, he will be able to provide solid estimates for the prevalence of many drug side effects for the first time.

“Does a particular medication carry a 20 percent chance of causing a seizure or a 0.2 percent chance? That difference might determine whether or not you prescribe it to somebody,” he says. “But today, physicians are often in the dark when trying to make these kinds of judgment calls. They’ll read the list of potential side effects on a drug’s label but have little idea what real risk they pose.”

In addition to containing enormous amounts of information of value to physicians and patients, the new catalog could also be a boon for researchers.

“One of the things I’ll be using the catalog for is to spot more dangerous drug combinations,” says Tatonetti. “Knowing the rates at which certain side effects occur will provide us clues as to which pairs of drugs — among the thousands of pairs that may at first glance appear to be troublesome — are the most important to investigate.”

None of this is to say that data mining is going to replace traditional forms of medical research. Both Hripcsak and Tatonetti acknowledge, for example, that the only way to evaluate the safety of new drugs is to see how they work on small numbers of people in closely monitored clinical trials. But they predict that as the insights of big-data analytics are gradually integrated into routine medical practice, with data scientists tapping into the rivers of digital information flowing out of doctors’ offices and sharing their insights with practitioners in real time, a fundamentally different kind of healthcare system will emerge.

“This will create what data scientists like to call a ‘learning health system,’ where medical treatments and procedures can be continuously monitored and tweaked, in accordance with how they’re performing,” says Hripcsak. “Eventually, we’ll also have massive quantities of data coming in from mobile monitoring devices, like smart watches that record your vital signs. By analyzing that data, we could enable a physician to provide you individually tailored medical advice without you even stepping into his or her office.”

  • Email
  • ShareThis
  • Print
  • Recommend (5)
Log in with your UNI to post a comment

The best stories wherever you go on the Columbia Magazine App

Maybe next time