The Ghost Files
US historians have long complained about gaps in the National Archives. Can big-data analysis show what kinds of information the government is keeping classified?by David J. Craig Published Winter 2013-14
Matthew Connelly had an idea for a book.
The Pentagon, he realized, was one of the first organizations ever to undertake a large, scientifically based effort to predict the future. During the Cold War, it had invested billions of dollars into the development of computer-based war games, statistical models, and elaborate role-playing exercises in hopes of anticipating Soviet military activity. How successful had the Pentagon’s program been at predicting the Soviets’ next moves? And how had the Pentagon’s predictions been skewed by the group dynamics of the generals, intelligence analysts, diplomats, and statisticians involved? Did they tend to push more cautious or alarmist conclusions? Did they favor predictions that were too forward-looking to be proved wrong while they were still on the job? These were questions that had never before been thoroughly investigated.
“I thought this would provide insights into how all sorts of predictions get made today, whether about climate change, disease outbreaks, or rogue states acquiring nuclear weapons,” says Connelly ’90CC, who is a professor of history at Columbia. “How seriously should we take these predictions? And what’s the best way to gauge their relative validity? The US government has been in the business of forecasting the future for fifty years, so it seemed logical to evaluate its record.”
He didn’t get very far. In the spring of 2009, a few months after starting his research, Connelly decided it would be impossible to tell the story that he envisioned. Too little information was available. Connelly had spent long hours researching the Pentagon’s forecasting efforts at the National Archives in College Park, Maryland, and at other government archives around the country. He had found a decent amount of material related to the program’s beginnings in the 1960s, but few records from later decades.
“The Pentagon was certainly making forecasts throughout the course of the Cold War,” says Connelly, the author of the 2008 book Fatal Misconception: The Struggle to Control World Population. “So it was pretty obvious that the records from the 1970s onward were incomplete.”
What Connelly experienced was something that researchers had been complaining about for years: that the National Archives’ contemporary holdings had more holes than a donut factory. The problem was that the US government was not releasing classified documents on schedule. Although federal policy requires that most documents labeled “Confidential,” “Secret,” or “Top Secret” be released within thirty years, by the time George W. Bush left office some four hundred million pages of classified material had been sitting in filing cabinets and on computer hard drives for longer than that. This was evident from the National Archives’ own annual reports.
To many people who study the declassification process, this was a startling abrogation of the government’s responsibility to act as its own archivist. The only classified documents that were supposed to be kept hidden for more than three decades were those whose disclosure would pose a serious risk to national security, such as by revealing details of an ongoing military or intelligence operation. “Very few of those four hundred million pages could possibly have met the standard for remaining secret that long,” says Steven Aftergood, a transparency advocate who directs the Federation of American Scientists’ Project on Government Secrecy. “This was very troubling. The government’s prerogative to classify sensitive materials is supposed to be a temporary refuge from public oversight, not a permanent shield.”
Connelly, when confronted with the gaps he saw in the National Archives, did what he says most scholars do: he muddled through. After reading the documents that were available to him, he cobbled together the best history he could, soon publishing a paper about the power struggles among the CIA, the FBI, and the State Department over whose organization got to issue the authoritative interpretations of the military forecasts made early in the Cold War.
But afterward, Connelly couldn’t put the experience out of his mind. He wanted to know how long it would take the government to release those records. He also wondered: what other stories were hiding in those millions of backlogged documents? Other historians were asking similar questions, but Connelly grew angrier than most. The way he saw it, the government was not just standing in the way of new books being written; it was delaying a revolution in historical scholarship. Connelly was among a small but growing number of historians who believed that the future of his field was in using computers to analyze huge volumes of documents. For years, he had been going into archives with a digital camera and taking photographs of paper records. He would then turn those images into text files and feed them into software that in the aggregate could show him, for instance, where the paths of certain people, institutions, and companies had overlapped at different points in history. He was excited about the prospect of using similar techniques to analyze US government records from the digital era. A lot of sensitive electronic records should have already been declassified, since some federal agencies had embraced digital communications and record-keeping as early as the 1970s.