The Ghost Files

US historians have long complained about gaps in the National Archives. Can big-data analysis show what kinds of information the government is keeping classified?

by David J. Craig Published Winter 2013-14
  • Comments (0)
  • Email
  • ShareThis
  • Print
  • Download
  • Text Size A A A

Costs of complacency

On a recent Friday afternoon, Connelly sat behind his desk in Fayerweather Hall, quietly observing a group of graduate students who had gathered to work in a lounge outside of his office. Some were historians, others computer scientists. It was impossible to tell who was who, based on their conversations, which flowed with references to Nixon, Kissinger, Saigon, mean probabilities, gap-time distributions, and applets.

“Twenty years from now, when historians are writing the story of our time, their archive is going to include Google and Facebook,” Connelly remarked. “They’re going to need to understand data-mining techniques to do that work. I’m trying to develop those tools.”

It had been a busy day. Connelly was preparing for talks with representatives of several federal agencies, including the State Department and the National Security Agency. He planned to address any concerns they had about his research. He would also offer to demonstrate his team’s analytic techniques in case the government had any interest in using them. Connelly had come away from previous conversations with federal officials convinced that the same tools his team is using to analyze the public record could help the government better manage its secrets. The government, too, is sifting through enormous numbers of documents and trying to make categorical assessments about their contents. In the government’s case, this means determining which of the millions of classified documents that come up for review every year ought to be released, with or without redactions, and which ought to remain locked up in drawers. Federal employees do this work by reading documents one at a time, page by page, using black felt pens to ink over sensitive passages. Connelly said that many officials he has spoken to believe this needs to change soon; in order to process the tidal wave of electronic records that are coming due for review in the next few years, the government will need to implement its own data-mining system. One strategy that Connelly and many others have advocated to the government, he says, would involve screening large numbers of documents for language that is associated with sensitive topics. Human censors could then inspect these documents carefully, while funneling the others straight into the public domain.

“We need a system that protects those secrets that are truly sensitive and releases the rest. Right now, neither of these goals is being accomplished.” —Matthew Connelly

“This would be a risk-management approach, and it would start from the position that it’s impossible to catch everything, and that it’s a mistake to try,” Connelly says. “Time and time again government boards have proposed using technology in this way to make the declassification process more efficient.”

That the US government would even consider releasing large numbers of sensitive documents, sight unseen, may sound surprising. Yet the current system may already be collapsing under its own weight. Connelly, echoing an argument that many experts on US secrecy have made, says that the rash of illegal leaks that the US government has experienced in recent years is partly a manifestation of a cynicism that has taken root about the government’s perceived lack of transparency. When the government classifies too much information for too long, he says, the irony is that none of it is safe.

“What we need is a system that protects those secrets that are truly sensitive and releases the rest,” he says. “Right now, neither of these goals is being accomplished. Technology has to be part of the solution.”

Exactly how the Declassification Engine team could help the US government is unclear. Today, it is widely assumed by academics who study secrecy that the government must be pursuing its own data-mining research to speed the declassification process. It is also assumed that if this kind of research is taking place it is poorly funded, as most work related to declassification is perceived to be. It is hard to know for sure, though.

Why is that? Connelly pauses, and one can almost hear a drum roll. “The research is all classified.”

To visit Connolly's Declassification Engine:

Matthew Connelly ’90CC is a professor of history at Columbia. He codirects the Declassification Engine and is the author of Fatal Misconception: The Struggle to Control World Population.

  • Email
  • ShareThis
  • Print
  • Recommend (82)
Log in with your UNI to post a comment

The best stories wherever you go on the Columbia Magazine App

Maybe next time