The Ghost Files

US historians have long complained about gaps in the National Archives. Can big-data analysis show what kinds of information the government is keeping classified?

by David J. Craig Published Winter 2013-14
  • Comments (0)
  • Email
  • ShareThis
  • Print
  • Download
  • Text Size A A A

Connelly and his colleagues have so far refrained from doing this kind of research while they evaluate its legal and ethical implications. They have formed a steering committee of historians, computer scientists, and national-security experts that will convene in January to help them decide whether to go ahead with it. If they did, Connelly says, they might rig the technology so that when it produces guesses about what lies beneath a redaction, it would exclude names of people and other highly sensitive types of information.

“The last thing we want to do is out the name of a CIA agent,” Connelly says. “Our main goal, even with this kind of research, would be to discover what types of information are getting classified, and why.”

But who is to say what information is safe to disclose? And might historians, by taking it upon themselves to decide this, inadvertently provoke the US government into releasing even less information so that they have fewer clues to work with?

It is conceivable that the US government will tighten its grip on classified information in response to Connelly’s work, according to several Columbia professors. They worry that the Declassification Engine, by demonstrating a capacity for redaction cracking that US intelligence experts have long feared that foreign spies would develop, might strengthen the hand of federal officials who are inclined to keep the lid on information.

“Those who advance a conservative approach to declassification could say, ‘Look, now there’s this small band of academics who are able to break down our redactions; can you imagine what others are capable of?’” says law professor David Pozen. “My concern would be that government officials might now say, “OK, instead of releasing these documents with redactions, we just won’t release them at all.”

Yet these same Columbia experts say that the US government has for years been quietly taking steps to limit the information that it releases, specifically to frustrate any attempts to examine its records with data-mining techniques. One of the best things that could result from the Declassification Engine, they say, is that it will provoke debate about when it is justifiable to limit access to federal records as a way of offsetting this perceived risk. That this public conversation will take place soon seems inevitable. The analytic tools that Connelly and his colleagues are developing embody some of government censors’ worst fears of data mining — fears that, according to these Columbia experts, likely contributed to the enormous backlog of declassified documents that inspired Connelly’s work in the first place. 

Removing clues

Pozen’s own research has shown that the US government began fighting FOIA requests in court more aggressively in the early 2000s to avert the threat of computer-savvy spy craft. He has found that when FOIA cases go to court, Justice Department lawyers have often argued that documents that look innocuous in isolation ought to remain classified, because if they were to be analyzed in conjunction with a lot of other documents, vital secrets could be revealed. A hypothetical example goes like this: a document that references a café is released, and then is analyzed against another one that references a waiter, another a street, another a city, another an unnamed CIA informant, until, finally, a computer generates a list of people who could be that informant.

According to Pozen, this sort of hypothetical is plausible but is often treated by courts as a pretext for deferring to the government. “I don’t think judges carefully weigh the validity of this argument in each case, and they often don’t understand the technology that’s involved,” he says. “On top of this, they’re generally inclined to err on the side of caution whenever national-security concerns get raised. The result is they’ve tended to side with the government whenever they hear this argument.”

Pozen has argued in several papers that judges ought to take more time to consider these cases and push the government harder to justify why FOIA requests ought to be rejected on these grounds. But he says there has been little discussion of the issue among legal scholars or the judiciary so far. “It remains a pretty esoteric topic,” he says. “Anything that drums up some discussion about it will be a benefit to the legal community.”

Robert Jervis, a Columbia political-science professor who for the past ten years has chaired the Central Intelligence Agency’s Historical Review Panel, a role in which he advises the agency on which of its classified materials ought to be prioritized for review and potential release, adds another twist to the story: he says that CIA officials worry that the Declassification Engine, by making available on its website huge numbers of federal documents that are drawn from disparate sources, could enable foreign spies or terrorist groups to conduct more powerful data-mining analyses of the nation’s public record than they could otherwise. Jervis says it is partly to prevent enemies of the United States from data-mining old intelligence reports that the CIA’s main digital repository for declassified documents, CREST, is not accessible on the Internet but only on computer terminals located at the National Archives in College Park — an inconvenience that has long irritated scholars.

The specter of data mining, Jervis says, could also cause some CIA officials to work more slowly while reviewing documents.

“These guys would love to have the budget that’s necessary for reviewing all the documents that are before them carefully and getting them all out on time,” Jervis says. “But they’re not going to do anything that endangers an agent or his informants. So they’re looking at this technology that’s out there now, and they may say to themselves, ‘We’re going to have to work more scrupulously than ever.’” 

  • Email
  • ShareThis
  • Print
  • Recommend (87)
Log in with your UNI to post a comment

The best stories wherever you go on the Columbia Magazine App

Maybe next time