In the 100 days since President Trump took office, concerned Americans have downloaded over 2 million government datasets. Their goal? To back up information they believe is in danger of going dark: climate science research, discriminatory housing reports, gun violence statistics. But public data preservation isn’t just a job for citizens working in university libraries and on digital archiving Githubs. Now, Washington is getting in on the action.
On Thursday, Senators Gary Peters and Cory Gardner introduced a bipartisan bill that would make it much, much harder for any administration to disappear public data. If passed, the Preserving Government Data Act of 2017 would affect the availability of everything from census numbers to sea level rise.
While Congress has been working toward an open government data platform since 2010, the new bill is a clear reaction to the new administration and its attitude toward transparency. Earlier this month Trump announced that he wouldn’t be voluntarily providing White House visitor logs, invoking the Presidential Records Act to conceal the comings and goings at 1600 Pennsylvania Avenue for five years after his last term ends. And last week the Justice Department mounted a legal defense of the US Agriculture Department’s decision to yank a massive set of records detailing animal abuse enforcement.
Those are clear signals that the executive branch isn’t interested in continuing lawmakers’ commitment to public data access, which began with the 2010 Public Online Information Act. That law put all publicly available data on the internet—and last December, the Senate passed a bill to make it easier for a machine to read and extract that data. (The Open Government Data Act is now waiting on a House vote this session.) The Preserving Government Data Act introduced on Thursday lays the third and final brick—ensuring that once a government data set is published online, it can’t be taken down. Well, not easily anyway.
The bill actually forces federal agencies to give six months notice—and a good explanation—if they want to remove publicly available data. That closes a significant loophole: Right now, agency heads can pull data for a variety of reasons, including if they think it’s too costly to maintain or not valuable to the public. “It imposes a sort of accountability tool,” says Aaron Mackey, a legal fellow at the Electronic Frontier Foundation. “Once something is out there, this makes it really hard to make it secret again.” The bill has the backing of the Center for Data Innovation and The Sunlight Foundation, a government data watchdog. Sunlight’s deputy director, Alex Howard, who helped craft the proposed law, sees it as a no-brainer. “Keeping public data available and easy for the public to access isn’t a Republican idea or a Democratic idea,” he says. “It’s an American idea.”
But the bill could just as easily hobble data preservation by overburdening the system. Its definition of “data” is so broad it would include anything created or funded by the federal government, with narrow exceptions for national security and personal privacy. It’s kind of like a rule that says you have to save every school assignment ever. Is your first-grade crayon drawing just as valuable as your college senior thesis? “We don’t want a world where agencies need to provide notice every time they update text on a website,” says Rebecca Williams, an information activist who serves on the board of Legal Hackers and works by day at the Johns Hopkins Center for Government Excellence. “That would create an undue burden.”
The law would also be difficult to apply during major funding disruptions, like the government shutdown that turned off dozens of websites in 2013. Last week, just the threat of a shutdown sent scientists into a frantic data grab that temporarily crashed servers at the Environmental Protection Agency. Preventing such loss of access is the goal of the new bill, but it’s not clear how agencies would comply if the money got turned off.
Which is why independent data scraping groups aren’t slowing down. Over the past few months, groups like the Environmental Data and Governance Initiative, Data Refuge, and Climate Mirror have pulled together hundreds of activists, computer scientists, and hackers to spend free time duplicating government datasets. Even though Congress is on the case, the new bill’s implementation could take years and agencies might still take a loose approach to compliance.
“The bottom line is we shouldn’t rely solely on the federal government to make sure this stuff persists,” says Daniel Schulman, who co-founded the Congressional Data Coalition and worked closely with lawmakers to write the Open Government Data Act. “A formal process supported by the government would be great, but we should also always be keeping lots of copies in lots of different places.”
At the very least, if you’ve been feeling all of the data removal angst with none of the data recovery skills to put them at ease, now you can champion transparency the old fashioned way: by calling your legislator.