Government data availability and integrity
It is a time for increased vigilance to safeguard the government's social and scientific data resources
In an ideal world, governments collect, evaluate and process data to create evidence with which to assess the state of the natural and social worlds for which they share responsibility. Such data are intended to guide the creation of public policy and public investment, improvement of government services, and to enable governments to act on behalf of the population in general. Examples of the kind of data governments use include: demographic and population data; weather, climate, and other environmental data; economic data; and so on.
The United States government is by far the world’s biggest collector of data. Its U.S. Government Open Data portal includes just a minute portion of the data held by the government and its agencies, and as of 5 January 2025 it held 306,121 datasets. As required by the Open Government Data Act, these datasets comply to a high standard of quality and integrity, sufficient for use as evidence in the establishment of actionable government policy. These data, published for public consumption, are just the tip of the iceberg. Some government data is too sensitive to share openly—think of the data collected by the military, by the Internal Revenue Service, or even recent census data. And some is simply too voluminous—think of the amount of data transmitted from orbiting remote sensing platforms, for example, or the extraordinary amount of data collected by entities such as NOAA or NASA and its research partners.
Of course, data can be exploited or abused for political gain. In the United States perhaps the best known example and probably the most widely practiced abuse is gerrymandering. This is the practice of drawing electoral boundaries in a way that provides advantage to a given political party, or even to an incumbent, by minimising the electoral impact of the opposition. In this case the abuse involves census data, which in effect creates social profiles of geographic areas, but more specifically electoral data and how it is aligned to the geographic distribution of the population.
In the campaign for the 2024 U.S. presidential elections voters were presented with conflicting assertions about issues of concern to voters by the presidential candidates, who often cited data to support their claims. While politicians of all stripes often commit errors when referring to data and statistics, many wildly inaccurate and utterly false claims about the state of the U.S. economy and of immigration to the country—allegedly based on data—became a part of the everyday rhetoric of the Republican candidates.
In the context of the campaign, it appears that a near majority of voters accepted these falsifications and distortions of data as part of political speech, or even as fact. As the incoming administration takes the reins of government this month, with an extensive playbook in the form of Project 2025: Presidential Transition Project, data that do not support the proposed changes in government policy and operations becomes, at the very least, extremely inconvenient.
What, then can we expect?
We can look back to 2016 for some clues. At that time, there was widespread concern about the suppression of government data and many independent researchers who require such data for their own work began to systematically download or otherwise capture it for safe-keeping. The actual suppression of data at that time turned out, however, to be somewhat limited. For example, in 2017 the White House removed the website established by the Obama administration that provided executive branch data to provide greater transparency of the use of public resources.1 There is also some evidence that during the first Trump administration some data relating to health and climate data that did not support White House rhetoric and policy was removed from agency websites.2 However the White House did not possess a sufficient level of control over the various agencies of government to actually disrupt access to critical, curated public data.
But as has been widely noted, the Republican administration of 2017 and the incoming administration in 2025 are two different animals. Loyalists who toe the party line and promote its rhetoric will be placed in leadership and management roles throughout agencies that report to the Executive and they will be guided, presumably, by the objectives of Project 2025. Given the incoming administration’s public positions taken toward a broad range of issues, it seems reasonable to expect that existing data that contradicts their stance will be suppressed to the extent possible, that “alternative data” held to different data integrity standards is likely to sometimes take its place, and that some agencies will cease to collect data that undermines the administration’s goals—climate and weather, immigration, economic and education data come to mind.3 Indeed, some of the executive agencies whose activity is incongruous with administration objectives may cease to exist or be radically down-sized over the next four years.4
The National Archives plays a role in assuring compliance to government regulations that prohibit the destruction of government data, but will this be enough, or can the Archives’ role also be undermined?
The best protection against the loss or corruption of data in the coming years is the vigilant monitoring of the federal government’s data collection and curation activities—an activity that is likely to fall to independent actors in the research domain, inside and outside of academe. As with many other things, it may be that the statement often attributed to Thomas Jefferson applies aptly to government data: “eternal vigilance is the price of liberty.”
The site has nevertheless been preserved by the National Archives, see https://open.obamawhitehouse.archives.gov/ (consulted 8 Jan 2025)
Scott Waldman, “Climate Web Pages Erased and Obscured under Trump,” Scientific American (10 January 2018) https://www.scientificamerican.com/article/climate-web-pages-erased-and-obscured-under-trump/; and Chalres S. Clark, “Groups Monitoring Agency Website Changes See Deeper Trump Agenda,” Government Executive (27 April 2018) https://www.govexec.com/technology/2018/04/groups-monitoring-agency-website-changes-see-deeper-trump-agenda/147806/ (consulted 8 Jan 2025)
Regarding potential doctoring of economic data, see Paul Krugman, “The Real Threat of Fake Numbers,” Krugman Wonks Out: Substack (10 Jan 2025) https://substack.com/inbox/post/154512702 (consulted 10 Jan 2025).
With regard to NOAA, Project 2025 asserts that the services within NOAA “… form a colossal operation that has become one of the main drivers of the climate change alarm industry and, as such, is harmful to future U.S. prosperity. This industry’s mission emphasis on prediction and management seems designed around the fatal conceit of planning for the unplannable. That is not to say NOAA is useless, but its current organization corrupts its useful functions. It should be broken up and downsized.” Thomas F. Gilman, “Department of Commerce,” Project 2025: Presidential Transition Project (Heritage Foundation, 2023-2024) https://static.project2025.org/2025_MandateForLeadership_FULL.pdf#page=708 (consulted 8 Jan 2025)
This is good and important stuff. I would chime in to say that at the same time, search engines are restricting results. We are getting access to less and less.
I actually did not realize this. I think many Americans would imagine that it's those "socialist European countries" that collect the most data. In fact, I myself imagined that, but without the perjorative outlook.