DOGE, DHS, and data matching
“Data matching” may seem abstract, but its consequences can be life-changing: visa revocation, deportation, sudden cessation of Social Security payments, all without warning or opportunity to present argument or evidence to a human fact-finder.
One of the hallmarks of the new U.S. Department Of Government Efficiency (DOGE) is large-scale algorithmic analysis and comparison of existing databases of personally-identified information. In many cases, algorithms, AI, and data matching are being substituted for human judgement as the basis for decisions about individuals. Similar projects are being carried out by the Department of Homeland Security (DHS).
These activities appear likely to violate the Privacy Act (including its rarely-enforced criminal provisions) and/or the Computer Matching and Privacy Protection Act.
DOGE’s programmers are working to aggregate and correlate databases that have been compiled by different agencies or commercial third parties such as social media platforms, identified in different ways, and ingested in different formats.
Data matching is central to the methods of DOGE and the Trump 2.0 Administration. One of President Trump’s Executive Orders to heads of all Federal agencies directs that:
Agency Heads shall take all necessary steps, to the maximum extent consistent with law, to ensure Federal officials designated by the President… have full and prompt access to all unclassified agency records, data, software systems, and information technology systems… This includes authorizing and facilitating both the intra- and inter-agency sharing and consolidation of unclassified agency records.
How is this working out, and what does this say about ID-linked records?
Aggregations of death records created by local health officials over many decades are being cross-tabulated against lists of current recipients of Social Security benefits. The inevitable result has been that an unknown number of living people have had their Social Security entitlement payments cut off without human fact-finding. Weeks later, many are still struggling to prove they aren’t dead and recover the payments they are owed.
Living immigrants have reportedly been coded as “deceased” in Social Security records on the basis of unknown criteria and matching of immigration or other records. Those marked as dead by this data-matching effort include living individuals who obtained Social Security numbers legally and have been working and paying taxes legally. This extra-judicial punishment is intended to make their lives more difficult and induce them to “voluntarily” leave the U.S., regardless of their actual legal status or any claims they might make in answer to any attempt to remove them through legal proceedings.
Keyword searches and analysis of social media posts by U.S. Customs and Border Protection’s National Targeting Center and National Vetting Center — the same CBP components that operate the largest U.S. government blocklist, the extra-judicial no-fly list of a million and half mostly-Muslim names — have been used to generate blocklists of foreign students that are being used to summarily revoke their visas.
More recently, attempts have been made to match lists of holders of student visas with the incomplete and inaccurate aggregation of Federal, state, and local criminal history records held by the the FBI’s National Crime Information Center (NCIC).
Despite the fact that the FBI has officially acknowledged that the process of compiling NCIC records makes them inherently unreliable, this data matching is being used for algorithmic revocation of student visas. NCIC often includes records of arrests without including records of the disposition of the charges, especially if the charges were dismissed. As a foreseeable result, this data matching is resulting in the revocation of visas of many foreign student “criminals” who have never actually been convicted of any crime.
According to the latest report on these DOGE data matching programs:
DOGE comes in to comb through datasets at all relevant agencies and facilitate data-sharing between agencies so the enforcement actions can be carried out systematically and at-scale, one of [the] administration officials said.
“DOGE is working in all the agencies. We are looking at our books, at people who don’t belong here that Joe Biden allowed in with bogus claims,” said a White House official. “They shouldn’t be receiving federal tax dollars in the form of government benefits.”
In the absence of legally required notices, what we know about these programs comes largely through anecdotal reports and reverse engineering of government practices.
The story of mass revocations of foreign students’ visas based on analysis of social media profiles and postings and attempts to match them with visa records, for example, broke first in the Times of India based on reports from Indian students in the U.S. and Indian-American immigration attorneys.
What we haven’t seen are the notices, reviews, or approvals that are required for this sort of matching of data from disparate agencies, especially when data matching is used as a basis (and in many or most of these cases the sole basis) for automated decisions to deny or revoke Federal benefits ranging from Social Security to visas.
There are good reasons for restrictions on the use of personal records by agencies or for purposes other than those for which they were collected. Records created for other purposes are often, and perhaps even definitionally, not fit for purpose for these new uses.
President Trump’s Executive Order on data silos directs the head of each agency to report to the Office of Management and Budget (OMB) within 30 days as to whether any regulations “including system of records notices” would need to be modified to accommodate the integration of personal data held by all Federal agencies, regardless of the purpose for which it was collected, for use for DOGE’s purposes.
The problem with this is that the Privacy Act requires the promulgation of a System Of Records Notice (SORN) disclosing all of the routine uses of records in the system, including the other agencies with which it will be shared, before a system of records is created, new uses are made of the records in the system, or the data is shared with additional agencies.
DOGE has promulgated no new SORNs. So if any of DOGE’s activities have involved the creation of a new system of records, such as through merger or mining of previously existing systems, the operation of that new database constitutes a criminal violation of the Privacy Act on the part of each responsible DOGE official or employee.
Inquiring minds in Congress already want to know what SORNs, if any, provide the basis for DOGE’s data aggregation, data mining, and data analytics across agencies and databases.
On the other hand, what if DOGE hasn’t created any new systems of records. What if it’s just matched of existing records from different agency “silos” to determine eligibility or ineligibility for Federal benefits?
Those activities would be subject to the Computer Matching and Privacy Protection Act of 1988 (CMPPA), which imposes a different set of notice and oversight requirements intended to protect individuals against exactly what DOGE has done: automated denial or revocation of Federal benefits on the basis of cross-correlation of data collected for other purposes.
As the Congressional Research Service summarizes it in a 2022 report, “the CMPPA requires a written matching agreement between a source agency and a recipient agency prior to the disclosure and matching of records…. A matching agreement is not effective until 30 days after a copy has been sent to the Senate Committee on Homeland Security and Governmental Affairs and the House Committee on Oversight and Reform…. The CMPPA generally requires an agency to assess the costs and benefits of a proposed matching program before approving a matching agreement.” Each agency must establish a Data Integrity Board to conduct this prior review and cost-benefit analysis, and must publish an annual inventory in the Federal Register of its data matching programs.
OMB guidelines also require each agency to post its data matching agreements on its public website, but DOGE has posted none. There’s been no public indication that DOGE has submitted any matching agreements to Congress, much less that it has waited 30 days after submitting them or completed any cost-benefit analyses before starting work.
We don’t expect that the staff of DOGE will suffer any consequences for what seem likely to have been ongoing criminal violations of the Privacy Act and flagrant violations of the CMPPA.
We do hope, though, that the public will learn a lesson from these violations of individuals rights’ and the impunity with which they have been conducted.
The potential for future abuse of personally identified data for future malign purposes by a future malign administration inheres in the collection of such data, even when it is being collected for benign purposes by well-meaning officials and agency staff, under policies at the time of collection that restrict how it can be used and shared.
Silos and segregation can limit the danger of data matching, but these are policy barriers than can be changed at the stroke of a pen. DOGE is showing us that current privacy, data protection, usage, or sharing policies cannot protect against future abuse.
The only way to prevent future misuse of personal data is not to collect it. And to the extent that, as we’ve pointed out before, identification is the enabler of ID-linked record keeping, the best pathway to “Do Not Collect” is “Do Not Identify.