The dangers of identity databases
After a white-supremacist hacker got access to digital records of old applications for admission to Columbia College and passed them on to the New York Times, the Times reported that in 2009 Zohran Mamdani checked boxes identifying his national origin and ancestry as African-American (he is a US citizen — an American — and was born in Uganda — the country in Africa where his paternal ancestors had lived for generations) and Asian (his mother was born in India, where her ancestors had lived for generations).
Questions have since been raised by Liam Scott in the Columbia Journalism Review and others as to the use in a news story of data obtained illegally by an unnamed third party with an unmentioned political bias, and by Dan Froomkin at Presswatchers.org and others as to whether these truthful factual 2009 statements by Mr. Mamdani are newsworthy. The Times has responded to some of these questions in a follow-up article.
But we have other questions that we haven’t seen asked elsewhere. These are questions not for Mr. Mamdani or the Times but for Columbia University:
- Why does Columbia still have this information about an unsucessful 2009 applicant for admission in its records?
- Even if there was some reason to retain these records, why were they accessible online, with or without whatever passwords or access restrictions were circumvented by a hacker to obtain them?
- Now that this incident has made the potential for misuse of records like this apparent, what are Columbia and other institutions and entities with similarly dangerous data doing to expunge it?
At a time when naturalized US citizens, including but not limited to Mr. Mamdani, are being threatened with denaturalization followed by detention and/or expulsion to overseas death camps, and when pogroms are being carried out by masked armed gangs snatching people off the streets on the basis of perceptions of national origin, these are questions for anyone in charge of a database with a field for citizenship, race, or national origin.
Mr. Mamdani had good reason to apply to Columbia, even if his application may have been a long shot, since as the child of a tenured Columbia professor he would have been entitled to free tuition if admitted. But whatever purpose Columbia may have had in 2009 for asking applicants for admission to its colleges to cateogrize their national origin by continent, that purpose was completed when Mr. Mandani’s application was rejected.
The lesson of this teachable moment is that personally identified information, even information about attributes and activities that were lawful at the time and that were collected for innocent purposes, has the inherent potential for weaponization against innocent individuals — sometimes by unforseen actors in unforseen ways — as long as it is retained. It’s happened before in the US, as when census data on national origin was used to round up Japanese-Americans and send them to concentration camps, and could happen again as long as data like this is collected and retained.
Columbia may claim that it retained this data in case it might have needed it to defend agaisnt potential litigation by unsuccessful applicants. But the statute of limitations for any such litigation related to 2009 admission decisions would have passed years ago.
Columbia may claim that, having collected this data, it retained it for research purposes. But there’s been no indication that it made any attempt to even semi-anonymize this data. And would possible future research use justify retention of information that could endanger past applicants for admission?
Under Canadian or European data privacy law, retaining this data when it was no longer needed for the purposes for which it was collected would be illegal.
This data was collected for the purpose of making admissions decisions in 2009. If there was some adequate justification for retaining this data for possible future use when it was unquestionably no longer useful for that original purpose — which we doubt there was — it could have been stored on an air-gapped device or media, such as a backup tape or disk locked in an archival vault.
But even that would pose the danger of government-compelled disclosure.
Imagine that you were the director of a business or institution in Germany in 1933. Imgaine that — at a time when it when German Jews still had all the rights of German citizens — you had compiled information about your employees’, students’, customers’, or suppliers’ “nationality” or “race” as indicated on their ID cards, including which Germans were identified as Jewish.
When the government began to redefine German Jews as not German citizens, deny them rights, and exclude them from more and more categories of employment, wouldn’t it have been your moral duty to expunge those records identifying Jews? Then you could truthfully say, if the government demanded to know which of your employees were (under Nazi laws) illegally employed non-citizens, that you had no records of who was a Jew.
The best way to avoid misuse of personal data is not to collect it. If it has been collected, and especially if it is no longer needed for the purpose for which it was collected, the best way to mitigate the risk to the individuals to whom it pertains is to expunge it.
Columbia has no excuse. Nor do other institutions in the same position. No law required Columbia to collect this information about Mr. Mamdani and an untold number of others. No law requires Columbia to retain it. Now Columbia knows, as it should have known all along, how this information can be weaponized.
Columbia and its peers in both the public and private sector should expunge these records — now, before even more damage is done to Mr. Mamdanai and millions of other naturalized US citizens and other immigrants.