A Short History Lesson: In July 2006 AOL offered a data-bank, containing the data of 20 million search queries by 680.000 AOL-users, for download on its website. Although the data was removed again shortly after, the data had found its way into the net and since then stayed there. This did not only prove as a PR-disaster (the ‘Data Valdez‘ case) but also triggered an interesting legal dispute (Does v. AOL LLC, Case No. C06-5866 SBA (N.D. Cal.; June 22, 2010).
Although the AOL-users had been assigned random numbers to protect their identity it took reporters of the New York Times less than a month to identify at least one user (only) on the basis of the search queries of this user:
“No. 4417749 conducted hundreds of searches over a three-month period on topics ranging from “numb fingers” to “60 single men” to “dog that urinates on everything.”
And search by search, click by click, the identity of AOL user No. 4417749 became easier to discern. There are queries for “landscapers in Lilburn, Ga,” several people with the last name Arnold and “homes sold in shadow lake subdivision gwinnett county georgia.”
It did not take much investigating to follow that data trail to Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga., frequently researches her friends’ medical ailments and loves her three dogs.”
Although most users have already accepted that “online anonymity” as a concept is mistaken and does not exist under normal circumstances, the case of Thelma Arnold may serve as a school-book example to highlight that if you just have enough little pieces of information you will sooner or later be able to put them together to get the big image.
For more information on this issue please see my previous but quite extensive post: *How Much Information Does A Search Query Reveal About A User?
(And if you enjoy doing this on a regular basis, eh…. here is a video that might help you stop doing that)