And I wanted to make sure that, fingers crossed, we get to a season 2 that we really have the appropriate amount of time to explore the character of Ed.” The show went its own route which is why they chose to not keep everything 1 to 1 accurate with the original.Įd clearly has a very limited role in the Cowboy Bebop live-action show but if they do end up getting a season 2, then fans will be able to see more of the character. And Ed is a very complicated character in all of the good and right ways. The official quote reads, “We really take our time to get to know our characters. Ed is prancing around in a dark-looking background with decrepit childlike decorations as well as lights. The tweet shows that they really captured the vibe of the character and the show to a tee. Even anonymous telephone data makes for fascinating research.The Netflix Geeked account tweeted out a quick GIF of Ed that shows a little of what fans should expect going forward. Large anonymous databases of medical data are enormously valuable to society: for large-scale pharmacology studies, long-term follow-up studies and so on. On one hand, anonymous data is an enormous boon for researchers - AOL did a good thing when it released its anonymous dataset for research purposes, and it's sad that the CTO resigned and an entire research team was fired after the public outcry. This has profound implications for releasing anonymous data. It turns out that date of birth, which (unlike birthday month and day alone) sorts people into thousands of different buckets, is incredibly valuable in disambiguating people. Stanford University researchers (.pdf) reported similar results using 2000 census data. "In general," the researchers wrote, "few characteristics are needed to uniquely identify a person." Expanding the geographic scope to an entire county reduces that to a still-significant 18 percent. population is likely identifiable by gender, date of birth and the city, town or municipality in which the person resides.
Using public anonymous data from the 1990 census, Latanya Sweeney found that 87 percent of the population in the United States, 216 million of 248 million, could likely be uniquely identified by their five-digit ZIP code, combined with their gender and date of birth. Other research reaches the same conclusion. The moral is that it takes only a small named database for someone to pry the anonymity off a much larger anonymous database. After that, all they need is a little bit of identifiable data: from the IMDb, from your blog, from anywhere. With only eight movie ratings (of which two may be completely wrong), and dates that may be up to two weeks in error, they can uniquely identify 99 percent of the records in the dataset.
Narayanan's and Shmatikov's de-anonymization algorithm is surprisingly robust, and works with partial data, data that has been perturbed, even data with errors in it. It turns out, though, that this only makes the problem slightly harder. Netflix could have randomized its dataset by removing a subset of the data, changing the timestamps or adding deliberate errors into the unique ID numbers it used to replace the names. The obvious countermeasures for this are, sadly, inadequate. This would certainly hold true for our book reading habits, our internet shopping habits, our telephone habits and our web searching habits.
It turns out that if you eliminate the top 100 movies everyone watches, our movie-watching habits are all pretty individual. What the University of Texas researchers demonstrate is that this process isn't hard, and doesn't require a lot of data. A data broker holding databases of several companies might be able to de-anonymize most of the records in those databases. Merchants who maintain detailed customer and purchase information could use their data to partially de-anonymize any large search engine's data, if it were released in an anonymized form. Google, with its database of users' internet searches, could easily de-anonymize a public database of internet purchases, or zero in on searches of medical terms to de-anonymize a public health database. Or Amazon's online book reviews could be the key to partially de-anonymizing a public database of credit card purchases, or a larger database of anonymous book reviews. Someone with access to an anonymous dataset of telephone records, for example, might partially de-anonymize it by correlating it with a catalog merchants' telephone order database.