Researchers have often resorted to releasing huge amounts of data from apps and social media online, for their research and studies. In order to address ethical and privacy concerns, they almost every time resort to the excuse of data being already public. The most recent instance in this regard was when a team of researchers from Denmark publicly released a huge dataset of 70,000 users of the dating site OkCupid, which includes not only the regular demographics such as name, age, gender, address but also private information such as sexual orientation and many intimate profiling questions the app/website asks users to answer in order to provide them the most suitable matches. When asked whether he made any efforts to anonymize this data, Emil O. W. Kierkegaard who led the team of researchers simply replied that the data was already public.
While it is true that the researchers are trying to advance their understanding of the phenomenon of user behavior by analyzing their sensitive data, this does not address the ethical dilemma associated with releasing such data in public. Undoubtedly, the data they released was already public, but only to the ones who created a profile on OkCupid with the intention of using it. In their research documentation, it is revealed that they initially designed a bot to gather profile data from the OkCupid database, but all they did was gather information from the ‘suggested’ profiles, and they thought the data would be biased. This suggests that the researchers, in fact, created a profile to obtain the relevant data, and then released it to the public. The data was meant to be visible only when you are a member of the OkCupid community, and hence, it was not ‘public’ as claimed by the researchers. The final method that they used to obtain data is not clear.
Their method of obtaining this huge amount of data clearly defies the numerous research ethics, such as to maintain privacy of the subjects, obtain prior consent for data sharing in public or minimize the harm done to the subjects of study. This method of releasing user data is, however, not new. In 2008, a research team at Harvard released dataset comprising four years’ worth of Facebook profile data of their 1,700 peer students. Again, in 2010, Pete Warden, who used to work at Apple, used a loophole in Facebook to obtain around 100 GB user data comprising of 215 million Facebook profile manes, their fan pages, and friend list and announced that he would be releasing it publicly for academic research. In both the instances, the researchers claimed that the data was already public.
While Kirkegaard currently faces a lot of heat with respect to the ethical integrity of the methods to obtain data, there needs to be a better understanding among the Big Data scholars and researchers to ensure that their processes are well within the ethical limits. The problem is that there is no predefined line that divides the ethical from non-ethical and unjustified methods. Communities of scholars and researchers must be formed to address this issue head-on. Simply because researchers have access to data on subjects through social media doesn’t imply that they have a right to use this data publicly. The above mentioned Harvard research data is no longer publicly accessible; Pete Warden also deleted all data he wanted to share publicly. In the light of the criticism, Kirkegaard has also removed the OkCupid data from the public repository. But this doesn’t undermine the urgency to address the ethical issues related to data science, especially in the current age when data about users is readily available.
The aim of these measures shouldn’t discourage the work of the scientists, but it must lay a common platform and lay out certain rules which the researchers must abide to protect the privacy and interest of their study subjects. There should be peer-review forums where researchers must submit their methods of obtaining data, and the same should be reviewed by their fellow researchers. Such forums also need a healthy participation from Internet research ethics specialists. Not that there aren’t such forums and platforms presently, but the importance and the authority they exert need a serious boost in light of such unethical research activities happening.