The right to be forgotten

 

Screenshot 2015-12-07 18.18.01

Abstract:

Nowadays, our social life is more or less dominated by information. Everyday, we are participating in a wide network of data exchange, a great deal of which is of personal nature. However, there are people that have been stigmatized or just want to get rid of their digital footprint and this is where the problem arises. Total personal data erasure from the web or any other medium is unmanageable unless it is performed by the provider. The European Union (EU) proposal promises to cure this problem: the right-to-be-forgotten might be bestowed upon individuals who may express such an interest.

Objectives:

The unlimited sources of information that are accessible by search engines like Google and the rise of social media like Facebook and Twitter make one’s digital past easy to retrieve. This fact may raise ethical issues. The right-to-be-forgotten (RTBF) is part of the general idea of personal data protection. The individual, interested in erasing any personal data from the web or any other medium in general is eligible to ask the provider to rectify, erase or block the unwanted piece of information that concerns him/her. The project will focus mainly on the RTBF for past activity and emphasize on matters of ethics, security, public interest, freedom to know and freedom of expression.

Deliverables:

  1. The RTBF-Bot implementation (probably in R or Python).
  2. A synthetic report about this issue including aspects regarding definitions, evolution, technologies and challenges, ethics, public interest, and freedom of expression.

Methodology:

1. What will the RTBF-Bot perform?

This Bot will gather a huge amount of tweets concerning the “Right to be Forgotten” issue and analyse them using sentiment analysis to find out how people on twitter think about this issue. Sentiment analysis uses Natural language processing, statistics, and Machine learning techniques to determine whether a text is positive, neutral, or negative.

To achieve this, we will use the twitter search API to collect tweets relevant to the topic. Unfortunately, this API allows the users to download only the last week’s tweets. So in order to create a huge database of tweets we are going to download new tweets every week.

The major fields of a tweet are the following:

  • text:  the actual text of the tweet
  • favouriteCount: the number of times that tweet has been favourited
  • created: the date of creation
  • id: the tweet identifier
  • statusSource: utility used to post the tweet
  • screenName: the user who posted the tweet
  • retweetCount: the number of times that tweet has been retweeted
  • longitude, latitude: geographic location of this tweet

In our analysis we will take into account the tweet’s text, the number of favourites and retweets. After collecting them, pre-processing is necessary before performing any analysis. Pre-processing consists of extracting the text from the tweet and tokenising the text, that is breaking the text into words or phrases. Emoticons should also be taken into account, as they represent positive or negative emotions.  Finally, the picture below shows that tweeted URLs are an important part of the analysis, as their content strongly reflects human emotions. After the data processing, the RTBF-Bot will estimate the sentiment of every tweet and classify it as positive, neutral, or negative.

Screenshot 2015-12-02 19.16.02

2. Focus of the theoretical report

The theoretical  report will focus on the definition of the RTBF and its evolution until today. It will provide examples of people who requested the erasure of personal information, as well as the outcome of  these requests. It will also refer to the impact of new technologies and social media (e.g. Facebook and Twitter) on the fields of information sharing, data privacy and data protection. In addition, the report will  present various challenges related to the RTBF. These challenges have mainly to do with ethics, public interest, freedom of expression and security issues.

3. What is related to laws and how to identify the information?

Up to now, there is no global framework to allow individuals to control online information about them. In May 2014, the European Court decided that individuals have the right to ask search engines to remove links with personal information, when the data provided is inaccurate, inadequate, irrelevant or offending [1]. However, the court also confirmed, that the RTBF has also clear limits as it opposes the  freedom to expression [2]. For that reason, a case-by-case assessment is needed. Google was the first company that formed a committee of various experts. This committee is responsible for evaluating and taking action for the RTBF requests received by Google [3,4]. The majority of these requests come from individuals. Numerous examples are given in [5]. Google published that the majority of the removed URLs comes from www.facebook.com and www.profileengine.com [5].

Milestones

  • week 1 to 2: skeleton of the RTBF-Bot
  • week 3 to 5: software development of the RTBF-Bot
  • week 6: testing the RTBF-Bot
  • week 7 to 9: bibliographical research
  • week 10 to 11: data analysis
  • week 12 to 13: writing of the report
  • week 14: presentation

Throughout the semester new tweets will be downloaded every week.

References

  1. http://ec.europa.eu/justice/dataprotection/files/factsheets/factsheet_data_protection_en.pdf
  2. “We have no right to be forgotten online”
  3. https://support.google.com/legal/contact/lr_eudpa?product=websearch
  4.  “How Google determined our right to be forgotten”
  5. http://www.google.com/transparencyreport/removals/europeprivacy/?hl=en-US