The Right to be Forgotten, Progress Report 1

Our project will consist of two parts. The first part will consist of a theoretical report on the topic of the “Right To Be Forgotten” (RTBF). It will emphasize on matters of ethics, security, public interest, freedom to know, and freedom of expression. The second part of our project is more practical, since we will perform a sentiment analysis on tweets in order to find out how people on twitter [1] think about the RTBF issue. This part of the report is intended to present the skeleton of the RTBF-bot we are planning to design and implement. We will deviate from the milestones we set after the assignment of the project. We will work on the two parts of the project simultaneously, so that we can have better control on both the report and the RTBF-bot progress.
In the first weeks of the project, our goal is to define what we will include in the theoretical report and present the skeleton of the RTBF-bot. As far as the first part is concerned, here we will briefly give some information regarding the RTBF and present what we will include in the theoretical report. Regarding the second part, the skeleton of the RTBF-bot and the tools we are planning to use will be presented.

Theoretical Report

The RTBF is put into practice in the European Union (EU) since 2006 and an analytical definition is provided in [2]. On 15 December 2015, the European Parliament, the Council and the Commission reached agreement on the new data protection rules, establishing a modern and harmonized data protection framework across the EU [3].

To better understand  the RTBF, we should first distinguish three concepts, namely

  • data protection
  • privacy
  • identity

We will present the analytical definitions of the aforementioned terms in our full report. Here, we just mention that the right to identity concerns the correct image that one wants to project in society. The RTBF, as the right for individuals to have information about themselves deleted after a certain period of time, not only concerns the fundamental identity interest, but also enables individuals to be different from their “past self”. The RTBF has a wider scope of application than the right to privacy. The relation between the aforementioned concepts is depicted in the following figure. In our report, we will try to define properly and explain the difference between data protection, privacy and identity and present how they are related to the RTBF. We will also provide examples of real cases related to the RTBF.

personality_rights

RTBF-bot

First of all, we will use the twitter search API to collect tweets relevant to the RTBF. Tweets are short messages, up to 140 characters in length. Due to this limitation, people use acronyms, emoticons, and other characters that express special meaning. As a second step, we will perform sentiment analysis [4] to extract polarity from the tweets we collect. To achieve this goal, we will follow the lexicon-based approach, which involves calculating the polarity of the text from the semantic orientation of words or phrases in it. Hence, using a descent lexicon plays an important role in determining the correct polarity of a message.

By performing an internet research we concluded in using the two following lexicons and combine the results:

  1. NRC Word-Emotion Association Lexicon of Saif Mohammad and Peter Turney [5]. These researchers have built a lexicon containing lots of words with associated scores for eight different emotions (anger, anticipation, disgust, fear, joy, sadness, surprise, trust) and two sentiments (positive/negative) Each word in the lexicon will have a “one” or “zero” for the emotions and sentiments.

  2. AFINN Lexicon of Finn Årup Nielsen [6]. This is a list of words associated with a valence between minus five (very negative) and plus five (very positive). The valence of the tweet is calculated as the sum of the valences of the individual words in the tweet.

Apart from words, tweets often contain emoticons, i.e. sequence of printable characters such as -:) or ^_^. These sequences are intended to represent a human facial expression and convey an emotion. Emoticons not only are full of meaning by themselves, but they also have the virtue to change the meaning of the sentences they are appended to. So in order to create a lexicon we will use the list from this website http://apps.timwhitlock.info/emoji/tables/unicode and manually annotate them as positive, neutral or negative. Last but not least, acronyms are widely used in tweets. So we will crawl the website noslang.com in order to create a dictionary for acronyms. For example “lol “has the translation “laughing out loud”.

Team :

Giannakopoulos Athanasios, Kyritsis Georgios, Zhang Fuzhi, Zhong Hua

References

[1]. https://en.wikipedia.org/wiki/Twitter

[2]. https://en.wikipedia.org/wiki/Right_to_be_forgotten

[3]. Directive 95/46/EC on the protection of individuals with regard to the processing of personal data and on the free movement of such data. EU Directive 1995, http://ec.europa.eu/justice/data-protection/reform/index_en.htm

[4]. http://arxiv.org/pdf/1507.00955.pdf

[5]. http://saifmohammad.com/WebPages/lexicons.html

[6]. http://neuro.imm.dtu.dk/wiki/AFINN