Online Survey on the RTBF
For the theoretical report, we decide to deviate from the milestones we presented in our first blogpost and do an online survey in order to discover the public awareness and the opinion of people regarding the “Right to be Forgotten” topic. Therefore, we address to the EPFL community and conduct a survey via the surveymonkey.com platform.
The first step in planning our survey is to determine its objectives, and designing respective questions. Also, we try to keep our survey as small as possible, because surveys that are too long may suffer from reduced response rate and/or biasing of the results.
First of all, we aim at determining the public awareness about the RTBF. Therefore, we add a question asking if the user has already heard of that issue. The results are depicted below:
Surprisingly, we notice that more than 45% is already aware of the RTBF, although actions regarding that particular issue have been taken only during the last 3-4 years.
To facilitate users who where not aware of the RTBF to answer the rest of the questions, we provided a short definition. Then, we ask questions to find how people correlate the RTBF with the right to privacy / data protection and the freedom of expression.
At this point, we would like to define the separation line between the right of privacy / data protection and freedom of expression. The first one is defined from the point of view of the person who is mentioned in an article, whereas the latter is defined from the point of view of the person who wrote / published the article. On one hand, if the article content has to do strictly with the personal life of somebody (e.g. murder of a family person, personal default, etc.) and at the same time, the incident described has minor impact on the society, then any request for article removal belongs to the category of privacy / data protection. On the other hand, if the incident described in the article has stronger impact on the society (e.g. children abuse, tax evasion by public officials, etc.), then the freedom of expression should be taken into account and most probably the removal request should be rejected.
We give several request examples regarding privacy and data protection and ask users to tell us if they would approve or deny them.
Then, we repeat the same procedure for requests regarding the freedom of expression.
Finally, we ask people which information they think should be removed from the search engines
As already mentioned in previous posts, sentiment analysis implies extracting opinions, sentiments, and emotions in a specific text. One application is to extract attitude and opinion of people on twitter for a specific topic. In our case we try to find out how people on twitter think about the topic “The Right to Be Forgotten”.
Details about the lexicons used, and the preprocessing of the text of tweets in order to be able to analyze them can be found on our previous blogposts.
We use the R programming language to implement our the RTBF – Bot. We present and comment on various data mining results such as
- tweets by day
- sentiment analysis
Each of the aforementioned tasks is defined and analyzed in the rest of the blogpost.
Tweets by day
We make a basic histogram showing the distribution of the number of tweets over time.
We notice some peaks at the beginning and at the end of March. After some research, we find out that the first peak has to do with some actions taken by Google. More concretely, Google said that it would implement changes in how it applies the so-called right to be forgotten for online searches made in Europe. Up to now, Google had been deleting certain results from searches made on google.de, google.fr, google.co.uk and other domains within Europe. However, Google uses now geo-location signals (like IP addresses) to restrict access to the deleted URL on all Google Search domains, including google.com, when accessed from the country of the person requesting the removal . The second peak has to do with the fact that France fined Google over ‘right to be forgotten’ on March 24. More concretely, France accused Google for not scrubbing web search results widely enough in response to a European privacy ruling .
Also, we look at the sentiment scores for eight emotions from the NRC Word-Emotion Association Lexicon .
The numbers depicted in each bar of the histogram represent the number of words that are found in all tweets and URLs we analyze. We have two dominant emotions, namely trust and fear. Tweets that contribute to the emotion of trust strongly support the RTBF and include words like “achieve” and “advocate”. Tweets that contribute to the emotion of fear include words like “abandon” and “abuse”.
The following histogram shows the frequency of tweets with respect to the scores allotted to each tweet. For this analysis we use tools provided in  and .
The above histogram is skewed towards positive score, which shows that the sentiment of people regarding RTBF is positive. Out of 13773 tweets that were fetched from twitter, the majority of them are neutral, whereas 1737 have positive polarity. On the contrary, 930 have negative sentiments.
Giannakopoulos Athanasios, Kyritsis Georgios, Zhang Fuzhi, Zhong Hua