VenitianBot

Introduction

With the rise of social media, we share more and more information about where we are, what we do, what we like online and the VenitianBot is an autonomous program that tries to take advantage of it.  The bot talks about the Digital Humanities’ work related to Venice on Twitter and it interacts with the users on Twitter to get their attention, because the goal is to reach out to people that use Twitter and care about Venice. People that visit Venice often share their impressions on Twitter and the The VenitianBot project is about them.

Twitter is one of the most popular social networks and there are 236 million monthly active users. There are multiple ways to communicate on Twitter i.e. by updating the profile status, directing a tweet to a user or using hashtags. Our program uses all three of them.

Methodology

1. Tweet recognition

           1.1   Tweet Rank

In order to classify the received tweets, we have decide to use ranking. We have assigned ranks to every tweet in order to classify them. The rank of the tweets is calculated with the classification function (see below), which will allow us to sort the tweets by rank. The classification function is the following:

α·A + β·B + γ·C > 0         if A+B>0

false                 otherwise

with:

  • A the number of areas the tweet belongs to
  • B which indicates how many precise keywords were detected in the tweet
  • C the number of general keywords in the body of the tweet.

After testing with the activity on twitter, we have decided to use 0 as threshold that defines a relevant tweet, since the number of tweets about Venice is not as high as we first suspected.

We have defined 3 types of keywords: precise, general and red-flags. A precise keywords are words that clearly indicate that the tweet is of our interested (examples: Venice, Grand Canal, San Marco, …). A general keyword is a word that will only be considered if a relevant precise keyword or a location was previously detected (examples: Italy, canal, piazza, … ). These words will help us boost a tweet that has already been recognized as relevant. The latest category, red-flags, exists to filter out false positives or words we don’t want to consider (examples: beach, ocean, Los Angeles, California, ..)

           1.2 Tag extraction

We use a simple mechanism to extract relevant keywords. We already build a set of keywords that we use to detect interesting tweets, so we also use them to extract “tags” that we use to send more targeted tweets. An example of this situation would be the following: let’s assume that we have the following answers in our database:

“Did you know that the Venice Republic was the biggest power of the Mediterranean during 1300’s-1500’s  #Venice”

“Did you know that the #SanMarco bell tower – or campanile – is the #Italy’s fifth tallest bell tower, measuring 98,6m. #Venice”

and that we receive the following tweet:

SanMarco_Venice

Then, the expected behaviour of the bot would be to send tweet

           1.3 Locations

Twitter allows us to retrieve the geo-location from which a tweet was emitted. We can use this to locate people that are in Venice and, even better, people that are close to important monuments. We expect a high accuracy of the GPS sensors in urban environment such as Venice [http://blogs.esri.com/esri/arcgis/2013/07/15/smartphones-tablets-and-gps-accuracy/]. We then use coordinates and a radius to define places when important monuments are located (such as Rialto Bridge and San Marco). We use Equirectangular distance to compute the distance between two points, as opposed to Haversine formula, which is more computationally expensive but also more precise (experiments show that differences are negligible and computation resources is not a bottleneck). We found however that not many tweets have an associated location. This might be because people don’t enable GPS on their phones or send tweets from their computer.

2. Tweet generation

           2.1 Search on the Internet

As a starting point in the quest for tweets, we needed a basis to lure Twitter users into paying attention to what the VenitianBot was saying. To do so, the obvious choice was to wander the internet in search for fun but nonetheless captivating facts about the Queen of the Adriatic. So we enquired the mighty Google and found some good results to begin with that can be observed on the GitHub repository.

           2.2 Crowdsourcing as a source of information

The problem with mining websites and other elaborate techniques to gather knowledge is that they take a lot of time and resources. What we thought instead was to harness the power of the people to build a bigger database of tweets that we could use.

Of course, this method has its risks, one could post anything, meaningless sentences and that would be disastrous for the bot so we need a human hand to verify tweet proposals, formulate them correctly, etc…

           2.3 Use other projects as a way to expand

One thing we could not use yet, since the semester is not over, are other projects from this course. There are projects about movies in Venice to gather new facts, palace recognition to recognize tweets in a better way, etc…

           2.4 Featured users

We thought that some users are more important than others and that we can retweet what they say everytime. A good example is the Twitter profile of Francesco Foscari that is animated by our fellow students but other people could rejoin that group. The candidate should not tweet to heavily, tweet always about pertinent subjects which is why profiles like EPFL cannot be included.

3. Handle tweets directed to the VenitianBot

           3.1 Pre-recorded tweets

At the moment, we have only one simple answer: “Hey, I’m just a simple bot. Tell me more here: goo.gl/forms/fyx0PSmBzk“. It seems sufficient right now but of course more elaborate and diverse answers would be good.

Screen Shot 2015-05-13 at 22.53.20

           3.2 What the future holds

Something that would be very interesting to add would be something that understands the tweets addressed to the bot and be able to respond it but this is a problem known by the community and worked on by the best minds.

Results

1. Perception by the public

So far, we have left the bot to run a few hours at a time to tune it, test it more and the results have been promising. We have been replied to 12 times, retweeted 14 times and favorited 29 times! This shows that people appreciate the presence of the bot and even though it is not yet known, its tweets are spot on. One person even sent us a picture as a response:

Screen Shot 2015-05-13 at 22.49.21

2. User feedback

We asked user in the Google form given when we have a reply what they think about the bot. This has not yet been used by many because people might not want to take the time to answer to those questions, or might not be interested in giving feedback, but the one answer we had was promising. Someone gave us their experience in Venice and although it was not something we could use, it showed us we were going in the right direction.

Conclusion

This project was one of many encounters. We had to think just as a user lambda: when would I care for that sort of things, where would I be, what would I tweet that would awaken my curiosity, what would quench that thirst of knowledge about the Bride of the Sea? We tried our best to answers those questions through filter mechanisms, tag extraction, etc…

We have built a simple web interface in order to visualize the results of the bot, mainly what is the rank of each tweet, what do we reply and to whom. Although important for testing and tuning our bot, this has been discussed in Blog posts 2 and 3 from the course’s website (See link and references below).

Screen Shot 2015-05-13 at 23.43.11

Links

Project repository

References

Previous blogposts

Twitter Streaming API

Typesafe Activator