This project is still on track according to the progress plan, and this progress blog post outlines the work done in the last 3 weeks. With our initial plan, these weeks were dedicated to “Defining replies”, so this blog post is mainly focused on that part, with some additional features.
1. About replies
Once we have defined our system of ranking and tweet recognition, a crucial part of this project is what the VenitianBot is going to post on Twitter. As it has been previously pointed out, the VenitianBot shares facts about Venice in two ways: directed to a user and as a status update.
1.1 Personalised responses
When we detect that a tweet is interesting, we would like to share information as meaningful to the user as possible. To do this, we added “categories” to each predefined answer that we can send in response to a tweet. Every answer and every tweet can have many categories, and whenever we have to choose a response we will try to match as many categories as possible.
and in our database we have the following responses:
“Did you know that the Venice Republic was the biggest power of the Mediterranean during 1300’s-1500’s #Venice”
“Did you know that the #SanMarco bell tower – or campanile – is the #Italy’s fifth tallest bell tower, measuring 98,6m. #Venice”
we would like to use the second answer because it is more relevant.
According to our actual implementation both the tweet and the second answer belong to the categories “san marco” whereas the first answer doesn’t belong to any category, it is a general response (in fact the second answer belongs to more than one category, it is also part of “bell tower” and “campanile”).
We have a predefined set of categories to which each tweet and answer can belong. For now we set the categories of answer manually, but this will be done automatically for the last deliverable. This way, when someone wants to add a new response for the bot, categories will automatically be extracted and used to provide more targeted answers.
1.2 Frequency of tweets
In the previous blog posts we have mentioned that we are going to define the frequency of VenitionBot’s tweets as a period of time between tweets. However, after observing the flow of tweets, we have noticed that it is variable. Based on our observations we have decided to define the period of time between two tweets depending on the number of tweets recognized as ranked. There are two main advantages for this approach:
When people are talking a lot about Venice, the bot will be more reactive because the chances of users noticing it is higher (ex. summer)
When there is not any activity on Twitter about Venice, the bot will not post updates, since it’s very likely that no one will see the update (ex. at night)
1.3 Keep history
It is important that Twitter users are not annoyed by the bot, hence the need to keep history of people the bot has already addressed a tweet at. To achieve this we have enabled the bot to keep track of people it has tweeted at.
2. Crowdsourcing: learn facts from the users
While experimenting with the bot, we have noticed that people may be willing to share some interesting facts about Venice with the bot. This tweet is a good example of that:
We have decided to predefine a reply for the people who are trying to talk to the bot. The reply will contain a simple form where interested users can submit interesting information about Venice. We will later review and decide if we add those facts to our database of posts.
3. Refine tweet recognition
We noticed that, for now, most of the tweets that we see don’t provide much information for us, they contain the word Venice but no other interesting keywords (like Rialto Bridge or San Marco). To have a more refined ranking, we added some common positive keywords that will boost it if the tweet contains them, and negative ones that will decrease it. This is kind of a first step towards natural language processing, but unfortunately we don’t expect to be able to gain a lot of insight regarding the tweets because of the limitation to 140 characters.
However, we have to note that during this period of the year not a lot of tourists are visiting Venice, this might explain that we don’t find a lot of “interesting” tweets. We expect that the number of tweets with a high ranking (due to keywords) will increase a lot during July and August where a lot of tourists will be visiting Venice.
4.1 Web interface
The web interface will allow us to easily demonstrate that the VenitianBot works as expected and show results.
For the moment, it displays the tweets received from the Twitter Streaming API with their rank as follows:
We will add a way to see when the bot responds to tweets as well.
4.2 Configuration : define parameters
With the Google form to gather answers from the crowd, we need a convenient way to add possible answers to a DB so that the bot can use them later. This will be added to the web interface as well as the possibility to modify parameters such as the number of tweets per day.
In blogpost 2, we talked about CometSockets vs WebSockets and how the latter were not necessary because we were planning on only sending messages from the server to the browser. With this configuration page, we will need that other communication channel in order to modify the bot’s behaviour.
We use one database to save all the tweets that were relevant for review or future use and will add another database to keep the answers that the bot will later use, it will allow us to easily add new answers, delete them and reuse them for any other project.