VenitianBot: Progress post 2

This blog post summarizes the work we have done from week 3 to week 7. In this phase of the project we have continued with the implementation of the tweet recognition and started to define possible replies of the bot. We have also created a web interface that will later display the tweets that were classified as relevant and what the VenitianBot replied.

BotProgress2The second phase of our project was “Tweet recognition”. For now we have implemented the tweet recognition based on:

  1. Key words
  2. Hashtags
  3. Geo-location
  • Key words and Hashtags

The Twitter API doesn’t provide any way to distinguish hashtags from other words in a tweet, so we decided to process them the same way. Actually we use some known monuments as keyword (Rialto Bridge for instance) as well as Venice itself. We have to account for the different writings the same location or monument can have. To keep the same example, Rialto Bridge can be written as Rialto-bridge, rialtoBridge, etc. This is an example tweet recognized by the bot, based on the keywords.

  • Geo-location

Tweet recognition based on geo-location was discussed in the previous blogpost.

  Defining replies

We have also started the third phase of the project which is “Defining replies”. For now we have defined a simple database of answers that the VenitianBot would useto reply to the users that tweet about Venice. Ideally, in the following weeks the bot will be able to post related tweets. By related tweets, as we previously explained, we mean for example if in the tweet we have the keyword “Rialto” we would like to share a fact about “Rialto”.keyword

Here are some examples of replies and reactions of users.

We will also define replies for people that show interest in the bot, such as asking them to follow the VenitianBot or links to the DHlab projects related to Venice as Venice Atlas. We will consider that people show interest for the VenitianBot if reply to one of our tweets, favourite it or retweet it, which can be detected with the Streaming API.

Twitter streaming API

As planned in the last blog report, we moved from the Twitter REST API to the Streaming API. This means that, instead of having to send multiple requests to retrieve tweets, we now just have to ask once when the program is launched and we will then receive all tweets in real time. Fortunately we can specify some criteria, which avoids having to process thousands of tweets per seconds. With the actual keywords and locations criteria we have around one tweet per second, this is likely to vary depending on the time of the day and even more based on the period of the year (we expect to see much more tweets during the summer).

Play framework and web interface

As we continued to implement the VenitianBot, a question remained as how we would be able to display results in real time. There exists no effective solution on a standard project except for Java Applets but this technology is now obsolete and intrusive as you need to install Java plugins for your browser in order for it to work.

  1. MVC and the Play framework:

With all that in mind, we found what is called web frameworks that exist for Java, Scala, … These frameworks often use the MVC (or MVCC) concept. MVC stands for model-view-controller and these three parts work together to create a powerful tool to create in this case web applications.

It is composed of mainly four elements:

  • Model: it describes how the information looks and what it is. It is often paired with a database in order to store all these informations.
  • View: it defines how the data is displayed on the browser. It is mostly composed of html and javascript files that are sent to the browser. It is possible that the framework helps the programmer by making them hybrids between html and other languages so that information from the model is easier to manipulate (see index.scala.html in Play).
  • Controller: this is the heart of the concept. It essentially controls everything in the web application. It defines how the application should behave from what view to send for which url to which data to send to the browser and all computations in between.
  • User: the user sees the view and uses the controller at every time.

The Play framework is a Java/Scala based framework base on MVC. We chose this framework in particular because we use it for another course and are more familiar with it than any other. It gives us the web interface we need for showing results and makes us able to code with the powerful Java (which has one of the best Twitter APIs out there).


Diagram representing the MVC (Model-View-Controller)

2. Displaying real time information:

What we need to make the web interface more useful is displaying real time information about the VenitianBot: what it recognizes to be an interesting tweet, which tweet should be responded to and what has been responded. This will also help for testing to see whether some tweets shouldn’t have been recognized or some replies are too intrusive or not interesting enough. But to send informations in real time to the web browser of the client, we will need to keep a channel open from the browser to the server. In this case, we use what is called WebSockets. They allow us to keep this connection and to send elements from the server to the browser without needing to reload the page. WebSockets work both ways: the server can send data to the browser and the browser can send data to the server. The former is exactly what we want to add tweets as they come to the view but the latter is not needed. The Play framework being awesome gives us exactly what we want in the name of CometSockets. They allow the server to communicate in a one-way fashion with the browser. This is important security-wise and we do not want users to crash the server because of data streamed to it.

This will be implemented during the Easter break and ready when school resumes.