Venice in Movies : Progress Report 1

Recap

The aim of this project is to organize information about movies that are set in Venice as well design a tool that allows users to experience Venice as portrayed in movies. We plan to develop an Android app which, based on current user location in Venice, gives the list of movies and relevant scenes which are set at or near that location. We also plan to export a database and/or map of movie locations in Venice

Setting up

We decided to spend the first two weeks on designing how we would collect the data as well as how we would store it.  To collect data, we proposed to use python scripts to parse web pages and extract information from them.  In some web pages, the scene locations were easily identifiable and could be extracted easily, whereas in some other pages, the location was described in a paragraph of text from which it could not be extracted easily. For this, we are compiling a dictionary of places in Venice from which we could match the locations in the paragraph.  To understand the possible problems we might encounter during the data collection phase, we tested this on  filmaps.com. From this site we were able to extract several locations that we added to our dictionary. We also marked them on a map (one of our final deliverable) to see how it would like.

We also thought about how the tables in our database should look like. We plan to create six tables. We needed a design in which the movies, scenes, actors and directors are stored in separate tables related by keys, as a movie could have several relevant scenes, several actors could have acted in a scene, and there may be multiple directors for the same movie, and we need the database to be searchable via movie, actor, director, and genre. Since the set of genres is small and well defined, the genres would be represented as an alphabetically ordered list for each movie. The scene table shown would be expanded to include the columns for links to video clips, screenshots or scene descriptions as per availability. For display of places on the map, Google map requires a single table with all the information. The separate tables would be joined using the keys for map generation. But the database would be stored as separate tables in order to prevent repetition of entries. Below, we show some sample entries in the tables.

 

MovieSceneActor Director

 

Problems Encountered

  1. As mentioned above, in some web sites, the film location is not easily identifiable as it is described in a paragraph text. We need to use dictionary of places in Venice to match.
  2. Many web sites list the location as just “Venice, Italy” with no further details. We would expand our search to more web sites for these cases. A movie could be shot at several locations and we applied a filter to look for the “Venice” to extract only those locations which are relevant to us. However, the number of results was less than what we had expected because several entries had city name as “Venezia” and even “Venecia”.Movies shot in “Venice, California, USA ” also appeared on our list and we had to apply additional filters to remove them
  3. The map does not look very bad now, only because we have few locations. We need to find a way to cluster markers to make the map more presentable once we have more locations
  4. In this initial stage we faced some difficulty in obtaining links to video clips or screenshots of certain scenes due to copyright issues. We would expand our search for more scenes and movies, and explore ways of obtaining licensed clips, and if still unavailable for certain scenes, we would provide minimal information about the movies and the description of the scene.