The city of Venice has been the setting of several great movies over the years. Important locations that are known for their architectural beauty or historical value, have been featured in many scenes. Experiencing these locations as they have been portrayed in the movies would be of great interest, both to the large number of tourists that visit Venice every year, and to researchers studying the changes that these places undergo as time goes by. However, currently very few tools are available to make this possible in a user friendly manner. Although some of these such as MovieLoci help to identify movie locations, they contain only a very limited set of the movies set in Venice, as they are not specific to Venice. Our aim in this project is to help solve this problem by
- Researching and organizing information about movies that are set in Venice along with their locations and year of release.
- Making an Android app, that allows users to experience Venetian locations as portrayed in movies by using maps with georeferenced movie information.
One of the most important parts of this project is the collection of relevant data about movies and locations. Thus as a first step, possible sources of data were search and identified. They are listed here, along with the advantages and disadvantages of each with respect to data extraction. Then the data collection method to be used was decided. The sources were initially divided broadly into books, websites and movie databases. Extracting data from books was planned towards the end, as this would have to be done manually. For the websites and databases, we decided to use python scripts for parsing and querying to obtain the data. Many websites provide information in the form of paragraphs. To extract information from these, we initially tried to build a dictionary of places, using which we could match the locations in the paragraphs. But this approach required more time than we had expected, and hence we decided to focus on movie databases such as IMDb.
We first planned to use a simple HTML parser based on python to extract movie names and location data from the IMDb search page. However this turned out to be difficult as the parser needed to keep state information on how this information is organized on the page. We then found a separate interface, from where we obtained information in text format. We were able to extract 239 entries of movies and locations, and also the genres for the movies. But the problem here was that the information in text format was not coupled. To link them, we needed to find a unique identifier of a movie, as the movie names are not unique. The IMDb id of a movie is its unique identifier, but this was not directly available from the text interface. We found a source (OMDb API) on which we could search for movie details by giving the movie and year of release and the result would be given in JSON format. An example is given here. This result contains the IMDb ID as well as other details that we require such as Genre, Director, Actors etc. However, some problems remained. The OMDb API uses only the primary title names in IMDb. It does not give results when querying for non English movies, as the primary title in the database is translated to English. Also there were different TV Series/Movie/Documentary with same title and year and the correct IMDb ID could not be determined by this method.
We had abandoned the IMDb search feature earlier as it required a complex parser keeping state, as mentioned. But, on examining it further we found that extracting the IMDb ids using a parser is easy as it needs to only match hyperlinks of format imdb.com/title/<id>. Once the IMDb ID is known, the movie name and year can be obtained from OMDb API as stated above. Information on filming locations for a particular movie was extracted from imdb.com/title/<id>/locations. There were many entries with location information as just Venice, Veneto, Italy. These were discarded, and only the entries with more precise information were kept.
The model of our database is given below. The description will contain a YouTube link to the scene if it is available. If not, a link to screenshot, and a verbal description of the scene with the time codes can be added.
In order to display the information on a map, geolocation-python module was used to get the coordinates ( latitude, longitude) information for each location. However, the geocoding was not very accurate, and there are some mistakes. Either a better geocoding module need to be used or the incorrect results to need to be fixed manually. Also, there could be several different entries with different names for the same place. These could be removed by a script that checks all latitudes and longitudes and merges places with difference in coordinate values less than a threshold, say 0.00001, into a single entry.
For the map, we decided to create a new map in Google Maps and then embed it. First we created the map by uploading a csv file containing movie details and location details per row, but we faced the problem that the markers for different entries at the same location were getting overlapped and only one of them was visible. To solve this, we create the map by uploading data in KML format. We wrote a script to generate a KML file in which there was one entry per location and the description for that location contained information (IMDB link, movie details, pictures etc) for all movies that were shot at that location.
To get more scene information, in the last step of data collection, we turned to the book World Film Locations: Venice. The book contains a lot of information including the screen shots and time code of scenes shot at several locations in Venice. This information was used to search for the YouTube videos of scenes, which was linked to in the map as well as the database. In the time available, we have been able to obtain links to 10 scenes. This can be further expanded just by relying on the information given in the book. In addition, for the scenes whose YouTube link is unavailable, the scene description and screen shots given in the book could be used, after satisfying copyright requirements.
An Android app that displays the collected information to the user has also been developed.
1. The map is given below. There are three layers, ‘Links only’ displaying just the movie names and IMDb link, ‘Links with pics’ additionally displaying screenshots, and ‘Full media’ which displays the YouTube link with scene description. :
3. All sources files ( Android app , scripts etc )and other files will be uploaded to https://github.com/sachinbjohn/venice-in-movies no later than 20th May, 2015