Tourists Pictures[D1]


The way people get to know a city varies and the common practice is to read the official booklets or visiting the official websites. However, we would like to present the way how the tourist see a city by using the pictures tourists take. By analyzing big image dataset, we extract and reconstruct the 2D tourist pictures of Venice into 3D models.

Tourists in Venice
Fig 1. Tourists in Venice

Project objective

The project aims at reconstructing the picture of points of interests in Venice taken by tourists in Venice into 3D models and we would like to present in a way how tourists see Venice.


In the project, we are planning to present several programs and scripts, a database, a web interface and visualization of the 3D building model.

  • √ Programs and scripts: scripts to capture tourists’ pictures of Venice from Google and popular image website like Flicker, programs of clustering the pictures by location thus creating a database of images including the location information, program of extracting the pictures which will be used in 3D construction modeling softwares.
  • √ Database: an organized database including the location information.
  • √ Web interface: a website which is made up of a map including places of interests of Venice and the 3D pictures we generated.
Fig 2. Deliverables


1. Getting tourist’s pictures of Venice.

  • √The number of pictures is around 10000 (around 2mb per picture)
  • √Using Python script download pictures from flickr and google image.
  • √Using keywords or tag ‘Venice’.
Fig 3. Venice

Using google “Point of interest”, get pictures for the various famous point of interest. The names of those sites can be used as the classification catalogue. Also, those pictures can also be used as training dataset. Furthermore, this is also the picture source of the semantic query part.

2. Clustering by location

From step 1, we can get a huge amount of tourist image if we using keywords ‘Venice’. The challenge here is to get several clusters of building pictures by using classification approach or graph-based approach.

Graph-based approach.

Among these images, some may have GPS info within JEPG structure. Others are purely visual files. In paper [1], researcher use graph-based approach to cluster worldwide landmarks. On this specific landmarks graph, images share similar GPS coordinates have an edge. And visual approach is also used here. Images share large common features are also connected by an edge. The edge in a graph represents the two nodes are either geo related or visual related. Finally, we can get each cluster by connections on graph.

Fig 4. Graph

Classification approach

Another feasible approach is to use the images with GPS info as an already trained set to classify images without GPS info. A naïve approach is to compare every to-be-classified image with images which have been already classified and label it by the most likelihood cluster.

In paper [2], the researchers compare image similarity search with text search using tf-idf. The query ‘words’ of an image here are feature descriptors of all the features. In the database, if two images share a common feature, they share a ‘word’(descriptor). Thus, a descriptor vector space is built to cluster spatial adjacent features. Each cluster is a bag of words and has an occurrence frequency. Finally, a new image can then search the all database and retrieve the most relevant images(in paper 10 images).

We can apply this approach by just add a classification step after the retrieval. We classify the image into the most likelihood retrieval cluster.

3. Get image database.

After clustering, we finally get the classification results of the whole picture database. The classification result can be tags inside the info files of pictures or simply be separated in different pathway of folders. The database will then be used in the next extracting process class by class (indicating one certain tourist site).

4. Semantic Query

Then we can also using the semantic query in google image searching engine. This procedure will be applied as a parallel procedure with the clustering procedure. Meanwhile, it is also a backup for the clustering procedure.

5. Extracting picture from database

This is the core part as well as the highest technically demanding one for the whole project. In this procedure, a certain number (around 30) of pictures for a certain scenic spot will be selected by computational method. The eventual target for the extracting process is aiming at select proper pictures for the 3D-reconstruction.

The first step for the extracting is using histogram. This method will be very useful to separate those pictures taken in various time (daytime or night) and get rid of those improper pictures for 3-d reconstruction. The other characteristics of the histogram such as contrast and brightness can probably be used for some other classification purpose(Fig.5).

Fig 5. Histogram

Then we have to select canonical images from this still large cluster. Paper [3] gives a way to summarize a bounded canonical set. This set may be way too smaller to do 3-D reconstructing. But it can be treated as a base set and we can increment the set. The increment algorithm is able to select images which share a common area with two or three images in canonical set. This would give us intermediate images between two or three canonical images, which theoretically is good for reconstructing.

6. 3-D reconstructing

We offer one 3-D reconstruction software to mainly process result pictures from extracting procedure in order to gain the 3-d model of the target constructions. Due to the robustness of 1,2,3-D Catch[4], preliminary number of pictures will be 30-50 considering some error data(totally irrelevant pictures with target construction) to be deleted by the software itself. Meanwhile, a project of University of Washington working on 3D-reconstruction will also be referred, this project[5] using the location information of tourists’ photo to build a better visualization and photo exploration tool for collections of photos of the same scene. The algorithm of this project will supply some certain reference content for our final 3-D reconstruction because they share the same picture source with us—tourists’ photos.

3D Reconstruction
Fig 6. 3D Reconstruction

Video Demo

Plan and Milestone

Fig 7. Planning
Fig 8. Workflow


[1] Zheng, Yan-Tao, et al. “Tour the world: building a web-scale landmark recognition engine.” Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.
[2] Philbin, James, et al. “Object retrieval with large vocabularies and fast spatial matching.” Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on. IEEE, 2007.
[3] Denton, Trip, et al. “Selecting canonical views for view-based 3-D object recognition.” Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on. Vol. 2. IEEE, 2004.
[4] 123D Catch,
[5] Photo Tourism,