Category Archives: Tourists pictures (2014/D1)

Progress report post 3

From the previous researches, we focused on the processing method on the classification of image. After the classification, we go throuhg the test procedure using the results from the classification on several selected constructions in Venice.

PART I. Python script of semantic query.

We use a formal python http API and google API to download images from google ajax servers, which pitifully only allow downloading maximum 64 images using google API. There is no way of downloading thousands of google images by machine code. So the following steps are based on 64 images.

PART II.  Classification of images from semantic query

Using the query result of “San Giorgio Maggiore” for example, there are typically 4 types of images from the query dataset.


Figure 1a. the tourist images we want.


Figure 1b. some sketch or 3d-model


Figure 1c. interior of the architecture


Figure 1d. some painting

We can see there are diverse results from semantic query, which we don’t expect. And also different query of landmarks would return different biased images which we don’t want to put them into 3d reconstruction.

The general strategy of classification, which here also can be regarded as filter out the images inappropriate for 3d reconstruction, includes 2 filters. To introduce the strategy, we would like to illustrate the characteristics of the target image such as the image ‘1a’ from the above 4 types of images. The characteristics include the images typically have blue sky and the architecture is the main object of the image.

The first filter eliminates those images with too fewer or too many blue component. For example, the 2a contains a reasonable blue component but 2b does not. So we filter 2b out.


Figure 2a. image and mask.

The white pixel on mask represent this pixel is either belongs to the sky part or lake part.


Figure 2b. image and mask.

The mask is almost black because the image does have sky part or lake part. And we don’t want images taken from interior.

After first filter, we would get a subset of the original data set. In experiment, the first filter eliminates 30% of the original dataset. If we have an original dataset of 1000 images, the intermediate result is still large after first filter. We perform second filter to give a score to each image. The score is based on histogram analysis and measure the histogram distance. We calculate the score by first calculate the pairwise distances of all images. If we have an image dataset of 500 image, then we would get 500*500 distance matrix. The score is based on the distance matrix. We calculate the score by sum up the column of the distance matrix. Thus the score measure the overall distance from one images to the rest dataset.


Figure 3a


Figure 3b. histograms of two images of 3a

For example, if the histograms of two images from 3a are similar to each other, we can get some clue that the conditions of taking the two images are supposed to similar. Also there are a lot of exceptions. We can easily find two totally irrelevant images which have the same histogram.

The result of second filter gives a distance score and the rank of images according to the score. The left image of 3c ranks first and the right image of 3c ranks last.


Figure 3c

PART III.  Canonical images selection

Here we use SIFT detection to compare the distance between images, which is similar to histogram distance but is more discreet and computational expensive. We use this distance to select canonical images. We also get a 100*100 distance matrix if we have a dataset of 100 images after two filters. Every entry(i,j) of this matrix measures how confidence it is to say image i and image j is related. This value is calculated by SIFT detection and feature matching. Images taken under similar condition usually have a higher confidence.

We then perform a leaveBiggestComponent(matrix, conf_threshold) to get the biggest component of the matrix. Also the confidence of this biggest component should above a threshold. This step eliminate the irrelevant images. The result varies from different confidence threshold.

PART IV.  Test result of 3D reconstruction software.

After we have our classification results, we test them in the 3D reconstruction software. In this test, we tried several certain buildings in Venice. The original well-selected tourist picture is on the left and the 3D-reconstruction result is on the right.

1)       San Giorgio Maggiore

14        QQ截图20140506214538

2)       Ponte di Rialto

10 Screen Shot 2014-05-03 at 5.31.30 PM

3)       Basilica di Santa Maria della Salute

18    QQ截图20140506213854

4)       Chiesa dei Gesuati

31  QQ截图20140506214310

It can be seen, due to the restriction of the place (angle) for tourist to take photo, we can’t have a full angle of pictures on these buildings. Therefore we can only gave a part reconstruction on the buildings. Also, comparing the 3)       Basilica di Santa Maria della Salute with the others, it can be told that the more angles of picture we have, the better 3D reconstruction we can achieve.