I. Definition of the project and deliverables
The aim of the project is to build a webservice that can be used to recognize any Venetian Palace based on its image. The idea is to build a website where the user uploads an image of the palace and can know which palace or building it is and also learn a brief history of the place and how to reach it. For the purpose of recognizing images, we try to extract unique signatures from each image such as position of windows, doors , arches, color of the walls and size of the palace and other specific architectural features. We build this image recognition system in the backend of our webservice.
First we make a list of all the palaces that we want in our database. To build an automated image recognition system, we need a large number of good quality images of each palace that is in our database. We collect these images from various sources like wikipedia, books like “Palazzi di Venezia by Andrea Fasolo”, google search using images from wikipedia and among others. Once we have obtained a significant number of images for each palace, we study the architectural features in the venetian palaces to know what features should be extracted from the images. This is an important step in the project as this gives us an idea of what are the distinguishing features we should be looking in the image to classify them with the highest accuracy possible.
Next step in the project would be the recognition problem. For this purpose we propose two ideas that will be implemented. In the first method, we try to apply several state-of-the-art algorithms of Computer vision like SIFT, HOG, GIST to extract features around spacial keypoints of an image using OpenCV package. We then use these features to train a classification model using linear SVM, Neural networks, etc. We then test these algorithms’ performance on our venetian palace dataset. This first approach does not involve any details of the architecture in the process of classification. In the second approach we try to embed the details of architecture in the recognition problem. In this approach we segment the images into parts containing distinct architectural details like doors,windows, arches,etc. We identify different types of these segments and the distances between these segments and encode them in a graph. So, for each palace we will have a set of nodes representing the architectural details of the building and the edges representing the spatial orientation or positions of these segments in the image. We train the graph model for each palace using the large number of images we collected. At the end of this process, we will have a unique graph for each palace. This kind of modelling approach can be used for 3D reconstruction of palaces also. We then compare the two approaches tried, to know which of these methods give good results. The following figure gives a pictorial view of the recognition part of the project.
The last part of the project would be to build a web interface. The webinterface would ask the user to upload an image. We will then try to identify the image using our image recognition algorithms. If we are able to identify the image with good accuracy we would return the user useful information about the palace like it’s history, how to reach it,location on google maps, etc. If we are unable to identify the image uniquely, we will prompt the user to answer some specific questions about the image like how many number of doors or windows he sees, color of the image or some geographical information(if possible). We use these additional parameters that the user provided to narrow down the number of possible palaces and give the one which matches the users request closely. A possible web interface picture is shown below.
III. Project Plan with milestones
- Collection of Data sets – Week 1 and 2.
- Extracting features using first approach and Segmenting the images in the second approach – Week 3 and 4.
- Training the first model and building a graph structure for the second approach – Week 5,6 and 7.
- Training the graph structures for all the palaces – Week 8,9.
- Basic design of web interface – Week 10,11
- Integrating image recognition into our web interface and testing – Week 11,12
- Final Report and Poster preparation – Week 13
- Goel, Abhinav, Mayank Juneja, and C. V. Jawahar. “Are buildings only instances?: exploration in architectural style categories.” Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing. ACM, 2012.
- Lazebnik, Svetlana, Cordelia Schmid, and Jean Ponce. “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories.“Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. Vol. 2. IEEE, 2006.