The original plan of the project initially reported here has proven to be infeasible and particularly impractical for now. Following up on our first progress report (Report 1) we redefine our goals and show the current results in the following.
While we followed our project plan in most parts so far we were not able to obtain the hardware needed to complete the upcoming steps. The intended collaboration with another laboratory at EPFL was not successful. Therefore and due to the high costs we mentioned earlier, we are not able to work with our desired drone, the AscTec Hummingbird. We intensely searched for other customizable solutions that would offer both a programmable robot controller and sufficient payload to mount a higher quality camera that would deliver the pictures used for reconstruction. However, there is currently no feasible solution within the frame of this project.
Our progress in the reconstruction task allowed us to discuss other ambitious changes in the project goals with our supervisor. Instead of working with drones we decided to offer our reconstruction pipeline as a service to a much larger group of users by developing a web service. Processing amateur videos does not only satisfy the demand of the great public but also offers a large number of opportunities to further develop this project. Possible extensions include enhancing street maps by embedding 3D models of important points of interest. An even further approach might stich videos of different users together to refine models or to connect various models and finally create 3D representations of whole streets or even cities.
Updated progress plan:
Due to the difficulties we have faced we have updated our project plan according to the following. New milestones are illustrated in more detail to explain our new idea.
#1 [Done] Regulations regarding UAV flights in Lausanne and in Venice
#2 [Done] Reconstruction software
The 3D reconstruction pipeline is now complete. Now we have a fully automated reconstruction pipeline which given images and their metadata (eg. focal length, distortions, etc.) constructs a dense 3D model. The main principle is the so called Structure from Motion (SfM) technique which recovers 3D structure of the scene using images taken from different viewpoints. The reconstruction pipeline consists of multiple modules. First, features are extracted from images and feature matching is done to compute transformations (relative rotation and translation) between images. These transformations are used to triangulate points which give rise to “point clouds”. These point clouds and camera poses are finally combined together into one scene using an optimization process called Bundle Adjustment. It aims at minimizing reprojection error of the point cloud. We used an opensource framework called openMVG (open Multi View Geometry) for bundle-adjustment and sparse 3D reconstruction.
The second module in the pipeline, multi-view stereo, uses such sparse 3D reconstruction and camera poses to create a dense 3D reconstruction of the scene. Patches are grown iteratively from the sparse 3D points taking into account certain photometric constrains. We used an opensource library called the CMVS (Clustering Views for Multi-view Stereo) for such dense 3D reconstruction.
The final (but optional) step in the pipeline is generating meshes from 3D point cloud and applying texture to the 3D model. We used the freely available MeshLab for this post-processing step.
As a demo we downloaded a video from YouTube, extracted images from the video stream and constructed a dense 3D reconstruction in a fully automated way.
The video above is taken by a GoPro camera onboard a quadcopter (DJI Phantom) flying above the famous Colosseum in Rome. We extracted images from a continuous shot in the video stream and fed it to the sparse reconstruction module. Below is the result. The green dots are the camera poses in the scene.
We then fed the sparse reconstruction to the multi-view stereo module for patch expansion and dense 3D reconstruction. Below is the result. As can be seen, even a small collection of 26 image frames can lead to a reasonably good reconstruction result.
#3 [Discarded] Hardware – selection and ordering
[In progress] Reconfigured project plan
#4 Web interface [5 weeks]
Since we are not able to work with drones, we decided to extract images for 3d reconstruction from videos. After negotiating with our professor we decided to create a web interface where people submit their videos and the system reconstructs 3D shapes from that. We divide the web interface into several modules and each of them requires 1 to 2 weeks to accomplish:
User management: Each user will be able to register to the system and create their own profile, where they manage their videos and personal information.
Video upload module: User will be able to upload video including several optional fields (video description, location (city, country), coordinates). We can extract GPS coordinates directly from video metadata but this option is natively enabled only for video files which are taken by iPhone cameras. We have to find other (manual or automatic) coordinate retrieval alternatives for other types of files.
Videos and 3D model listing and details: All videos and reconstructed 3D models will be displayed to the user in a listing page. A more detailed page will contain particular videos and 3d models with description and location on a map. Also content edit, delete and share function will be enabled.
Notification after reconstructing 3d model: After uploading the video an finishing model reconstruction process, users will be notified via mail by sending the link of the detailed page where they will see the 3D model along with the video itself.
#5 Making our service publicly available [2 weeks]
Since the task of reconstruction is computationally expensive we cannot present our idea to a larger public while we are working on a limited number of machines. Together with our supervisor we have determined a company that would allow us to install our service on their platform.
#6 Testing [optional]
Depending on the performance we reach after deploying our idea on the servers we will advertise our service to a smaller or larger number of people through mailing lists and announcement on social media.
Facing problems at obtaining hardware turned out to be a valuable advantage for us as the adaptations of the plan came fast and naturally to our minds. Given that we still have several weeks left we are confident to accomplish the re-designed tasks.