Feature-less Stitching of Cylindrical Tunnel

Feature-less Stitching of Cylindrical Tunnel

Ramanpreet Singh Pahwa, Wei Kiat Leong, Shaohui Foong, Karianto Leman, Minh N. Do R. S. Pahwa and K. Leman are with Institute for Infocomm Research (IR), Singapore (e-mail: ramanpreetpahwa, karianto@i2r.a-star.edu.sg). W. K. Leong is with University of Glasgow, UK (e-mail: weikiat.leong87@gmail.com). S. Foong is with University of Technology and Design (SUTD), Singapore (e-mail: foongshaohui@sutd.edu.sg). Minh N. Do is with the Department of ECE, University of Illinois at Urbana-Champaign, IL, USA (e-mail: minhdo@uiuc.edu).

Traditional image stitching algorithms use transforms such as homography to combine different views of a scene. They usually work well when the scene is planar or when the camera is only rotated, keeping its position static. This severely limits their use in real world scenarios where an unmanned aerial vehicle (UAV) potentially hovers around and flies in an enclosed area while rotating to capture a video sequence. We utilize known scene geometry along with recorded camera trajectory to create cylindrical images captured in a given environment such as a tunnel where the camera rotates around its center. The captured images of the inner surface of the given scene are combined to create a composite panoramic image that is textured onto a 3D geometrical object in Unity graphical engine to create an immersive environment for end users.

Image Stitching, Cylindrical Projection, Unity Simulation

I Introduction

Aging infrastructure is becoming an increasing concern in the developed countries. There is a growing need for automatic or user-assisted assessment, diagnosis and fault detection of old structures such as sewage tunnels, bridges and roof-tops [1, 2]. Some of these structures may also be inaccessible or too dangerous for human inspection. For example, manual inspection of deep tunnel networks is an extremely challenging and risky task due to the inaccessibility and potentially hazardous environment contained in these tunnels. Due to the health risks involved, UAVs coupled with scene understanding techniques [3, 4] provide a perfect choice as they are compact and can be automated or controlled by a user to remotely capture the necessary information.

This paper builds towards imaging and inspection of the Deep Tunnel Sewerage System (DTSS). DTSS is a massive integrated project currently being developed by the Public Utilities Board (PUB) in Singapore to meet the country’s long-term clean water needs through the collection, treatment, reclamation and disposal of used water from industries, homes and businesses [5]. These DTSS tunnels are covered with a corrosion protection lining (CPL) for protection. This paper aims towards automatically stitching the images collected by the UAV into a cylindrical panoramic view of the tunnel and render the tunnel in D to inspect the physical conditions of the CPL as well as the structural integrity of the tunnel as a whole.

While UAVs provide a viable alternative for remote assessment of deep tunnels as they are unaffected by debris and sewage flow, they are primarily designed for high altitude aerial imagery and are not appropriate for short range detailed imaging of tunnel surfaces. An alternative is to attach a camera in front of a UAV and capture the panoramic view of the tunnel. However, these images have low resolution that are not suitable for fault detection. Moreover, most of these cameras are too heavy and/or consist of odd shapes that render them difficult to attach to a UAV. Instead, we can use a lightweight and high resolution camera. After performing calibration [6, 7, 8], the camera can be mounted on a UAV. The camera rotates around the shaft of the UAV while it moves forward in a tunnel, in turn providing us with spiral-like images.

Fig. 1: The UAV is assumed to be moving horizontally during the capturing process. This constant horizontal motion of the UAV coupled with the rotating camera results in a panoramic spiral image.

This paper presents a framework where we simulate a UAV to fly through a cylindrical sewage tunnel as shown in Fig. 1. We record the trajectory of the UAV moving in the scene and and integrate it with D color images captured by a rotating camera to stitch the images using cylindrical projection. Thereafter, the stitched cylindrical images are textured on a tunnel-like D object and displayed in Unity [9] to assist users in visualization, remote inspection, and fault detection of these tunnels.

In particular, we make the following contributions:

  • A novel revolving camera system is simulated in a virtual environment which, coupled with the maneuverability of the UAV, allows capturing high definition images of the tunnel surface efficiently in a spiraling motion.

  • We develop a mathematical framework to combine recorded camera trajectory with known scene geometry of a tunnel to automatically combine the images captured and create a “panoramic” view of the tunnel.

  • A geometrical visualization framework is developed using Unity to assist the users in visualizing the tunnel and room-like geometry for inspection and fault detection.

Ii Related Work

Image Stitching: A lot of work has been done in the computer vision [10, 11, 12, 13] and photogrammetry community [14, 15, 16, 17] to perform image stitching. Traditional image stitching techniques rely on an underlying transform, usually a affine or homographic matrix that maps pixels from one coordinate frame to another. Typical image stitching techniques such as AutoStitch [18] assume the camera’s location to be static, i.e. a pure camera rotation between captured images, or the captured scene to be roughly planar. In our panoramic spiral imaging system, the camera both rotates and translates while capturing the scene. This translation is often not negligible as compared to the distance of the tunnel surface to the camera. Moreover, the planar assumption of the scene is invalid for us since we capture images of a cylindrical tunnel. Dornaika and Chung [19] proposed a heuristic approach of piecewise planar patches to overcome this issue. Other recent methods [20, 21, 22] propose a different strategy of aligning the images partially in order to find a good seam to stitch different images in the presence of parallax. However, these methods rely heavily on reliable feature detection and matching, which might be difficult in the tunnel environment. Furthermore, these methods do not exploit known geometry of the scene.
Structure-from-Motion: Structure-from-Motion (SfM) refers to the recovery of D structure of the scene from given images. A widely known application of SfM is where ancient Rome is reconstructed using Internet images [23, 24]. SfM has made tremendous progress in the recent years [25, 26]. Since, we use a simulated environment for this work, we use the ground-truth camera pose recorded while capturing the dataset.
cameras: MIT Technology review [27] identified cameras as one of the top ten breakthrough technologies of . Usually two or more spherical cameras are used to stitch images and videos in real-time [28, 29] or offline [30, 31]. However, the current technology suffers from extensive motion blur and low resolution. Expensive and dedicated hardware is required to capture and post-process high resolution video [32]. To our knowledge, the best resolution for video capture is provided by VIRB [28] at pixels per resolution at fps. This is not sufficient for anomaly inspection and identifying faults in sewage tunnel linings. We intend to use a light-weight GoPro camera [33] for our future data collection which will provide upto pixels per resolution at fps. This will result in twice the resolution and twice the frame rate compared to the best solution available currently in the market.

Iii Cylindrical Projection

Fig. 2: Cylindrical projection - A D point is projected onto a cylinder with unit radius to obtain cylindrical coordinates. These cylindrical coordinates are then eventually flattened out onto a planar image.

In this section, we set up the cylindrical projection that models spiral imaging of tunnel surfaces. As shown in Fig. 2, a generic D pixel of an acquired image, , can be projected to a D point using a camera’s intrinsic projection parameters - focal length, , and optical center () - as follows:


where represents the internal calibration matrix of the camera and refers to the pixel’s depth. This D point is projected onto a unit cylinder as follows:


where, angle, , and height, , are two parameters required to represent a D point lying on a unit cylinder. The unit cylinder can be unwrapped onto a planar image as shown in Fig. 2 as follows:


An example of cylindrical projection of a D planar image captured by a camera is shown in Fig. 3.

Fig. 3: (a) A sample brick image. (b) Image obtained after cylindrical projection.

Iv Image Stitching

We use Unity to generate our simulated dataset. A cylindrical hollow cylinder, resembling a tunnel, is created in Blender [34] and a texture is applied to it in Unity. The camera is placed near the center of tunnel and is rotated a certain degree every few milliseconds. The benefits of taking an image after the camera rotates a certain pre-defined rotation is to avoid issues such as motion blur and rolling shutter noise that would occur if we capture a video sequence in real environment.

As we use a simulated environment, we can generate the ground-truth measurements for the movement of the camera in the tunnel. We consider these rotation and translation measurements of the camera as the final camera pose for our framework. The image capturing and stitching design is shown in Fig. 4.

Using the camera pose enables us to project and transform the D pixel information per image into world coordinate frame. Let represent the D points, per frame, in world coordinate frame. A global point lying on the cylindrical tube of radius can be represented by two parameters - and as follows:


Let represent the points in camera frame of reference for every image captured by the rotating camera. Every pixel (=) can be projected onto D as:


where refers to the depth of pixel which is unknown. and are related by:


where is a orthonormal rotation matrix and represents the translation of the UAV in world coordinate frame. Initially the UAV is assumed to be at the center of the tunnel (). As the UAV moves horizontally across the tunnel, increases. and denote the deviation of the UAV from center of the tunnel in and directions respectively.

(c) Image captured at
(d) Image captured at
Fig. 4: Top row shows the unity simulation when camera is rotated by and with respect to y axis (green arrow). Bottom row displays the images captured by the virtual camera at these two angles respectively.

For every pixel, we obtain three equations for three parameters - , , . The two equations involving can be deterministically solved as follows:


where and represent the first and third element of the column vector . Let


Solving for using Eqs. (6,7,8,9), we obtain:




Thereafter, we can compute by finding an intersection of the following two solutions for Eqs. (7,8):


Thereafter, we can also estimate the component of by:


Once we compute the world coordinate of every pixel’s location, we can obtain the cylindrical projection for it as described in Sec. III.

V Experimental Results

In this section, we perform synthetic experiments and compare our results in both noiseless and noisy scenarios. We also discuss our Unity framework and display a few examples of the rendered cylindrical scene in Unity for visualization. Unity is a game engine used mainly to create platform independent video game applications. However, it also provides us with a good set of tools to provide an immersive D visualization for various purposes. The cylinder of radius, m, is positioned such that the geometrical center of the cylinder is located at . We texture-mapped a brick wall with a downward facing light source onto the inner face of the cylinder for visualization purposes. The light source is fixed to look vertically down for our experimental evaluation. Hence, images captured around rotation are brightly lit while the images captured around appear dark due to lack of illumination. A few images captured by the virtual camera are shown in Fig. 4(c-d),  5(a-d). Let us denote the cylindrical image to be synthesized as .
Stationary Camera Positioned at Center: In our first experiment, we positioned the camera at the center of a cylindrical tunnel. The camera is held stationary throughout this experiment and only rotated rotated by per frame and images are captured consequently. An example of this process is shown in Fig. 5. It takes images to complete the full rotation. Thereafter, the images are stitched together as described in Sec. IV. In this experiment: (14)

For each image captured, we use Eqs. IV and 13 to project each pixel onto the cylindrical stitched image . However, performing this “forward warping” may leave some holes in the stitched image. Thus, instead of performing forward warping, we use the four corners of each image to obtain the forward warped boundary. Thereafter, for every pixel inside this boundary of , we perform “inverse warping” to obtain its pixel location and intensity information in . The fully stitched image is shown in Fig. 5(e). We observe that the images align perfectly with each other and all the “curved” bricks are straightened after performing cylindrical projection.

(e) Stitched image
Fig. 5: Top row displays the images captured from unity when the camera is rotated from to degrees. Bottom row displays the stitched image obtained after our stitching process.

Stationary Camera Positioned off Center: Under ideal conditions, the camera should be positioned at the center of cylindrical tunnel. However, in reality this will not happen as UAVs tend to hover around and might have difficulty maneuvering to the center of the cylinder before each image is captured. We move the camera off center, i.e. . The camera is held stationary throughout this experiment and only rotated by per frame and images are captured consequently.

A cylindrical image is a view of the scene around the camera flattened on a planar image. Thus, even though the camera is off center, we can still view what the cylindrical projection of the scene looks like assuming the UAV’s current position to be the center of the tunnel as shown in Fig. 6(a). While the stitching is perfect, the straight lines (bricks) are no longer straight and we can see the zoom in and zoom out effect when the camera is far or near to the tunnel boundary respectively. We use Eqs. IV and 13 to synthesize what each image would look like if the camera was positioned at the center of the tunnel. The fully stitched image is shown in Fig. 6(b). We observe that the images align perfectly with each other and all the “curved” bricks are straightened after performing cylindrical projection.

Fig. 6: (a) Stitched cylindrical image without accounting for camera location off center. (b) Stitched cylindrical image after accounting for camera’s location off center.

Freely moving camera in a simulated tunnel: In our last experiment, we aim to simulate UAV movements in real-world conditions. A UAV is expected to suffer from jitters and sideways movements while it tries to balance itself and move forward in the tunnel. This means that the camera’s movement and rotation per image given by IMU may be unreliable for our stitching purposes.

(a) Simulated camera trajectory inside the tunnel.
(b) Using baseline pre-estimated camera pose.
(c) Using groundtruth camera pose, four rotations.
(d) Using groundtruth camera pose, ten rotations.
Fig. 7: We add random jitter and movement to the camera after each image capture. Using baseline rotation and translation leads to an extremely inaccurate and incoherent stitch.

We simulate a tunnel of radius m. We initially positioned the camera at the center of the cylindrical tunnel. Our baseline movement and rotation in the three orthogonal directions are and respectively. This implies that we expect the ideal movements in the tunnel to be cm horizontally forward (y direction) with a rotation across y axis. We add Gaussian noise with zero mean and standard deviation of to our translation and Gaussian noise with zero mean and standard deviation of to camera rotation per image. We run the simulation till the camera completes ten rotations and record the groundtruth translation and rotation of the camera per frame.

Fig. 8: A hollow cylinder is created in Blender and UV mapped. This cylinder is imported in unity to create both (a) straight and (b,c) curved tunnels based on the user requirements. (d) The user can move freely inside the tunnel and move closer to the boundaries to carefully inspect for any damage in the tunnel.

In the baseline stitch, we assume that the camera does not suffer from any jitter or sideways movement and blindly trust the initially planned rotation and translation per frame. In our second approach, we use groundtruth measurements to perform image stitching. The results of the baseline and groundtruth stitch are shown in Fig. 7. The baseline stitch, understandably, fails to provide us with a good stitch of the simulated tunnel. This validates our framework and showcases that we can obtain good results for stitching and visualizing long underground tunnels as long as we are able to obtain a good estimation for camera pose per image. This cylindrical projection stitch can be wrapped around a cylinder in the Unity engine to provide an immersive D display of the panoramic view for users.

Visualization in Unity: We display the cylindrical images of the scene rendered as described in previous sections in Unity. We envision our system as a “fly-through” of the scene. The user controls the camera position using keyboard and mouse and just like a first-player shooter game, can float around in the D scene freely. This enables the user to not only view the rendered tunnel images but also move closer to the areas where the user suspects faults in the tunnel. Fig. 8 shows our setup for straight and curved tunnels. The user provides a text file with the curve of tunnel and the cylinders for the curved tunnel are rendered when the user starts the application. Video demos for Unity simulation and user inspection can be seen here and here.

Vi Conclusion

We presented a simple and accurate system to capture images of a given scene using a spirally moving camera system and display the panoramic stitched images in unity for an interactive D display. The presented method excels in scenes where prior geometrical information is available. This allows us to project the images in D and warp them onto a unit cylinder to obtain unit cylindrical images of the scene. This approach can be easily extended to other commonly found geometries such as underpasses, rooms, and train tracks.

In future, we plan to extract the geometrical information such as tunnel radius, curved angle of pipes automatically using depth sensors mounted on the UAV. We also intend to test our framework on real dataset in the future.


This research grant is supported by the Singapore National Research Foundation under its Environmental Water Technologies Strategic Research Programme and administered by the PUB, Singapore’s National Water Agency.


  • [1] S. Samiappan, G. Turnage, L. Hathcock, L. Casagrande, P. Stinson, and R. Moorhead, “Using unmanned aerial vehicles for high-resolution remote sensing to map invasive phragmites australis in coastal wetlands,” International Journal of Remote Sensing, pp. 1–19, 2016.
  • [2] N. Metni and T. Hamel, “A UAV for bridge inspection: Visual servoing control law with orientation limits,” Automation in Construction, vol. 17, no. 1, pp. 3–10, 2007.
  • [3] R. S. Pahwa, J. Lu, N. Jiang, T. T. Ng, and M. N. Do, “Locating 3D Object Proposals: A Depth-Based Online Approach,” IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), vol. 28, no. 3, pp. 626–639, March 2018.
  • [4] R. S. Pahwa, T. T. Ng, and M. N. Do, “Tracking objects using 3D object proposals,” in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec 2017, pp. 1657–1660.
  • [5] PUB, DTSS Home, Singapore’s National Water Agency, Singapore, 2017. [Online]. Available: https://www.pub.gov.sg/dtss/
  • [6] R. S. Pahwa, M. N. Do, T. T. Ng, and B. S. Hua, “Calibration of depth cameras using denoised depth images,” in IEEE International Conference on Image Processing (ICIP), Oct 2014, pp. 3459–3463.
  • [7] R. S. Pahwa, “Depth camera calibration using depth measurements,” Master’s thesis, UIUC, U.S.A, 2013.
  • [8] ——, “3D sensing and mapping using mobile color and depth sensors,” Ph.D. dissertation, UIUC, U.S.A, 2017.
  • [9] Unity, Unity - Game Engine, urlhttps://unity3d.com/, Unity Technologies, San Francisco, USA, 2017, [Online; accessed 10-Jan-2017].
  • [10] J. Lu, D. Min, R. S. Pahwa, and M. N. Do, “A revisit to MRF-based depth map super-resolution and enhancement,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2011, pp. 985–988.
  • [11] T. T. Ng, R. S. Pahwa, B. Jiamin, K. H. T. an, and R. Ramamoorthi, “From the Rendering Equation to Stratified Light Transport Inversion,” International Journal of Computer Vision (IJCV), vol. 96, no. 2, pp. 235–251, Jan. 2012. [Online]. Available: https://doi.org/10.1007/s11263-011-0467-6
  • [12] T. T. Ng, R. S. Pahwa, B. Jiamin, T. Q. S. Quek, and K. H. Tan, “Radiometric compensation using stratified inverses,” in IEEE International Conference on Computer Vision (ICCV), Sept 2009, pp. 1889–1894.
  • [13] X. Chu, T. T. Ng, R. S. Pahwa, T. Q. S. Quek, and T. S. Huang, “Compressive Inverse Light Transport,” British Machine Vision Conference (BMVC), vol. 10, no. 16, p. 27, 2011.
  • [14] R. Szeliski, “Image alignment and stitching: A tutorial,” Foundations and Trends in Computer Graphics and Vision, vol. 2, no. 1, pp. 1–104, 2006.
  • [15] A. Zomet, A. Levin, S. Peleg, and Y. Weiss, “Seamless image stitching by minimizing false edges,” IEEE Transactions on Image Processing (TIP), vol. 15, no. 4, pp. 969–977, 2006.
  • [16] A. Levin, A. Zomet, S. Peleg, and Y. Weiss, “Seamless image stitching in the gradient domain,” in European Conference on Computer Vision (ECCV).   Springer, 2004, pp. 377–389.
  • [17] M. Brown and D. G. Lowe, “Automatic panoramic image stitching using invariant features,” International Journal of Computer Vision (IJCV), vol. 74, no. 1, pp. 59–73, 2007.
  • [18] M. Brown and D. Lowe, “Autostitch,” 2008.
  • [19] F. Dornaika and R. Chung, “Mosaicking images with parallax,” Signal Processing: Image Communication, vol. 19, no. 8, pp. 771–786, 2004.
  • [20] J. Zaragoza, T.-J. Chin, M. S. Brown, and D. Suter, “As-projective-as-possible image stitching with moving DLT,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 2339–2346.
  • [21] F. Zhang and F. Liu, “Parallax-tolerant image stitching,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 3262–3269.
  • [22] W.-Y. Lin, S. Liu, Y. Matsushita, T.-T. Ng, and L.-F. Cheong, “Smoothly varying affine stitching,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR).   IEEE, 2011, pp. 345–352.
  • [23] S. Agarwal, Y. Furukawa, N. Snavely, I. Simon, B. Curless, S. M. Seitz, and R. Szeliski, “Building rome in a day,” Communications of the ACM, vol. 54, no. 10, pp. 105–112, 2011.
  • [24] J.-M. Frahm, F.-G. Pierre, D. Gallup, T. Johnson, R. Raguram, C. Wu, Y.-H. Jen, E. Dunn, B. Clipp, S. Lazebnik et al., “Building rome on a cloudless day,” in European Conference on Computer Vision (ECCV).   Springer, 2010, pp. 368–381.
  • [25] A. Kushal, B. Self, Y. Furukawa, D. Gallup, C. Hernandez, B. Curless, and S. M. S. M. Seitz, “Photo tours,” in Second International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT).   IEEE, 2012, pp. 57–64.
  • [26] C. Wu, “Towards linear-time incremental structure from motion,” in International Conference on 3DTV.   IEEE, 2013, pp. 127–134.
  • [27] E. Woyke, The 360-Degree Selfie: 10 Breakthrough Technologies 2017 - MIT Technology Review, https://www.technologyreview.com/s/603496/10-breakthrough-technologies-2017-the-360-degree-selfie/, MIT Technology Review, 2017, [Online; accessed 10-April-2018].
  • [28] Garmin Ltd., VIRB 360 — Cameras — Products — Garmin — Singapore — Home, http://www.garmin.com.sg/products/cameras/virb-360, 2017, [Online; accessed 10-April-2018].
  • [29] Insta360, Insta360 ONE - A camera crew in your hand, https://www.insta360.com/product/insta360-one, 2018, [Online; accessed 10-Apr-2018].
  • [30] Xiaomi Inc., Mi Sphere Camera Kit 360 Degree Panoramic Camera, http://www.mi.com/us/mj-panorama-camera/, 2018, [Online; accessed 10-Apr-2018].
  • [31] Rylo Inc., Rylo - The powerful little 360 camera, https://www.rylo.com/, 2018, [Online; accessed 10-Apr-2018].
  • [32] S. Kasahara and J. Rekimoto, “Jackin head: An immersive human-human telepresence system,” in SIGGRAPH Asia 2015 Emerging Technologies, ser. SA ’15.   New York, NY, USA: ACM, 2015, pp. 14:1–14:3. [Online]. Available: http://doi.acm.org/10.1145/2818466.2818486
  • [33] GoPro, Inc., GoPro Official Website - Capture + share your world - HERO4, https://shop.gopro.com/APAC/cameras/, 2017, [Online; accessed 10-Jan-2017].
  • [34] B. O. Community, Blender - A 3D Modelling and Rendering Package, Blender Foundation, Blender Institute, Amsterdam, 2016. [Online]. Available: http://www.blender.org
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description