RealTime Image Distortion Correction: Analysis and Evaluation of FPGACompatible Algorithms
Abstract
Image distortion correction is a critical preprocessing step for a variety of computer vision and image processing algorithms. Standard realtime software implementations are generally not suited for direct hardware porting, so appropriated versions need to be designed in order to obtain implementations deployable on FPGAs. In this paper, hardwarecompatible techniques for image distortion correction are introduced and analyzed in details. The considered solutions are compared in terms of output quality by using a geometricalerrorbased approach, with particular emphasis on robustness with respect to increasing lens distortion. The required amount of hardware resources is also estimated for each considered approach.
ptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptpt
RealTime Image Distortion Correction: Analysis and Evaluation of FPGACompatible Algorithms
Paolo Di Febbo, Stefano Mattoccia, Carlo Dal Mutto
^{0}^{0}footnotetext: DISI, University of Bologna, Italy paolo.difebbo1 at gmail.com^{0}^{0}footnotetext: DISI, University of Bologna, Italy stefano.mattoccia at unibo.it^{0}^{0}footnotetext: Aquifi, Inc. cdm at aquifi.com
I Introduction
Image distortion correction is a fundamental task in several practical computer vision and image processing applications. Among them, stereo vision is a very common scenario. Many singlecamera applications also require distortion correction. Examples of such applications are: egomotion estimation [1], metric measurements [2], foodportion size estimation [3], image stitching [4], and many more.
A large number of these applications are realtime and mobileoriented, requiring therefore low latency and lowpower image processing. Implementing distortion correction algorithms on specialized or reconfigurable devices can be an optimal solution to reduce latency and power consumption, as well as to unload the CPU from this task. In particular, Field Programmable Gate Arrays (FPGAs) are often the preferred choice when aiming at maximizing the performance/powerconsumption ratio. Moreover, analyzing a typical vision processing pipeline, image distortion correction is often performed immediately after image acquisition from raw sensor(s). Since most common embedded systems target the entire vision pipeline or its portion on specialized or reconfigurable devices, it is implied that image distortion correction is almost mandatory to be implemented on such hardware rather than on a GPU in order to avoid ineffective transfers back and forth from the shared memory.
Nevertheless, porting image distortion correction algorithms to specialized hardware is not straightforward. Even if locality is assumed in the distortion correction process, there are still parts of the algorithm which cannot be ported naively from their software implementation counterpart. For instance, the distortion correction function which relates input and output images is computationally complex. While a LookupTable (LUT) approach is the ideal solution in software implementations, it might be too demanding in terms of memory when it comes to FPGA implementation. Different solutions have been proposed in the literature in order to overcome this problem, but they are rarely compared among each others. Therefore, in practical implementations, the decision of choosing one approach instead of the other is often made without considering the full spectrum of options. This may lead to oversized, overconsuming, inefficient, expensive systems, which often provide results comparable to the ones enabled by more efficient but less popular alternatives. The scope of this paper is to clarify the current status of image distortion correction on FPGA, by reviewing and evaluating different solutions in terms of image quality and hardware resources. In particular, we evaluate the robustness of stateoftheart approaches with respect to increasing lens distortions as well as to their hardware footprint. Both these factors have to be considered in order to determine the best tradeoff between resources usage and output quality.
In the first part of the paper, an overview of the image distortion correction algorithm is given and its standard software implementation is explained. In the second part, an appropriate hardware (FPGA) design is described, with particular focus on the realtime output capability and memory usage optimization. The main hardware modules used within the design are described with a particular focus on the most critical component, i.e., the pixeltopixel remapping function module. Concerning this task, we consider the defacto standard distortion model proposed by [5, 6] and implemented in [7] and in the OpenCV library [8, 9], which accounts for both radial and tangential distortion. Similar reasoning can also be applied to more complex models, such as the rational one proposed by [10], which however is less commonly adopted.
Possible approximated versions of the standard softwareimplemented approach are defined in order to meet the hardware constraints, and compared by analyzing their geometric error according to the reference software map. Along with hardware resources utilization estimates for each different approach, this study allows to determine which of these solutions could be the best choice in relation to different lenses and target FPGA architectures.
It is important to state that the proposed evaluation is built over the specific distortion model chosen [9], and conclusions might be different if simpler models are considered. This model has been selected since it is very commonly used and general purpose, covering most of the typical scenarios for both stereo vision and single camera image processing.
Ii Related Work
Distortion correction algorithms have been subject of several studies in the literature. In regards to the specific mathematical models for defining the distortion geometry, different models have been introduced [5, 6, 7, 10]. Determining the distortion of the particular lenses requires a calibration stage, in which the image information from the camera is used to estimate the parameters for the distortion model through different techniques [6, 11]. The distortion correction methodologies considered in this paper are agnostic to the calibration approach deployed.
Image distortion correction has been implemented in a wide variety of different contexts. The wellknown OpenCV library [8] includes probably one of the most popular software versions of the distortion correction algorithm [9]. It has also been adopted within GPUs [12] and FPGAs [13]. Focusing on FPGA implementations, many alternatives have been designed for overcoming the problems mentioned earlier about the input/output mapping function complexity. Several papers adopt a straightforward porting of the software LookupTable version [14, 15]. Given the high amount of memory needed in these approaches, external dynamic memories are required, leading to bulky and expensive hardware systems when such memory is not required for other purposes. Other approaches compute the mapping function onthefly by implementing the complete camera and distortion model functions [9] on hardware [16, 17]. Although this approach is very accurate, it may be expensive in terms of gates and also redundant or inefficient from the point of view of power consumption, since the same model computation is performed repeatedly, from scratch, for each point of each frame. Another solution is based on a subsampled version of the full resolution LUT method, using simple bilinear interpolation to get the specific pixeltopixel mapping value. This solution has been considered less frequently [18, 19] than other approaches, even though it provides an effective tradeoff between resources and performance.
Iii Image Distortion Correction Algorithm
The distortion correction algorithm takes a raw camera image as input, and produces as output a distortioncorrected version of the same image, according to the camera parameters estimated at calibration time. The image distortion correction algorithm warps the input image such that the lens distortion effects are compensated and it optionally performs a virtual rotation of the camera frame according to the specific usage needs. This latter transformation is needed for example in case of image rectification for stereo vision, where the images from the camera pair are required to appear aligned as if they were actually acquired by a perfectly aligned system. This setup is often referred to as standard form.
Usually, a pixeltopixel mapping function is used to relate every pixel of the output (distortioncorrected) image to a pixel (located with subpixel accuracy) of the input distorted image, according to the specific correction to apply that is determined at camera calibration time. This design allows to easily compute each output pixel value by simply retrieving its corresponding pixel from the input image through the mapping function; this specific part of the algorithm is called image remapping.
Iiia Image Remapping
Image distortion correction can be regarded as a particular version of a more generic algorithm, that is image remapping. The main idea behind image remapping consists in defining a coordinates relationship between the input (src) and the output (dst) pixels through two mapping functions (i.e., , ):
(1) 
in which is the mapping function relative to the component and is the mapping function relative to the component.
Considering Equation (IIIA), and given a predefined mapping function (), the output image dst can be computed according to the following two steps. First, for each output pixel , the value of and is retrieved according to an appropriate approach, with and generally encoded with floating point values. Then, the algorithm gets the corresponding pixel value from the input image src, and stores it in the output pixel dst. Since the source image coordinates and are not necessarily integers, the desired source pixel is actually somewhere in between four source pixels, therefore it is required to perform a bilinear interpolation of these pixel values according to the map function.
The important part of the image distortion correction algorithm, which characterizes it compared to a generic image remapping algorithm, is the specific mapping function . The and functions are computed according to the considered distortion model [9], such that the input image – after remapping – will actually appear distortioncorrected and possibly rotated.
IiiB Standard Software Implementation
In order to understand how the distortion correction algorithm works, and what are the main problems when it comes to its implementation on FPGA, it is convenient first of all to analyze its software implementation. In this paper, the OpenCV implementation is considered as a reference [8, 9].
The mapping function is usually defined according to a pinhole camera model which includes radial and tangential lens distortion, along with a rotation operation for possible use when required. In this paper, we consider the commonly used model of [5, 6], implemented in [7, 8]. The actual function is fully described in [9].
In the OpenCV implementation, this mapping function is computed beforehand – for each pixel – and all of its values are stored in memory. The remapping method accesses the precomputed map values at run time in a LookupTable fashion. Although this is a very efficient solution in software, since the map function is computed only once at initialization time, this might be not feasible on a standard sized FPGA without external memory due to the size of the fullresolution LUTs.
Iv Image Distortion Correction on FPGA
This section provides a description of our distortion correction algorithm for FPGA. This design is aimed at maximizing throughput and clock frequency through a deep datapath pipelining, and minimizing resources usage, both in terms of memory footprint and gates usage, through appropriate design techniques. Both goals have been successfully achieved, allowing realtime distortion correction at a pixel clock frequency of 100+ MHz. This easily enables processing of 720p video streams at 60+ Hz.
In this section, a detailed hardware description of the inverse mapping function for distortion correction (Map Module in Figure 1) is omitted, since its adaptation from the standard software implementation is not straightforward and a further explanation is required. Therefore, Section V is focused on this specific problem analyzing in detail the constraints enforced by each examined solution to the underlining hardware design.
Iva Top Level Architecture and Submodules
The high level block diagram of this design is outlined in Figure 1. Incoming image data is written into a circular buffer, whose size is characterized by a predefined number of image rows. This amount of rows should be depending on the minimum and maximum relative vertical displacement values of the specific distortion correction map that is currently being used. On the read side of the buffer, with a delay – in number of lines, related to the input frame start – that corresponds at least to the maximum relative vertical displacement value, the output image generation begins using the same timing specs as the input one. In this way, the buffer always contains all the pixels of the input image that are required for remapping purposes. For the most important modules in Figure 1 we provide below a brief description of their functionality.
IvA1 Buffer Write Manager
This module manages the writing mechanism that allows every input pixel to be written into one of the four different buffer memories, according to a strategy that makes possible to access anytime (from the reading point of view), in the same clock cycle, four different pixels that are needed for an interpolation as depicted in Figure 2. For any interpolation quartet within the input image, its four pixels can be accessed in parallel, since they are always stored in different buffers.
IvA2 Buffer Read Manager
This module combines the current Buffer read address, generated by the address manager, with the remapping function coordinates that are related to that current output pixel, in order to compute the buffer addresses containing the input pixels that are needed to fill the current output pixel. Specifically, the remap coordinates are provided by the map module as relative coordinates, meaning that they represent the relative position of the input distorted pixel with respect to the current output pixel:
IvA3 Pixel Interpolation Module
This module, given the four neighbor pixels that are coming from the buffers in parallel at the same clock cycle, computes their bilinear interpolation according to the subpixel map value.
V Mapping Function on FPGA
Within our FPGA design discussed in the previous section, the task of determining the input (distorted) coordinates from the output image coordinates is assigned to the Map Module depicted in figures 1 and 3. Possible issues arise when implementing the distortion correction mapping function on FPGA. The most challenging problem is that a conventional software implementation of the image distortion correction algorithm uses precomputed full resolution maps that are entirely stored in memory as LookupTables. Even considering low resolution VGA frames, the complete map size is too big for being stored in static memories (block ram/rom). Therefore, if external memory devices (e.g., DDR, SRAM) are not included in the hardware design, the methodology borrowed from the conventional software approach is not feasible for hardware implementation. Moreover, this solution requires a significant amount of memory bandwidth.
It is then required to find alternative solutions to the conventional LookupTable approach adopted for typical software implementations. Different techniques can be used in order to achieve different goals, that include primarily: robustness to the lens distortion with accuracy comparable to the conventional approach, reduced FPGA resource utilization and small memory footprint compatible with amounts available even in cheaper FPGAs.
Va Complete Onthefly Function Computation
A first hardwarecompatible approach that comes into mind by looking at the considered remapping function [9] is the one of computing from scratch each value from the current coordinates. This approach, which avoids to store the full precomputed map, is usually called onthefly map computation [16, 17], completely avoids the need of memory, but has some drawbacks.
First of all, for each input frame within the same stream session, these values are computed each time for scratch, even if they always have the same exact value for the same coordinates among frames. This introduces redundant computational costs, which are directly depending on the complexity of the distortion model. Moreover, the floating point computation required by this approach might not be available or too demanding in terms of resource utilization and thus an approximated fixed point computation is mandatory in most cases. This strategy might result in an approximation error that is not negligible in comparison with the resolution of the standard software approach. Therefore a further analysis, provided in the experimental results section, is needed in order to better understand how accurate the fixed point computation should be, in terms of fractional bits, in order to preserve the integrity of the mapping. Despite these facts, most of the image distortion correction approaches on FPGA adopt this strategy.
VB Map Sampling
As outlined before, the main problem that the standard software solution faces when it needs to be designed on hardware is the size of the full resolution LookupTable for the remapping function. Different implementations within the current literature “force” to port this exact solution on hardware by using a complete softwarelike LookupTable [14, 15]. However, because of the size of such structure, an external memory is needed in order to store it, making the design more complex, expensive, and demanding in terms of memory bandwidth.
An alternative approach, that comes to mind after this preface, is the one of using a sizereduced version of the LookupTable. The software computed pixeltopixel LUT can be subsampled, generating a lower resolution version of it, by picking one value every values for both axes. Bilinear interpolation of these samples can be used to reconstruct an approximation of the specific pixel’s map value. The approximation error gets bigger when increases, i.e., when using less samples.
However, as reported in the next section, the subsampling factor can be significantly increased – making the actual sampled map very small – without compromising too much the quality of the map values. This approach is very convenient in terms of hardware resources since it requires only two simple bilinear interpolators (one for the map coordinates, one for the ones), plus a small amount of internal memory to store the subsampled map.
Moreover, even if this approach could be often considered as the best tradeoff for the hardware map function problem, it is much less frequently adopted compared to the onthefly computation one. In fact, only few papers about distortion correction or rectification on FPGA rely on this strategy [13, 18, 19]. The main goal of this paper is to highlight that, instead, the sampling approach is typically very accurate even with a small number of samples (i.e., large subsampling factor ) that require only a fraction of the whole LUT size, making it compatible with the amount of memory available even in cheaper FPGAs.
In the next section we thoroughly evaluate the performance, in terms of accuracy against increasing lens distortion, of each considered distortion correction approach providing a brief analysis of their resource utilization on FPGA.
Vi Evaluation of the Proposed Solutions
In this section, the considered hardware solutions are analyzed in detail from two different perspectives:

Geometric error: The root mean square error (RMSE) for each between the software generated map values and the FPGA solutions is computed and plotted considering different increasing lens distortions^{1}^{1}1Lens distortion factor corresponds to a minimal distortion, while distortion factor corresponds to the maximum distortion allowed by OpenCV 2.4 (almost equivalent to the distortion introduced by the short focal length of GoPro lenses). Unfortunately, given the limited number of pages, we are not able to detail this matter exhaustively.
Via Complete Onthefly Function Computation
An important parameter to consider in this solution is the number of fixed point fractional digits that are exploited in the implementation of the onthefly computation operators on hardware. Let us recall that a greater number of fractional digits leads on one hand to an improved accuracy of the final result, but on the other hand to a higher computational cost. This trade off is fundamental and therefore it required some further exploration. In particular, different values of fractional bits (12, 16, 20) have been considered and compared.
From this analysis, two very positive aspects of this approach are noticed. The first one is that, with the exception of the 12 bits fractional precision, the average RMSE is quite low, which means that the FPGA map is almost equivalent to the floatingpoint software one also when considering strong lens distortion (Figure 5). Other than that, the geometric error is almost uniformly distributed within the frame (Figure 4).
ViB Map Sampling
An important parameter to consider when evaluating this approach is the subsampling factor , that is directly related to the total number of samples that are extracted from the original full resolution LookupTable: the bigger , the smaller the number of samples. Sampling factors of 7 (128px), 6 (64px), and 5 (32px) have been considered during this study.
Different aspects of this approach can be noticed. First of all, the error given by the bilinear interpolation of the samples causes a “cushion” effect (Figure 6). Then, it can be noticed that the relationship between samples number and geometric error is inversely proportional: the more the used samples, the lower the error (Figure 7). Moreover, there is a strong dependency on the amount of distortion: an increment of the distortion leads to an increment of the error.
Nevertheless, a significant conclusion is that very good accuracy results are achieved even with a very small amount of samples (i.e., 336 for a VGA resolution image), also in the case of lenses characterized by severe distortion. When analyzing the RMSE reported in Figure 7, it is worth to consider that distortion correction maps estimated with camera calibration procedures are characterized by a accuracy that is in the range [0.1  0.5] px. Therefore the approximation introduced by this method is comparable with the precision of the map itself, hence this procedure does not introduce a fundamental precision limitation.
When considering the relative hardware costs, in this case it is not needed anymore to implement on hardware the map function, as a software precomputed LookupTable of the function is used. Since the map is now sampled for fitting into static memory within the chip, an hardware implementation of a bilinear interpolator is needed for interpolating the samples and obtaining an approximation of the specific pixel’s map value. It can be easily proven that a bilinear interpolation module can be implemented by using 3 multipliers, that means 6 total multipliers are needed for two interpolation modules (one for and another one for ). Along with some glue logic for address generation, an estimate of the needed resources is again shown in Figure 8.
A very important achievement is that, with this solution, the amount of needed hardware resources for computation decreases significantly (if we also consider that dividers can be very expensive), and it is now independent on the mapping function complexity. Moreover, a small amount of memory, e.g. in case of VGA resolution with sampling factor 5 (32 px), is now needed for storing the map samples compared to the of a full resolution LUT. Table I reports the actual resources usage of our specific implementation for the Xilinx Artix7 architecture. The report is relative to a single image distortion correction module, using an 8px subsampling factor (very high accuracy) on VGA resolution (4941 total samples); the image buffer size is 50 image lines.
As theoretically analyzed before, this approach is definitely cheap in terms of hardware resources as well as in terms of memory footprint. The only significant hardware costs are related to the interpolation modules (three of them needed in total: two for the LUT values interpolation, one for the output pixel value computation).
Finally, observing Figure 8, we can notice again that the subsampling method completely avoids expensive dividers, and yields a significantly lower amount of multipliers and adders, which is not depending anymore on the distortion model function. Moreover, by selecting appropriate parameters like the sampling factor, the subsampled approach leads to an accuracy that is suited for most practical applications.
Vii Conclusions
In this paper, we considered the image distortion correction problem for reconfigurable devices. Although most approaches in the literature rely either on the external memory solution or on the onthefly computation one, our analysis and evaluation shows that a good tradeoff between the two approaches is represented by the map subsampling strategy. Moreover, compared to the onthefly solution, this approach decouples the hardware logic from the specific mapping function chosen, allowing flexibility in choosing different or more complex models with no need to change the architecture. A drawback of this approach is that, increasing the lens distortion factor, it might be not accurate enough in very particular setups including lenses with very short focal length (i.e., wide angle lenses).
Site Type  Used 

Slice LUTs  475 
Slice Registers  525 
Block RAM Tile  16 
DSPs  19 
References
 [1] G. P. Stein, O. Mano, A. Shashua. A Robust Method for Computing Vehicle Egomotion. In IEEE Intelligent Vehicles Symposium (IV2000), Oct. 2000, Dearborn, MI.
 [2] M. Johnson, H. Farid. Metric measurements on a plane from a single image. Technical Report TR2006579. Department of Computer Science, Dartmouth College; 2006a.
 [3] Z. Li, M. Sun, H.C. Chen, J. Li, K. Wang, W. Jia. Distortion correction in wideangle images for picturebased food portion size estimation. Proc 38th Annual Northeast Bioengineering Conference; Philadelphia, PA. 2012.
 [4] P. Azzari, L. Di Stefano, S. Mattoccia. An Evaluation Methodology for Image Mosaicing Algorithms. In Proceedings of ACIVS, 10th International Conference, JuanlesPins, France, 2008.
 [5] J. Heikkila, O. Silven. A fourstep camera calibration procedure with implicit image correction. Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (p. 1106), 1997.
 [6] Z. Zhang. A flexible new technique for camera calibration. Microsoft Technical Report MSRTR9871, 1998.

[7]
J.Y. Bouguet.
Camera calibration toolbox for Matlab.
Retrieved March 8, 2016, from
http://www.vision.caltech.edu/bouguetj/calib_doc/index.html.  [8] G. Bradski, A. Kaehler. Learning OpenCV. O’Reilly Media, Inc, 2008.
 [9] OpenCV Documentation, Geometric Image Transformations. Retrieved July 28th, 2016 from: http://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html#initundistortrectifymap
 [10] D. Claus, and A. Fitzgibbon (2005). A rational function lens distortion model for general cameras. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), pages 213–219, San Diego, CA.
 [11] F. Devernay, O. Faugeras. Straight lines have to be straight. Machine Vision Applications 2001; 13:14–24.
 [12] J. Fung. Chapter 40: Computer vision on the gpu. In GPU Gems 2: Programming Techniques for HighPerformance Graphics and General Purpose Computation, pages 649–665. AddisonWesley, 2005.
 [13] S. Mattoccia, M. Poggi. A Passive RGBD Sensor for Accurate and Realtime Depth Sensing Selfcontained into an FPGA. In Proc. of the 9th Int. Conf. on Distributed Smart Cameras, Seville, Spain, 2015.
 [14] C. Vancea and S. Nedevschi. LUTbased Image Rectification Module Implemented in FPGA. In IEEE International Conference on Intelligent Computer Communication and Processing, ClujNapoca, 2007, pp. 147154.
 [15] A. KjærNielsen, L. B. W. Jensen, A. S. Sørensen and N. Krüger. A RealTime Embedded System for Stereo Vision Preprocessing Using an FPGA. In International Conference on Reconfigurable Computing and FPGAs (ReConFig), Cancun, 2008, pp. 3742.
 [16] S. Jin et al. FPGA Design and Implementation of a RealTime Stereo Vision System. In IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 1, pp. 1526, Jan. 2010.
 [17] S. Jorg, J. Langwald, M. Nickl. FPGA based realtime visual servoing. Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, pp. 749752 Vol.1.
 [18] K. Jawed, J. Morris, T. Khan, G. Gimel’farb. Real time rectification for stereo correspondence. In Proc. IEEE/IFIP Int. Conf. Embedded Ubiquitous Computing, pages 277–284, 2009.
 [19] H. Akeila, J. Morris. High resolution stereo in real time. In International Workshop on Robot Vision, 2008, pp. 72–84.