A highspeed, realtime vision system for texture tracking and thread counting
Abstract
In garment manufacturing, an automatic sewing machine is desirable to reduce cost. To accomplish this, a highspeed vision system is required to track fabric motions and recognize repetitive weave patterns with high accuracy, from a micro perspective near a sewing zone. In this paper, we present an innovative framework for realtime texture tracking and weave pattern recognition. Our framework includes a module for motion estimation using blob detection and feature matching. It also includes a module for lattice detection to facilitate the weave pattern recognition. Our latticedetection algorithm utilizes blob detection and template matching to assess pairwise similarity in blobs’ appearance. In addition, it extracts information of dominant orientations to obtain a global constraint in the topology. By incorporating both constraints in the appearance similarity and the global topology, the algorithm determines a lattice that characterizes the topological structure of the repetitive weave pattern, thus allowing for thread counting. In our experiments, the proposed threadbased texture tracking system is capable of tracking denim fabric with high accuracy (e.g., 0.03 rotation and 0.02 weavethread’ translation errors) and high speed (3 frames per second), demonstrating its high potential for automatic realtime textile manufacturing.
 Citation

Y. Hu, Z. Long, and G. AlRegib, “A HighSpeed, RealTime Vision System for Texture Tracking and Thread Counting,” IEEE Signal Processing Letters, vol. 25, no. 6, pp. 758762, 2018.
 DOI
 Review

Date of publication: 9 March 2018
 Codes
 Bib

@articlehu2018high,
title=A HighSpeed, RealTime Vision System for Texture Tracking and Thread Counting,
author=Hu, Yuting and Long, Zhiling and AlRegib, Ghassan,
journal=IEEE Signal Processing Letters,
volume=25,
number=6, pages=758–762,
year=2018,
publisher=IEEE  Copyright

©2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
 Contact
I Introduction
Textures are ubiquitous in real world, and texture analysis has been studied for years in the field of image processing and computer vision. One application in textile industry is an automatic sewing machine for garment manufacturing^{1}^{1}1This work was partially supported by a Walmart Foundation grant (1806K45)., which requires a vision system to recognize fabric patterns and track fabric motions. Although texture tracking in general has been investigated in the literature [1, 2], few work focuses on robust and efficient algorithms specially designed for fabrics in a realtime environment. In the cutting or sewing part of the sewing machine, to avoid aliasing, a realtime vision system should be capable of continuously monitoring a small region of a fabric near the sewing needle and tracking a very small motion (possibly less than the width of a thread) in successive frames. Besides fabric tracking, the vision system should be capable of thread counting, which reduces the effect of the local fabric distortions on tracking accuracy. To illustrate the concepts of weave patterns, threads, and lattices, we show an example in Fig. 1. Given an original starting point and a center point in a threadbased coordinate system, counting threads denotes maintaining a cumulative amount of warp and weft (filling) thread that has passed the center point.
To design a fast and robust vision system for an automatic sewing machine, Book et al. [3] proposed a prototype in which the concept of threadcount was first introduced. In the vision system, they used the Harris corner detection algorithm to detect corner features and fabric translation, and 2Dfast Fourier transform (FFT) to track fabric angles. However, the FFTbased algorithm does not work well for accurately estimating a small rotation angle (e.g. 0.1). In addition, threadcount remains a concept, with no solution provided in their work. Recognizing fabric threads is closely related to discovering texture regularity or granularity. In the literature, Liang and Weller [4] detected granularity (i.e., the size of texture primitives) for general texture with simple edge detection techniques. To discover the lattices of nearregular texture (NRT), Hays et al. [5] first formulated a latticefinding problem for NRT as a higherorder correspondence problem. This technique uses interest point detectors, iteratively proposes and assigns neighboring texture primitives, and then seeks an optimal lattice assignment by maximizing the pairwise visual similarity and the geometric consistency. The approach fits well for thread detection and counting, however, the optimization and iteration process is complex and time consuming, not suitable for real time applications. In addition, Lin and Liu [6, 7] proposed the first deformed lattice detection and tracking algorithm for dynamic NRTs and compared it with optical flow and LukasKanade algorithms [8]. Their methods (as well as [9]) are also computationally expensive by involving MRFbased tracking models.
In this paper, we propose an efficient, robust, and accurate featurebased approach to track individual fabric threads and provide the associated motion information in terms of position and orientation. Our contributions include:

an innovative framework that integrates a lattice detection module to accomplish fabric tracking in a threadbased coordinate system instead of a pixelbased system, to ensure robustness to local fabric deformation;

a novel algorithm for fast and efficient lattice detection for thread counting, achieved by constraining on both the appearance similarity (through template learning and matching) and the global topology; and

an extensive comparative study evaluating various methods of keypoint detection and description for their applicability to the fabric tracking problem of interest.
Ii Texture Tracking and Thread Counting
We present a framework to automatically track small motions and detect the lattice in nearregular texture (e.g. denim fabric shown in Fig. (a)a). The framework includes three phases shown in Fig. 2. Phase I is feature extraction in which we detect feature points and describe the local region around each feature point. In phase II, we find matching feature points between two sets of feature points in two successive frames and estimate a geometric transformation with translation and rotation offsets. In phase III, we learn the repetitive pattern template and seek a local lattice by involving both local appearance similarities and underlying topological relationship among texture patterns.
Iia Feature Detection and Description
To establish reliable matching between two frames, we need a feature detector that extracts ample feature points and a feature descriptor that distinctively describes local regions.
IiA1 MSER feature detection
Since the image exhibits abundant bright blob regions with a nearregular placement and that a large percentage of the same blobs appear in successive frames, we choose maximally stable extremal region (MSER) [10] to detect stable blobs. The detected MSERs are visualized in Fig. (b)b, in which each unique color represents one individual MSER region. After MSER detection, we fit ellipses and centroids into detected regions displayed in green ellipses and points in Fig. (c)c. We utilize MSER regions later to match feature points in two frames and to generate a texture template.
IiA2 BRISK feature description
For feature description, binary robust invariant scalable keypoints (BRISK) [11] offers a fast alternative to the wellknown algorithms (e.g. scaleinvariant feature transform (SIFT) / speeded up robust features (SURF)), and still maintains comparable matching performance. We mix each MSER feature point with a BRISK descriptor. Taking Fig. (c)c as an example, we use the red column vector to represent the BRISK feature vector of the red MSER feature point. It is worth noting that, for feature description in this paper, using typical texture features (e.g. [12, 13, 14, 15]) normally cannot satisfy both the robustness and the highspeed requirements simultaneously.
IiB Translation and Rotation Estimation
We estimate translation and rotation offsets between two frames from feature points and feature vectors through featurepoint matching and geometric transformation estimation.
IiB1 Featurepoint matching
Featurepoint matching involves finding corresponding interest points between a pair of images. Since BRISK feature descriptors are binary strings, we use the Hamming distance for computational efficiency. We apply the nearest neighbor approach, in which a match threshold is set for selecting the strongest matches. Therefore, using local neighborhood information given by feature descriptors, we find reliable matching points. Fig. 4 illustrates featurepoint matching, in which we use matched pairs to estimate the geometric transformation between a pair of images.
IiB2 Geometric transformation estimation
In an affine matrix , and note translation offsets in camera pixels and () notes a rotation angle. To estimate the parameters of a mathematical model from a set of observations that contain outliers, we use the mestimator sample consensus (MSAC) [16], one variant of random sample consensus (RANSAC) [17], to estimate a 2D geometric transform from matching pairs, which obtains a global affine matrix in a standard orthogonal coordinate system and reports translation and rotation offsets.
IiC Thread Counting
Thread counting starts with lowlevel vision cues (e.g. blobs) and ends with highlevel lattice models shown in Fig. 5. We generate a representative blob template and seek a vector pair consistent with geometric relationships between blobs.
IiC1 Template learning
As shown in Fig. 3(b), MSER generates potential blobs, some of which blobs are connected, and others are not. From attributes of each blob region (e.g. its area and its intensity values), we group blobs into two clusters: individual blobs and grouping blobs. We use all individual blobs such as the highlighted blob regions shown in Fig. (a)a to propose a blob template. We align all individual blobs according to their centroids and average their intensity values to determine a blob template shown in Fig. (b)b.
IiC2 Template matching
We use the obtained blob template to detect the nearest neighboring blobs of the current MSER center (i.e., the red point in Fig. (c)c). To find neighboring blobs, we adopt the correlationbased template matching method, which utilizes the information in local peaks on a correlation map between the candidate neighboring blob region and the blob template. In Fig. (c)c, we show the centroids of detected neighboring blobs in blue and use them as a constraint of appearance similarities for later lattice detection.
IiC3 Dominant orientation determination
Blobs form a nearregular placement of repetitive patterns in dominant orientations. For a square image, the angular orientation of a peak AC component in the frequency domain and a dominant repeating orientation in the spatial domain are perpendicular. For example, the 2DFFT of a denim image is shown in Fig. (d)d, in which the red point represents a DC component; the yellow points represent AC components with strong peaks; and three AC peaks (yellow points) in the frequency domain correspond to three dominant orientations in the spatial domain (blue lines) in Fig. (e)e. We use only two orientations from three as reference orientations. Peak features in the frequency domain help determine dominant directions and provide a geometric constraint for later lattice detection.
IiC4 Lattice detection
Proposing a lattice model represents determining a vector pair connecting the current blob centroid and its two nearest matched neighbors. Orientations of the two basis vectors should follow the guidance of obtained dominant directions, which results in global topological consistency. For potential vector pairs, we minimize the distance defined in Eq. 1 and generate a final lattice proposal.
(1) 
where and represent coordinates of the current blob centroid and the candidate blob centroid, respectively; denotes the angle of the vector connecting the current blob centroid and the candidate blob centroid; represents a dominant direction of repetitive patterns estimated from the frequency domain; denotes a constraint of appearance similarities from template matching; and notes a topological constraint. To balance the appearance constraint and the topological constraint, we use , a weighting factor related to prior knowledge of weave patterns. By selecting candidate blobs with the smallest , we determine basis vectors along the two dominant directions. We superimpose the final proposal of a local lattice on inlier MSER feature points shown in Fig. (f)f. We produce the final lattice proposal by involving both the similarity of the pairwise texture appearance and the global consistency of topological relationships.
IiC5 Thread counting mapping
To obtain fractional thread counting, thereby reducing the effect of fabric distortion for each inlier feature point, we calculate the translation vector and decompose it into the local lattice coordinate system with the assumption that we have a prior knowledge of the fabric type and mapping information between the local lattice and the physical fabric thread.
Iii Experiments and Discussion
To evaluate the tracking performance of our proposed framework, we conduct a set of experiments, in which we estimated a translation offset in a camera space, a rotation angle, and a translation offset in a threadbased coordinate system between two frames with small motions shown in Fig. 6 (a demo video available online^{2}^{2}2https://ghassanalregib.com/texturetrackinginvideostreamsandweavepatternrecognition/). Our target texture is a piece of denim fabric mounted on a micro stage that allows precise translation and rotation. Since our camera captures only a small region of denim fabric, the field of view (FOV) of the captured images contains only the texture of denim fabric rather than the background texture. The size of the images is . We implement algorithms on MATLAB®2014b with a PC (Intel i74790K, 4GHz, RAM: 32GB).
Translation estimation: To evaluate the accuracy of translation estimation, we obtained ground truth using the micro stage and conducted an extensive experiment. We translated the micro stage from zero to ten mm at intervals of 0.5 mm (i.e., around 7.53 pixels) in the horizontal direction and acquired 20 test images for our experiment. With the ground truth of translation offsets, we combined various feature detectors and descriptors (e.g. SURF [18], MSER [10], BRISK [11], FAST [19], and HARRIS [20]) and compared their estimation accuracy. To quantify accuracy for translation estimation, we use three metrics: (1) the maximum value of the absolute error between the estimated and actual translation values in pixels; (2) the mean value of the absolute error in pixels; and (3) the computational cost in seconds. We demonstrate the performance of the six algorithms on tracking translation in Fig. 7, in which “A+B” denotes “feature detector+feature descriptor.” From Fig. 7, we observe: (1) “MSER+SURF” yields the lowest mean absolute error (i.e., 0.12 pixels); and (2) “FAST+BRISK” generates the lowest maximum absolute error (i.e., 0.51 pixels) and the lowest computational time (i.e., 0.035 seconds). To determine which algorithm to use, besides the three metrics mentioned above, we evaluate the accuracy and computation time of rotation estimation and thread counting.
Rotation estimation: We applied a similar experimental setup to that in translation estimation into the evaluation of rotation estimation. By rotating the micro stage from 0 to 10 at intervals of 1/6, we obtained 61 images and chose the first as a reference. The actual rotation angles between the test images and the reference image are successively . To evaluate the tracking of the rotation angles on tracking accuracy and computation time, we tested the same six feature extraction schemes as those tested in translation estimation. Since we simultaneously estimated translation and rotation parameters, the curves that exhibit computational cost in Figs. 7 and 8 present a similar shape and trend. Compared with other feature extraction methods shown in Fig. 8, “MSER+BRISK” and “MSER+SURF” achieve superior tracking accuracy while sacrificing computational efficiency. Their mean absolute errors are 0.026 and 0.018, respectively, and their maximum absolute errors are 0.075 and 0.057, respectively. Among all the comparison methods, “FAST+BRISK” consumes the least computation time (i.e., 0.017 seconds) but yields the greatest error. In addition, by comparing feature detectors with the same feature extraction approach, we observe that MSER extracts higher quality and a larger number of blob features than SURF. Combined with the same feature detector, BRISK consumes less time than SURF. By involving tracking accuracy and computational efficiency, we choose “MSER+BRISK” for the threadcounting system.
Thread counting: The outcomes of thread counting include the basis vectors of a final lattice proposal in a camera space and two translation offsets in a latticebased coordinate system. The mean error of translation estimation of a thread is about 0.02 (i.e., 1 thread = 0.33 mm).
Computational time: It takes three frames per second (fps) for the proposed system (Matlab code without optimization, will be available online^{2}^{2}footnotemark: 2) to calculate for a pair of images including motion estimation and thread counting. In comparison, using the algorithm from Hays et al. [21], it takes 1.8 minutes for only lattice detection for an image of the same size. With C++ implementation (MSER and BRISK are both available in the OpenCV library) and code optimization, our proposed system is expected to operate in real time.
TmaxAE (pixel)  TmeanAE (pixel)  RmaxAE ()  RmeanAE ()  Time (s)  Thread  
[3]  0.474  0.124  0.637  0.064  0.156  N/A 
Ours  0.636  0.118  0.075  0.026  0.339  Yes 
Comparison with existing system: The vision system proposed by Book et al. [3] is the first and the only existing system for automatic garment sewing. Therefore, we compare the performance of their system and ours in Table I, where TmaxAE, TmeanAE, RmaxAE, and RmeanAE represent translation maximum, translation mean, rotation maximum, and rotation mean absolute errors, respectively. Our system outperforms theirs in two aspects: (1) our rotation tracking errors are significantly lower; and (2) our system performs thread counting, which is critical for a practical setting, but theirs cannot.
Iv Conclusion
We proposed an innovative threadbased texture tracking system that accurately tracks texture and detects lattice underlying fabric weave patterns in high speed. We adopted a feature extraction approach that not only detects feature points to establish valid matching between images, but also facilitates the generation of template proposal and the discovery of a repetitive lattice. To detect a reliable local lattice, we designed a computationally efficient algorithm utilizing both local appearance similarities and global topological relationship. We applied the system successfully to denim fabric tracking and thread counting, demonstrating its high potential for automatic realtime textile manufacturing.
References
 [1] Z. Wang, T. Hegazy, Z. Long, and G. AlRegib, “Noiserobust detection and tracking of salt domes in postmigrated volumes using texture, tensors, and subspace learning,” Geophysics, vol. 80, no. 6, pp. WD101–WD116, 2015.
 [2] M. Pham, G. Mercier, E. Trouvé, and S. Lefèvre, “SAR image texture tracking using a pointwise graphbased model for glacier displacement measurement,” in IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2017, pp. 1083–1086.
 [3] W. J. Book, R. C. Winck, M. D. Killpack, J. D. Huggins, S. L. Dickerson, S. Jayaraman, T. R. Collins, and R. J. Prado, “Automated garment manufacturing system using novel sensing and actuation,” in International Symposium on Flexible Automation, 2010.
 [4] H. Liang and D. S. Weller, “Edgebased texture granularity detection,” in IEEE International Conference on Image Processing (ICIP), 2016, pp. 3563–3567.
 [5] J. Hays, M. Leordeanu, A. A. Efros, and Y. Liu, “Discovering texture regularity as a higherorder correspondence problem,” in European Conference on Computer Vision (ECCV), 2006, pp. 522–535.
 [6] W. Lin and Y. Liu, “Tracking dynamic nearregular texture under occlusion and rapid movements,” in European Conference on Computer Vision (ECCV), 2006, pp. 44–55.
 [7] W. Lin and Y. Liu, “A latticebased mrf model for dynamic nearregular texture tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 29, no. 5, pp. 777–792, 2007.
 [8] B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” in International Joint Conference on Artificial Intelligence, 1981, pp. 674–679.
 [9] S. Liu, T. Ng, K. Sunkavalli, M. N. Do, E. Shechtman, and N. Carr, “Patchmatchbased automatic lattice detection for nearregular textures,” in IEEE International Conference on Computer Vision (ICCV), 2015, pp. 181–189.
 [10] M. Donoser and H. Bischof, “Efficient maximally stable extremal region (MSER) tracking,” in Computer Vision and Pattern Recognition (CVPR), 2006, vol. 1, pp. 553–560.
 [11] S. Leutenegger, M. Chli, and R. Y. Siegwart, “Brisk: Binary robust invariant scalable keypoints,” in IEEE International Conference on Computer Vision (ICCV), 2011, pp. 2548–2555.
 [12] M. Cimpoi, S. Maji, I. Kokkinos, and A. Vedaldi, “Deep filter banks for texture recognition, description, and segmentation,” International Journal of Computer Vision (IJCV), vol. 118, no. 1, pp. 65–94, 2016.
 [13] Y. Hu, Z. Long, and G. AlRegib, “Scale selective extended local binary pattern for texture classification,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 1413–1417.
 [14] Y. Hu, Z. Long, and G. AlRegib, “Completed local derivative pattern for rotation invariant texture classification,” in IEEE International Conference on Image Processing (ICIP), 2016, pp. 3548–3552.
 [15] L. Liu, P. Fieguth, Y. Guo, X. Wang, and M. Pietikäinen, “Local binary features for texture classification: taxonomy and experimental study,” Pattern Recognition, vol. 62, pp. 135–160, 2017.
 [16] P. H. Torr and A. Zisserman, “Mlesac: A new robust estimator with application to estimating image geometry,” Computer Vision and Image Understanding, vol. 78, no. 1, pp. 138–156, 2000.
 [17] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.
 [18] H. Bay, A. Ess, T. Tuytelaars, and Luc Van Gool, “Speededup robust features (SURF),” Computer Vision and Image Understanding, vol. 110, no. 3, pp. 346–359, 2008.
 [19] E. Rosten and T. Drummond, “Fusing points and lines for high performance tracking,” in IEEE International Conference on Computer Vision (ICCV), 2005, vol. 2, pp. 1508–1515.
 [20] C. Harris and M. Stephens, “A combined corner and edge detector,” in Alvey vision conference. Citeseer, 1988, vol. 15, p. 50.
 [21] J. Hays, M. Leordeanu, A. A. Efros, and Y. Liu, “Discovering texture regularity as a higherorder correspondence problem,” http://balaton.graphics.cs.cmu.edu/jhhays/texture_match_06_08_2007.zip, 2006 (accessed 2006).