Tree Species Identification from Bark Images Using Convolutional Neural Networks
Tree species identification using images of the bark is a challenging problem that could help in tasks such as drone navigation in forest environment and autonomous forest inventory management. It also brings more value to harvesting operations as it leads to greater market values of trees. While the recent progress in deep learning showed its effectiveness for visual classification, it cannot currently be used for bark classification due to a lack of dataset. In this work, we present a novel dataset containing more than 23 000 high-resolution bark images from 23 different species and establish a benchmark using deep learning. We obtain an accuracy of 93.88% and show the possibility of using a majority voting approach on all the images of a tree to obtain 97.81% accuracy. We also perform experiments indicating that it is more important to collect a large amount of trees over a large quantity of image and that images of a single tree should be taken at different locations.
Being able to automatically and reliably identify tree species from images of bark is an important problem. Early work in mobile robotics has already shown that the ability to recognize trees from non-trees in combined LiDAR+camera sensing can improve localization robustness . More recent work on data-efficient semantic localization and mapping algorithms [2, 3] have demonstrated the value of semantically-meaning landmarks; In our situation, trees and the knowledge of their species would act as these semantic landmarks. The robotics community is also increasingly interested in flying drones in forests . For the forestry community, one could use this visual species identification to perform autonomous forest inventory. In the context of autonomous tree harvesting operations , the harvester or forwarder would be able to sort timber by species, improving the operator’s margins. Similarly, sawmill processes such as debarking could be fine-tuned or optimized based on the species knowledge of the currently processed log.
For tree species identification, relying on bark has many advantages when compared to other attributes, such as the appearance of its leaves or fruits. First of all, bark is always present despite seasonal changes. It is also present on logs long after being cut and stored in a lumber yard. In the case of standing tree inventory, bark tends to be visually accessible to most robots, as foliage is not prevalent at the robot’s height in forests of commercial value. However, tree species classification using only images of the bark is a challenging task that even trained humans struggle to do, as some species have only very subtle differences in their bark structure. For example, two human experts obtained respectively 56.6% and 77.8% classification accuracy on the Austrian Federal Forests (AFF) dataset .
Recent progress in deep learning have shown that neural networks are able to surpass human performance on many visual recognition tasks . One significant drawback of deep learning approaches is that they generally require very large datasets to obtain satisfactory results. For instance, the ImageNet database contains 14 millions images separated in almost 22,000 synsets.
In the literature, there is no equivalent database for bark recognition, in terms of size or variety. For example, the largest one is the AFF dataset , with only around 1200 images covering 11 species. This dataset is also private, making it difficult to use in an open, scientific context. This lack of data might explain why the majority of research on bark recognition has been mostly centered around hand-crafted features such as Gabor filters [8, 9], SIFT  or Local Binary Pattern [10, 11], as they can be trained using smaller datasets.
To address this issue, we gathered a novel bark dataset specifically designed to train deep neural networks. It contains 23,000 high-resolution images of 23 different tree species found in forests and parks near Quebec City, from which over 800,000 unique crops of 224x224 pixels can be extracted. The species are typical trees present on the eastern seaboard forests of Canada, most of which have commercial value. On top of having the species annotation, we also collected the tree diameter at breast height (DBH), a commonly-used metric in forest inventories. The DBH captures in some sense the age of the tree, thus having the possibility to provide auxiliary information to the network during training. Indeed, bark appearance can change drastically with age, which might help a network optimizer in finding solutions that exhibit better generalization performance. Moreover, having this extra label opens up the possibility to experiment with multi-task learning approaches, for which little datasets exists in the literature .
The contributions presented in this paper are as follow:
We collected and curated a novel bark image dataset that is compatible with deep learning research on fine-grained and texture classification problems. This dataset can also be used in the context of multi-task benchmarking.
We demonstrate that using this dataset, we can perform visual tree recognition of 20 species, far above any other work. We also quantify the difficulty of differentiating between certain species, via confusion matrices.
We performed experiments in order to determine the impact of several key factors on the recognition performance (number of images used during training, use of a voting scheme on classification during testing.)
This paper is organized as follows. In Section 2, we review existing methods and datasets used to accomplish bark image classification. Section 3 introduces our dataset, and details on how it was collected. Section 4 describes the network architecture used to perform classification. Section 5 presents the obtained results for various test cases. Finally, Section 6 concludes this paper.
2 Related work
Bark classification has most frequently been formulated as a texture classification problem, for which a number of hand-crafted features have historically been employed. For instance, some works based their approaches on Local Binary Patterns (LBP) [10, 11, 13].  used SIFT descriptors combined with a support vector machine (SVM) to obtain around 70% accuracy on the AFF dataset. Meanwhile,  extracted four statistical parameters (uniformity, entropy, asymmetry and smoothness) used in texture classification in trunk images, and employed a decision tree for classification.  developed a custom segmentation algorithm based on watershed segmentation methods, extracted saliency, roughness, curvature and shape features and fed them to a Random Forest classifier.
Interestingly, some early works used neural networks for bark classification. For instance,  extracted texture features based on Gabor wavelet and used a radial basis probabilistic network as the classifier. With their method, they obtained close to 80% accuracy using a dataset containing around 300 images. This work predates, however, the advent of deep learning approaches, spearheaded by AlexNet .
Looking at the more general task of tree classification, some did apply deep learning methods. For instance in the LifeCLEF competition, which tries to classify plants using images of different parts such as the leaves, the fruit, or the stem, the best performing methods all employed deep learning [17, 18, 19, 20]. For our purpose however, the number of images with significant bark content in their training database is very small. Less related to us, work on leaf classification by  extracted features from deep neural networks, in order to figure out what were the most discriminating factor when classifying leaves.
Deep learning has also been employed for tree identification from bark information, but using a different kind of image. In their work,  used LiDAR scans instead of RGB images. They used a point cloud with a spatial resolution of 5 at a 10 distance, from which they generated a depth image of size 256x256. For the classification, they used a pre-trained AlexNet  that they fine-tuned on around 35,000 scans. This allowed them to obtain around 90% precision on their test set containing 1536 scans. However, they only used two different species, Japanese Cedar and Japanese Cypress, making the problem less challenging.
Finally, some authors have started exploring deep learning on RGB images of textures.  extracted features from CNNs pre-trained on ImageNet and used different region segmentation algorithms along with an SVM to classify texture materials, notably on the Flickr Material Dataset . They improved the state-of-the-art by at least 6 % on all the datasets on which they tested.  modified the standard convolutional layer to learn rotation-invariant filters. They did this by grouping filters into groups and by tying the weights of each filter within the same group so that they would all correspond to a rotated version of each other. They tested their layer on 3 Outex texture classification benchmarks where they obtained better results than the state-of-the-art on one the benchmarks and similar results on the other two.
3 Bark dataset
3.1 Existing bark datasets
One significant hurdle when trying to use deep learning for bark classification is the lack of existing datasets for training purposes. Table 1 shows dataset that were used in previous work for the bark classification task. One thing to notice is that most of these datasets contain only a very small number of images as well as limited number of classes. Another important point is that only one of those dataset is publicly available, hindering the global research effort on this problem.
3.2 Image collection and annotation
To solve the dataset issue, we collected images from 23 different species of trees found in parks and forests near Quebec City, Canada. We hired a forestry specialist to identify the species on site. Indeed, tree identification is much easier and reliable when relying on extra cues such as leaf shape or needle distribution. To accelerate the data collection process, we used the following protocol. First, a tree was selected and its species and circumference written on a white board by the forestry specialist. While the specialist moved to another tree, a second person took a picture of the white board as the first picture of the tree. It was then followed by 10-40 images of the bark at different location and height around this tree, depending on its circumference. Images were captured at a distance between 20-60 away from the trunk. This distance was highly variable, depending on the conditions in which the photos were taken (due to obstacles, tree size, etc.). Having this kind of variability prevents overfitting to a particular distance of camera. Finally, all images were taken so as to have the trunk parallel to the vertical axis of the image plane of the camera.
We also gathered the images under varied conditions, to ensure that the dataset would be as diversified as possible. First, we used four different cameras, some of which were cellphones: Nexus 5, Samsung Galaxy S5, Samsung Galaxy S7, and a Panasonic Lumix DMC-TS5 camera. To increase the illumination variability, we took the pictures under various weather conditions which ranged from sunny to light rain. Finally, we selected trees over a number of different locations, such as in open areas like the university campus or parks and in the forest. This can greatly affect the appearance of the bark, especially in high vegetation density location where the leaves reflection from the canopy can change the bark color, by giving it different shades of green. In total, we gathered pictures over 15 outings, spread during the summer.
From the picture of the white board, we obtained the species and circumference information to annotate the subsequent pictures. This means that each photo in our database contains a unique number identifying the tree, its species, its DBH, the camera used and the date and time at which it was taken. We also cleaned the dataset by removing approximately 25 % of the pictures, most of them corresponding to blurred images due to camera motion. Each remaining picture was then manually cropped, so as to only keep the part of the image where bark was visible. This had the side effect that younger trees yielded very narrow pictures (Figure 2 (1)), while mature trees were full-sized pictures (Figure 2 (2)). Table 2 shows the composition of our dataset. We aimed at keeping the dataset as balanced as possible, while maximizing the number of different trees used for each class. The data collection strategy was also modulated based on initial classification results. Indeed, we increased the number of trees collected for species that were found to be difficult to separate. One can see this as a loose form of active learning, but implemented with humans in the loop.
We also aimed at having a wide distribution on the DBH which is shown in Figure 1. Most of the trees are between 20 and 30 , but we also have a few trees near 100 . This can have an impact on the classification since the size of the tree can greatly affect the appearance of the bark. Figure 2 shows an example of this, with the younger tree having a relatively smooth bark while the older one is covered with ridges and furrows.
|Id||Species||Common name||Number of trees||Number of images||Number of potential unique crops|
|1||Abies balsamea||Balsam fir||41||922||28235|
|2||Acer platanoides||Norway maple||1||70||2394|
|3||Acer rubrum||Red maple||64||1676||48925|
|4||Acer saccharum||Sugar maple||81||1999||68040|
|5||Betula alleghaniensis||Yellow birch||43||1255||37325|
|6||Betula papyrifera||White birch||32||1285||33892|
|7||Fagus grandifolia||American beech||41||840||23904|
|8||Fraxinus americana||White ash||61||1472||53995|
|10||Ostrya virginiana||American hophornbeam||29||612||28723|
|11||Picea abies||Norway spruce||72||1324||35434|
|12||Picea glauca||White spruce||44||596||19673|
|13||Picea mariana||Black spruce||44||885||43127|
|14||Picea rubens||Red spruce||27||740||22819|
|15||Pinus rigida||Pitch pine||4||123||2264|
|16||Pinus resinosa||Red pine||29||596||14694|
|17||Pinus strobus||Eastern white pine||39||1023||25621|
|18||Populus grandidentata||Big-tooth aspen||3||64||3146|
|19||Populus tremuloides||Quaking aspen||58||1037||63247|
|20||Quercus rubra||Northern red oak||109||2724||72618|
|21||Thuja occidentalis||Northern white cedar||38||746||19523|
|22||Tsuga canadensis||Eastern Hemlock||45||986||27271|
|23||Ulmus americana||American elm||24||739||27821|
As is commonly done in image recognition tasks, we employed networks that have been pre-trained on ImageNet. Moreover, we used the ResNet architecture , as it is both powerful and easy to train on standard classification problems.
4.2 Training Details
We used PyTorch 0.3.0.post4  for all experiments and downloaded the weights of the resnet18 and resnet34 networks pre-trained on ImageNet. As commonly-accepted practice, we froze the first layer, since our problem is very different from ImageNet, and then fine-tuned the networks using an initial learning rate of 0.0001. We reduced the learning rate at fixed epochs (16 and 33) by a factor of 5, and trained for a total of 40 epochs. We used Adam as the optimization method, with a weight decay of 0.0001.
Since the photos are high definitions, we resized them to half of their original size. This allowed for a faster loading and image processing of the images when creating the mini-batches. It also takes into account the Bayer filter pattern on color cameras, which only samples colors for every other pixel on the imaging element. For each mini-batch, we uniformly sampled a random tree species (class), from which we sampled a random image from a random tree. This allowed us to mitigate the problems of having an unbalanced dataset, similarly to the class-aware sampling used in . Then, we augmented the data using random horizontal flips and finally, we took a random crop of 224x224 in the resulting image. Recall that during the data gathering process, a fair amount of randomness in terms of illumination and scale was present, so we did not perform color, scale or contrast jittering.
In our experiments, we compared the effect of network depth (18 vs 34) on classification precision. We also tested for different batch sizes, to evaluate its regularization effect . For the evaluation, we used a 5-fold cross-validation method using 80% of the trees for the training and the remaining for testing. Care was taken in performing the split on the trees instead of the image, to avoid positively biasing results due to the network learning to recognize each tree instead of the species. We report the average accuracy on the 5 folds. Note that we did not use Acer platanoides, Pinus rigida and Populus grandidentata since we did not collect enough images in these categories to obtain meaningful results.
5.1 Test results when using individual images
Table 3 contains the results of evaluating the two models on each image individually, for a number of batch sizes. We report both single crop (random) and multiple crop results. For the latter, we split the test image into multiple non-overlapping 224x224 crops and classified each one individually. Then, we performed majority voting to determine the final outcome. As can be seen from Table 3, progressing from single crops (87.04%) to multiple crops (93.88%) on a complete image significantly improves the accuracy, which is expected. Figure 4 displays two examples of classification using the multiple tiled crops, showing the spatial distribution of the classification. It also displays the Id label for each crop.
Figure 5 shows the average confusion matrix of our multi-crop voting on individual image experiments using a resnet34 and a batch size of 32. As one can suspect, trees from the same family are harder to differentiate. For instance, Betula parpyrifera and Betula alleghaniensis as well as Acer rubrum and Acer saccharum are often confused with one another. It also shows some other difficult combinations, such as Fraxinus americana and Acer saccharum.
|Network||Batch size||Single crop||Multiple crops|
5.2 Test results when using all images of a tree
We were interested in seeing if by employing images coming from more than one location along the trunk, one can improve the classification results. We thus performed majority voting across all the images of a given tree, both for single and multiple crops per pictures. Note that the number of available images per tree was variable, as stated in Section 3.2. Table 4 contains the results of this evaluation, again for a number of batch sizes. What we can see is that we are able to further improve the classification results (97.81%). More interestingly, we did not see any real difference between using a single or multiple crops in each image. This seems to indicate that having a greater variety of locations along a trunk is more beneficial than having a large number of crops that are closely located. It can probably be explained by anecdotal observations in the field, where we noticed that the bark appearance changed significantly from one trunk region to another.
|Network||Batch size||Single crop||Multiple crops|
5.3 Effect of dataset size on training performance
A common question arising when developing new classifier systems is: how much data do we need for training purposes? To answer this, we empirically evaluated the impact of the size of the training dataset on the classification accuracy. Moreover, we performed this evaluation for two cases that are particular to our classification problem: a) reduced number of images and b) reduced number of individual trees. To accomplish this, we took one of the fold from the previous experiment in Section 5.1 and created 9 smaller training datasets per case. For case a), we randomly sampled images from the training set until we hit a target goal of images. For case b), instead of sampling the images, we sampled the individual trees directly until we hit a target number of trees. Figure 6 shows the results we obtained.
As can be seen, the general trend is that as we increase the number of images for the training, the better results we obtain. However, the network is much more sensitive to the number of trees in the training dataset, rather than to the overall number of pictures. Indeed, when reducing randomly by 90% the number of overall images, we only lose about 5% of accuracy. On the other hand, when we reduce by 90% the number of trees randomly, results fall by more than 30%. This indicates that it is much more important to collect training data over a large number of trees, rather than taking a large number of pictures per tree. In other words, we only need a fairly limited number of pictures per tree to have good performances.
In this paper, we have empirically demonstrated the ability for ResNets to perform tree species identification from the pictures of bark, for 20 Canadian species. The accuracy of the method goes from 93.88% (for multiple crops on a single image) to 97.81% (using all trunk images), far above the 5% chance classification. We have found empirically that training is significantly more susceptible to the number of trees in the database rather than the overall number of images. This result will help tailor further data gathering efforts on our side.
In the process, we have also created a large public dataset
Nevertheless, more work is needed to adapt the architecture of the network specifically to this task. As future work, we aim to leverage the DBH into a multi-task approach . We will also explore the use of multi-scale classifications, as we do not know yet what is the optimal scale at which to perform bark image classification. We will also explore the use of novel deep architectures that have been tailored to texture classification. We also plan on testing the approach on a sawmill floor, where we will have access to thousands of logs for data gathering. A new challenge will be to ensure that damages to bark due to logging operations do not adversely affect classification performances.
The authors would like to thank Luca Gabriel Serban and Martin Robert for their help in making the dataset.
- To be released upon publication.
- F. T. Ramos, J. Nieto, and H. F. Durrant-Whyte, “Recognising and modelling landmarks to close loops in outdoor slam,” in Proceedings 2007 IEEE International Conference on Robotics and Automation, April 2007, pp. 2036–2041.
- N. Atanasov, M. Zhu, K. Daniilidis, and G. J. Pappas, “Localization from semantic observations via the matrix permanent,” The International Journal of Robotics Research, vol. 35, no. 1-3, pp. 73–99, 2016.
- A. Ghasemi Toudeshki, F. Shamshirdar, and R. Vaughan, “UAV Visual Teach and Repeat Using Only Semantic Object Features,” ArXiv e-prints, Jan. 2018.
- N. Smolyanskiy, A. Kamenev, J. Smith, and S. Birchfield, “Toward low-flying autonomous MAV trail navigation using deep neural networks for environmental awareness,” CoRR, 2017.
- T. Hellström, P. Lärkeryd, T. Nordfjell, and O. Ringdahl, “Autonomous forest vehicles: Historic, envisioned, and state-of-the-art,” International Journal of Forest Engineering, vol. 20, no. 1, 2009.
- S. Fiel and R. Sablatnig, “Automated Identification of Tree Species from Images of the Bark, Leaves and Needles,” Proceedings of the 16th Computer Vision Winter Workshop, pp. 67–74, 2011.
- K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE International Conference on Computer Vision, vol. 11-18-Dece, 2016, pp. 1026–1034.
- Z.-k. Huang, D.-S. Huang, J.-X. Du, Z.-h. Quan, and S.-B. Gua, “Bark Classification Based on Contourlet Filter Features,” In Intelligent Computing, pp. 1121–1126, 2006.
- Z. Chi, L. Houqiang, and W. Chao, “Plant species recognition based on bark patterns using novel Gabor filter banks,” in International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003, vol. 2, dec 2003, pp. 1035–1038 Vol.2.
- S. Boudra, I. Yahiaoui, and A. Behloul, “A comparison of multi-scale local binary pattern variants for bark image retrieval,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2015, vol. 9386, pp. 764–775.
- M. Sulc, “Tree Identification from Images,” 2014.
- Y. Zhang and Q. Yang, “A Survey on Multi-Task Learning,” ArXiv e-prints, July 2017.
- M. Sulc and J. Matas, “Kernel-mapped histograms of multi-scale lbps for tree bark recognition,” in Image and Vision Computing New Zealand (IVCNZ), 2013 28th International Conference of. IEEE, 2013, pp. 82–87.
- A. Bressane, J. A. F. Roveda, and A. C. G. Martins, “Statistical analysis of texture in trunk images for biometric identification of tree species,” Environmental Monitoring and Assessment, vol. 187, no. 4, 2015.
- A. A. Othmani, C. Jiang, N. Lomenie, J. M. Favreau, A. Piboule, and L. F. C. L. Y. Voon, “A novel Computer-Aided Tree Species Identification method based on Burst Wind Segmentation of 3D bark textures,” Machine Vision and Applications, vol. 27, no. 5, pp. 751–766, 2016.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25. Curran Associates, Inc., 2012, pp. 1097–1105.
- J. Champ, T. Lorieul, M. Servajean, and A. Joly, “A comparative study of fine-grained classification methods in the context of the LifeCLEF plant identification challenge 2015,” in CEUR Workshop Proceedings, vol. 1391, 2015.
- M. Šulc, D. Mishkin, and J. Matas, “Very deep residual networks with maxout for plant identification in the wild,” Working notes of CLEF, 2016.
- N. Sunderhauf, C. McCool, B. Upcroft, and P. Tristan, “Fine-grained plant classification using convolutional neural networks for feature extraction,” Working notes of CLEF 2014 conference, pp. 756–762, 2014.
- H. Goëau, P. Bonnet, and A. Joly, “Plant identification based on noisy web data: the amazing performance of deep learning (LifeCLEF 2017),” CLEF working notes, vol. 2017, 2017.
- S. H. Lee, C. S. Chan, S. J. Mayo, and P. Remagnino, “How deep learning extracts and learns leaf features for plant classification,” Pattern Recognition, vol. 71, pp. 1–13, 2017.
- T. Mizoguchi, A. Ishii, H. Nakamura, T. Inoue, and H. Takamatsu, “Lidar-based individual tree species classification using convolutional neural network,” Proc.SPIE, vol. 10332, pp. 10 332 – 10 332 – 7, 2017.
- M. Cimpoi, S. Maji, and A. Vedaldi, “Deep Filter Banks for Texture Recognition and Segmentation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), jun 2015.
- L. Sharan, R. Rosenholtz, and E. Adelson, “Material perception: What can you see in a brief glance?” Journal of Vision, vol. 9, no. 8, pp. 784–784, Aug 2009.
- D. Marcos, M. Volpi, and D. Tuia, “Learning rotation invariant convolutional filters for texture classification,” in 2016 23rd International Conference on Pattern Recognition (ICPR), dec 2016, pp. 2012–2017.
- T. Ojala, T. Mäenpää, M. Pietikäinen, J. Viertola, J. Kyllönen, and S. Huovinen, “Outex - new framework for empirical evaluation of texture analysis algorithms.” 2002, proc. 16th International Conference on Pattern Recognition, Quebec, Canada, 1:701 - 706.
- M. Švab, “Computer-vision-based tree trunk recognition,” 2014.
- L. J. Blaanco, C. M. Travieso, J. M. Quinteiro, P. V. Hernandez, M. K. Dutta, and A. Singh, “A bark recognition algorithm for plant classification using a least square support vector machine,” in 2016 Ninth International Conference on Contemporary Computing (IC3), aug 2016, pp. 1–5.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016, pp. 770–778.
- A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
- L. Shen, Z. Lin, and Q. Huang, “Relay backpropagation for effective learning of deep convolutional neural networks,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Cham: Springer International Publishing, 2016, pp. 467–482.
- J. Stanislaw, Z. Kenton, D. Arpit, N. Ballas, A. Fischer, Y. Bengio, and A. Storkey, “Finding flatter minima with sgd,” in ICLR Workshop, 2018.
- L. Trottier, P. Giguère, and B. Chaib-draa, “Multi-Task Learning by Deep Collaboration and Application in Facial Landmark Detection,” ArXiv e-prints, Oct. 2017.