3D Reconstruction of Incomplete Archaeological Objects Using a Generative Adversary Network

3D Reconstruction of Incomplete Archaeological Objects Using a Generative Adversary Network

Renato Hermoza                   Ivan Sipiran
Pontificia Universidad Católica del Perú, Lima, Peru
renato.hermoza@pucp.edu.pe
Abstract

We introduce a data-driven approach to aid the repairing and conservation of archaeological objects: ORGAN, an object reconstruction generative adversarial network (GAN). By using an encoder-decoder 3D deep neural network on a GAN architecture, and combining two loss objectives: a completion loss and an Improved Wasserstein GAN loss, we can train a network to effectively predict the missing geometry of damaged objects. As archaeological objects can greatly differ between them, the network is conditioned on a variable, which can be a culture, a region or any metadata of the object. In our results, we show that our method can recover most of the information from damaged objects, even in cases where more than half of the voxels are missing, without producing many errors.

1 Introduction

Figure 1: From left to right: complete objects, objects with simulated fractures, reconstruction from ORGAN and a second iteration with ORGAN.

During archaeological excavations, it is common to find fractured or damaged objects. The process to repair and conserve these objects is tedious and delicate, objects are often fragile and the time for manipulation must be short. With the recent progress in geometry processing and shape analysis, one can address the repair problem from a computational perspective. The process starts with a 3D scanning of the object. Then, an algorithm analyzes the 3D shape to guide the conservation process. Previous experience shows that unsupervised shape analysis to repair damaged objects give good approximations to conservators, and therefore reduce the workload and time of the processing [20].

The main problem is the prediction of missing geometry of damaged objects. Current methods assume that man-made objects exhibit some kind of structure and regularity [18]. The most common type of structure used is symmetry. If an algorithm can detect symmetries in the object, we can apply the symmetric transformation to create what is missing. Although this approach is a promising direction, there are still some drawbacks: 1) If the object is too damaged, the symmetries cannot be recovered from the object itself. 2) The computational time to search for symmetries is still high.

Deep learning techniques have proved to be highly successful in processing 3D voxelized inputs [29, 4] and has also been recently used with generative adversarial networks (GANs) [5] architectures [27]. We hypothesize that the aforementioned drawbacks can be addressed by a data-driven approach. It means we can learn the structure and regularity from a collection of complete known objects (in training time) and use them to complete and repair incomplete damaged objects (in testing time).

In this work we propose an object reconstruction generative adversarial network (ORGAN), for which we employ a 3D convolutional neural network (CNN) with skip-connections, as a generator on a Conditional GAN (CGAN) [17] architecture. With two optimization targets: a mean absolute error (MAE) and an Improved Wasserstein GAN (IWGAN) [6] loss, the final model is encouraged to find solutions that resemble the structure of real objects. An example of a reconstructed object is shown in Figure 1. The code for the project is publicly available on a GitHub repository 111https://github.com/renato145/3D-ORGAN.

2 Related work

Shape completion has gained important attention in recent years. In consequence, many approaches have been proposed so far. Pauly et al[21] proposed to complete a 3D scans using similar objects from a shape repository. A post-processing step of non-rigid alignment fix the transitions between the input geometry and the generated geometry. On the other hand, Huang et al[10] computed feature-conforming fields which were used to complete missing geometry. Another interesting approach is the use of local features to guide the process of completion. For example, Harary et al[8] proposed to transfer geometry between two 3D objects using a similarity assessment on Heat Kernel Signatures[25]. Likewise, Harary et al[7] proposed to complete an object with knowledge extracted from curves around the missing geometry.

An important concept that has been used to synthesize geometry is the symmetry. If one can get the information about the symmetry of an object, that information could be used to replicate portions of the object until completing it. Thrun and Wegbreit[26] proposed to complete partial scans using probabilistic measure to score the symmetry of a given object. Similarly, Xu et al[30] designed an method to find the intrinsic reflectional symmetry axis of tubular structures, and therefore they used that information to complete human-like 3D shapes. More related to archaeological objects, Sipiran et al[24] defined a strategy to find symmetric correspondences in damaged objects, which were used later to synthesize the missing geometry. Likewise, Mavridis et al.[15] formulated the problem of symmetry detection as an optimization problem with sparse constraints. The output of this optimization provided good hints for the process of completion of broken archaeological objects. More recently, Sipiran[23] described an algorithm to determine the axial symmetry of damaged objects. This symmetry was subsequently used to restore objects with good precision.

2.1 Deep learning methods

With the availability of 3D shape databases [2, 12, 28], deep learning approaches have started to being applied on tasks involving 3D data. In Wu et al[29] a 3D convolutional neural network (CNN) is proposed for classification and shape completion from 2.5D depth maps, more recently and relevant to our case, Dai et al[4] proposes an encoder-predictor network (which follows the idea of an autoencoder) for the task of shape completion.

Recent advances in generative models with the use of generative adversarial networks (GANs) [5] have shown an effective aid in tasks that require the recuperation of missing information while giving plausible-looking outputs. By adding a GAN loss to our model, the network is encouraged to produce outputs that reside on the manifold of the trained objects. Some examples can be seen on tasks like: super-resolution [13], image completion [31] and 3D object reconstruction from 2D images [27].

One of the drawbacks of GAN models is the instability on their training, an alternative to traditional GAN is the Wasserstein GAN algorithm (WGAN) [1] which shows an improved stability by minimizing the Wasserstein distance between distributions instead of the Kullback-Leibler divergence (KL), but this algorithm forces the critic to model only K-Lipschitz functions by clamping the weights to a fixed box. A new Improved Wasserstein GAN (IWGAN) [6] proposes an alternative method for enforcing the Lipschitz constraint, penalizing the norm of the gradient of the critic with respect to its input, resulting in faster convergence and higher quality samples.

3 Method

The goal of our method is to take a 3D scan of a fractured object as input and predict the complete object as output. To achieve this, we use a shape completion network that represents 3D objects as a voxel grid of size , the completion network is then taken as the generator in a GAN architecture. The final training objective combines the completion network loss and the adversarial loss .

3.1 Data generation

In order to train the network, pairs of fractured and complete objects are needed, to simulate fractures from complete objects we sample random voxels from the occupied grid, then, at the each voxel a fracture of random size is created, having a probability of having a spherical shape, and of having a cubic shape. All our models were trained with , sampled from 1 to 4 and from 3 to 6.

3.2 Shape completion network

Figure 2: Network architecture for the generator.
Figure 3: Reconstruction GAN architecture, conditioned on the object label.
Figure 4: Network architecture for the discriminator.
Figure 5: 3D Squeeze-and-Excitation block.

The network, illustrated on Figure 2, starts with a 3D encoder which compresses the input voxel grid using a series of 3D convolutional layers, the compressed hidden values are then concatenated with the embedded information about the input class label and finally a 3D decoder uses 3D transposed convolutional layers to predict the voxel output. Similarly to U-net architecture [22] and also used on Dai et al[4] for 3D voxels, we add skip connections on the decoder part of the network, concatenating the output of the transposed convolutions with the corresponding outputs of the encoder layers, this way we double the feature map size and allow the network to propagate local structure of the input data in the generated output.

All the layers use ReLU activations and batch normalization, with the exception of the last one that uses tanh activation and no batch normalization. After each convolutional operation a 3D Squeeze-and-Excitation (SE) block [9] is applied with a reduction ratio as used in Hu et al[9], see Figure 5. To train the network we use a L1-norm as the completion loss:

(1)

Where is the target sample, is the incomplete object, is the object label and is the completion network

3.3 Adversarial network architecture

As proposed on Goodfellow et al[5], the GAN algorithm consists of a generator and a discriminator , where captures the data distribution and estimates the probability that a sample came from the training data rather than G. And following the idea of Conditional Generative Adversarial Nets (CGAN) [17], we extend our GAN to a conditional model, by feeding and information about class labels as an additional input layer as showed on Figures 2, 3 and 4.

starts similarly to , compressing the voxels input and concatenating the class label, then is followed by fully connected layers, as showed on Figure 4. All layers of use Leaky ReLU activation functions and no batch normalization, with the exception of the last fully connected layer, that outputs a single value with no activation function. The 3D convolutional layers are, as in , followed by a SE block.

Combining the ideas of IWGAN and CGAN, we define loss function as:

(2)

Where is the gradient penalty coefficient, is the class label data, is the generator distribution, is the target distribution and is the distribution sampling uniformly on straight line between and . We use as proposed on Gulrajani et al[6].

Finally, to define loss function, we combine a typical WGAN loss with Equation 1:

(3)

Where controls the learning contribution of the completion loss. The final model uses .

4 Experiments

4.1 Data

We perform the validations of the models and hyperparameter search using the ModelNet10 dataset [28], a subset of a large 3D CAD model dataset (ModelNet), containing 10 classes (bathtub, bed, chair, desk, dresser, monitor, nightstand, sofa, table and toilet) divided on 3991 objects as the train set and 908 as the test set.

For the final model, we built a custom dataset, starting with ModelNet10 and adding an 11th class of archaeological looking objects, as the objective of this model is to reconstruct damaged archaeological objects. This new class contains 659 handpicked objects from ModelNet40 and 492 handpicked objects 3D Pottery dataset [12]. The resulting dataset is divided into 4923 objects for the train set and 1127 for the test set. Finally, we show how the model performs on real fractured objects, obtained from 3D scans of archaeological objects from the Larco museum 222http://www.museolarco.org.

For all the experiments on this work we voxelize each object at a resolution of voxels using Binvox [16, 19] and scale the binary voxels to .

4.2 Training details

The models were implemented using Keras framework [3] with Tensorflow [14] as backend. We trained all the experiments on a NVIDIA TITAN Xp using a batch size of 64, with the generator model training every 5 batches and the discriminator model training every batch. For optimization we use Adam [11] with , and , as proposed on Gulrajani et al[6]. The final model is trained for 400 epochs.

4.3 Performance of the final network

Skip-connections Squeeze-and-excite L1 loss
No No 0.0611
Yes No 0.0061
Yes Yes 0.0057
Table 1: L1 loss results on different network settings against the test data set. Each setting was run for 100 epochs.
Label Input loss Output loss
Archeology 0.0209 0.0077
Bathtub 0.0180 0.0068
Bed 0.0235 0.0058
Chair 0.0185 0.0103
Desk 0.0202 0.0114
Dresser 0.0192 0.0027
Monitor 0.0181 0.0062
Nightstand 0.0203 0.0052
Sofa 0.0231 0.0063
Table 0.0129 0.0039
Toilet 0.0216 0.0100
Table 2: L1 loss results by class label for the test data set.
Figure 6: Model average performance against different randomly generated fractures. The number and size of the fractures vary between 1 and 15. We can see that even when 40% of the voxels are missing, we can still recover 80% of the information.
Figure 7: Results on different labels with a maximum fracture size of 6. First row: Complete objects. Second row: Objects with fractures. Third row: Results from our model.
Figure 8: Results on different labels with a maximum fracture size of 12. First row: Complete objects. Second row: Objects with fractures. Third row: Results from our model. Fourth row: A second iteration with our model.
Figure 9: Results from scanned archaeological objects. First row: Scanned archaeological objects. Second row: Obtained voxels. Third row: Results from our model.
Figure 10: Unexpected artifacts being reconstructed. First row: Scanned archaeological objects. Second row: Obtained voxels. Third row: Results from our model.

In order to choose the final network configuration, we trained different settings for 200 epochs each. The results show that using skip-connections is important for the network performance, and SE blocks can also give a performance boost, as seen in Table 1.

When using the model at inference time, only the generator is used. We tested the trained model on our custom dataset, reducing significantly the loss as seen on Table 2. On Figure 6 we show the performance against different fragment sizes, it can be seen that the model can recover most of the information, even when more than half of the voxels are missing, without producing many misplaced voxels. Results on different labels from the test data set are shown on Figure 7. In some cases, where the number of missing voxels was greater than 2000, an additional run on the model was performed against the first result as shown on Figures 1 and 8.

We also performed tests on real fractured archaeological objects, results can be seen on Figure 9. Some of the real objects greatly differ from the ones used in trained, on Figure 10, we show some examples of unexpected fragments being reconstructed. This happened in objects whose structure greatly differed from that of the objects used in training.

5 Conclusion

This paper presents a method to predict the missing geometry of damaged objects: ORGAN, an object reconstruction generative adversarial network. Our results show that we can accurately recover an object structure, even in cases where the missing information represents more than half of the input occupied voxels. When tested on real archaeological objects, we showed some cases of unexpected artifacts being reconstructed, this was expected since the objects in this cases had different structures from the ones on the training set. As the objective of this work is to aid the conservation of archaeological objects, and knowing that this objects can greatly differ from one culture to another, we prepared our method to acts as a conditional model on a variable, which can be a culture, a region or any metadata of the object. Then, an important task is to increase the amount of data we have about archaeological objects, which we leave as our future work.

References

  • [1] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein GAN. arXiv:1701.07875 [cs, stat], Jan. 2017. arXiv: 1701.07875.
  • [2] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu. ShapeNet: An Information-Rich 3d Model Repository. Technical Report arXiv:1512.03012 [cs.GR], Stanford University — Princeton University — Toyota Technological Institute at Chicago, 2015.
  • [3] F. Chollet and others. Keras. https://github.com/fchollet/keras, 2015.
  • [4] A. Dai, C. R. Qi, and M. Nießner. Shape Completion using 3d-Encoder-Predictor CNNs and Shape Synthesis. arXiv:1612.00101 [cs], Nov. 2016. arXiv: 1612.00101.
  • [5] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative Adversarial Nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc., 2014.
  • [6] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. Improved Training of Wasserstein GANs. arXiv:1704.00028 [cs, stat], Mar. 2017. arXiv: 1704.00028.
  • [7] G. Harary, A. Tal, and E. Grinspun. Context-based coherent surface completion. ACM Trans. Graph., 33(1):5:1–5:12, Feb. 2014.
  • [8] G. Harary, A. Tal, and E. Grinspun. Feature-preserving surface completion using four points. Computer Graphics Forum, 33(5):45–54, 2014.
  • [9] J. Hu, L. Shen, and G. Sun. Squeeze-and-Excitation Networks. arXiv:1709.01507 [cs], Sept. 2017. arXiv: 1709.01507.
  • [10] H. Huang, M. Gong, D. Cohen-Or, Y. Ouyang, F. Tan, and H. Zhang. Field-guided registration for feature-conforming shape composition. ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2012), 31:171:1–171:11, 2012.
  • [11] D. P. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations (ICLR), 2015.
  • [12] A. Koutsoudis, G. Pavlidis, V. Liami, D. Tsiafakis, and C. Chamzas. 3d Pottery content-based retrieval based on pose normalisation and segmentation. Journal of Cultural Heritage, 11(3):329–338, July 2010.
  • [13] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. arXiv:1609.04802 [cs, stat], Sept. 2016. arXiv: 1609.04802.
  • [14] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Y. Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/, 2015.
  • [15] P. Mavridis, A. Andreadis, and G. Papaioannou. Fractured object reassembly via robust surface registration. In Eurographics 2015 Short Papers, Eurographics 2015, 2015.
  • [16] P. Min. binvox. http://www.patrickmin.com/binvox, 2004.
  • [17] M. Mirza and S. Osindero. Conditional Generative Adversarial Nets. arXiv:1411.1784 [cs, stat], Nov. 2014. arXiv: 1411.1784.
  • [18] N. J. Mitra, M. Pauly, M. Wand, and D. Ceylan. Symmetry in 3D Geometry: Extraction and Applications. Computer Graphics Forum, 32(6):1–23, 2013.
  • [19] F. S. Nooruddin and G. Turk. Simplification and repair of polygonal models using volumetric techniques. IEEE Transactions on Visualization and Computer Graphics, 9(2):191–205, Apr. 2003.
  • [20] G. Papaioannou, T. Schreck, A. Andreadis, P. Mavridis, R. Gregor, I. Sipiran, and K. Vardis. From reassembly to object completion: A complete systems pipeline. J. Comput. Cult. Herit., 10(2):8:1–8:22, Mar. 2017.
  • [21] M. Pauly, N. J. Mitra, J. Giesen, M. Gross, and L. J. Guibas. Example-based 3d scan completion. In Proceedings of the Third Eurographics Symposium on Geometry Processing, SGP ’05, Aire-la-Ville, Switzerland, Switzerland, 2005. Eurographics Association.
  • [22] O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Lecture Notes in Computer Science, pages 234–241. Springer, Cham, Oct. 2015.
  • [23] I. Sipiran. Analysis of partial axial symmetry on 3d surfaces and its application in the restoration of cultural heritage objects. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  • [24] I. Sipiran, R. Gregor, and T. Schreck. Approximate symmetry detection in partial 3d meshes. Computer Graphics Forum (proc. Pacific Graphics), 33:131–140, 2014.
  • [25] J. Sun, M. Ovsjanikov, and L. J. Guibas. A Concise and Provably Informative Multi-Scale Signature Based on Heat Diffusion. Comput. Graph. Forum, 28(5), 2009.
  • [26] S. Thrun and B. Wegbreit. Shape from symmetry. In Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, volume 2, pages 1824–1831 Vol. 2, Oct 2005.
  • [27] J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum. Learning a Probabilistic Latent Space of Object Shapes via 3d Generative-Adversarial Modeling. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 82–90. Curran Associates, Inc., 2016.
  • [28] Z. Wu, S. Song, A. Khosla, X. Tang, and J. Xiao. 3d shapenets for 2.5 d object recognition and next-best-view prediction. arXiv:1406.5670 [cs.CV], 2014.
  • [29] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1912–1920, 2015.
  • [30] K. Xu, H. Zhang, A. Tagliasacchi, L. Liu, G. Li, M. Meng, and Y. Xiong. Partial Intrinsic Reflectional Symmetry of 3D Shapes. ACM Trans. Graph., 28(5):138:1–138:10, 2009.
  • [31] R. A. Yeh, C. Chen, T. Y. Lim, A. G. Schwing, M. Hasegawa-Johnson, and M. N. Do. Semantic Image Inpainting with Deep Generative Models. arXiv:1607.07539 [cs], July 2016. arXiv: 1607.07539.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
44763
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description