Ancient Coin Classification Using Graph Transduction Games
Abstract
Recognizing the type of an ancient coin requires theoretical expertise and years of experience in the field of numismatics. Our goal in this work is automatizing this timeconsuming and demanding task by a visual classification framework. Specifically, we propose to model ancient coin image classification using Graph Transduction Games (GTG). GTG casts the classification problem as a noncooperative game where the players (the coin images) decide their strategies (class labels) according to the choices made by the others, which results with a global consensus at the final labeling. Experiments are conducted on the only publicly available dataset which is composed of 180 images of 60 types of Roman coins. We demonstrate that our approach outperforms the literature work on the same dataset with the classification accuracy of 73.6% and 87.3% when there are one and two images per class in the training set, respectively.
I Introduction
Ancient coins, that depict cultural, political and military events, natural phenomena, ideologies and portraits of god and emperors are important source of information for historians and archaeologists. Recognizing the type of an ancient coin requires theoretical expertise and years of experience in the field of numismatics. A common way to detect the period of a discovered coin is searching through the manual books where ancient coins are indexed [4] which requires a highly time consuming labor. Our goal in this paper is automatizing recognition of Roman coins by employing computer vision and pattern recognition techniques. Automatizing such a manual procedure not only provides faster processing time but also can support historians and archaeologists for a more accurate decision. A visual classification framework for ancient coin recognition can also be used at museums or by individual collectors to organize large collections of coins.
From the computer vision point of view, classification of ancient coin images is a highly challenging task. One of the difficulties arises from existence of high number of types (i.e. classes) in ancient coins (e.g. Portuguese coins from medieval period and coins from Roman Republic compose over 1500 [17] and 550 [4] different classes, respectively), while most of the classes include few known specimens as mentioned in [17, 23]. Moreover, intraclass variation is large due to local spatial variations arising from missing parts and degradations on the coins, and manual manufacturing of coins by different engravers. Another reason of large intraclass variation is the metallic structure of these coins yields to strong reflection and shading variations so the appearance of the same coin changes significantly under different lighting conditions. Another challenge in ancient coin classification is the typical low interclass variations due to high global similarity between classes [22]. Images from two coin classes are presented in Fig. 1 to demonstrate the challenges of large intraclass and low interclass variations.
Ancient coin classification can be accomplished by adopting one of the following approaches for classifiers [3]: (i) learningbased classifiers, where the parameters of the classifier (e.g. Deep Neural Networks, SVM, Random Forests, etc.) are learned from data in an intensive training phase. (ii) nonparametric classifiers, where the classification decision is directly based on data without pursuing any training phase (e.g. Nearest Neighbor based classifier). Although the first group proved to be superior to the second one, they require extraction of highly discriminative features (possibly from abundant training data) for robust classification. Moreover, pursuing such a time consuming training phase can be impractical for handling dynamic databases where new classes are included continuously.
In this paper, we adopt a nonparametric classifier for ancient coin classification, which is preferable under existence of aforementioned challenges, i.e. large intraclass and low interclass variation and lack of abundant training data. We have followed the same approach in [22, 20], i.e. our nonparametric classifier uses a dissimilarity measurement derived from costs of dense matching of SIFT features. Similar to [22, 20], for dense feature matching we use SIFT flow [11], a flow estimation technique developed for image alignment. SIFT flow preserves discontinuity so allows matching objects that locate at different parts of image. This property of SIFT flow makes it well suited for coin images [22], i.e. it helps to deal with large intraclass variation since images from the same class has similar spatial arrangement of features. Additionally, defining similarity between two coin images based on local matches between them helps to deal with low interclass variation, since two classes mostly differ from each other in variations at local regions.
Differently from [22, 20], in this work we do not use a greedy Nearest Neighbor (NN) based classifier where a query image is labeled with the class of its nearest (most similar) image in the dataset. Instead, we use a semisupervised learning approach, namely Graph Transduction Game (GTG) [6], for ancient coin classification. The GTG casts the classification problem as a noncooperative game where the players (the coin images) decide their strategies (class labels) according to the choices made by the others, which results with a global consensus at the final labeling. Experimenting on a smallscale ancient coin dataset having the aforementioned classification challenges, we show that the notion of label consistency [8] provided by GTG brings significant performance gain over the conventional NNbased classifier for this challenging problem.
Ii Previous works
One of the main problems of ancient coin image analysis that is addressed in the literature is coin identification where the goal is recognizing a specific coin instance instead of a coin type [7, 9]. This type of application finds usage at identification of stolen coins. Most of the other works have focused on coin type recognition (or coin classification) which has found a wider range of practical usage. A number of works [10, 22, 20, 23] employed NNbased classifier where the class of a query coin image is assigned with the label of its most similar one in the training set. Among these, [10] defined coin similarity by number of matched SIFT features that were detected sparsely on the images, while [22, 20] employed dense matching costs of SIFT flow as dissimilarity metric. In [23], the authors used densely computed illuminationinvariant LIDRIC features and fusing several similarity scores that point out the matching quality they employed an overall similarity score. High performance results are reported in these works although the employed datasets were quite smallscale, i.e. classification accuracies of 90% [10] and 82% [20] are obtained for the datasets with 390 images of 3 classes and 180 images of 60 classes, respectively.
Other works employed learningbased classifiers. Earlier attempts [1, 2] relied on Bag of Visual Words based representation of local image features where a visual dictionary is learned from a training set and classification is achieved with SVM in [1] and GMM in [2]. Recently, Schlag and Arandjelovic proposed to use a deep convolutional neural network for Roman coin classification in [18]. They accomplish training with a large set of images, i.e. around 20K images of 83 classes, and reported around 83% accuracy on 10k images.
A significant obstacle at employing learningbased classifiers for this particular research problem is deficiency of publicly available datasets. A number of works employed datasets of Sassanian dynasty coins [15], some others focused on medieval coins [17], and most of them have worked on coins of the Roman Republic [1, 2, 18, 22, 20, 23]. However, the only publicly available ancient coin dataset is published by Zambanini and Kampel which is composed of 180 images of 60 Roman coin classes [20] which we experimented on in this work.
Iii Graph Transduction Game
The Graph Transduction Game (GTG) [6], is a semisupervised learning method which has recently found a renewed interest and successfully applied in different contexts, e.g. bioinformatics [19] and label augmentation problem [5]. The GTG casts the problem in terms of a noncooperative multiplayer game, in which the objects (or images of a dataset) are the players while the possible strategies are the class labels. The idea is, randomly taking two players, they both choose a strategy with a certain probability and receive a payoff which is proportional to the agreement of the chosen strategies (labels). Being a noncooperative game is in their own interest to maximize their payoff, hence choosing the labels with the higher agreement. Then, the game is played until all the objects have chosen a strategy (label) and none of them would like to change their membership hypothesis. This particular condition is known as Nash Equilibria [14]. Once the game reaches an equilibrium, every player plays its best strategy which correspond to a consistent labeling [8, 13]. A peculiarity of GTG is that the consistency is a global property which is not related to a single player but achieved for all of the players.
For the sake of completeness we recap some basic concepts on gametheory in here. Given a set of players (i.e. images of our dataset) and a set of possible pure strategies (the set of labels):

mixed strategy: a mixed strategy is a probability distribution over the possible strategies for player . Then, , where is the standard dimensional simplex and is the probability of player to choose the pure strategy .

strategy space: it corresponds to the set of all mixed strategies of the players , which is represented as a stochastic matrix of size . The starting point of the game is defined by a proper setting of .

utility function: it is responsible for computing the gain obtained by the th player when it chooses a mixed strategy . In particular .
In this context, the players are separated into labeled () and unlabeled () sets^{1}^{1}1In terms of standard learning algorithm, the set of labeled players correspond to the training set while the unlabeled ones to the test set.. The strategy space is initialized in two different ways based on the fact that an object is labeled or unlabeled. A onehot vector is assigned to each of the labeled objects, since their labels are known:
(1) 
whereas, since no prior knowledge is available for the unlabeled objects, the same probability of all labels is assigned to them:
(2) 
Payoff function
The payoff function reflects the likelihood for a player (object) to choose a particular strategy (label), considering the similarities between labeled and unlabeled players. It provides that more similar players are more likely to influence each other in choosing one of the possible strategies (labels).
Formally, given a player and a strategy the utility function is as follows:
(3)  
(4)  
(5) 
where and are the labeled and unlabeled nearest neighbors of , respectively. Here, and are the payoffs received by player while it uses the strategy and plays the mixed strategy , respectively. The matrix is the partial payoff matrix between players and , which is computed as [6], where is the similarity between player and and is the identity matrix of size .
Players similarity
Finding Nash Equilibria
In order to find a Nash Equilibria of the game we used a result, named as Replicator Dynamics (RD) [12], from Evolutionary Game Theory[21]. The RD are dynamical systems that mimic a Darwinian selection process on a set of strategies for each player. The underlying idea is it favors the fittest strategies for their survival while the others become extinct.
More formally, the RD are defined as follows:
(7) 
where is the probability of strategy at time for player (see Eq. 4) and is the expected payoff of the entire mixed strategy (see Eq. 5). The Eq. 7 is iterated until convergence^{2}^{2}2Convergence criteria: i) the distance between two successive steps is or ii) a certain amount of iterations is reached, i.e. typically 20 iterations are sufficient. (See [16] for a detailed analysis). Once the convergence of Eq.7 is reached, we simply get the index of the maximum value in the th row of in order to label the th object.
Iv Ancient Coin Classification using Graph Transduction Game (GTG)
By considering the training set of coin images as the labelled players, GTG can be applied for ancient coin classification problem to estimate the labels of the test set images, i.e. unlabelled players. We list the steps that we have employed for the application of GTG for ancient coin classification as follows:
Feature extraction
We compute two type of features on the images: (i) In order to analyze local similarities, we compute 128dimensional SIFT features in the local neighborhood of every image pixel that results with a tensor named as SIFTimage [11]; (ii) In order to analyze global similarities between images we compute CNN features. Specifically, since our dataset is quite small, which makes a CNN training unfeasible, we apply transfer learning by using a CNN architecture pretrained on ImageNet. Finally, for each input image we get its feature from the output of the last fullyconnected layer of the CNN.
Initialization of the strategy space
Computation of similarity between objects
A correct choice of computation for the similarity between images is important to avoid a failure at label estimation. We employ different schemes of similarity computation regarding to the extracted feature types:
i. Similarity between local features of images: It is demonstrated in [22] that matching scores of SIFT flow technique are powerful dissimilarity metric for ancient coin classification. In SIFT flow, SIFTimages are matched along the flow vectors and optimal correspondences are found by minimizing an energy function ( in [11]) using duallayer belief propagation [11]. Since runtime of such optimization scales up with the image size, authors of [11] proposed to employ coarsetofine search which results with faster computation and better performance of matching. Similar to [20], in this work we used the minimum energy value, say , (to which SIFT Flow algorithm converges at the finest level of the coarsetofine search) as a dissimilarity metric between image and , i.e. we used in Eq. 6.
Execution of transduction game
Giving the similarities to the GTG, it starts to play the game between players, i.e. images, until convergence. We get the final probabilities of strategies, i.e. labels, for the unlabeled objects at the output and we assign the object with the strategy that could get the highest maximum probability.
V Experiments
Dataset
We experimented on the only published^{3}^{3}3http://cvl.tuwien.ac.at/research/cvldatabases/coinimagedataset/ ancient coin dataset [20] which is acquired at Coin Cabinet of the Museum of Fine Arts in Vienna, Austria. The dataset is composed of 180 images (reverse sides of the coins that includes motifs and legends) of 60 classes with 3 images in each class. Images are resized to pixels as in [20].
Experimental setup
Since we have experimented on the same dataset, we followed the same experimental setting with [20] to make a fair comparison of techniques. In [20], accepting one of the coins as a query image (or test image), the remaining one or two images per class are used to create the training set. At each classification run, nearest neighbor of the query image is searched in the training set. This procedure leads to 180 and 360 classification runs when two and one training images per class is used. When the training set is created by two images per class, the nearest neighbor search is performed through accumulated dissimilarity values of each training set image over classes.
Adopting the same experimental setting in our approach, we create a dissimilarity matrix with the entries computed as in [20], i.e. as mentioned in Section IV.c. Then we symmetrize it (by getting maximum of entries around diagonal) before giving input to the GTG algorithm. Additionally, at each iteration we substitute the test image and training images as unlabeled object and labeled objects, respectively to be used in GTG and we get the class label of the unlabeled object in the output. In all experiments, the parameter of the neighboring set in Eq. 3 is set to 2.
Performance evaluation
We performed GTG by employing two feature types and with the corresponding dissimilarity metrics as explained in Section IV. In the first experiment, we compute offtheshelf CNN features by DenseNet201 which is one of the stateoftheart CNN architectures where we use the Euclidean distance metric to measure the dissimilarity between the features. In the second experiment, by employing densely computed local SIFT features we use matching costs of SIFT flow as dissimilarity measure. The performance results of these experiments and comparison with the stateoftheart work on the same dataset [20] are given in Table I.
It can be seen in Table I that the lowest performance results for both training set sizes are obtained when we use the CNN features. This is an expected outcome, because CNN features provide a global description of images and a high global similarity exists between different classes in this coin dataset. We could outperform [20] that employs a NNbased classifier, by using the GTG for ancient coin classification by 73.6% and 87.2% classification accuracy when the training set is constructed from one and two images per class, respectively. We additionally checked the performance of conventional NNbased classifier which does not adopt the accumulation of classwise dissimilarities (that were adopted at [20]), when there are two images per class in the training set. In that case, we got 81.67% accuracy which was slightly lower than the reported performance (83.3%) in [20].
Training set: 1 image per class  Training set: 2 images per class  

Technique  Correct classifications  Classification accuracy  Correct classifications  Classification accuracy 
CNN features + Euclidean distance + GTG  188 / 360  52.2%  113 / 180  62.8% 
Dense SIFT + Matching cost + NN [20]  257 / 360  71.4%  150 / 180  83.3% 
Dense SIFT + Matching cost + GTG  265 / 360  73.6%  157 / 180  87.2% 
In Fig. 2, we present two misclassifications of the proposed approach. It can be seen that the misclassifications are mostly due to low variability between different classes.
Vi Conclusion
In this paper, we studied the ancient coin classification problem using Graph Transduction Games (GTG) which adopts the approach of nonparametric classifier. The GTG is a gametheoretic semisupervised learning algorithm, grounded on the notion of label consistency, in which the final labeling of the objects is achieved by reaching an equilibrium condition between all labeling hypothesis. Our experimental results show that GTG works better for the problem of ancient coin classification, which is a highly complex problem due to large intraclass and low interclass variations, compared to conventional nearest neighbor based nonparametric classifiers that does not consider global agreement at labeling choices of all dataset images.
Acknowledgment
The authors would like to thank Sebastian Zambanini, Ismail Elezi, Leulseged Tesfaye Alemu and Alessandro Torcinovich for their invaluable advices, sharing and helps at various technical issues.
References
 [1] H. Anwar, S. Zambanini, and M. Kampel, “Coarsegrained ancient coin classification using imagebased reverse side motif recognition,” Machine Vision and Applications, vol. 26, no. 23, pp. 295–304, 2015.
 [2] O. Arandjelovic, “Automatic attribution of ancient roman imperial coins,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 1728–1734.
 [3] O. Boiman, E. Shechtman, and M. Irani, “In defense of nearestneighbor based image classification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8.
 [4] M. H. Crawford, Roman republican coinage. Cambridge University Press, 1974, vol. 2.
 [5] I. Elezi, A. Torcinovich, S. Vascon, and M. Pelillo, “Transductive label augmentation for improved deep network learning,” arXiv preprint arXiv:1805.10546, 2018.
 [6] A. Erdem and M. Pelillo, “Graph transduction as a noncooperative game,” Neural Computation, vol. 24, no. 3, pp. 700–723, 2012.
 [7] R. HuberMörk, S. Zambanini, M. Zaharieva, and M. Kampel, “Identification of ancient coins based on fusion of shape and local features,” Machine vision and applications, vol. 22, no. 6, pp. 983–994, 2011.
 [8] R. A. Hummel and S. W. Zucker, “On the foundations of relaxation labeling processes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 3, pp. 267–287, 1983.
 [9] M. Kampel, R. HuberMörk, and M. Zaharieva, “Imagebased retrieval and identification of ancient coins,” IEEE Intelligent Systems, vol. 24, no. 2, pp. 26–34, 2009.
 [10] M. Kampel and M. Zaharieva, “Recognizing ancient coins based on local features,” in International Symposium on Visual Computing, 2008, pp. 11–22.
 [11] C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its applications,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 5, pp. 978–994, 2011.
 [12] J. Maynard Smith, Evolution and the Theory of Games. Cambridge University Press, 1982.
 [13] D. A. Miller and S. W. Zucker, “Copositiveplus Lemke algorithm solves polymatrix games,” Operations Research Letters, vol. 10, no. 5, pp. 285–290, 1991.
 [14] J. Nash, “Noncooperative games,” Annals of Mathematics, pp. 286–295, 1951.
 [15] S.S. Parsa, M. Sourizaei, M. M. Dehshibi, R. E. Shateri, and M. R. Parsaei, “Coarsegrained correspondencebased ancient sasanian coin classification by fusion of local features and sparse representationbased classifier,” Multimedia Tools and Applications, vol. 76, no. 14, pp. 15 535–15 560, 2017.
 [16] M. Pelillo, “The dynamics of nonlinear relaxation labeling processes,” Journal of Mathematical Imaging and Vision, vol. 7, no. 4, pp. 309–323, 1997.
 [17] L. Salgado, “Medieval coin automatic recognition by computer vision,” Ph.D. dissertation, 2016.
 [18] I. Schlag and O. Arandjelovic, “Ancient roman coin recognition in the wild using deep learning based recognition of artistically depicted face profiles,” in IEEE International Conference on Computer Vision Workshop (ICCVW), 2017, pp. 2898–2906.
 [19] S. Vascon, M. Frasca, R. Tripodi, G. Valentini, and M. Pelillo, “Protein function prediction as a graphtransduction game,” Pattern Recognition Letters, 2018 (in press).
 [20] S. Zambanini and M. Kampel, “Coarsetofine correspondence search for classifying ancient coins,” in Asian Conference on Computer Vision, 2012, pp. 25–36.
 [21] J. Weibull, Evolutionary Game Theory. MIT Press, 1997.
 [22] S. Zambanini and M. Kampel, “Automatic coin classification by image matching,” in 12th International conference on Virtual Reality, Archaeology and Cultural Heritage, 2011, pp. 65–72.
 [23] S. Zambanini, A. Kavelar, and M. Kampel, “Classifying ancient coins by local feature matching and pairwise geometric consistency evaluation,” in 22nd International IEEE Conference on Pattern Recognition (ICPR), 2014, pp. 3032–3037.
 [24] L. ZelnikManor and P. Perona, “Selftuning spectral clustering,” in Advances in Neural Information Processing Systems (NIPS), 2005, pp. 1601–1608.