An In-Depth Study on Open-SetCamera Model Identification

An In-Depth Study on Open-Set
Camera Model Identification

Pedro Ribeiro Mendes Júnior, Luca Bondi,  Paolo Bestagini,  Stefano Tubaro,  and Anderson Rocha,  P. R. Mendes Júnior and A. Rocha are with the Institute of Computing, University of Campinas (Unicamp), Av. Albert Einstein, 1251, CEP 13083-852, Campinas, São Paulo, Brazil (e-mail: pedrormjunior@gmail.com / anderson.rocha@ic.unicamp.br).L. Bondi, P. Bestagini and S. Tubaro are with the Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133, Milan, Italy (e-mail: luca.bondi / paolo.bestagini / stefano.tubaro@polimi.it).This material is based on research sponsored by DARPA and Air Force Research Laboratory (AFRL) under agreement number FA8750-16-2-0173. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA and Air Force Research Laboratory (AFRL) or the U.S. Government. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001. This work was supported in part by São Paulo Research Foundation (FAPESP) under the grant #2017/12646-3 (DéjàVu project), and CAPES DeepEyes project.
Abstract

Camera model identification refers to the problem of linking a picture to the camera model used to shoot it. As this might be an enabling factor in different forensic applications to single out possible suspects (e.g., detecting the author of child abuse or terrorist propaganda material), many accurate camera model attribution methods have been developed in the literature. One of their main drawbacks, however, is the typical closed-set assumption of the problem. This means that an investigated photograph is always assigned to one camera model within a set of known ones present during investigation, i.e., training time, and the fact that the picture can come from a completely unrelated camera model during actual testing is usually ignored. Under realistic conditions, it is not possible to assume that every picture under analysis belongs to one of the available camera models. To deal with this issue, in this paper, we present the first in-depth study on the possibility of solving the camera model identification problem in open-set scenarios. Given a photograph, we aim at detecting whether it comes from one of the known camera models of interest or from an unknown device. We compare different feature extraction algorithms and classifiers specially targeting open-set recognition. We also evaluate possible open-set training protocols that can be applied along with any open-set classifier. More specifically, we evaluate one training protocol targeted for open-set classifiers with deep features. We observe that a simpler version of those training protocols works with similar results to the one that requires extra data, which can be useful in many applications in which deep features are employed. Thorough testing on independent datasets shows that it is possible to leverage a recently proposed convolutional neural network as feature extractor paired with a properly trained open-set classifier aiming at solving the open-set camera model attribution problem even to small-scale image patches, improving over state-of-the-art available solutions.

Camera model identification, image forensics, open-set recognition, open-set training protocol.
\newacronym

osnnOSNNOpen-Set Nearest Neighbors \newacronymssvmSSVMSpecialized Support Vector Machines \newacronympisvmPISVM\glstextsvm with Probability of Inclusion \newacronymwsvmWSVMWeibull-calibrated \glstextsvm \newacronymsvmSVMSupport Vector Machines \newacronymocsvmOCSVMOne-class \glstextsvm \newacronymdbcDBCDecision Boundary Carving \newacronymncmNCMNearest Class Mean with cosine distance \newacronymetETextremely randomized trees, a.k.a., Extra-Trees \newacronympsvmPSVM\glstextsvm with Platt’s probability for rejection \newacronym2psvm2PSVM2-phase \glstextsvm \newacronymsoftmaxSOFTMAXthresholding softmax probability \newacronym[first=\glsentrydescdresden]dresdenDresdenDresden Image Database \newacronymisaISA UnicampImage Source Attribution Unicamp \newacronym[first=\glsentrydescflickr]flickrFlickrFlickr Unicamp \newacronymnaNANormalized Accuracy \newacronymaksAKSAccuracy on Known Samples \newacronymausAUSAccuracy on Unknown Samples \newacronymosfmMOSFMopen-set macro-averaging f-measure \newacronymosfmmOSFMopen-set micro-averaging f-measure \newacronymfmMFMtraditional binary-based macro-averaging f-measure \newacronymfmmFMtraditional binary-based micro-averaging f-measure \newacronymdaDADetection Accuracy \newacronymdksDKSDetection on Known Samples \newacronymdusDUSDetection on Unknown Samples \newacronym[symbol=\glsentrytexttp]tpTPtrue positive \newacronym[symbol=\glsentrytextfp]fpFPfalse positive \newacronym[symbol=\glsentrytextfn]fnFNfalse negative \newacronymnetworkopenNetOpenNetwork Open \newacronym[first=\glsentrytextopen]openOpen?? \newacronym[first=\glsentrytextclosed]closedClosed?? \newacronymcnnCNNConvolutional Neural Network \newacronymklosKLOSknown-labeled open space \newacronym[first=\glsentrytextku [Bendale2015]]kuknown unknown?? \newacronym[first=\glsentrytextconv]conv?? \newacronym[first=\glsentrytextip1]ip1?? \newacronym[first=\glsentrytextip2]ip2?? \newacronym[first=\glsentrytextrich]rich?? \newacronym[first=\glsentrytextcfa]cfa??

I Introduction

From social networks to media sharing platforms, digital pictures are spreading all over the Internet at an overgrowing pace. However, a major drawback of this phenomenon is the diffusion of illicit or illegal material online, specially visual content. In order to fight this trend, multimedia forensic researchers have focused on the development of numerous solutions aiming at inferring pieces of information related to the acquisition and editing history of images [Stamm2013, Piva2013, Rocha2011], among others.

A common problem of interest for forensic analysts is camera model identification. This means being capable of detecting which camera model has been used to shoot a given digital photograph based solely on its content. Indeed, this is a first step toward tracking down the author of distributed illicit contents [Kirchner2015] (e.g., pictures related to acts of violence, images linked to terrorist behavior, sexually exploitative imagery of children, among others). Given the social relevance of this problem, in the last few years, a continuous effort has been put forward to the development of more accurate and efficient camera model identification solutions. These can be broadly split into two categories: (i) model-based methods leveraging the study of characteristic traces left behind by specific operations applied by different camera models on acquired images; and (ii) data-driven methods based on machine-learning techniques that seek to “learn” the patterns of such telltales automatically. Considering the first category, we can cite methods relying on traces left by color filter array (CFA) interpolation [Bayram2005, Cao2010, Zhao2016], on histogram equalization footprints [Chen2007a], on traces left by camera lenses [Choi2006], and on characteristic noise analysis [Thai2014]. Considering the second category, in turn, we can cite the works of Chen2015a, Marra2015, and Tuama2016, which extract statistical features in the pixel-domain to train supervised machine-learning classifiers specialized at the problem. More recently, relying upon advancements on deep learning techniques, data-driven solutions based on \glsplcnn have outperformed prior art [Tuama2016a, Bondi2017, Bondi2017a], and are becoming an area’s staple.

(a) Closed-set Classification.
(b) Open-set Detection.
(c) Open-set Classification.
Fig. 1: Different camera model identification problem formulations. \subreffig:closed_classification In the closed-set problem, an image is attributed to a known camera model. \subreffig:open_detection In the open-set detection problem, an image is detected as belonging either to the known set or the unknown set of models. \subreffig:open_classification In the open-set classification problem faced in this work, an image is detected as belonging to the known or unknown model sets and, in the first case, the correct model is also estimated.

The drawback of all aforementioned data-driven techniques—or, more precisely, the evaluation setup to validate such techniques—is that they mainly cope with camera model identification in a closed-set setup. This means that a finite set of camera models is considered when designing the solution, and each image is attributed to one of these models. However, oftentimes analysts must work in open-set scenarios. This means that the investigator must also be able to recognize whether an image does not belong to any of the known models of interest [Kirchner2015].

In this vein, we present herein an in-depth study on open-set camera model attribution based on a supervised learning pipeline. Specifically, we focus on methodologies that perform an analysis at patch level rather than on the whole image, as this opens the door to future development of tampering detection and localization methods as shown by Bondi2017b. To the best of our knowledge, open-set camera model attribution has only been introduced by Gloe2012a and later on approached by Bayar2018. Bayar2018 focus on an open-set binary detection problem, i.e., detecting whether an image comes from a known or unknown camera model. Conversely, we aim to solve the joint problem of (i) detecting whether the image under analysis comes from a known or from an unknown camera model and (ii) determining the image source model if it comes from the set of known models.

In previous work [Jain2014], a general open-set classifier have been proposed along with cross-class validation, which is a method tailored to open-set scenarios that aims at searching for the parameters of the proposed open-set classifier. In parallel, another previous work [MendesJunior2017], also proposing an open-set classifier, introduce the parameter optimization procedure which is also tailored at searching the parameters of their proposed classifier, which shares the same essence of the cross-class validation. In the latter work, authors have suggested as future work the employment of their parameter optimization method as a general grid-search procedure that could be applied to any open-set classifier. In our work, we follow this direction and we evaluate what we call \glsclosed training protocol (the traditional form) and the \glsopen training protocol (with the same essence of the cross-class validation [Jain2014] and the parameter optimization [MendesJunior2017]). We further study those alternatives and we formalize and evaluate what we call the \glsnetworkopen training protocol, specifically tailored for situations in which deep features are employed. As we shall see later on along with the presented results, the equivalence of \glsopen and \glsnetworkopen indicates the \glsopen training protocol as the best and cheaper alternative in terms of data required to be employed.

Camera model identification is not to be confused with camera source attribution at instance level (i.e., distinguishing pictures shot with different devices of the same model) based on sensor pattern noise (SPN) analysis [Lukas2006]. Indeed, SPN-based solutions [Costa2012, Costa2014] exploit a strong correlation-test that is known to sometimes fail in case of unknown camera models [Lukas2006]. However, the same test cannot be applied for model identification, as SPN is a device-specific trace.

In light of these considerations, our key contributions are the following:

  • We study the open-set camera model identification problem analyzing state-of-the-art open-set classification methods.

  • We evaluate the effectiveness of \glsplcnn features, compared to hand-crafted ones, for per-patch classification in open-set setups.

  • We formalize and evaluate open-set training protocols applied to open-set classification methods during training for proper estimate of parameters for the open-set scenario.

  • We carry out the first large-scale testing on the open-set camera model identification problem considering independent datasets and several algorithms, also comparing with known solutions in the literature [Bayar2018].

The best evaluated solution for the problem combines a deep feature extraction method and a state-of-the-art open-set classifier trained with an open-set training protocol of intermediate complexity. This solution works on color patches, making it useful for forgery localization techniques [Bondi2017b]. Moreover, it is capable of reaching state-of-the-art accuracy also in the closed-set framework.

The rest of the paper is structured as it follows. Section II formally introduces the camera model identification problem under different points of view. Section III provides all the details about the algorithmic pipeline used in our evaluation. Section IV reports information about the considered experimental setup. Section V presents the performed experiments and achieved results. Finally, Section VI concludes the paper.

Ii Open-set Camera Model Identification Problem

In this section, we introduce the problem of camera model identification, from the closed-set to the open-set one faced into this paper.

Camera model identification refers to the problem of assigning an image, in a blind fashion, to the camera model that was used to shoot it. This means that no watermarks or side information such as header or EXIF data are used, assuming they will not be available during investigation. Depending on the considered constraints, camera model identification can be cast into different kinds of problems, as shown in Figure 1. In the following, we report the main differences between these problem formulations.

Ii-a Closed-set Classification

Closed-set camera model classification is the problem of assigning an image to a camera model within a known set of possible models, as depicted in Figure 1(a). In this scenario, it is required to assume that the investigator is sure that the camera model of the picture under analysis belongs to the set of candidate models.

Formally, let be a color image acquired with the camera model identified by label . Consider further as the set of labels belonging to known camera model dataset, e.g., available to the analyst when developing the solution. The goal in closed-set camera model identification is to estimate the label associated to the picture under analysis.

This is by far the most widely considered scenario in the literature [Kirchner2015]. However, closed-set classification is bound to fail whenever the analyst has no full knowledge on all the possibly used camera models: in real-case open-set scenarios, it happens that , in which , , is the unknown label that represents any unknown class.

Ii-B Open-set Detection

Relaxing the constraint of knowing all possible camera models, we enter the open-set realm. Indeed, in an open-set scenario, the image under analysis can belong to either known or unknown camera models. In particular, we refer to open-set camera model detection as the problem of detecting whether an image belongs to the set of known models, or to the set of unknown ones, as depicted in Figure 1(b).

Formally, the goal of open-set camera model detection is to estimate whether or for a given image . This is basically a two-class classification problem that does not provide the analyst with information on the actual used camera model. To infer the possible used camera model, an open-set detection solution should be paired with a subsequent step of closed-set classification, as proposed by Bayar2018.

Ii-C Open-set Classification

The most complete camera model identification problem formulation is that of open-set classification. As a matter of fact, this refers to the problem of jointly estimating whether the image under analysis comes from a camera in the known set of models or from an unknown model and, if condition one holds, also detecting which model it is, as depicted in Figure 1(c).

Formally, the goal of open-set camera model identification is to estimate for a given image .

Typically, to properly deal with an open-set classification problem, three different kind of data are employed:

  • Known data (train and test): images shot with models that the analyst must correctly detect and classify.

  • Known-unknown data (optional; train and test): images shot with models available at training time but assumed as unknown in order to model unknown camera models at algorithm validation time. Those data might or might not be available.

  • Unknown-unknown data (test only): images shot with models and not used for either training or validation, used to properly evaluate a method’s performance in the wild. Those data only appear for classification once the classifier is trained.

Open-set classification is by far the most complete problem formulation of the overall camera model identification problem. In this paper, we present an algorithmic pipeline to solve this problem, deeply analyzing each building block of the algorithm in all combinations of the alternatives.

Previous works in open-set camera model identification have not fully evaluated the multiclass open-set classification problem. Bayar2018 have considered the performance of the classification methods for detecting known vs unknown and, independently, the closed-set classification performance among the classes. In this latter evaluation, the classifiers work in a closed-set scenario, i.e., they never predicts as unknown. As we shall see in Section V, the results we have obtained in our study are relatively lower in terms of accuracy compared to the ones reported in their work. It happens because our evaluated classifiers always have the ability of predicting test instances as unknown at the same time of choosing which known class they are, in case they are considered as belonging to one of the known ones. It is worth considering that the accuracy in a problem as described in Section II-C tends to be smaller than considering, independently, the detection accuracy and the closed-set accuracy of the methods without the option for rejection, as in the open-set classification problem the classification methods can perform the following types of error: misclassification, false unknown, and false known [MendesJunior2017].

Iii Evaluation Pipeline

In this section, we provide all the details about the factors we evaluate in this work. We first provide an overview of the overall algorithmic pipeline. Then, we focus on each separate block of it, reporting information about all methodologies employed in this paper.

Iii-a Pipeline

To solve open-set camera model attribution, we study the possibility of exploiting a supervised classification strategy leveraging image descriptors tailored to capture camera-based traces proposed in the closed-set scenario literature. Specifically, we follow the pipeline depicted in Figure 2, which is composed by three main modules: {enumerate*}[label=()]

a feature extractor,

a training protocol for preparing training data, and

an open-set classifier. For each module, we investigate the possibility of using different strategies.

Fig. 2: Supervised pipeline for open-set camera model identification. At training time (top), a classifier learns how to associate features extracted from labeled images to labels . At testing time (bottom), the learned model is used to estimate the label from the image under analysis.
(a) \glsclosed strategy.
(b) \glsopen strategy.
(c) \glstextnetworkopen strategy.
Fig. 3: Different training strategies: \subreffig:closed \glsclosed, \subreffig:open \glsopen, and \subreffig:netopen \glstextnetworkopen. The set contains all camera models, whereas only contains known models. Green regions represent data for training the \glscnn employed for feature extraction. Striped regions represent known-unknown camera models employed for classifiers’ parameter search and also for training final model in case of \subreffig:open. Orange regions represent unknown-unknown camera models that appear only for testing. Blank regions are never employed for testing for fair strategies comparison. Although not represented, test set also contains instances from known camera models, for proper evaluation of the classifiers.

Feature extraction consists in computing a discriminative feature vector from an image . The feature extractor algorithm is tuned to obtain characteristic camera model information while compacting data dimensionality. Feature vectors extracted from pictures sharing the same camera model should be similar. Conversely, feature vectors extracted from images shot with different models should be, ideally, strongly dissimilar.

Open-set classifiers, as we shall see, tend to associate a bounded region of the feature space to the known classes. A recent work [MendesJunior2017] has shown that the split of training data for parameter search can have an influence on the final model obtained by an open-set classifier. The training protocol splits the training data into fitting data and validation data for parameters search, as depicted in Figure 2. This is a delicate step, as a good open-set classifier must “learn” its parameters taking into account the risk of the unknown, not just the empirical risk measured on known data [Scheirer2013]. In essence, prominent alternatives at this stage aim at employing part of the known training data as known-unknown data, as a form of simulation of the unknown.

The role of an open-set classifier is to learn a mapping between feature vectors and camera labels . This mapping is learned at training time by observing several different pairs for many different images and values, . The open-set classifier partitions the space spanned by all possible vectors , associating different regions of the feature space to different labels .

Once the system has been fully trained, it can be deployed. Whenever a new image under investigation is considered, a feature vector is extracted. The open-set classifier model is employed to predict the vector with one class label .

Iii-B Feature Extractors

Different feature extractors for camera-related features have been proposed in the literature. We decided to focus on recently proposed ones that have shown good performance in closed-set camera model attribution setups.

Iii-B1 Rich features

Fridrich2012 have proposed the use of statistical descriptors known as rich features for steganalysis. Rich features are obtained by pre-processing an image through high-pass filtering, quantization and truncation. The rich feature vector is then computed by counting the occurrences of different pixel group combinations. The use of rich features has subsequently proved successful for other forensic applications, from tampering detection [Cozzolino2014] to camera model attribution [Marra2015]. We denote as the rich feature vector referred to as SPAM by Marra2015 for camera model identification. It has already proved to be more discriminative than those proposed by Gloe2012a, Xu2012, and Celiktutan2008 as shown by Marra2015.

Iii-B2 CFA features

As shown by Chen2015a, the concept of rich features can be extended to work across different image color planes. Chen2015a have shown that it is possible to capture characteristics related to color filter arrays (CFA) for camera model identification. For this reason, we denote as the CFA-based feature vector proposed by Chen2015a. As shown by Bondi2017, this can be considered a baseline solution especially when large images are concerned.

Iii-B3 \glscnn-derived features

We adopt as a data-driven method the \glscnn proposed by Bondi2017 with an architecture comprising four convolutional layers followed by two inner product layers. It has been successfully applied to attribute images to different camera models using patches as input. In principle, the output of each \glscnn layer can be employed as a feature vector . We employ three layers in this work: {enumerate*}[label=()]

, obtained after the last convolutional layer;

, obtained after the first inner product layer; and

, obtained after the second inner product layer, where is the cardinality of the set of known cameras ( in our experiments, as in the work of Bondi2017).

Iii-C Training Protocols

To train open-set classifiers, a set of hyper-parameters must be tuned through some method of parameter search to maximize classification accuracy and generalization/specialization capabilities of the employed method. A typical way to do this consists in splitting training data into fitting and validation data. The selected classifier is then trained on fitting data using different sets of hyper-parameters. Finally, the parameters which model provides the highest accuracy on the set of validation data are selected. The final model is generated on the entire training set with those parameters and results are reported on images belonging to a completely separate (independent) test dataset. In this work, we explore three different training strategies for open-set classifiers. The introduction of this stage in the pipeline was inspired by the work of MendesJunior2017, which pointed out their parameter optimization as a general form of grid search for future investigation. In Figure 3, we depict those alternatives as described below.

Iii-C1 \glsclosed strategy

Depicted in Figure 3(a), this is the simplest training strategy, in which no knowledge on the unknown classes is simulated. Indeed, both fitting and validation datasets contain samples from all known classes (i.e., camera models), and no instance from known-unknown data is used in validation. In other words, parameter search is performed simulating a closed-set setup. This means that the classifier will set the boundaries for each class in the feature space taking into account only the empirical risk aiming at optimizing the separability of the known classes.

Iii-C2 \glsopen strategy

Depicted in Figure 3(b), in order to let the classifier better tune against unknown samples, an straightforward strategy consists in training the classifier on known data, and tuning it considering both the presence of known and known-unknown samples. When the open strategy is selected, of the classes are employed as known and the other are employed as known-unknown in validation. The classifier fitting procedure is carried out on the known classes, however, validation during parameter search is carried out on all classes, i.e., known and known-unknown camera models. In doing that, parameter search is performed simulating an open-set setup. After the best parameters are obtained, the final model is trained with all known training classes to provide a fair comparison with the \glsclosed strategy, i.e., the same number of classes to correctly detect is employed.

Iii-C3 \glstextnetworkopen strategy

Depicted in Figure 3(c), the \glsnetworkopen strategy employs unknown data—from the point of view of the network used for feature extraction—as known-unknown data for validation. Dealing with data-driven features (i.e., those extracted using a \glscnn), special attention must be payed to the fact that the \glscnn, as a feature extractor, must also be trained and validated on the known classes in order to enable discrimination within the set of known camera models.

This strategy considers that the \glscnn has been separately trained using all available known classes. The validation set employed during the \glscnn training process comes from the set of known classes—as it also happens with \glsopen and \glsclosed strategies. For \glsnetworkopen, to better guide the choice of classifiers’ parameters, additionally to the known classes, the validation set also includes samples from extra known-unknown classes, i.e., classes never employed for \glscnn training or validation. Parameter search of the classifiers is carried out using all known data along with those extra known-unknown data. Finally, when hyper-parameters have been selected, the final model training is performed using just the known classes, for a paired experiment with the other strategies. In doing that, parameter search is performed simulating an open-set setup also in the point of view of the network.

This approach is appropriate for use with \glscnn-derived features, however, for the sake of fairness, those extra classes, that are known-unknown from the point of view of the network, are also employed in experiments with and features when \glsnetworkopen strategy is applied.

Iii-D Open-set Classifiers

In the open-set scenario, a classifier should be able to assign one or more bounded regions in the feature space for each known class. In contrast, closed-set classifiers simply splits unbounded portions of the feature space to each of the known classes. This concept is illustrated in Figure 4.

(a) Closed-set classifier
(b) Open-set classifier
Fig. 4: Illustration of a three-class classifier in \subreffig:closed_classifier closed- and \subreffig:open_classifier open-set configurations, considering a two-dimensional feature space . The open-set classifier partitions the feature space with bounded regions referring to the known classes in order to enable rejection of unknown classes. In contrast, a closed-set classifier tends to classify any point in the feature space as belonging to one of the known classes.

In this work, we employ for evaluation multiple open-set classifiers available in the literature. \Glssvm have been applied in the literature to solve various classification problems, including open-set ones in recent works. Traditional \glssvm can be straightforwardly employed for open-set problems by means the one-vs-all [Rocha2014] multiclass-from-binary approach [MendesJunior2017]: when a feature vector is classified as negative by all binary \glsplsvm that compose the multi-class classifier, then is rejected as unknown. Alternatively, \glsocsvm can also be easily used in open-set setups, as it focuses on carving a decision boundary around known classes, thus points related to unknown classes can be rejected. The same all-negative criterion can be employed for any one-class classifier [Heflin2012, Pritsos2013]. Additionally, other methods derived from \glssvm have been proposed in the literature specifically for open-set problems. In this work, we considered the \glswsvm [Scheirer2014], \glsdbc [Costa2012, Costa2014], \glsssvm [MendesJunior2018b], and \glspisvm [Jain2014].

In addition to these \glssvm-based approaches, we also consider the \Glsosnn classifier proposed by MendesJunior2017. This is a recently proposed technique that extends upon the classic nearest neighbors approach. The main rationale behind this method is to avoid relying on raw similarity scores for thresholding. Rejection of unknown instances is accomplished through the of ratio of similarity scores instead. Furthermore, we also consider the classifiers employed by Bayar2018, i.e., \glset [Geurts2006], \glspsvm, \glssoftmax, and \glsncm. Also, by suggestions of previous work [Bayar2018], we employ a \gls2psvm which consists on having a \glsocsvm for solving the known vs unknown problem, then, if the test instance is classified as known, a \glspsvm is employed for choosing the class, otherwise the image is classified to an unknown model.

Iv Experimental Setup

In this section, we provide details regarding the employed datasets and evaluation metrics.

Iv-a Datasets

To evaluate all tested methodologies thoroughly, it is important to consider a large enough image database. In this work, we merged three different datasets freely available from previous work.

Iv-A1 \glsdresden [gloe2010dresden]

This dataset contains almost images from different camera models. Exactly as in the work of Bondi2017, we selected images from models111We considered Nikon D70 and Nikon D70s on the same single class due to the negligible differences between them, as reported by gloe2010dresden. as the set of images from known camera models. This set was split in training, validation, and test sets [Bondi2017]. The training set was used to train the \glscnn-based feature extractor and all classifiers. All images from remaining models—not considered in the subset of models previously selected—have been considered as known-unknown along with the \glsnetworkopen strategy and ignored for both \glsclosed and \glsopen strategies.

Iv-A2 \glsisa22footnotemark: 2

This dataset contains around images from camera models. All images from models not overlapping with Dresden Image Database have been selected as unknown-unknown models for the test set in the open-set experiments. 33footnotetext: Available at: http://www.recod.ic.unicamp.br/~filipe/dataset [Costa2014].

Iv-A3 \glsflickr\getrefnumberfn:filipe_dataset\getrefnumberfn:filipe_datasetfootnotemark: \getrefnumberfn:filipe_dataset

This dataset comprises around images from more than camera models. Differently from previously mentioned datasets, these images have been downloaded from Flickr444Available at: https://www.flickr.com. image hosting service. To avoid dealing with images from the same camera taken at different resolutions, only images at maximum resolution for each model have been selected. All images have been considered as belonging to unknown-unknown camera models in test set for the open-set experiments.

As performed by Bondi2017, we obtain, in a content-aware way, non-overlapping -pixel patches from each image. Provided results are based on majority voting after classification per patch. All patches coming from the same image have been carefully placed only into one of training, validation, and test sets in order to avoid overfitting problems and training/testing contamination.

Iv-B Metrics

As evaluation metrics, we employ a set of commonly used ones, as well as others recently proposed for open-set scenario [MendesJunior2017, Bayar2018]. In particular, we consider different definitions of accuracy and f-measure. Concerning accuracy, we employ the following definitions:

Iv-B1 \glsaks [MendesJunior2017]

This is the accuracy in correctly attributing images from known models to the actual models. This metric encompasses two kinds of misclassification errors: known-model images attributed to unknown class (false unknown) and known-model images attributed to wrong known classes (misclassification).

Iv-B2 \glsaus [MendesJunior2017]

This is the accuracy in correctly classifying as unknown the images from unknown camera models.

Iv-B3 \glsna [MendesJunior2017]

This is the average between \glsaks and \glsaus and provides an overall view of a classifier performance in terms of both open- and closed-set scenarios.

Iv-B4 \glsda [Bayar2018]

This averages the percentage of images from known cameras detected as coming from known models, and the percentage of images from unknown cameras detected as coming from unknown models. This metric does not take into account whether images from known cameras are misclassified to the wrong camera model.

Concerning f-measure, an additional comment is in order. Traditionally, f-measure is defined in terms of precision and recall as

Depending on the definitions of precision and recall employed, we obtain different f-measure definitions. MendesJunior2017 has pointed out that it might be inappropriate to consider the unknown classes as any other known class in terms of \glstp, \glsfp, and \glsfn calculations. Therefore, considering the number of known camera models, and the -th class concerning the unknown classes , we resort to the following f-measure definitions:

Iv-B5 \GlsosfmM [MendesJunior2017]

F-measure using precision and recall defined as

(1)

Iv-B6 \Glsosfmm [MendesJunior2017]

F-measure using precision and recall defined as

(2)

Iv-B7 \GlsfmM [Sokolova2009]

F-measure using precision and recall defined as

Iv-B8 \Glsfmm [Sokolova2009]

F-measure using precision and recall defined as

The main difference between traditional and open-set versions of f-measure is that the latter does not consider the effect of the unknown class in terms of \glstp as the unknown cannot represent a single positive class. Indeed, the sum index spans the range rather than , thus excluding the label representing the unknown classes. However, both \glsosfmM and \glsosfmm account for false known and false unknown through \glsfp and \glsfn, respectively, in Equations (1) and (2).

V Results

We have evaluated all combinations of extracted features (i.e., ), training protocols (i.e., ), and classifiers (i.e., ) for a total amount of cases of study. Results for each metric are reported in a complete and detailed table of all our experiments, as a supplementary material.555See supplementary material available at https://tinyurl.com/ya85fr5h (webpage will be transferred to Github upon acceptance).

Results show that, overall, better performance are obtained for \glspisvm, \glset, and \glsssvm classifiers. Regarding the training protocols, interestingly, \glsopen has presented slightly superior results compared to \glsnetworkopen, despite using less known-unknown data. And, finally, \glsip1 presents the better result among the features, although \glsip2, in general, seems to be the most discriminative one.\getrefnumberfn:code\getrefnumberfn:codefootnotemark: \getrefnumberfn:code Hereinafter we report a subset of the obtained results in order to highlight the most interesting findings in terms of best feature set, training protocol, and classifier.

V-a Feature Extractors

To identify the feature vector most suitable for open-set camera model identification problem, we analyzed the behavior of all features (i.e., , , , , and ) paired with different training strategies and classifiers. To summarize the achieved results, we rely on \glsna as preferred analysis metric. As a matter of fact, \glsna clearly takes into account the ability of correctly classifying known samples at camera level as well as rejecting the unknown. Therefore, an algorithm with high \glsna value is a good candidate to work for both known and unknown classes.

Table I reports the best \glsna achieved with each feature extractor. Specifically, it shows which combination of classifier and training strategy enables to obtain the achieved \glsna, as well as all the other metric values for the selected classifier. From this table, it is possible to notice that the best results are obtained by \glscnn-based features. In particular, \glsip1 achieves the best \glsna, which is close to . This confirms the behavior observed by Bondi2017 for the closed-set scenario: hand-crafted features (i.e., and ) performs better on high resolution images, whereas the \glscnn is superior when trained on small pixel patches as the ones considered in this work. Their explanation for the affected accuracy with hand-crafted features when working with those small patches is that hand-crafted features relies on co-occurrence [Fridrich2012, Chen2015a], for which those calculations for small patches might make it less stable and reliable.

Feature Classifier Training Protocol Best \glsna \glsaks \glsaus \glsda \glsosfmM \glsosfmm \glsfmM \glsfmm
\glsip1 \glspisvm \glsopen
\glsip2 \glset \glsopen
\glsconv \glspisvm \glsnetworkopen
\glscfa \glsssvm \glsnetworkopen
\glsrich \glssvm \glsopen
TABLE I: Best results in terms of \glsna achieved with each feature extractor. For each metric, the highest results are reported in bold and the lowest ones are reported in italics.
Training Protocol Feature Classifier Best \glsna \glsaks \glsaus \glsda \glsosfmM \glsosfmm \glsfmM \glsfmm
\glsopen \glsip1 \glspisvm
\glsclosed \glsip1 \glsssvm
\glsnetworkopen \glsip2 \glssoftmax
TABLE II: Best results in terms of \glsna achieved with each training protocol. For each metric, the highest results are reported in bold and the lowest ones are reported in italics.
Classifier Feature Training Protocol Best \glsna \glsaks \glsaus \glsda \glsosfmM \glsosfmm \glsfmM \glsfmm
\glspisvm \glsip1 \glsopen
\glset \glsip2 \glsopen
\glsssvm \glsip1 \glsclosed
\glssoftmax \glsip2 \glsnetworkopen
\glsosnn \glsip2 \glsnetworkopen
\glssvm \glsip1 \glsopen
\glspsvm \glsconv \glsopen
\glsncm \glsip2 \glsopen
\glsocsvm \glsip2 \glsopen
\glsdbc \glsconv \glsopen
\gls2psvm \glscfa \glsclosed
\glswsvm \glscfa \glsopen
TABLE III: Best results in terms of \glsna achieved with each open-set classifier. For each metric, the highest results are reported in bold and the lowest ones are reported in italics.

It is interesting to notice how \glsaks and \glsaus are unbalanced for hand-crafted features. For instance, \glsrich and \glscfa show \glsaus higher than , but \glsaks lower than . This means that the classifier rejects many more images as unknown than it should. This makes these features not appealing for open-set problems, as the presence of unknown devices greatly hinders the closed-set classification capability of these features. The same behavior is also captured by the metrics based on f-measure. Conversely, \glsip1 is able to correctly classify unknown images with almost accuracy (\glsaus), and to correctly attribute known-camera images to their model with accuracy (\glsaks).

V-B Training Protocols

To evaluate the different training protocols, we considered \glsna as reference metric for the same reasons previously mentioned. Table II reports the best \glsna results for each protocol, also showing which feature and classifier is used to obtain the reported result. Also, the other metrics are then reported for each case.

It is possible to notice that \glsopen strategy presents better results, more than % higher than the best result with \glsnetworkopen. In Table II, although \glsclosed strategy presents better results than \glsnetworkopen, in general, we have observed that \glsclosed tends to perform the worse.\getrefnumberfn:code\getrefnumberfn:codefootnotemark: \getrefnumberfn:code Also in a general evaluation, we also observe that, in fact, \glsopen tends to perform slightly better than \glsnetworkopen.

It is worth to highlight one aspect about the \glsclosed strategy. Despite this strategy’s name, all classifiers employed along with it are open-set ones. Therefore, even if trained only considering known camera images, they still have the ability to reject new data as unknown (remember, from Section III-C, the different training protocols refers only to the split of the training data). This explains why using the \glsclosed strategy is still possible to achieve \glsaus higher than . However, even though, \glsopen approaches , almost % of difference from the \glsclosed strategy.

Furthermore, considering all the measured combinations (), classifiers training with \glsopen obtained better results than versions trained with \glsnetworkopen in of the them, while \glsnetworkopen wins in cases. It also indicates an slightly better performance for \glsopen protocol. However, when \glsnetworkopen evinces better results, the classifiers obtain an average of about % better results, while \glsopen improves only % in average.

This is a counter intuitive result, as \glsnetworkopen uses the same known data as \glsopen strategy does, along with extra known-unknown data from the other \glsdresden classes not employed as known. The numbers regarding the difference of those two training protocols indicates some similarity among the representativeness of the two sets of training data. Therefore, those results indicate that by simply having some known-unknown data, although they are not unknown from the point of view of the network (\glsopen strategy), is enough for improving the performance compared to the traditional \glsclosed form. It means those extra data are not necessary, which is a good trace also for making the training process cheaper.

Moreover, those results are good evidences that representation of unknown instances are as distinct as the representations of known-unknown from the point of view of the network. It means those representations are distinct alike from the known instances after a trained network is employed for feature extraction. Those results are also in tune with the ones presented by Bondi2017: they have performed a closed-set experiment with a distinct set of camera models not employed for network training and they have showed that representations for those camera models are distinct enough to allow discrimination among them.

V-C Open-set Classifiers

To analyze the effect of different classifiers, Table III reports the best \glsna result obtained with each classifier, showing also the feature and training strategy used in each case. For each selected methodology, all other metrics are also reported.

From these results—as we saw in other tables as well—it is possible to see that \glspisvm performs better than its counterparts, achieving \glsna close to , however, best \glsaks and \glsaus are obtained with \glsssvm and \glsocsvm, respectively. Results in Table III show many classifiers with reasonable performance: among the cases \glset have obtained the best performance for the macro-averaging versions of the f-measure measures and \glsosnn presents best results for the micro-averaging versions. \glsocsvm also outperforms other methods based on \glsda although its high propensity of rejecting instances as unknown. Additionally, \glsclosed protocol only appears to be the best one for \glsssvm and \gls2psvm classifiers, all other classifiers has the \glsopen or \glsnetworkopen variations as the best training protocol, and \glsopen appears in most of the cases.

It is important to notice that \gls2psvm appears as one of the last methods in the ranking of Table III. This low performance for \gls2psvm can be justified by its implicit assumption that all known classes can be modeled as a single class. It does not take into account the fact that known classes can be sparse in the feature space and some intermediate regions among those classes can refer to the unknown, i.e., it is difficult to specialize on the known classes by means of a single model. Furthermore, the best \glsna result with \gls2psvm is obtained with \glsclosed training protocol, which indicates that even though simulation of the open-set scenario is performed for parameter optimization, a one-class classifier is not able to handle well the feature space.

In general, we verify that by the straightforward employment of an open-set classifier, as is, improves results for the open-set scenario compared to closed-set classifiers adapted for open-set recognition by means rejection through thresholding of similarity scores. Further details regarding comparison with those state-of-the-art solutions are presented in the next section.

V-D Comparison with State-of-the-art

Reference \glsna \glsaks \glsaus \glsda \glsosfmM \glsosfmm \glsfmM \glsfmm
\glssoftmax
\glset
\glsncm
\glspsvm
TABLE IV: Difference achieved by the best solution found through the pipeline considered in this work (\glspisvm) and the baselines. Results obtained for \glsip1 feature for the corresponding methods implemented along with \glsopen training protocol. The consistency of positive values for evinces the improvement of the found solution over the state-of-the-art methods.

To the best of our knowledge, the only work presenting results for the open-set camera model identification problem is the work of Bayar2018. In particular, in this work, the authors propose two different approaches. The first one (V-D1) relies on confidence score thresholding: when the classifier is not “sure” about its classification to a certain known class, test instance is then rejected as unknown. The second approach (V-D2) assumes known-unknown data is available for training a classifier for detecting if a test instance is known or unknown. For this approach, previous work have evaluated only the detection ability although in a real open-set scenario further decision should be required to chose the correct class in case an instance is detected as known.

V-D1 Approach 1

The first approach proposed by Bayar2018 works as it follows. A multi-class classifier is trained with \glsclosed training protocol.666Previous work [Bayar2018] have not evaluated neither \glsopen nor \glsnetworkopen protocols. To the best of our knowledge, our work evaluates them for the first time in this problem. This classifier is chosen in order to also provide a confidence score about detected class. Instances providing a low confidence score are classified as unknown. For this class of methods, we implemented their solutions based on \glsfirstsoftmax, \glsfirstncm, \glsfirstpsvm, and \glset [Geurts2006].

Table IV reports the metric difference achieved by the best solution we have evaluated in previous sections compared to the baselines, by considering, for each method, the setup that maximizes \glsna. From this comparison, in general, it is possible to notice that the best solution we found in our analysis is able to achieve better results than all strategies reported by Bayar2018. For most of the measures, for each of the compared baselines, \glspisvm improves the accuracy.

(a) \glspisvm
(b) \glset
Fig. 5: Decision boundaries of the \glspisvm open-set classifier compared to competing \glset classifier.

We see in the same table that \glset, as employed by Bayar2018, is the most competitive method compared to a classifier specially designed for open-set scenario (\glspisvm). Although its high accuracy, we should analyze some theoretical properties of the classifier. For instance, consider the ability of bounding the region of the feature space in which a possible test instance would be classified as belonging to one of the known classes, i.e., bounding the \glsklos [Scheirer2013, MendesJunior2017]. Figures 5(a) and 5(b) depict the decision boundaries of \glspisvm and \glset classifiers, respectively, in the feature space formed by the two first features of the \glsip2 layer. For those images, only training samples from the 4 first classes, out of the 18, were employed to avoid cluttering the visualization. Small circles represent training samples. Colored regions indicate that a possible test instance in there would be classified to the class of the same color. The white region represents rejection as unknown. In Figure 5, we observe that \glspisvm is able to bound the \glsklos, properly ensuring the rejection of any data point that would appear far from the support of the training samples in the feature space. However, by thresholding the probability score of the \glset classifier, the same property is not ensured. In general, we see that \glspisvm demonstrates a more controlled behavior.

V-D2 Approach 2

The second approach proposed by Bayar2018 works as it follows. A binary classifier is trained to distinguish between images from known and unknown camera models. The objective here is to analyze only the detection ability. All samples from all known classes are considered into a single known class called known. Extra data from other classes not of interest are employed on the unknown class for the binary classification. As in the previous experiments, we consider the 18 classes of \glsdresden as the known classes of interest. For the extra known-unknown data, we employed the remaining classes of the \glsdresden dataset, as those classes were also employed along with \glsnetworkopen training protocol. For this method, we implemented both solutions shown by Bayar2018, i.e., \glspsvm777Notice, however, that Platt’s probability is not required to be employed in this context as only the class decision matters in this case. and \glset.

In Table V, we present the \glsda for the two baselines as well as for \glspisvm solution which have presented the best results throughout our experiments. Furthermore, \glsdks and \glsdus are also presented for a more in-depth evaluation of the performance of the classifiers. \glsnetworkopen were selected for those results because baselines require extra known-unknown data for training although \glspisvm have obtained best performance along with \glsopen strategy. \glsip2 is employed in this case because, as previously saw\getrefnumberfn:code\getrefnumberfn:codefootnotemark: \getrefnumberfn:code, it has comparable or better results than \glsip1 in general and, furthermore, baselines has presented slightly better results with this feature, compared to \glsip1.

Our results for this approach, as seen in Table V, are far from the ones reported by Bayar2018 as the baselines have almost no ability to reject instances as unknown. Our conclusions from those results is that by relying solely on known-unknown data for training a classifier to distinguish, in the wild, known versus unknown classes is susceptible to a worst case scenario. We conjecture that the known-unknown data employed for those classifiers makes them create a decision frontier in the feature space in such a way that most of the real unknown data (from \glsisa and \glsflickr datasets) becomes accepted as known. If a different set of known-unknown is employed in place of the unknown part of the \glsdresden dataset, we believe results might drastically differ. Taking an essentially distinct approach, \glspisvm along with \glsnetworkopen training protocol does not relies solely on the known-unknown data for defining its boundary decision: instead, it minimizes the risk of the unknown also taking advantage of the inter-class information gathered from the known data [Jain2014].

Reference \glstextda \glstextdks \glstextdus
\glspisvm
\glspsvm
\glset
TABLE V: Difference achieved by the best solution found through the pipeline considered in this work (\glspisvm) and the baselines considering Approach 2 of Bayar2018. Results obtained for \glsip2 feature for the corresponding methods implemented along with \glsnetworkopen training protocol. For each metric, the highest results are reported in bold.

V-E Post-fusion Analysis

In the machine learning field, it is well known that jointly using a series of different models can help increasing classification performance. This is known as ensemble learning [Sagi2018]. In the light of this, here we present results achieved with a very simple yet effective ensemble fusion technique. We perform majority voting among different models. Given a set of trained models, we test the image under analysis with all of them, and perform majority voting on their output. If the majority of the votes is for rejecting as unknown, the image is then classified as unknown.

By considering all combinations obtained by fusing up to single models achieving \glsna greater than . Top-three results are reported in Table VI. Notice that, the features that are selected are always and . Moreover, top results includes all three training protocols. The classifiers that appear among those selected solutions are \glspisvm, \glsssvm, \glsosnn, and \glset. These results confirm that by using post-fusion it is actually possible to increase \glsna of approximately %, and no more than models are needed. This paves the way to the development of more complex ensemble methods for camera model identification.

Combination N. models \glsna
(\glsopen, \glsosnn, \glsip2), (\glsnetworkopen, \glspisvm, \glsip2), (\glsopen, \glsssvm, \glsip2), (\glsclosed, \glsssvm, \glsip1), (\glsopen, \glset, \glsip2), (\glsopen, \glspisvm, \glsip1)
(\glsnetworkopen, \glsssvm, \glsip1), (\glsopen, \glsosnn, \glsip2), (\glsopen, \glsssvm, \glsip2), (\glsclosed, \glsssvm, \glsip1), (\glsopen, \glset, \glsip2), (\glsopen, \glspisvm, \glsip1)
(\glsopen, \glsosnn, \glsip2), (\glsopen, \glsssvm, \glsip2), (\glsclosed, \glsssvm, \glsip1), (\glsopen, \glset, \glsip2), (\glsopen, \glspisvm, \glsip1)
TABLE VI: Top three post-fusion results in terms of \glsna by considering the alternatives with accuracy greater than 0.7 and the fusion of at most 8 models.

V-F Impact of an open-set solution

(a) Open-set solution with \glspisvm on \glsip1 features along with \glsopen training protocol.
(b) Close-set solution as the output of the network, i.e., without reject option.
Fig. 6: Confusion matrix comparing the best open-set results against the baseline close-set \glscnn solution [Bondi2017], that is equivalent to the \glssoftmax method [Bayar2018] without the rejection threshold.

In Figure 6, we present two confusion matrices. One of them, in Figure 6(a), obtained by an open-set solution and the other, in Figure 6(b), by the closed-set output of the neural network employed along this work. By comparing Figures 6(a) and 6(b), we observe that the ability on recognizing instances of each individual model is affected on the open-set solution. That is expect as long as an open-set solution can also perform the fault of rejecting instances as unknown, i.e., false unknown, while a closed-set solution can only have misclassifications. On the other hand, we clearly see the undesirable behavior of the closed-set solution of assigning every unknown instance to one of the known models, i.e., % on \glsaus or, in another perspective, % of false known. The false known rate obtained by the open-set solution, in this example, is %. Anyhow, it is worth noticing that most of the open-set classifiers can be tuned to decrease its false known accuracy although with the expense of increasing its false unknown accuracy.

Vi Conclusions

In this paper, we studied the use of a supervised-learning strategy for image camera model identification in an open-set scenario. In doing so, we explored the possibility of using multiple camera-related features proposed in the literature for closed-set camera model identification, however, under the more challenging open-set regime. We considered pairing feature vectors with different open-set classifiers exploring also the use of three alternatives of training protocols. All tests have been performed considering a selection of three independent image datasets freely available online comprising a large number of images from more than 300 camera models.

In terms of training protocols, we found out that employing extra known-unknown classes, as for \glsnetworkopen approach, in general does not help on improving the performance of the classifiers compared to the simpler and cheaper employment of the \glsopen strategy. This result is interesting as it evinces that extra known-unknown classes, from the point of view of the network, are not required to be employed as its impact is limited. It means one can successfully train any open-set classifier, along with an \glsopen training protocol, with only the data available for the known classes. A better intuition on this behavior requires a deeper study on the network’s representation for unknown classes not employed on network training and those should be compared among the representation of each of the known classes employed for training the network, therefore, it remains as a future work.

Another evidence on the limited use of the known-unknown data from the point of view of the network were presented by employing a binary classifier for recognizing known versus unknown camera models: when a known-unknown set of data (from the unknown part of \glsdresden) is employed to train this classifier, its performance on detecting unknown camera models from \glsisa and \glsflickr datasets is highly effected (Section V-D2). It also reinforces previous arguments on the open-set area that more theoretically-sounded and less data-relied solutions should be developed for general open-set problems [Scheirer2013].

Our results have shown that appropriate means of dealing with the open-set camera model attribution problem should be sought in order to properly handling the problem, considering that a recently proposed open-set method [Jain2014], as is, obtains considerable improved results compared to the straightforward idea of thresholding the softmax probability of neural networks for rejection as unknown (Section V-D1). This problem on thresholding the softmax probability for open-set recognition have been evinced in one of our previous work, hence the current work also confirms the previously more theoretical perspective [MendesJunior2018b, Chapter 7].

For the open-set camera model identification problem, a promising future research can be performed on investigating recently proposed alternatives to the softmax loss, e.g., the center loss [Wen2019], the angular softmax loss [Liu2017], etc., as the authors of those works have claimed improvement on the open-set face recognition problem.

References

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
354768
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description