Bilateral Asymmetry Guided Counterfactual Generating Network for Mammogram Classification
Mammogram benign or malignant classification with only image-level labels is challenging due to the absence of lesion annotations. Motivated by the symmetric prior that the lesions on one side of breasts rarely appear in the corresponding areas on the other side, given a diseased image, we can explore a counterfactual problem that how would the features have behaved if there were no lesions in the image, so as to identify the lesion areas. We derive a new theoretical result for counterfactual generation based on the symmetric prior. By building a causal model that entails such a prior for bilateral images, we obtain two optimization goals for counterfactual generation, which can be accomplished via our newly proposed counterfactual generative network. Our proposed model is mainly composed of Generator Adversarial Network and a prediction feedback mechanism, they are optimized jointly and prompt each other. Specifically, the former can further improve the classiï¬cation performance by generating counterfactual features to calculate lesion areas. On the other hand, the latter helps counterfactual generation by the supervision of classification loss. The utility of our method and the effectiveness of each module in our model can be verified by state-of-the-art performance on INBreast and an in-house dataset and ablation studies.
Breast cancer is the leading cause of cancer death among women . The mammography-based Benign/Malignant Classification (BMC) is considered to be an effective way for early breast cancer diagnosis. Note that only the images with lesions need benign/malignant classiï¬cation. It is meaningless to tell the malignancy of healthy images since there are no lesions in them. Whether there are lesions in an image can be parsed from clinical reports. Since the existence of lesions is a necessary condition to be diagnosed as malignant, we are interested in benign/malignant classification for samples with lesions. The annotations of lesion areas require extra efforts such as bounding boxes of lesion areas [6, 18, 32, 24, 30] and binary mask for segmentation , which require expert domain knowledge and are costly and difficult to obtain. Therefore, addressing BMC with only image-level labels is valuable to clinical application. The key for BMC with the only image-level labels as supervision is to explore abnormal features for classification from a full mammogram image. This kind of abnormality can be expressed as masses, calcification clusters, structure distortions and their associated signs like skin retraction, skin thickening and so on. However, the high-intensity breast tissues in 2D image (as projection of the 3D organ) may partially obscure the lesions, making the problem more challenging.
To solve this problem, existing works mainly utilize specific rules or attention modules for feature selection, such as the selected local features with the maximum response or largest prediction score , and select the most discriminative region via the proposed attention branch supervised by a classification signal [7, 34]. The common problem for these methods lie in failing to take advantage of mammogram domain knowledge, which can be very valuable for lesion localization.
One important mammogram domain knowledge is “Anotomical Symmetry”, which has been authenticated by BI-RADS standard of American College of Radiology . It refers to that the lesion area in the target image (denoting the image from target side to be classified) of breast rarely appears in the corresponding area in the reference image (denoting the image of the opposite side). There is no lesion in the corresponding area on the other side, as shown in Fig. 1. Due to such a prior, the radiologists commonly compare bilateral breasts to find the asymmetric regions for further diagnosis.
Such a prior naturally motivates the counterfactual generation question: what would the features of the target image have been looked like had lesions removed, given observed target image with lesions and the reference image that is lesion-free in the corresponding area? After such counterfactual features being generated, the residue between the original target features and the counterfactual one incorporates the information of lesion hence can provide an informative and interpretable guidance for BMC. The answer to the above question is via constructing a structural causal model  in which the counterfactual learning is well defined. Specifically, a structural causal model (SCM) is proposed that introduces latent bilateral variables for generating bilateral images. To depict the bilateral symmetry, we further introduce a hidden confounder (including DNA, environment, etc.) that generates such bilateral features via the same causal mechanism, naturally leading to an inspiring conclusion: the target features of counterfactual generation share the same distribution (i) with the reference features in lesion areas and (ii) with the target features in lesion-free areas, namely counterfactual constraints. Based on such a theoretical finding, we propose a novel Counterfactual Generation Network (CGN). Note that pixel-to-pixel registration between bilateral images is challenging due to unpleasant spatial distortion during image capturing and imperfect anatomical symmetry, we apply counterfactual generation in feature level motivated by . Moreover, it achieves faster training speed without losing prediction power. This is also the reason why many domain adaptation methods work on feature space. Our CGN iteratively optimizes counterfactual generation under counterfactual constraints and lesion-area estimation via an attention-based prediction feedback mechanism. Both the lesion-area estimation and counterfactual generation are optimized jointly and prompt each other, supervised by classification loss. Finally, the residual features that incorporate the accurate lesion information, and the original target features which encodes the contextual information, are concatenated for the final classification.
In contrast to existing GAN-based works [35, 28, 25] for counterfactual generation, our method is endowed with a theoretical guarantee regarding the counterfactual distribution  by exploiting the symmetric prior. Specifically, AnoGAN  learns the latent space of healthy data and assumes that the lesions can not be reconstructed within such latent space. Therefore the areas with large reconstruction errors are more likely to be lesions. Its performance highly relies on how well the healthy data modeled. However, in our mammogram application, the glandular structure and characterization of healthy images can be very diverse. Sometimes the healthy pattern can even be similar to lesions, as shown in Fig 1. Thus it is challenging to model healthy patterns well and distinguish the lesions at the same time using only healthy data. While another cycle consistency loss based method targets on lesion removal [35, 28]. Although these methods can utilize the lesion information by learning a back translation (i.e., from the counterfactual to the original), they also suffer from the healthy modeling problem in the forward translation (i.e., from the original to the counterfactual). What is more, these methods all assume that the translated data can be translated back to the original data [13, 21]. In our application, it means the back translation network should be able to model the location and appearance of the removed lesion. However, mammogram lesions can appear anywhere, i.e., the location of the lesions is unpredictable. Therefore, it is an ill-posed problem to translate the counterfactual data back to the corresponding original data perfectly.
In this paper, we introduce symmetry prior to counterfactual learning to propose a bilateral asymmetry guided counterfactual generating network (CGN), improving the performance of mammogram classification. Instead of learning from healthy images, our CGN applies counterfactual generation conditioning on the bilateral information. Based on the symmetry prior, we formulate the generated counterfactual features and estimated lesion areas together by counterfactual constraints: being similar distribution with the reference features in lesion areas and maintaining most of the information of target features in lesion-free areas. Therefore, we first apply a deep generator with AdaIN  mechanism to provide the feature generation ability. Then we design a prediction feedback mechanism to help estimate the lesion areas. Meanwhile, an adversarial reference loss, a feedback triplet loss, and an auxiliary negative embedding loss are proposed to encourage the generated features to satisfy the above counterfactual constraints. Both the lesion-area estimation and counterfactual generation are optimized jointly and prompt each other. Further, we get the residual features by computing the difference between the generated counterfactual features and target features. Finally, we aggregate the residual features together with the target features for the final classification.
We evaluate the proposed method on a public dataset INBreast  and an in-house dataset. Our CGN achieves an area under the curve (AUC) of 91.1% on INBreast and 78.1% on the in-house dataset, which largely outperforms the representative methods. To summarize, our contributions are mainly three-fold:
First, for benign or malignant classification with only image-level labels, we propose a novel counterfactual-based method to learn the healthy features of the target image, which can help localize the lesions to prompt further classification;
Second, we draw the bilateral symmetry prior to the molybdenum target images into the counterfactual generation for learning counterfactual features reasonably and effectively;
Third, we achieve state-of-the-art performance for mammogram classification on both the public and in-house datasets.
Ii Related Work
Ii-a BMC with only image-level labels
Previous approaches that can be used to address BMC with only image-level labels without any extra annotations are roughly categorized into two classes: (i) the attention-based methods, e.g., Zhu et al. , Zhou et al.  and Fukui et al. ; (ii) the simple multi-view fusion methods, e.g., Wu et al. . For the class (i), they extend a response-based visual explanation model with an attention module or specific rules. However, they all ignore medical domain knowledge which is valuable for BMC and are fragile when facing dense breasts without learning from bilateral information. For the class (ii), since the bilateral breasts are not pixel-to-pixel symmetry, simple multi-view fusions can be very sensitive to bilateral misalignment. Motivated by above, we take advantage of domain knowledge and design CGN to improve BMC.
Ii-B Counterfactual Generation
Existing GAN-based models for counterfactual generation can be roughly categorized into two classes: (i) healthy modeling methods, e.g.,AnoGAN  and (ii) cycle consistency based methods, e.g., CycleGAN , Fixed-point GAN . For class (i) that learns to model the pattern of healthy data, they suffer from unstable result due to large diversity of glandular structure and characterization of healthy images which are hence difficult to model. Another line of work, i.e., class (ii), uses cycle consistency loss to incorporate bi-directed translation: forward translation (from the original to the counterfactual) and back translation (from the counterfactual to the original). These methods suffer from two problems: a) the healthy modeling problem for forward translation, similar to class (i); b) the ill-posed problem for back translation since the location and appearance of the removed lesion is diverse and unpredictable. In contrast to existing works, our method learns healthy pattern by exploiting symmetric prior, so as to avoid the problems mentioned above and hence be able to achieve more robust counterfactual generation result.
Problem Setup and Notations The goal of mammogram benign or malignant classification is to learn classifier that predicts the disease label of target side , where () denotes the input space of bilateral breast images with denoting the target side of bilateral breast image and correspondingly denoting the other side, a.k.a, reference side, and denotes the disease label of the target side (1 denotes malignant and 0 denotes benign). To achieve this goal, we are given training data ( for any integer ). During test stage, our goal is to predict for a new instance .
Iii-a Counterfactual Learning
Symmetric Prior  For a paired image data, if the target image contains lesions, the corresponding symmetrical area in the reference image has almost certainly no lesions.
This symmetric prior provides a guidance for localizing lesion areas, as a residue of the feature of target image subtracting the one with the removal of corresponding lesions. The generation of the latter image, which can leverage the information of the reference features due to symmetric prior, is a counterfactual problem, i.e., what would the features of target image have been looked like had lesions removed, given observed target image with lesions and the reference image that is lesion-free in the corresponding area? Such a counterfactual problem has been well-defined and explored in the framework of (Structural) Causal Model (SCM)  that describes the generating process of observational variables, with assumptions entailed in the corresponding causal graph.
To describe bilateral images, we propose a SCM that introduces a hidden common factor (denoted as which can refer to DNA, growth environment, etc.) that generates bilateral variables, which depicts our symmetric prior, as shown in Fig. 2 (a). Besides, our SCM incorporates bilateral latent features, denoted as ( denotes target side and denotes reference side), as abstraction/concepts of bilateral images. Such bilateral features, which are affected by and disease status () that is determined by lesion status . The distribution of these variables are assigned by the following structural equations:
Equipped with such a SCM, we can mathematically formulate the symmetric prior as , with denoting the lesion areas of the target image ; and counterfactual generation problem as that can be read as the value of on in situation had . Since the situation is induced by the factual event , our counterfactual distribution can be denoted as . Under our SCM and the symmetric prior, we have following results for counterfactual generation:
The proof of Theorem III.1 is shown in our appendix. This theorem implies that the generated counterfactual features should be equal (i) to reference features in lesion areas, (ii) to target features in lesion-free areas, which leads the following two goals for the counterfactual generation:
where denotes generalized distance measure, e.g., KL divergence. With such counterfactual learning, it is expected that the lesion areas, as the subtraction of counterfactual generation of (with lesions removed) from original , can be detected precisely and hence can lead to accurate classification performance. To achieve the above two goals, we propose a counterfactual generating network (CGN), which cooperatively localizes the lesion areas and achieve counterfactual generation simultaneously. We explain the CGN in details in the subsequent section.
Iii-B Counterfactual Generating Network (CGN)
As illustrated in Fig. 3, our counterfactual generation network for mammogram classification contains the following steps: (i) generation of target and reference features and from images and , via a feature extractor chosen from backbone network, e.g. AlexNet , ResNet , (ii) a counterfactual generation module is designed to generate counterfactual features from both and , (iii) a classification module is designed to predict malignant/benign, with aggregated and as input. To accurately identify for generating in step (ii), a prediction feedback mechanism and a set of counterfactual constrains motivated by Eq. (4) and (5) are designed. In what follows, we will explain the above mechanisms in more details.
Counterfactual Generation Module The Adaptive Instance Normalization (AdaIN) , which has been proved to be effective for style transfer tasks, is adopted as the generator (as shown in Fig. 3) for counterfactual generation, with as content and as style in our case:
with and denoting the mean and standard variance function. As suggested by , an interpolated and AdaIN are fed into a generator network containing nine residual blocks to generate counterfactual features :
where is a hyper-parameter of the interpolation weight.
Classification Module The residual features (entailing lesion information) obtained by and (with additional contextual information which is showed useful for the medical image inference  besides lesion-related information we obtained) are fed into a classifier in a concatenated way. This classifier, which implements a convolutional block as FusionLayer to obtain the fused features, is trained via commonly used cross-entropy loss:
where (with ) is the classification probability.
Prediction Feedback Mechanism This mechanism is to estimate the lesion areas for better counterfactual generation. Specifically, we use the attention map, in which the locations with higher value implies higher lesion probabilities, as final estimation of . Such an attention map is calculated by normalization/softmax following the class activation map (CAM) , i.e., . is the corresponding prediction probabilities of being lesions at each position.
|Method||AUC (a)||AUC (b)||AUC (c)||AUC (d)|
|Pretrained CNN ||0.690|
|Pretrained CNN+Random Forest ||0.760|
|Vanilla AlexNet, Zhu et al. ||0.790|
|Zhu et al. ||0.890|
|Fixed-Point GAN *||0.835||0.837||0.805||0.734|
|Wu et al. ||0.863||0.860||0.810||0.723|
|Zhu et al. *||0.860||0.862||0.830||0.720|
Counterfactual Constraints Since the direct optimization of Eq. (4) and (5) can be intractable/unstable for general distance measure such as KL-divergence, we adopt the adversarial learning strategy . For optimization of Eq. (4), GAN generates similar features from the whole reference image and can constrain our desired features be the same as the references in lesion areas. Specifically, a Discriminator (learns to classify and ) and a Generator (fools the discriminator) are designed and trained in a competing way:
However, the generated features through GAN loss are undesired features in lesion-free areas. For optimization of Eq. (5), we use a prediction feedback mechanism to localize lesion areas. One intuitive way to use feedback mechanism is constraining generated features be the same as the target features in lesion-free areas directly or only constrains the generated features be the same as the reference features in lesion areas in discriminator. However, motivated by  triplet loss can be better than such designs. They will suffer from slow convergence and falling into local minimum easily and we analysis and evaluate such variant methods in Sec IV-G. Thus, we propose a feedback triplet loss to minimize the distance between the target features and counterfactual features in lesion-free areas, which is measured by target-counterfactual distance by weighted mean square error:
, where and denote the height and width of CAM respectively. Motivated by minimization of distance between and enforced by Eq. (III-B), we choose a between and as an adaptive reference to minimize . The is measured by chamfer distance  to endure the misalignment, and is defined by
Therefore, the feedback triplet loss is defined as:
The triplet loss makes be closer to than in terms of the lesion-free areas. Further the GAN loss makes the distance between and be close in the lesion areas. Based on the cooperation of GAN loss and the triplet loss, the generated satisfies Eq. (4) and (5). Besides, as a margin term can avoid learning identity mapping from to during minimizing . Catering misalignment is not needed for since is for the âtarget” and hence perfectly aligned with in pixel-wise.
Besides, since the lesion regions of have been removed in , the must also be non-malignant. Such a knowledge can be reflected via auxiliary negative embedding loss as a constraint:
where denotes the malignant probability of .
, where denotes sample index, that is, we calculate corresponding losses for each sample and derive the final joint loss. By optimizing the loss , these modules can be optimized cooperatively and compatibly: the counterfactual generation helps discover the lesions for classification; on the other hand, the classiï¬cation module helps counterfactual generation in a supervised way. The effect of these modules can be validated by our ablation study, which are explained detailedly in the next section.
Iv-a Implementation Details
Mammogram images are commonly stored using a 14-bit DICOM format. A simple linear mapping is used to convert them into 8-bit gray images. Then, the Otsus method  is used for breast region segmentation and background removal. The segmented images are resized into and fed to networks. We implement all models with PyTorch. The models are initialized by ImageNet pre-trained weights for a fair comparison with the representative method . For training, we use Adam optimization with a learning rate of and train for 50 epochs. For all experiments, we select the best model on the validation set for testing. Both target and reference features are extracted from the last convolution layer.
|Methodology||Top-1 error(b)||Top-1 error(d)|
|Fixed-Point GAN *||0.646||0.737|
|Wu et al. *||0.627||0.650|
|Zhu et al. *||0.627||0.625|
We evaluate our method on the public INBreast dataset  due to its high quality compared to other public datasets  and an in-house dataset. The INBreast dataset contains 115 cases and 410 mammograms. INBreast provides each image a BI-RADS result as image-wise ground truth and we use the same process as Zhu et al. . (malignant if BI-RADS 3; benign otherwise). Our experimental setting in INBreast is all the same as Zhu et al.  who uses 100 mammogram images with masses and reports image-wise malignant classiï¬cation performance. We discard 9 of them for lack of contralateral images in the same task. The remaining 91 images all have opposite sides, i.e. 91 pairs for mass malignancy classification. We consider two settings: the mass-lesion image classification and mixed-lesion classification in which the lesion can be masses, calcification clusters and distortions. First, we follow  and select only the images containing masses for mass malignancy classification. In particular, we discard 9 images for the absence of the reference image. Second, to be generalized, we also evaluate mixed-lesion malignancy classification including masses, calcification clusters, or distortions. We use five-fold cross-validation for evaluation and area under the curve (AUC) for measurement.
The in-house dataset contains 2500 images, where 1303 images contain image-level malignant annotations. The dataset contains 589 only masses, 120 only suspicious calcifications, 34 only architectural distortions, 197 only asymmetries and 363 multiple lesions from 642 patients. All these 1303 images have opposite sides, i.e. 1303 pairs (Note that the target image A with a malignancy annotation is paired with B, counting as one pair. Meanwhile, if B also has a malignancy annotation, conversely B can be the target and A can be the reference, counting as another one pair). We randomly divide the dataset into training, validation and testing sets by the proportion of in patient-wise.
Iv-C Experiment settings
To fairly compare our method with others in a more general way, we implement AlexNet as backbone on both INBreast (for mass malignancy classification) and in-house dataset (for mixed-lesion malignancy classification). And we implement Resnet50 as backbone on INBreast (for both mass malignancy classification and mixed-lesion malignancy classification).
Iv-D Bilateral Distribution Verification
In this section, we verify the correctness of our symmetric prior assumption which is motivation of our proposed framework. Specifically, we choose 1,000 unhealthy couples of the bilateral images, each of which contains at least one lesion from the in-house dataset. Then for comparison, we choose another 1,000 healthy couples. We do not use the public INBreast dataset since there are few healthy couples in it. To measure the image distribution distances, we use FrÃ©chet Inception Distance (FID) , which has been used to evaluate medical images [9, 19]. After calculating FID value of healthy set and the unhealthy set , we conduct Hypothesis Testing with the null hypothesis and althernative hypothesis defined as:
We obtain a p-value of , which provides an evidence for us to reject , i.e., the bilateral distribution distance of unhealthy cases is larger than healthy cases significantly. This result can be regarded as a manifestation of our symmetric prior assumption.
Iv-E Experimental Analysis
Compared Baselines for Malignancy Classification. We conduct our experiments on both Mass malignancy classification(the 2nd and the 3rd columns of Table I) and Mixed-lesion Malignancy classification(the last two columns of Table I).
The first four lines in Table I summarize the official results of the representative methods. To be fair, we compare the results with the backbone of AlexNet  and ResNet50  separately. Due to the slightly difference in the number of images used by reference absence, for a fair comparison, we re-implement some baselines in the list such as vanilla methods which means using AlexNet  / ResNet50 , classification methods [36, 33], natural image classification methods [34, 7] and counterfactual generation methods [35, 25, 28].
Result Analysis. As shown in Table I, we achieve state-of-the art performance. We outperformed attention-based methods (Zhu , ABN  and CAM ) largely by to , multi-view method (Wu ) largely by to and GAN-based methods( AnoGAN , Fixed-Point GAN  and CycleGAN ) largely by to . Specifically, Zhu , ABN  and CAM  take advantage of the attention mechanism. They all outperform the vanilla baseline. However, without exploiting the domain knowledge of mammograms, their performances are limited. Wu  uses multi-view simple fusion. Better results compared with vanilla baseline indicate the bilateral information is useful. However, they are inferior to us since mammograms can not be pixel-to-pixel aligned. As to AnoGAN , compared with the vanilla baseline, AnoGAN performs slightly worse in INBreast dataset than in the in-house dataset. We argue this is because there are relatively more sufficient healthy images in the in-house dataset, leading to better healthy modeling. However, they are still much lower than us due to suffering from various healthy patterns in mammogram. Fixed-Point GAN  and CycleGAN  achieve similar performances due to similar cycle consistency constraints. They outperform AnoGAN since they can make use of the image-level annotations. However, their performances are limited by suffering from the ill-posed translation on lesion removal.
Localization Evaluation To verify whether the proposed model focuses on the lesion areas or not, we evaluate the localization error by CAM . Same as , we first calculate the CAMs based on the predicted category. Then to generate a bounding box from CAM, we segment the regions whose CAM value is larger than 20% of the max CAM value and obtain the bounding box for the largest connected component in the segmentation map. We use the top-1 localization error as ILSVRC except for the intersection over union (IOU) threshold of 0.1, since our main concern is the classification performance, the precise localization is not necessary. As is shown in Table II, the proposed method obtains a localization error of 0.421 for masses and 0.455 for all lesions, outperforming other methods.
Visualization To verify the effectiveness of CGN in terms of learning lesion area, we visualize the class activation maps, as shown in Fig. 4. We can see the asymmetry of lesions on bilateral images validates the bilateral asymmetric prior (the first three columns).The proposed CGN succeeds to focus on all lesions since it incorporates the bilateral symmetry prior. In contrast, the other methods show uneven results. In the first two cases, the other methods also show reasonable attention since the mass areas are highly different from the background. However, for the last two cases, the lesions are relatively indistinct. Thus it is quite challenging to find the lesions without bilateral information.
Iv-F Counterfactual Validation
Since there are no ground truth images under counterfactual conditions, we validate the effectiveness and reasonableness of our generated counterfactual features in two aspects, the FID measurement and the further feature visualization, which are motivated by counterfactual evidence in .
Counterfactual Visualization We visualize the target features, reference features, and generated counterfactual features in Fig. 5 to further verify the effectiveness of our counterfactual generation qualitatively. Since the three kinds of features are all with high dimension, we perform the max-pooling cross the channel dimension to generate the visualization heatmap for each of them. The heatmaps are shown in the last three columns respectively. We can see that the activated lesion features in the target features marked by green rectangles disappear in the counterfactual features. While the counterfactual features in lesion-free areas are similar to the target features. This means that the proposed method can generate a healthy version of the target features, i.e., counterfactual features, effectively.
We also visualize the predicted location of lesions during the iterative training process to further verify the effectiveness of CGN in Fig. 6. With the process of iteration, the predicted location of lesions becomes more and more accurate.
FID measurement To further evaluate the effectiveness of the generated counterfactual features, we calculate the mean FID  to measure the feature distribution distances in the INBreast. The mean FID between the target and reference features is 56.15. The counterfactual-reference mean FID is 27.04. The target-counterfactual mean FID is 25.42 while the one after removing the lesion areas from ground truth is 0.60. By comparing the four distances to each other, we find the learned counterfactual features contain both reference information and target information in healthy areas.
Iv-G Ablation Study
We evaluate some variant models to verify the effectiveness of each component. The ablative results in Table. III show that deleting or changing any of the components would lead to a descent of the classification performance. Specifically, naive bilateral features fusion also leads to a boosting of to over vanilla on performance. It proves the bilateral symmetric prior is quite helpful for malignancy classification. Meanwhile, the proposed prediction feedback mechanism outperforms the non-feedback largely by . We explain that the classification module provides additional useful supervision for lesion localization, making learning more accurate and stable. For additional counterfactual constraint of negative embedding loss, we show that it improves the performance by . Here are some interpretation for the variants:
in the first raw: Vanilla single view netwwork.
SBF: Simple Bilateral features. The bilateral features are directly concatenated and fed into the fusion layer;
TF-GAN: Target-feature GAN. Replace AdaIN input by target features only;
BF-GAN: Bilateral-feature GAN. Replace AdaIN input by simple combination of bilateral features;
Non-feedback: Estimate lesion areas by the areas with the largest target-counterfactual distance.
To further verify the effectiveness of the proposed adversarial loss and feedback triplet loss , we applied two variants respectively:
Variant (1): As to the discriminator loss, we directly minimize the distance between counterfactual features and reference features in lesion areas. We still estimate the lesion areas by the prediction feedback mechanism.
Compared with the competing losses we used for discriminator and generator in our paper:
We denote the modified discriminator loss and generator loss of variant (1) as:
therefore we have the final losses:
which are iteratively trained with .
Variant (2): As to the feedback triplet loss , we design a variant feedback loss instead. We direct constraint the generated features in lesion-free areas to be similar to target features .
The is defined as:
where is defined as Eq. (10);
Therefore we have the final losses:
which are iteratively trained with . The and are the generator loss and the discriminator loss respectively, as we used in the competing loss in .
The experimental results of the two variants against our proposed method are shown in Table. IV. We can see that modifying either the adversarial loss or the feedback triplet loss would lead to a descent performance. We argue that our proposed losses are robust and effective. As we said that due to the pixel-to-pixel registration between bilateral images, we achieve counterfactual generation in feature level instead of image level. In practical experiments, we get of feature level which is higher than of image level, verifying the performance of the feature generation. Moreover, the training speed of the former is more faster than the latter with 6.6 s/epoch v.s. 23.5 s/epoch.
Iv-H Bilateral Analysis
For bilateral analysis, we re-implement some interesting modules used in recent papers.
SBF: As mentioned in ablation study, Simple Bilateral Features. e.g., Kim et al.  applied in ToMO;
GF: Gated fusion in SBF. Learning more weights for asymmetric enhancement based on SBF ;
SFF: Simple Four-view features fusion. Ensembling cross-view and contralateral-view simply ;
SFF and GF can be seen as variants of SBF. As shown in Table. V, SFF and GF slightly outperform SBF for using more information but are inferior to our proposed method for naive use of view-wise information. Both of them share the similar disadvantage with SBF: even for healthy breasts, bilateral mammograms are only roughly symmetric but not pixel-to-pixel, the similarity of bilateral features cannot be guaranteed. While our method uses the symmetric prior by counterfactual generation with an improved GAN. Therefore, our method suffers less from these problems and leads to better results.
In this paper, we propose a novel approach called bilateral asymmetry guided Counterfactual Generating Network (CGN) to improve the mammogram classification performance. The proposed method performs the counterfactual generation by exploiting the symmetric prior effectively. Experimental results indicate that the proposed CGN achieves state-of-the-art results in both public and in-house datasets. Our work can be referred as the showcase of exploiting symmetric prior, which widely holds in many human organs,e.g., brains, eyes, skeletal structures, and kidneys. Therefore, we believe that the generalization ability of our method on corresponding medical imaging problems, the efforts of which will be left in future work.
Appendix A Proof of Theorem 3.1
If the the causal graph satisfies that the common factor influences the bilateral variables simultaneously, then,
Lemma A.1 shows that the causal factor influences the bilateral mammograms in equal function relationship.
Proof of Theorem iii.1.
Proof of Eq. (2):
where the first equation is due to that the is the only parent node of ; the second equation is according to Markov condition that , the third equation is due to the symmetric prior.
Proof of Eq. (3): Since in the lesion-free areas, there are , the probabilities are derived by the actual hidden features directly, i.e.,
- (2017) Learning representations and generative models for 3d point clouds. arXiv preprint arXiv:1707.02392. Cited by: §III-B.
- (2005) Retrieval of ivus images using contextual information and elastic matching. Int. J. Intell. Syst. 20, pp. 541–559. Cited by: §III-B.
- (2020) Counterfactuals uncover the modular structure of deep generative models. In International Conference on Learning Representations, External Links: Cited by: §IV-F.
- (2013) Inference on counterfactual distributions. Econometrica 81 (6), pp. 2205–2268. Cited by: §I.
- (2017) Dual path networks. In Advances in Neural Information Processing Systems, pp. 4467–4475. Cited by: §I.
- (2016) The automated learning of deep features for breast mass classification from mammograms. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 106–114. Cited by: §I, TABLE I.
- (2019) Attention branch network: learning of attention mechanism for visual explanation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10705–10714. Cited by: §I, §II-A, TABLE I, Fig. 4, §IV-E, §IV-E, TABLE II.
- (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §III-B.
- (2019) Multiparametric magnetic resonance image synthesis using generative adversarial networks. The Eurographics Association. Cited by: §IV-D, §IV-F.
- (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: TABLE II.
- (2017) In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737. Cited by: §III-B, §IV-E.
- (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pp. 6626–6637. Cited by: §IV-D.
- (2019) Mask-shadowgan: learning to remove shadows from unpaired data. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2472–2481. Cited by: §I.
- (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1501–1510. Cited by: §I, §III-B.
- (2016) Latent feature representation with 3-d multi-view deep convolutional neural network for bilateral analysis in digital breast tomosynthesis. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 927–931. Cited by: §IV-H.
- (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §III-B, §IV-E.
- (2019) From unilateral to bilateral learning: detecting mammogram masses with contrasted bilateral network. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 477–485. Cited by: §I, §IV-H.
- (2017) A multi-scale cnn and curriculum learning strategy for mammogram classification. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 169–177. Cited by: §I.
- (2019) Conditional wgans with adaptive gradient balancing for sparse mri reconstruction. arXiv preprint arXiv:1905.00985. Cited by: §IV-D.
- (2012) Inbreast: toward a full-field digital mammographic database. Academic radiology 19 (2), pp. 236–248. Cited by: §I, §IV-B.
- (2019) Breaking the cycle–colleagues are all you need. arXiv preprint arXiv:1911.10538. Cited by: §I.
- (1979) A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics 9 (1), pp. 62–66. Cited by: §IV-A.
- (2009) Causality: models, reasoning, and inference. Cambridge University Press. External Links: Cited by: §I, §III-A, §III-A.
- (2018) Detecting and classifying lesions in mammograms with deep learning. Scientific reports 8 (1), pp. 4165. Cited by: §I.
- (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In International conference on information processing in medical imaging, pp. 146–157. Cited by: §I, §II-B, TABLE I, Fig. 4, §IV-E, §IV-E, TABLE II.
- (2015) Facenet: a unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 815–823. Cited by: §III-B.
- ACR bi-rads® mammography. acr bi-rads® atlas, breast imaging reporting and data system. american college of radiology 2013. Cited by: §I, §III-A.
- (2019) Learning fixed points in generative adversarial networks: from image-to-image translation to disease detection and localization. In Proceedings of the IEEE International Conference on Computer Vision, pp. 191–200. Cited by: §I, §II-B, TABLE I, Fig. 4, §IV-E, §IV-E, TABLE II.
- (2019) Cancer statistics, 2019.. CA: A Cancer Journal for Clinicians 69 (1), pp. 7–34. Cited by: §I.
- (2013) An automatic mass detection system in mammograms based on complex texture features. IEEE journal of biomedical and health informatics 18 (2), pp. 618–627. Cited by: §I.
- (2011) Computer-aided detection of breast masses: four-view strategy for screening mammography. Medical physics 38 (4), pp. 1867–1876. Cited by: §IV-H.
- (2018) Conditional infilling gans for data augmentation in mammogram classification. In Image Analysis for Moving Organ, Breast, and Thoracic Images, pp. 98–106. Cited by: §I.
- (2019) Deep neural networks improve radiologistsâ performance in breast cancer screening. IEEE transactions on medical imaging. Cited by: §II-A, TABLE I, Fig. 4, §IV-E, §IV-E, TABLE II.
- (2016) Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921–2929. Cited by: §I, §II-A, §III-B, TABLE I, §IV-E, §IV-E, §IV-E.
- (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223–2232. Cited by: §I, §II-B, TABLE I, Fig. 4, §IV-E, §IV-E, TABLE II.
- (2017) Deep multi-instance networks with sparse label assignment for whole mammogram classification. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 603–611. Cited by: §I, §II-A, TABLE I, Fig. 4, §IV-A, §IV-B, §IV-E, §IV-E, TABLE II.