Wasserstein Style Transfer

Wasserstein Style Transfer

Youssef Mroueh
IBM Research
IBM T.J Watson Research Center
mroueh@us.ibm.com
Abstract

We propose Gaussian optimal transport for Image style transfer in an Encoder/Decoder framework . Optimal transport for Gaussian measures has closed forms Monge mappings from source to target distributions. Moreover interpolates between a content and a style image can be seen as geodesics in the Wasserstein Geometry. Using this insight, we show how to mix different target styles , using Wasserstein barycenter of Gaussian measures. Since Gaussians are closed under Wasserstein barycenter, this allows us a simple style transfer and style mixing and interpolation. Moreover we show how mixing different styles can be achieved using other geodesic metrics between gaussians such as the Fisher Rao metric, while the transport of the content to the new interpolate style is still performed with Gaussian OT maps. Our simple methodology allows to generate new stylized content interpolating between many artistic styles. The metric used in the interpolation results in different stylizations.

 

Wasserstein Style Transfer


  Youssef Mroueh IBM Research IBM T.J Watson Research Center mroueh@us.ibm.com

\@float

noticebox[b]Preprint. Under review.\end@float

1 Introduction

Image style transfer consists in the task of modifying an image in a way that preserves its content and matches the artistic style of a target image or a collection of images. Defining a loss function that captures this content/style constraint is challenging. A big progress in this field was made since the introduction of the neural style transfer in the seminal work of Gatys et al [1, 2]. Gatys et al showed that by matching statistics of the spatial distribution of images in the feature space of deep convolutional neural networks (spatial Grammian), one could define a style loss function. In Gatys et al method, the image is updated via an optimization process to minimize this “network loss". One shortcoming of this approach is that is slow and that it requires an optimization per content and per style images. Many workarounds have been introduced to speedup this process via feedforward networks optimization that produce stylizations in a single forward pass [3, 4, 5, 6]. Nevertheless this approach was still limited to a single style image. [7] introduced Instance Normalization (IN) to improve quality and diversity of stylization. Multiple styles neural transfer was then introduced in [8] thanks to Conditional Instance Normalization (CIN). CIN adapts the normalized statistics of the transposed convolutional layers in the feedforward network with learned scaling and biases for each style image for a fixed number of style images. The concept of layer swap in [9] resulted in one of the first arbitrary style transfer. Adaptive instance Normalization was introduced in [10] by making CIN scaling and biases learned functions from the style image, which enabled also arbitrary style transfer . The Whitening Coloring Transform (WCT) [11] which we discuss in details in Section 2 developed a simple framework for arbitrary style transfer using an Encoder/Decoder framework and operate a simple normalization transform (WCT) in the encoder feature space to perform the style transfer.

Our work is the closest to the WCT transform, where we start by noticing that instance normalization layers (IN,CIN, adaIN and WCT) are performing a transport map from the spatial distribution of a content image to the one of a style image, and the implicit assumption in deriving those maps is the Gaussianity of the spatial distribution of images in a deep CNN feature space. The Wasserstein geometry of Gaussian measures is very well studied in optimal transport [12] and Gaussian Optimal Transport (OT) maps have closed forms. We show in Section 3 that those normalization transforms are approximations of the OT maps. Linear interpolations of different content or styles at the level of those normalization feature transforms have been successfully applied in [10, 8] we show in Section 4 that this can be interpreted and improved as Gaussian geodesics in the Wasserstein geometry . Furthermore using this insight, we show in Section 5 that we can define novel styles using Wasserstein barycenyter of Gaussians [13]. We also extend this to other Fréchet means in order to study the impact of the ground metric used on the covariances in the novel style obtained via this non linear interpolation. Experiments are presented in Section 7.

2 Universal Style Transfer

We review in this Section the approach of universal style transfer of WCT [11].

Encoding Map. Given a content image and a style image and a Feature extractor where is the spatial output of , is its feature dimension . Define the following Encoding map: where is the space of empirical measures on . For example is a VGG [14] CNN that maps an image to ( is the number of channels, the height and the width). In other words the CNN defines a distribution in the space of dimension , and we are given samples of this distribution. We note this empirical distribution, i.e the spatial distribution of image in the feature space of a deep convolutional network .

Decoding Map. We assume that the encoding is invertible , i.e exists: such that . is a VGG image Encoder/ Decoder for instance trained from the pixel domain to a spatial convolutional layer output in VGG and vice-versa.

Universal Style Transfer in Feature Space. Universal style transfer approach [11] works in the following way: WCT (Whitening Coloring Transform) defines a transform (we will elaborate later on this transform) in the feature space : the style transfer Transform operates in the feature space and defines naturally a push forward map on the spatial distribution of the features of content image :

is defined so that the style transfer happens in the feature space i.e . We obtain the stylized image by decoding back to the image domain :

From this formalism we see that the universal style transfer problem amounts to finding a transport map from the spatial distribution of a content image in a feature space to the the spatial distribution of a target image in the same feature space . We show in the next section how to leverage optimal transport theory to define such maps. Moreover we show that the WCT transform and Adaptive instance normalization are approximations to the optimal transport maps.

3 Wasserstein Universal Style Transfer

Given and , we formulate the style transfer problem as finding an optimal Monge map:

(1)

the optimal value of this problem is , the Wasserstein two distance between and . Under some regularity conditions on the distributions, the optimal transport exists and is unique and is the gradient of a convex potential [15]

Wasserstein Geometry of Gaussian Measures. Computationally Problem (1) can be solved using for example entropic regularization of the equivalent Kantorovich form of [16, 17] or in an end to end approach using automatic differentiation of a Sinkhorn loss [18, 19] . We take here another route, using the following known fact that Gaussian measures OT provides a lower bound on the Wasserstein distance [20] . For any two measures and :

where are means and covariance of , and of . The Wasserstein geometry of Gaussian measures is well studied and have many convenient computational properties [12], we summarize them in the following:

1) Closed Form . Given two Gaussians distributions , and we have:

where

is the Bures metric between covariances. The Bures metric is a goedesic metric on the PSD cone. (In Section 5.2 we discuss properties of this metric).

2) Closed Form Monge Map. The optimal transport map between two Gaussians with non degenerate covariances (full rank ) has a closed form: where i.e and is optimal in the sense. If the Gaussian were degenerate we can replace the square root matrices inverses with pseudo-inverses [21].

Gaussian Wasserstein Style Transfer. The spatial distribution of images in CNN feature space is not exactly Gaussian, but instead of having the solve Problem (1) we can use the Gaussian lower bound and obtain a closed form optimal map from the content distribution to the style distribution as follows:

(2)

where , and , and are means and covariances of and resp. and

Finally the Universal Wasserstein Style Transfer can be written in the following compact way, that is summarized in Figure 1:

(3)

Figure 1: Wasserstein Style Transfer

Relation to WCT and to Adaptive Instance Normalization. We consider two particular cases:
1) Commuting covariances and WCT [11]. Assuming that the covariances and commute meaning that ( and have a common orthonormal basis ) it is easy to see that the optimal transport map reduces to :

which is exactly the Whitening and Coloring Transform (WCT). Hence we see that WCT [11] is only optimal when the covariances commute (a particular case is diagonal covariances).

2) Diagonal Covriances and AdaIN, Instance Normalization (IN) and Conditional Instance Normalization (CIN)[10, 7, 8]. Let be the diagonal of and be the diagonal of . In case the covariances were diagonal it is easy to see that:

this is exactly the expression of adaptive instance normalization AdaIN. We conclude that AdaIN, IN and CIN implement a diagonal approximation of the optimal Gaussian transport map (, are learned constant scaling and biases in IN and CIN , and are adaptive in adaIN).

4 Wasserstein Style/Content Interpolation with McCann Interpolates

One shortcoming of the formulation in problem (1) is that it does not allow to balance the content/style preservation as it is the case in end to end style transfer. Let we formulate the style transfer problem with content preservation as follows:

(4)

The first term in Equation (4) measure the usual "content loss" in style transfer and the second term measures the "style loss". balances the interpolation between the style and the content. In optimal transport theory, Problem (4) is known as the McCann interpolate [22] between and and the solution of (4) is a Wasserstein geodesic from to and is given by:

The spatial distribution of images in CNN is not exactly Gaussian, but instead of having the solve Problem (4) we can again use the following Gaussian lower bound:

(5)

Fortunately this problem has also a closed form [22]:

where is given in Equation (2). is a geodesic between and . Finally the Wasserstein Style/Content Interpolation can be written in the following compact way:

(6)

In practice both WCT and AdaIN propose similar interpolations in feature space, we give here a formal justification for this approach. This formalism allows us to generalize to multiple styles interpolation using the Gaussian Wasserstein geometry of the spatial distribution of CNN images features.

5 Wasserstein Style Interpolation

Given , target styles images, and a content image ,where are interpolation factors such that . A naive approach to content/ styles interpolation can be given by:

this approach was proposed in both WCT and AdaIn by replacing by and AdaIN respectively. We show here how to define a non linear interpolation that exploits the Wasserstein geometry of Gaussian measures.

Figure 2: Wasserstein Barycenter Interpolation between a content image and two target styles images. The weights used are given above the two examples.

5.1 Interpolation with Wasserstein Barycenters

Similarly to the content /style interpolation, we formulate the content / styles interpolation problem as a Wasserstein Barycenter problem [13] as follows. Let , and , we propose to solve the following Wasserstein Barycenter problem:

and then find the optimal map from to the barycenter measure The final stylized image is obtained as follows:

Again we resort to Gaussian optimal transport lower bound of the above problem:

(7)

As shown by Agueh and Carlier [13] the Wasserstein Barycenter of Gaussians is itself a Gaussian where and is a Bures Mean. Noting we have:

Agueh and Carlier showed that is the unique positive definite matrix solution of the following fixed point problem: In order to solve this problem we use an alternative fixed point strategy proposed in [23], since it converges faster in practice:

(8)

and we initialized as in [21]: we found that was enough for convergence, i.e we set . Matrix square root and inverses were computed using SVD which gives an overall complexity of and we used truncated SVD to stabilize the inverses. Finally since the Barycenter is a Gaussian , the optimal transport map from the Gaussian spatial content distribution to the barycenter (mix of styles and content) is given in closed form as in Equation (2):

(9)

Finally to obtain the stylized image as a result of targeting the mixed/style we decode back:

Figure 2 gives an example of our approach for mixing content images with style images. We see that the Wasserstein barycenter captures not only the color distribution but also the details of the artistic style (for instance Frida Kahlo’s unibrow is well captured smoothly in the transition between Picasso self portrait and Frida Kahlo).

5.2 Style Interpolation with Fréchet Means

In the previous section we defined interpolations between the content and the styles images. In this section we define a "novel style" via an interpolation of style images only, we then map the content to the novel style using Gaussian optimal transport.

From Wasserstein Barycenter to Fréchet Means on the PSD manifold. As discussed earlier the Wasserstein Barycenter of the Gaussian approximations of the spatial distribution of style images in CNN feature spaces can be written as:

(10)

for the euclidean metric , the Bures metric. The Bures Metric is a geodesic metric on the positive definite cone and and has another representation as a procrustes registration metric [24]:

From this we see the advantage of Wasserstein barycenter on for example using the Frobenius norm. Bures Metric aligns the the square root of covariances using a rotation. From this we see that by defining a new metric on covariances we can get different form of interpolates, we fix , and hence on we use always the arithmetic mean . We give here different metrics that defines different Fréchet means on the PSD manifold (see [25] and references there in )

1) Arithmetic Mean: Solving Eq. (10) for , we define the target style , where .
2) Harmonic Mean: Solving Eq. (10) for , we define the target style , where .
3) Fisher Rao Mean (Karcher or Geometric Mean). For that is the Riemannian natural metric or the Fisher Rao metric between Centered Gaussians. here refers to matrix logarithm. The Fisher Rao metric is a geodesic distance and its metric tensor is the Fisher information matrix . Solving Eq. (10) with the Fisher Rao metric we obtain the so called Karcher Mean between PSD matrices , and we define the target style .
In order to find the Karcher mean we use manifold optimization techniques of [26] as follows. The gradient manifold update is :

(11)

we initialize as in the Wasserstein case and iterate for iterations with the learning rate set to .

Remark 1.

While we defined here the barycenter style of each metric as a Gaussian, Wasserstein Barycenter is the only one that guarantees a Gaussian barycenter [13].

Mapping a content image to a target novel style. Given now the new style , where mean is in , we stylize a content image using Gaussian Optimal transport as described in the paper:

Our approach is summarized in Algorithms 1 and 2 given in Appendix.

6 Related works

OT for style Transfer and Image coloring. Color transfer between images using regularized optimal transport on the color distribution of images (RGB for example) was studied and applied in [27]. The color distribution is not gaussian and hence the OT problem has to be solved using regularization. Optimal transport for style transfer using the spatial distribution in the feature space of a deep CNN was also explored in [28, 29]. [28] uses for Gaussians as content and style loss and optimizes it in an end to end fashion similar to [1, 2]. [29] uses an approximation of the Wasserstein distance as a loss that is also optimized in an end to end fashion. Both approaches don’t allow universal style transfer and an optimization is needed for every style/content image pairs.

Wasserstein Barycenter for Texture Mixing. Similar to our approach for Wasserstein mixing in an encoder/decoder framework, [30] uses the wavelet transform to encode textures, applies Wasserstein barycenter on wavelets coefficients, and then decodes back using the inverse wavelet transform to synthesize a novel mixed texture. The wasserstein barycenter problem there has to be solved exactly and the Gaussian approximation can not be used since the wavelet coffecient distribution is not Gaussian. A special model for Gaussian texture mixing was developed in [21]. The advantage of using features of a CNN is that the Gaussian lower bound of the Wasserstein distance seems to be tight.

Other approaches to style Transfer. While our focus in this paper was on OT metrics for style transfer other approaches exist (see [31] for a review) and have used different type of losses such as MRF loss [32] , MMD loss [33], GAN loss [5] and cycle GAN loss [34].

7 Experiments

In order to test our approach of geometric mixing of styles we use the WCT framework [11], where we use a pyramid of encoders at different spatial resolutions, where corresponds to the coarser resolution, and the finer resolution. Following WCT we use a coarse to fine approach to style transfer as follows. Given interpolation weights , we start with and with :

Figure 3: ( Table 1): Wasserstein Barycenter Interpolation between a content image given above and four target styles images given at the corner of the square. Each image in the square is for an interpolation weight , that are defined on a grid on the square. (Table 2): Fisher Rao Interpolation between the same content image given above and the same four target styles images given at the corner of the square. In both cases Gaussian Wasserstein transport plans are used to obtain the transformed image to the novel mixed style in the feature space, and the final image is obtained using the decoder. (Table 3): the AdaIn baseline that we showed that it does a diagonal approximation fails at capturing the subtle details of the style of the target images. Both Wasserstein and Fisher Rao approaches are successful, we notice that while Wasserstein barycenter is color dominant in defining the new style, the Fisher Rao barycenter capture more the strokes and captures better color variations in the novel artistic styles. We note that the Wasserstein is smoother as we change the interpolation weights then the Fisher Rao. (Figure is better seen in color and zooming in; See Appendix for a full resolution).
  1. We encode all style images at resolution , . We define the mixed style at resolution using one of the mixing strategies (Frechet Mean) in Section 4, using Algorithms 1 and 2.

  2. We find the Wasserstein Transport map at resolution between the content and the novel style and compute the transformed features: .

  3. We decode the novel image at resolution : .

  4. We set , then set to and go to step until reaching .

The stylized output of this procedure is . We also experimented in the Appendix with applying the same approach but in fine to coarse way starting from the higher resolution to the lower resolution encoder . We show in Figure 3 the output of our mixing strategy using two of the geodesic metrics namely Wasserstein and Fisher Rao barycenters. We give as baseline the AdaIn output for this (this same example was given in [10] we reproduce it using their available code). We show that using geodesic metrics to define the mixed style successfully capture the subtle details of different styles. More examples and comparison to literature and other types of mixing can be found in the Appendix.

8 Discussion and Conclusion

We conclude this paper by the following three observations on the spatial distribution of features in a deep convolutional neural network:

  1. The success of Gaussian optimal transport between spatial distributions of deep CNN features that we demonstrated in this paper suggests that the network learned to "gaussianize" the space. Gaussinization [35] is a principle in unsupervised learning. It will be interesting to further study this Gaussianity hypothesis and to see if Gaussinization can be used as a regularizer for learning deep CNN or as an objective in self-supervised learning.

  2. We showed that many of the spatial normalization layers used in deep learning such as Instance normalization [7] and related variants can be understood as approximations of Gaussian optimal transport. When used in an architecture between layers, the normalization layer acts like a transport map between the spatial distribution of consecutive layers. We hope this angle will help developing new normalization layers and a better understanding of the existing ones.

  3. Geodesic metrics such as Wasserstein and the Fisher Rao metric allow better non linear interpolation in feature space.

References

  • [1] Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. Image style transfer using convolutional neural networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, 2016.
  • [2] Leon A. Gatys, Alexander S. Ecker, Matthias Bethge, Aaron Hertzmann, and Eli Shechtman. Controlling perceptual factors in neural style transfer. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, 2017.
  • [3] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. Lecture Notes in Computer Science, 2016.
  • [4] Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, and Victor S. Lempitsky. Texture networks: Feed-forward synthesis of textures and stylized images. In ICML, 2016.
  • [5] Chuan Li and Michael Wand. Precomputed real-time texture synthesis with markovian generative adversarial networks. In ECCV (3), 2016.
  • [6] Xin Wang, Geoffrey Oxholm, Da Zhang, and Yuan-Fang Wang. Multimodal transfer: A hierarchical deep convolutional neural network for fast artistic style transfer. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  • [7] Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  • [8] Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur. A learned representation for artistic style. 2017.
  • [9] Tian Qi Chen and Mark Schmidt. Fast patch-based style transfer of arbitrary style, 2016.
  • [10] Xun Huang and Serge Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
  • [11] Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. Universal style transfer via feature transforms. In Advances in Neural Information Processing Systems 30. 2017.
  • [12] Asuka Takatsu. Wasserstein geometry of gaussian measures. Osaka J. Math., 2011.
  • [13] Martial Agueh and Guillaume Carlier. Barycenters in the wasserstein space. SIAM J. Math. Analysis, 43, 2011.
  • [14] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556, 2014.
  • [15] Jean-David Benamou and Yann Brenier. A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem. Numerische Mathematik, 2000.
  • [16] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in neural information processing systems, pages 2292–2300, 2013.
  • [17] Gabriel Peyré and Marco Cuturi. Computational optimal transport. Technical report, 2017.
  • [18] Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya, and Tomaso A Poggio. Learning with a wasserstein loss. In Advances in Neural Information Processing Systems, pages 2053–2061, 2015.
  • [19] J. Feydy, T. Séjourné, F.-X. Vialard, S.-i. Amari, A. Trouvé, and G. Peyré. Interpolating between Optimal Transport and MMD using Sinkhorn Divergences. ArXiv e-prints, 2018.
  • [20] J. A. Cuesta-Albertos, C. Matrán-Bea, and A. Tuero-Diaz. On lower bounds for thel2-wasserstein metric in a hilbert space. Journal of Theoretical Probability, 1996.
  • [21] Gui-Song Xia, Sira Ferradans, Gabriel Peyré, and Jean-François Aujol. Synthesizing and mixing stationary gaussian texture models. SIAM J. Imaging Sciences, 2014.
  • [22] Robert J. McCann. A convexity principle for interacting gases. Advances in Mathematics, 1997.
  • [23] Pedro C. Álvarez Esteban, E. del Barrio, J.A. Cuesta-Albertos, and C. Matrán. A fixed-point approach to barycenters in wasserstein space. http://arxiv.org/pdf/1511.05355.
  • [24] Valentina Masarotto, Victor M Panaretos, and Yoav Zemel. Procrustes metrics on covariance operators and optimal transportation of gaussian processes. Sankhya A, 2018.
  • [25] Rajendra Bhatia. The Riemannian Mean of Positive Matrices. Springer Berlin Heidelberg, 2013.
  • [26] Hongyi Zhang and Suvrit Sra. First-order methods for geodesically convex optimization. In COLT, 2016.
  • [27] Sira Ferradans, Nicolas Papadakis, Julien Rabin, Gabriel Peyré, and Jean-Francois Aujol. Regularized discrete optimal transport. Scale Space and Variational Methods in Computer Vision, 2013.
  • [28] Style transfer as optimal transport. https://github.com/VinceMarron/style_transfer.
  • [29] Nicholas Kolkin, Jason Salavon, and Greg Shakhnarovich. Style transfer by relaxed optimal transport and self-similarity, 2019.
  • [30] Julien Rabin, Gabriel Peyré, Julie Delon, and Marc Bernot. Wasserstein barycenter and its application to texture mixing. In Proceedings of the Third International Conference on Scale Space and Variational Methods in Computer Vision, SSVM’11, 2012.
  • [31] Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, and Mingli Song. Neural style transfer: A review. CoRR.
  • [32] Chuan Li and Michael Wand. Combining markov random fields and convolutional neural networks for image synthesis. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2016.
  • [33] Yanghao Li, Naiyan Wang, Jiaying Liu, and Xiaodi Hou. Demystifying neural style transfer. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017.
  • [34] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Computer Vision (ICCV), 2017 IEEE International Conference on, 2017.
  • [35] Scott Saobing Chen and Ramesh A. Gopinath. Gaussianization. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 423–429. MIT Press, 2001.

Supplementary Material for Wasserstein Style Transfer

Appendix A Algorithms

  Inputs: style images, content Image , interpolations weights , Encoder/Decoder ,.
  Encode:
  Statistics:
  Content/Style or Style only: if content/style , else pass.
  Target Bary Mean: .
  Target Bary Covariance: Frechet mean
  Novel Style:
  Gaussian OT Content to Target: Compute given in Eq. (9)
  Decode:
Algorithm 1 Frechet Mean Style Interpolation and Content Stylization()

  Initialize:
  if find solve using iterations in Eq (8)
  if find solve using iterations in Eq (11)
  if
  if
Algorithm 2 FRECHET MEAN(

Appendix B Examples of Interpolating Content and Styles with Wasserstein Barycenter and Optimal Transport

In Figures 4, 5 ,6 we show examples of interpolations of content images with the style images. We used in this experiment a coarse to fine approach, i.e starting from matching upper layers of VGG to lower layers.

Figure 4: Wasserstein barycenters for Style Mixing and Transfer. The content image on the right corner of the triangle is mixed with the two styles images. Each image in the triangle correspond to a set of interpolation weights defined by proximity to the content or style images in the triangle.
Figure 5: Wasserstein barycenters for Style Mixing and Transfer. The content image on the right corner of the triangle is mixed with the two styles images. Each image in the triangle correspond to a set of interpolation weights defined by proximity to the content or style images in the triangle.
Figure 6: Wasserstein barycenters for Style Mixing and Transfer. The content image on the right corner of the triangle is mixed with the two styles images. Each image in the triangle correspond to a set of interpolation weights defined by proximity to the content or style images in the triangle.

Appendix C Mixing Styles with Frechet Means and Optimal Transport Style Transfer

Coarse to Fine.

-We give results of different Mixing strategies and a content stylization in a coarse to fine procedure as follows: Wasserstein Mixing in Table 8;Fisher Rao Mixing in Table 9 ;Arithmetic Mixing that would be close to WCT baseline [11] in Table 10; Harmonic Mixing in Table 11 ; AdaIN Mixing Table in 12. We also give another set of results on Wass Barycenter mixing in Table 15, Fisher Rao in Table 16 and AdaIn in 17

Fine to Coarse.

We experiment baselining WCT mixing [11] and Wasserstein Mixing in a Fine to coarse strategy (from lower layer to upper layers) results are given in Table 13 and Table 1.

Figure 7: Content Image. We give results of different Mixing strategies and a content stylization in a coarse to fine procedure as follows: Wasserstein Mixing in Table 8;Fisher Rao Mixing in Table 9 ;Arithmetic Mixing that would be close to WCT baseline [11] in Table 10; Harmonic Mixing in Table 11 ; AdaIN Mixing Table in 12; The style images are given on the four corners of each square.
Figure 8: Coarse to Fine style Transfer: Content image is given in Figure 7. Wasserstein Barycenter Mixing of the styles (the four images in the corners of the square) and Wasserstein Transport of the content image to the novel style defined by the Wasserstein Barycenter for various interpolations weights. The stylized image is generated by following a coarse to fine scheme.
Figure 9: Coarse to Fine Generation: Karcher (Fisher Rao) Barycenter Mixing of the styles and Wasserstein Transport of the content image to the novel style defined by the Fisher Rao barycenter. The stylized is generated following a coarse to fine scheme.
Figure 10: Coarse Fine Style Transfer: (Arithmetic )Euclidean Barycenter Mixing of Covariances and Wasserstein Transport. This is similar to a WCT type of mixing. The stylized image is similar to a Wasserstein Barycenter Mixing, nevertheless a closer look shows subtle differences. This hints to the fact the coarse layers are almost diagonal.
Figure 11: Coarse Fine Style Transfer: Harmonic Barycenter Mixing of Covariances and Wasserstein Transport. The Harmonic Mixing have saturation problems and does not produce good results.
Figure 12: Adaptive Instance Normalization Mixing Baseline [10].
Figure 13: Fine to coarse style Transfer: Content image is given in Figure 7. Wasserstein Barycenter Mixing of the styles (the four images in the corners of the square) and Wasserstein Transport of the content image to the novel style defined by the Wasserstein Barycenter for various interpolations weights. The stylized image is generated by following a coarse to fine scheme.
Table 1: Fine to coarse style Transfer: Results of Arithmetic Mean Mixing (WCT Mixing [11]) , we see here that the coloring has a lot of black shadow over the face unlike the Wasserstein barycenter mixing approach, in the previous Table.
Figure 14: Content Image ( a Photo of Hotel Dieu painted by van gogh in the most right corner of the square). Stylization in mixture of four styles incluing van gogh painting are in Tables 15 for Wasserstein mixing and Table 16 for Fisher Rao Mixing. Table 17 is the AdaIN baseline.
Figure 15: Coarse to Fine style transfer: Wasserstein Barycenter Mixing and Wasserstein Transport. The content image is given in Figure 14. The four styles are on the four corner of the square.
Figure 16: Coarse to fine style transfer: Fisher Rao Mixing and Wasserstein Transport.
Figure 17: Adaptive Instance Normalization Mixing Baseline [10].
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
369555
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description