Image Harmonization with Attention-based Deep Feature Modulation
Guoqing Hao
Satoshi Iizuka
Kazuhiro Fukui
[Paper]
[GitHub]

Abstract

We present a learning-based approach for image harmonization, which allows for adjusting the appearance of the foreground to make it compatible with background. We consider improving the realism by adjusting the high-level feature statistics of the foreground according to those of the background, which is motivated by the fact that specific image statistics between the foreground and background typically match in realistic composite images. Based on a fully convolutional network, we propose a novel attentionbased module that aligns the standard deviation of the foreground features with that of the background features, capturing global dependencies in the entire image. This module is easily inserted into any types of convolutional neural networks, and allows improving the harmony of the composites with only a small additional computational cost. Experimental results on the image harmonization dataset and real composite images show that our method outperforms existing methods both quantitatively and qualitatively. Furthermore, in our experiment, our module is able to boost existing harmonization networks by simply inserting it into intermediate layers of those networks.


Presentation


[Slides]

Network Architecture

The model takes a composite image and mask as input. Attention-based foreground-background feature map modulation layer is used to perform modulation at the feature space. This layer allows modulating the feature map of foreground according to the similarity-weighted background. Finally, the output of the network is a harmonized image.


Deep Feature Modulation Layer

This module first utilizes a self-attention block to calculate non-local information from the entire image. Afterwards, the foreground and background regions are separated by using a corresponding mask, and then the standard deviation of the foreground features is modulated according to that of the similarity-weighted background features. The modulation is done with the feature map modulation (FMM).


Paper

G. Hao, S. Iizuka, K. Fukui
Image Harmonization with Attention-based Deep Feature Modulation
In BMVC, 2020.
(hosted on BMVC)


[Bibtex]


Acknowledgements

This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.