Original Feature Preserving Virtual Try-on Network Based on Receptive Field Block

Zhaoyang WANG; Ran TAO; Hailun LU

doi:10.19884/j.1672-5220.202211004

Journal of Donghua University(English Edition) ›› 2024, Vol. 41 ›› Issue (01) : 28-36. DOI: 10.19884/j.1672-5220.202211004

Special Topic:Artificial Intelligence on Fashion and Textiles

Original Feature Preserving Virtual Try-on Network Based on Receptive Field Block

Author information +

History +

Abstract

Computer vision-based virtual try-on (VITON) technology refers to warping and composing the try-on clothing according to the model image features into the model image to replace the original clothing parts. Current VITON methods have two main challenges: insufficient preservation of original features such as the head, bottom, and background of the model image; poor matching of the warped try-on clothing to the model image. To solve these two problems, an original feature preserving virtual try-on network ( OFP-VTON ) is proposed, which consists of semantic segmentation map generation, try-on clothing warping, and try-on image synthesis. In the try-on clothing warping phase, the network learns the mapping of warping of the clothing worn in the model image to better constrain the try-on warping. In the try-on image synthesis phase, the original features of the model image are extracted and preserved, and a receptive field block (RFB) is introduced to preserve the features of try-on clothing as much as possible. Qualitative and quantitative experiments on the publicly available VITON dataset show that the proposed OFP-VTON better preserves the original features and that the warped try-on clothing matches the model images better than the baseline method.

Keywords

virtual try-on (VITON) / deep learning / receptive field block(RFB) / original feature preserving

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Zhaoyang WANG, Ran TAO, Hailun LU. Original Feature Preserving Virtual Try-on Network Based on Receptive Field Block. Journal of Donghua University(English Edition), 2024, 41(01): 28‒36 https://doi.org/10.19884/j.1672-5220.202211004

References

Publishing order | Descend order by publishing year | Descend order by cited within

[[1]]

WEI

X X

, ZHAO

, XU

Z B

. Present situation and development trend of virtual fitting technology based on deep learning[J]. Journal of Donghua University (Natural Science), 2022, 48(3):131-138. (in Chinese)

[[2]]

C J

, SONG

Y B

, GE

Y Y

, et al. Disentangled cycle consistency for highly-realistic virtual try-on[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos,California:IEEE, 2021:19628-16937.

[[3]]

WANG

B C

, ZHENG

H B

, LIANG

X D

, et al. Toward characteristic-preserving image-based virtual try-on network[C]// Proceedings of the European Conference on Computer Vision (ECCV).Berlin:Springer, 2018:589-604.

[[4]]

GUAN

, REISS

, HIRSHBERG

D A

, et al. Drape:dressing any person[J]. ACM Transactions on Graphics, 2012, 31(4):1-10.

[[5]]

SEKINE

, SUGITA

, PERBET

, et al. Virtual fitting by single-shot body shape estimation[C]// Proceedings of the International Conference on 3D Body Scanning Technologies.Lugano:[s.n.], 2014:406-413.

[[6]]

PONS-MOLL

, PUJADES

, HU

, et al. ClothCap:seamless 4D clothing capture and retargeting[J]. ACM Transactions on Graphics, 2017, 36(4):1-15.

[[7]]

HAN

X T

, WU

Z X

, WU

, et al. VITON:an image-based virtual try-on network[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018:7543-7552.

[[8]]

JEAN

. Splines minimizing rotation-invariant semi-norms in Sobolev spaces[M]//Constructive Theory of Functions of Several Variables. Berlin: Springer, 1977:85-100.

[[9]]

NEUBERGER

, BORENSTEIN

, HILLELI

, et al. Image based virtual try-on network from unpaired data[C]// Proceedings of the Computer Vision and Pattern Recognition. New York: IEEE, 2020:5184-5193.

[[10]]

CHOI

, PARK

, LEE

, et al. VITON-HD:high-resolution virtual try-on via misalignment-aware normalization[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos,California:IEEE, 2021:14131-14140.

[[11]]

HONDA

. VITON-GAN:virtual try-on image generator trained with adversarial loss[EB/OL].(2019-11-12)[2022-11-01]. https://arxiv.org/abs/1911.07926.

[[12]]

YANG

, ZHANG

R M

, GUO

X B

, et al. Towards photo-realistic virtual try-on by adaptively generating-preserving image content[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Los Alamitos,California:IEEE, 2020:7850-7859.

[[13]]

DAI

, QI

, XIONG

, et al. Deformable convolutional networks[C]// Proceedings of the IEEE International Conference on Computer Vision. New York: IEEE, 2017:764-773.

[[14]]

LIU

S T

, HHUANG

. Receptive field block net for accurate and fast object detection[C]// Proceedings of the European Conference on Computer Vision (ECCV).Berlin:Springer, 2018:385-400.

[[15]]

GOODFELLOW

, POUGET-ABADIE

, MIRZA

, et al. Generative adversarial networks[J]. Advances in Neural Information Processing Systems, 2014, 3:2672-2680.

[[16]]

MIRZA

, OSINDERO

. Conditional generative adversarial nets[EB/OL].(2014-11-06)[2022-10-30]. http://arXiv.org/abs/1411.1784.

[[17]]

REED

, AKATA

, YAN

X C

, et al. Generative adversarial text to image synthesis[C]// International Conference on Machine Learning. New York: ACM, 2016:1060-1069.

[[18]]

TALASILA

, NARASINGARAO

M R

. BI-LSTM based encoding and GAN for text-to-image synthesis[J]. Sensing and Imaging, 2022, 23(1):1-17.

[[19]]

ISOLA

, ZHU

J Y

, ZHOU

T H

, et al. Image-to-image translation with conditional adversarial networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017:1125-1134.

[[20]]

SHAO

X N

, ZHANG

W D

. SPatchGAN:a statistical feature based discriminator for unsupervised image-to-image translation[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision.Los Alamitos,California:IEEE, 2021:6546-6555.

[[21]]

BAEK

, CHOI

, UH

, et al. Rethinking the truly unsupervised image-to-image translation[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision.Los Alamitos,California:IEEE, 2021:14154-14163.

[[22]]

PATHAK

, KRÄHENBÜHL

, DONAHUE

, et al. Context encoders:feature learning by inpainting[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016:2536-2544.

[[23]]

RONNEBERGER

, FISCHER

, BROX

. U-net:convolutional networks for biomedical image segmentation[C]// International Conference on Medical Image Computing and Computer-Assisted Intervention.Berlin:Springer, 2015:234-241.

[[24]]

ZHU

J Y

, PARK

, ISOLA

, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]// Proceedings of the IEEE International Conference on Computer Vision. New York: IEEE, 2017:2223-2232.

[[25]]

BROCK

, DONAHUE

, SIMONYAN

. Large scale GAN training for high fidelity natural image synthesis[EB/OL].(2018-9-28)[2022-10-30]. http:/arxiv.org/abs/1809.11096.

[[26]]

JETCHEV

, BERGMANN

. The conditional analogy GAN:swapping fashion articles on people images[C]// Proceedings of the IEEE International Conference on Computer Vision Workshops. New York: IEEE, 2017:2287-2292.

[[27]]

ISSENHUTH

, MARY

, CALAUZèNES

. End-to-end learning of geometric deformations of feature maps for virtual try-on[EB/OL].(2019-06-04)[2022-10-30]. http://arxiv.org/abs/1906.01347.

[[28]]

RAJ

, SANGKLOY

, CHANG

H W

, et al. SwapNet:image based garment transfer[C]// European Conference on Computer Vision.Berlin:Springer, 2018:679-695.

[[29]]

MINAR

M R

, TUAN

, AHN

, et al. CP-VTON+:clothing shape and texture preserving image-based virtual try-on[C]// CVPR Workshops.Seattle:[s.n.], 2020.

[[30]]

HAN

X T

, HU

X J

, HUANG

W L

, et al. ClothFlow:a flow-based model for clothed person generation[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision.Los Alamitos,California:IEEE, 2019:10471-10480.

[[31]]

SIMONYAN

, ZISSERMAN

. Very deep convolutional networks for large-scale image recognition[EB/OL].(2014-09-04)[2022-10-30]. http://arxiv.org/abs/1409.1556.

[[32]]

KINGMA

D P

, BA

. Adam:A method for stochastic optimization[EB/OL].(2014-12-22)[2022-11-01]. http://arxiv.org/abs/1412.6980.

[[33]]

SZEGEDY

, VANHOUCKE

, IOFFE

, et al. Rethinking the inception architecture for computer vision[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016:2818-2826.

Funding

National Key Research and Development Program of China(2020YFB1707700); Fundamental Research Funds for the Central Universities, China(20D111201)