Application of the Improved PF-Flow-Style-VTON in Virtual Try-On

Jiajia TIAN; Rong HUANG; Aihua DONG; Zhijie WANG

doi:10.19884/j.1672-5220.202412002

Journal of Donghua University(English Edition) ›› 2026, Vol. 43 ›› Issue (1) :104 -117. DOI: 10.19884/j.1672-5220.202412002

Artificial Intelligence on Fashion and Textiles

research-article

Application of the Improved PF-Flow-Style-VTON in Virtual Try-On

Author information +

History +

PDF

Abstract

During the image generation phase, the parser-free Flow-Style-VTON model(PF-Flow-Style-VTON), which utilizes distilled appearance flows, faces two main challenges:blurring, deformation, occlusion, or loss of the arm or palm regions in the generated image when these regions of the person occlude the garment; blurring and deformation in the generated image when the person performs large pose movements and the target garment is complex with detailed patterns. To solve these two problems, an improved virtual try-on network model, denoted as IPF-Flow-Style-VTON, is proposed. Firstly, a target warped garment mask refinement module(M-RM) is introduced to refine the warped garment mask and remove erroneous information in the arm and palm regions, thereby improving the quality of subsequent image generation. Secondly, an improved global attention module(GAM) is integrated into the original image generation network, enhancing the ResUNet’s understanding of global context and optimizing the fusion of local features and global information, thereby further improving image generation quality. Finally, the UniPose model is used to provide the pose keypoint information of the target person image, guiding the task execution during the image generation phase. Experiments conducted on the VITON dataset show that the proposed method outperforms the original method, Flow-Style-VTON, by 5. 4%, 0. 3%, 6. 7%, and 2. 2% in Fréchet inception distance(FID), structural similarity index measure(SSIM), learned perceptual image patch similarity(LPIPS), and peak signal-to-noise ratio(PSNR), respectively. Overall, the proposed method effectively improves upon the shortcomings of the original network and achieves better visual results.

Keywords

virtual try-on / image generation network / pose keypoint / deep learning

Cite this article

Download citation ▾

Jiajia TIAN, Rong HUANG, Aihua DONG, Zhijie WANG. Application of the Improved PF-Flow-Style-VTON in Virtual Try-On. Journal of Donghua University(English Edition), 2026, 43(1): 104-117 DOI:10.19884/j.1672-5220.202412002

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	WEI X X, ZHAO J, XU Z B. Present situation and development trend of virtual fitting technology based on deep learning[J]. Journal of Donghua University(Natural Science), 2022, 48(3):131-138.(in Chinese)

[2]	TAN Z L, BAI J, CHEN R, et al. FP-VTON:attention-based feature preserving virtual try-on network[J]. Computer Engineering and Applications, 2022, 58(23):186-196.(in Chinese)

[3]	WANG Q, JAGADEESH V, RESSLER B, et al. Im2Fit:fast 3D model fitting and anthropometrics using single consumer depth camera and synthetic data[EB/OL].(2014-11-19)[2025-10-22]. https://doi.org/10.48550/arXiv.1410.0745https://doi.org/10.48550/arXiv.1410.0745.

[4]	BOGO F, KANAZAWA A, LASSNER C, et al. Keep it SMPL:automatic estimation of 3D human pose and shape from a single image[C]//Computer Vision-ECCV 2016. Cham:Springer International Publishing, 2016:561-578.

[5]	XU A J, ZHOU J. Research on key technologies of virtual fitting system[J]. Progress in Textile Science & Technology, 2020(3):28-32.(in Chinese)

[6]	WANG Z Y, TAO R, LU H L, et al. Original feature preserving virtual try-on network based on receptive field block[J]. Journal of Donghua University(English Edition), 2024, 41(1):28-36.

[7]	YU L, ZHONG Y Q. Two-dimensional virtual try-on based on clothing transfer[J]. Wool Textile Journal, 2021, 49(4):7-12.(in Chinese)

[8]	WANG Q L, XU Z B, YANG S. 3D garment modeling based on scattered point cloud[J]. Journal of Clothing Research, 2021, 6(4):366-373.(in Chinese)

[9]	WANG Z C, HUANG R, DONG A H, et al. Normalized restoration of clothing images for virtual try-on oriented towards open scenes[J]. Journal of Donghua University(Natural Science), 2024, 50(6):133-139.(in Chinese)

[10]	GE Y Y, SONG Y B, ZHANG R M, et al. Parser-free virtual try-on via distilling appearance flows[C]// 2021 IEEE/ CVF Conference on Computer Vision and Pattern Recognition(CVPR). New York: IEEE, 2021:8481-8489.

[11]	ALBAHAR B, LU J W, YANG J M, et al. Pose with style[J]. ACM Transactions on Graphics, 2021, 40(6):1-11.

[12]	HE S, SONG Y Z, XIANG T. Style-based global appearance flow for virtual try-on[C]// 2022 IEEE/ CVF Conference on Computer Vision and Pattern Recognition(CVPR). New York: IEEE, 2022:3460-3469.

[13]	ZHOU T H, TULSIANI S, SUN W L, et al. View synthesis by appearance flow[C]//Computer Vision-ECCV 2016. Cham: Springer International Publishing, 2016:286-301.

[14]	LIU Y, SHAO Z, HOFFMANN N. Global attention mechanism:retain information to enhance channel-spatial interactions[EB/OL].(2021-12-10)[2025-06-12]. https://doi.org/10.48550/arXiv.2112.05561.https://doi.org/10.48550/arXiv.2112.05561.

[15]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). New York: IEEE, 2016:770-778.

[16]	ARTACHO B, SAVAKIS A. UniPose:unified human pose estimation in single images and videos[C]// 2020 IEEE/ CVF Conference on Computer Vision and Pattern Recognition(CVPR). New York: IEEE, 2020:7033-7042.

[17]	YANG H Z, GUO N. Overview of image-based virtual fitting:from deep learning to diffusion model[J]. Computer Engineering and Applications, 2024, 60(2):1-21.(in Chinese)

[18]	GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[EB/OL].(2014-07-10)[2025-11-30]. https://doi.org/10.48550/arXiv.1406.2661.https://doi.org/10.48550/arXiv.1406.2661.

[19]	HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[J]. Advances in neural information processing systems, 2020, 33:6840-6851.

[20]	JETCHEV N, BERGMANN U. The conditional analogy GAN:swapping fashion articles on people images[C]// 2017 IEEE International Conference on Computer Vision Workshops(ICCVW). NewYork: IEEE, 2017:2287-2292.

[21]	ZHU L Y, YANG D W, ZHU T, et al. TryOnDiffusion:a tale of two UNets[C]// 2023 IEEE/ CVF Conference on Computer Vision and Pattern Recognition(CVPR). NewYork: IEEE, 2023:4606-4615.

[22]	KIM J, GU G, PARK M, et al. StableVITON:learning semantic correspondence with latent diffusion model for virtual try-on[C]// 2024 IEEE/ CVF Conference on Computer Vision and Pattern Recognition(CVPR). NewYork: IEEE, 2024:8176-8185.

[23]	HAN X T, WU Z X, WU Z, et al. VITON:an image-based virtual try-on network[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. NewYork: IEEE, 2018:7543-7552.

[24]	CAO Z, SIMON T, WEI S H, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). NewYork: IEEE, 2017:1302-1310.

[25]	GONG K, LIANG X D, ZHANG D Y, et al. Look into person:self-supervised structure-sensitive learning and a new benchmark for human parsing[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). NewYork: IEEE, 2017:6757-6765.

[26]	WOOD S N. Thin plate regression splines[J]. Journal of the Royal Statistical Society:Series B(Statistical Methodology), 2003, 65(1):95-114.

[27]	WANG B C, ZHENG H B, LIANG X D, et al. Toward characteristic-preserving image-based virtual try-on network[C]//Computer Vision-ECCV 2018. Cham: Springer International Publishing, 2018:607-623.

[28]	HAN X T, HUANG W L, HU X J, et al. ClothFlow:a flow-based model for clothed person generation[C]// 2019 IEEE/ CVF International Conference on Computer Vision(ICCV). NewYork: IEEE, 2019:10470-10479.

[29]	ISSENHUTH T, MARY J, CALAUZÈNES C. Do not mask what you do not need to mask:a parser-free virtual try-on[C]//Computer Vision-ECCV 2020. Cham: Springer International Publishing, 2020:619-635.

[30]	HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[EB/OL].(2015-03-09)[2025-11-30]. https://doi.org/10.48550/arXiv.1503.02531.https://doi.org/10.48550/arXiv.1503.02531.

[31]	JOHNSON J, ALAHI A, LI F F. Perceptual losses for real-time style transfer and super-resolution[C]//Computer Vision-ECCV 2016. Cham: Springer International Publishing, 2016:694-711.

[32]	DENG J, DONG W, SOCHER R, et al. ImageNet:a large-scale hierarchical image database[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. NewYork: IEEE, 2009:248-255.

[33]	KAREN S. Very deep convolutional networks for large-scale image recognition[EB/OL].(2015-04-10)[2025-11-30]. https://doi.org/10.48550/arXiv.1409.1556.https://doi.org/10.48550/arXiv.1409.1556.

[34]	KINGA D P, BA J. Adam: a method for stochastic optimization[EB/OL].(2015-01-30) https://doi.org/10.48550/arXiv.1412.6980.

[35]	MARTIN H, HUBERT R, THOMAS U, et al. Gans trained by a two time-scale update rule converge to a local Nash equilibrium[J]. Advances in Neural Information Processing Systems, 2017, 30:6626-6637.

[36]	WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment:from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4):600-612.

[37]	SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training GANs[J]. Advances in Neural Information Processing Systems, 2016, 29:1-8.

[38]	ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]// 2018 IEEE/ CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018:586-595.

[39]	MINAR M R, TUAN T T, AHN H, et al. CP-VTON+:clothing shape and texture preserving image-based virtual try-on[C]// CVPR Workshops. New York: IEEE, 2020, 3:10-14.

[40]	YANG H, ZHANG R M, GUO X B, et al. Towards photo-realistic virtual try-on by adaptively generating-preserving image content[C]// 2020 IEEE/ CVF Conference on Computer Vision and Pattern Recognition(CVPR). New York: IEEE, 2020:7847-7856.

[41]	HE C, LIU R, et al. Image-based virtual try-on via channel attention and appearance flow[C]// Proceedings of the 2024 5th International Conference on Computing, Networks and Internet of Things. New York: ACM, 2024:198-203.

[42]	GU X L, ZHU J K, WONG Y, et al. Recurrent appearance flow for occlusion-free virtual try-on[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2024, 20(8):1-17.

[43]	LIU R, LEHMAN J, MOLINO P, et al. An intriguing failing of convolutional neural networks and the coordconv solution[J]. Advances in Neural Information Processing Systems, 2018, 31:1-15.