Introduction 介绍
Fashion is a dynamic and influential form of self-expression and cultural representation, shaped by historical events, cultural movements, media, and technology. It plays a significant role in society, impacting personal identity, social interactions, and the economy. Meanwhile, artificial intelligence (AI) has garnered considerable attention, particularly in image processing [1], [2], [3], [4], [5], with notable advancements in deep learning and generative models driven by the prevalence of images in social media [4], [6]. Nowadays, AI and fashion design have developed a strong and evolving relationship. AI technologies are being increasingly utilized in the fashion industry to enhance various aspects of the design process, from fashion detection, synthesis to recommendation. It can empower fashion designers with tools and insights that streamline the design process, enhance creativity, and meet the evolving demands of consumers. It serves as a valuable ally in the fashion industry, driving innovation, efficiency, and sustainability. The main objective of this paper is to introduce the application and development of AI in the field of fashion design.
时尚是一种动态且有影响力的自我表达和文化表现形式,受到历史事件、文化运动、媒体和技术的影响。它在社会中发挥着重要作用,影响个人身份、社会互动和经济。与此同时,人工智能(AI)引起了广泛关注,特别是在图像处理方面 [1] , [2] , [3] , [4] , [5] ,在社交媒体中图像的流行推动下,深度学习和生成模型取得了显着进步 [4] , [6] 。如今,人工智能和时装设计已经建立了牢固且不断发展的关系。人工智能技术在时尚行业中得到越来越多的应用,以增强设计过程的各个方面,从时尚检测、合成到推荐。它可以为时装设计师提供工具和见解,从而简化设计流程、增强创造力并满足消费者不断变化的需求。它是时尚行业的宝贵盟友,推动创新、效率和可持续发展。本文的主要目的是介绍人工智能在服装设计领域的应用和发展。
Our review focuses on the application of artificial intelligence in the field of fashion design, summarizing more than sixty recent research achievements at the intersection of fashion and computer vision. The review encompasses a comprehensive overview, ranging from traditional methods to deep learning techniques, revealing the diversity and innovation of AI technologies within the fashion domain. It includes literature of various types, including journal articles, conference papers, and preprints. The covered literature spans from the year 2011 to 2023, showcasing the recent research trends in this field.
我们的综述重点关注人工智能在时装设计领域的应用,总结了时尚与计算机视觉交叉领域的六十多项最新研究成果。该评论涵盖了从传统方法到深度学习技术的全面概述,揭示了时尚领域人工智能技术的多样性和创新。它包括各种类型的文献,包括期刊文章、会议论文和预印本。涵盖的文献跨度从2011年到2023年,展示了该领域的最新研究趋势。
We divide the field of fashion design into three major parts: fashion detection [7], [8], [9], [10], fashion synthesis [2], [11], [12], [13], and fashion recommendation [4], [14]. In addition, we also give an overview of main applications in these fashion domains, showing the strengths and shortcomings of intelligent fashion in the fashion industry. The reason why we organize the survey in this structure is to simulate the working process of a fashion designer. In most design work, designers first analyze fashion trends, gather inspiration, and extract and categorize fashion elements. Then they proceed to the fashion synthesis process, where they design new fashion items or modify existing ones. Once the fashion items are created, designers also need to provide recommendations for outfit combinations based on user preferences and usage scenarios.
我们将服装设计领域分为三大部分: 时尚检测 [7] , [8] , [9] , [10] 、时尚综合 [2] , [11] , [12] , [13] 、以及时尚推荐 [4] , [14] 。此外,我们还概述了这些时尚领域的主要应用,展示了智能时尚在时尚行业的优势和不足。我们之所以以这种结构组织调查,是为了模拟时装设计师的工作流程。在大多数设计工作中,设计师首先分析流行趋势,收集灵感,提取并归类流行元素。然后他们进入时尚综合过程,设计新的时尚单品或修改现有的单品。时尚单品创作完成后,设计师还需要根据用户喜好和使用场景提供服装搭配建议。
Before delving further into the intricacies of our survey, it is essential to review the relevant literature and consolidate the existing knowledge on this topic. Gu et al. [15] proposed a survey focused on fashion analysis and understanding with artificial intelligence from 2011 to 2019. Nonetheless, a fresh and comprehensive survey is yet to emerge to consolidate contemporary methodologies, appraise evaluation metrics, and offer valuable insights into prospective avenues of research. Moreover, we have focused our selection of research papers to concentrate on specific areas. Focusing on augmenting the introductory accomplishments, our survey even incorporates advancements up to the year 2023, highlighting significant advancements such as multimodal approaches [13], large-scale models, and diffusion models [3] for AI.
在进一步深入研究我们调查的复杂性之前,有必要回顾相关文献并巩固有关该主题的现有知识。顾等人。 [15] 提出了一项专注于 2011 年至 2019 年人工智能时尚分析和理解的调查。尽管如此,一项新的、全面的调查尚未出现,以巩固当代方法、评估评估指标并为前瞻性研究途径提供有价值的见解。此外,我们将研究论文的选择集中在特定领域。我们的调查重点关注扩大介绍性成就,甚至纳入了截至 2023 年的进展,强调了多式联运等重大进展 [13] 、大型模型和扩散模型 [3] 对于人工智能。
In addition, we are inspired by the survey proposed by Cheng et al. [14], especially the classification method. The survey introduced more than 200 major fashion-related works published from 2012 to 2020. However, our review focuses more on the field of fashion design, so we have refined the classification and supplemented the new developments in the field on a large scale. Mohammadi et al. [16] extensively studied AI uses in fashion and apparel, analyzing over 580 articles across 22 fashion tasks using a structured multi-label classification approach. The major characteristic of the survey lies in the vast number of categories, while our comprehensive survey offers a higher level of summarization and greater concentration of content.
此外,我们受到 Cheng 等人提出的调查的启发。 [14] ,特别是分类方法。此次调查介绍了2012年至2020年出版的200余部主要时尚相关作品。但我们的回顾更多地集中在服装设计领域,因此我们细化了分类,并大规模补充了该领域的新进展。穆罕默迪等人。 [16] 广泛研究了人工智能在时尚和服装中的应用,使用结构化多标签分类方法分析了 22 项时尚任务的 580 多篇文章。调查的一大特点是类别多,综合性调查概括程度更高,内容更集中。
Overall, the contributions of our work can be summarized as follows:
总的来说,我们的工作贡献可总结如下:
We present a comprehensive survey that examines the current research progress in the field of fashion design. We categorize the research topics into three main categories: detection, synthesis, and recommendation.
我们提出了一项全面的调查,探讨了时装设计领域当前的研究进展。我们将研究主题分为三个主要类别:检测、合成和推荐。For each category, we conduct an in-depth and organized review of the most significant methods and their respective contributions and existing limitations and shortcomings.
对于每个类别,我们对最重要的方法及其各自的贡献以及现有的局限性和缺点进行了深入和有组织的审查。Lastly, we outline existing challenges together with the possible future directions that can contribute to further advancements in the field.
最后,我们概述了现有的挑战以及有助于该领域进一步进步的未来可能的方向。
Section II reviews the tasks of fashion detection, which include landmark detection, fashion parsing, and item retrieval. Section III provides an overview of fashion synthesis, which focuses on assisting designers in creating and improving fashion items. Section IV discusses the works of fashion recommendation, specifically on fashion items matching and innovative recommender systems.
Section II 回顾时尚检测的任务,包括地标检测、时尚解析和项目检索。 Section III 概述了时尚综合,重点是协助设计师创造和改进时尚单品。 Section IV 讨论时尚推荐的工作,特别是时尚单品匹配和创新推荐系统。
Fashion Detection 时尚检测
Fashion detection [7] is the automated analysis and recognition of various fashion-related attributes, elements, and categories within images. The primary objective of fashion detection in clothing is to enable automated analysis, understanding, and interpretation of fashion-related elements, contributing to a range of applications.
时尚检测 [7] 是对图像中各种时尚相关属性、元素和类别的自动分析和识别。服装时尚检测的主要目标是实现对时尚相关元素的自动分析、理解和解释,从而促进一系列应用。
The improvement of the detection model in fashion design plays an important role in enabling the efficient analysis of extensive datasets comprising fashion images. By accurately detecting and classifying various fashion elements, including clothing items, styles, and patterns, the reliance on manual identification by designers is alleviated. This improvement not only enhances productivity but also grants designers the freedom to devote more time to creative pursuits such as ideation and design. Hence, the pursuit of fashion detection improvement bears significant importance within the industry. However, traditional detection methods [17], [18], [19], developed over several decades, exhibit certain limitations. These methods typically involve a three-step process encompassing region suggestion, feature extraction, and classification and regression. Yet, they are plagued by computationally intensive calculations during the region suggestion stage and are confined in their ability to capture nuanced, high-level features during feature extraction. Moreover, the process’s division into distinct stages renders the search for a global optimal solution unviable. Consequently, substantial improvements in fashion detection are imperative to propel advancements in the field of fashion design, addressing the aforementioned shortcomings and opening avenues for more efficient and accurate analysis of fashion imagery.
时装设计中检测模型的改进在实现对包含时装图像的广泛数据集的有效分析方面发挥着重要作用。通过准确检测和分类各种时尚元素,包括服装、款式和图案,减轻了设计师对手动识别的依赖。这一改进不仅提高了生产力,还使设计师能够自由地将更多时间投入到创意和设计等创意追求中。因此,追求时尚检测的改进在行业内具有重要意义。然而传统的检测方法 [17] , [18] , [19] 经过几十年的发展,表现出一定的局限性。这些方法通常涉及三步过程,包括区域建议、特征提取以及分类和回归。然而,它们在区域建议阶段受到计算密集型计算的困扰,并且在特征提取过程中捕获细致入微的高级特征的能力受到限制。此外,该过程被划分为不同的阶段,使得寻找全局最优解变得不可行。因此,时尚检测方面的实质性改进对于推动时尚设计领域的进步、解决上述缺点并为更有效、更准确地分析时尚图像开辟途径至关重要。
We divided this section into three subsections: landmark detection, fashion parsing and item retrieval. Fashion detection is to accurately identify and localize landmarks on a user’s body for precise mapping onto virtual clothing items. Fashion parsing is the process of analyzing and segmenting images to extract fine-grained information about clothing. And item retrieval combines historical data, visual content, and fashion attributes to provide personalized and accurate recommendations. For each subsection, we will first introduce their concept and processing, and then share some improvement within the past few years.
我们将本节分为三个小节:地标检测、时尚解析和项目检索。时尚检测是准确识别和定位用户身体上的标志,以精确映射到虚拟服装上。时尚解析是对图像进行分析和分割以提取有关服装的细粒度信息的过程。商品检索结合历史数据、视觉内容和时尚属性,提供个性化、准确的推荐。对于每一小节,我们将首先介绍它们的概念和处理,然后分享过去几年的一些改进。
A. Landmark Detection A. 地标检测
1) Overview 1)概述
Landmark detection [8] is a crucial component in fashion detection, serving the purpose of accurately identifying and localizing landmarks on a user’s body for precise mapping onto virtual clothing items. This process plays a pivotal role in generating realistic depictions of how garments would fit and appear on individuals, especially in virtual fitting rooms. By comprehending body shape and proportions through landmarks such as shoulders and hips, landmark detection significantly enhances the virtual try-on experience. It facilitates the seamless integration of virtual garments with the user’s body, resulting in immersive virtual shopping interactions and advancing the field of fashion design. Figure 2 is a simple illustration of landmark detection process. The visibility of landmarks would be determined firstly. And through the convolutional neural networks (CNN) model, the local feature maps around the landmark location could be obtained. Then they were aggregated to produce the final feature maps. The final feature maps then went to decoding, and we can estimate the classes and attributes of clothes.
地标检测 [8] 是时尚检测的重要组成部分,其目的是准确识别和定位用户身体上的地标,以便精确映射到虚拟服装上。这个过程在生成服装如何合身和穿在个人身上的真实描述方面发挥着关键作用,尤其是在虚拟试衣间中。通过肩膀和臀部等标志来了解身体形状和比例,标志检测显着增强了虚拟试穿体验。它促进了虚拟服装与用户身体的无缝集成,从而产生身临其境的虚拟购物交互,并推进了时装设计领域。 Figure 2 是地标检测过程的简单说明。首先确定地标的可见度。并通过卷积神经网络(CNN)模型,可以获得地标位置周围的局部特征图。然后将它们聚合以生成最终的特征图。最终的特征图然后进行解码,我们可以估计衣服的类别和属性。
2) Development 2)发展
Landmark detection emerged as a concept proposed by Liu et al. [8] to predict crucial positions of fashion items, such as necklines corners, hemlines, and cuffs, and facilitate fashion clothing retrieval. This approach has found extensive usage within the fashion industry, effectively addressing challenges associated with capturing clothing features. Researchers have undertaken significant endeavors to enhance the applicability of landmark detection in fashion analysis. Wang et al. [20] introduced a fashion grammar model that combines the learning capabilities of neural networks with domain-specific grammars, aiming to tackle issues related to fashion landmark localization and clothing category classification. This model successfully captures kinematic and symmetric relationships between clothing landmarks, yielding improved accuracy in landmark localization.
地标检测是由 Liu 等人提出的概念。 [8] 预测时尚单品的关键位置,例如领口角、下摆和袖口,并方便时尚服装检索。这种方法在时尚行业得到了广泛的应用,有效解决了捕捉服装特征相关的挑战。研究人员付出了巨大的努力来增强标志性检测在时尚分析中的适用性。王等人。 [20] 引入了一种时尚语法模型,将神经网络的学习能力与特定领域的语法相结合,旨在解决与时尚地标定位和服装类别分类相关的问题。该模型成功捕获了服装地标之间的运动学和对称关系,从而提高了地标定位的准确性。
In addition to the fashion grammar model, researchers have proposed innovative methods to further refine landmark detection in fashion analysis. Huang et al. [21] presented a novel deep end-to-end architecture based on part affinity fields (PAFs) for landmark localization. This method employs a stack of convolution and deconvolution layers to generate initial probabilistic maps of landmark locations, which are subsequently refined through the exploitation of associations between landmark locations and orientations. Notably, this approach yields notable enhancements in clothing category and attribute prediction. Moreover, Lee et al. [22] introduced a global-local embedding module into landmark detection, effectively predicting landmark heatmaps and leveraging comprehensive contextual knowledge of clothing. This module adeptly handles challenges posed by substantial variations and non-rigid deformations in clothing images, significantly improving the accuracy of landmark detection.
除了时尚语法模型之外,研究人员还提出了创新方法来进一步完善时尚分析中的标志检测。黄等人。 [21] 提出了一种基于部分亲和力场(PAF)的新颖的深度端到端架构,用于地标定位。该方法采用一堆卷积层和反卷积层来生成地标位置的初始概率图,随后通过利用地标位置和方向之间的关联来细化这些概率图。值得注意的是,这种方法在服装类别和属性预测方面产生了显着的增强。此外,李等人。 [22] 将全局局部嵌入模块引入地标检测中,有效地预测地标热图并利用全面的服装上下文知识。该模块巧妙地应对了服装图像中大幅变化和非刚性变形带来的挑战,显着提高了地标检测的准确性。
These advancements in landmark detection techniques have made significant contributions to the field of fashion analysis, facilitating precise and reliable identification of fashion landmarks within images. Through the integration of neural networks, domain-specific grammars, and the exploitation of contextual knowledge, researchers have achieved remarkable progress in enhancing the accuracy and efficacy of fashion landmark detection. Ultimately, these advancements deepen the detection and analysis of fashion items.
地标检测技术的这些进步为时尚分析领域做出了重大贡献,促进了图像中时尚地标的精确可靠识别。通过神经网络、特定领域语法的集成以及上下文知识的利用,研究人员在提高时尚地标检测的准确性和有效性方面取得了显着进展。最终,这些进步加深了时尚单品的检测和分析。
B. Fashion Parsing B. 时尚解析
1) Overview 1)概述
Fashion parsing is the computational procedure of analyzing and segmenting images or videos in order to extract fine-grained information pertaining to clothing and fashion-related elements. This intricate process entails the identification and categorization of diverse fashion attributes, encompassing garment types, collars, necklines, patterns, and textures, among others. By effectively unraveling the intricacies of fashion representations, parsing facilitates a deeper understanding and organization of fashion-related data, thereby significantly contributing to the advancement of fashion detection and optimization endeavors. The accurate parsing of fashion imagery yields valuable insights that find application in various domains, including personalized styling, fashion recommendation systems, and trend analysis. The procedure of fashion parsing is as shown in figure 3. First the model would use a pre-trained backbone to extract features from the input image. After decoding, the recovered features are then used for the contour prediction of the person in the edge branch and the person segmentation in the parsing branch and we can get the prediction result of parsing.
时尚解析是分析和分割图像或视频的计算过程,以提取与服装和时尚相关元素有关的细粒度信息。这个复杂的过程需要对不同时尚属性进行识别和分类,包括服装类型、衣领、领口、图案和纹理等。通过有效地揭示时尚表征的复杂性,解析有助于更深入地理解和组织时尚相关数据,从而显着促进时尚检测和优化工作的进步。对时尚图像的准确解析产生了宝贵的见解,可应用于各个领域,包括个性化造型、时尚推荐系统和趋势分析。时尚解析的流程如图所示 figure 3 。首先,模型将使用预先训练的主干从输入图像中提取特征。解码后,将恢复的特征用于边缘分支中的人物轮廓预测和解析分支中的人物分割,即可得到解析的预测结果。
2) Development 2)发展
Since Yamaguchi et al. [9] put forward fashion parsing, it quickly became popular among fashion detection. During the fashion parsing, clothing labels could be predicted according to to body parts. Then Yamaguchi et al. [23] proposed a retrieval-based approach. The process involved retrieving similar images from a parsed dataset based on a given image, and then transferring the nearest-neighbor parsing to the final result using dense matching. However, the initial fashion parsing method only worked under the constrained condition, which means tags should be given to indicate the clothing items. So Yamaguchi et al. [23] also combined the model with pre-trained global models for clothing items, dynamically learning local models from retrieved examples, transferring parse mask predictions from retrieved examples to the query image, and incorporating iterative label smoothing for enhanced output coherence. In this case, labels do not need to be defined in advance, but can be extracted directly from the labels of the pre-trained model, which provides a new idea.
由于山口等人。 [9] 时尚解析的提出,迅速在时尚检测中流行起来。在时尚解析过程中,可以根据身体部位预测服装标签。然后山口等人。 [23] 提出了一种基于检索的方法。该过程涉及根据给定图像从解析数据集中检索相似图像,然后使用密集匹配将最近邻解析转移到最终结果。然而,最初的时尚解析方法仅在约束条件下有效,这意味着应该给出标签来指示服装项目。所以山口等人。 [23] 还将该模型与服装项目的预训练全局模型相结合,从检索到的示例中动态学习局部模型,将解析掩模预测从检索到的示例传输到查询图像,并合并迭代标签平滑以增强输出一致性。在这种情况下,标签不需要提前定义,而是可以直接从预训练模型的标签中提取,这提供了一种新的思路。
But when there were no or few tags and annotations for the clothing, i.e. under the unconstrained condition, the above approaches could not work well as expected. To make fashion parsing model work better under the unconstrained condition, Liang et al. [24] introduced a joint image segmentation and labeling approach which consists of two phases of inference: image co-segmentation and region co-labeling. Image co-segmentation is for extracting distinguishable clothing regions, and region co-labeling is for recognizing garment items. The approach helps overcome the problem of severe occlusions between clothing items and human bodies, successfully enables precisely locating the region of interest for the query. Liang et al. [25] also combined fashion parsing with CNN model to capture the complex correlations between clothing appearance and structure. Liang et al. [26] then built am improved model called contextualized CNN (Co-CNN), which aimed at simultaneously capture the cross-layer context and global image-level context to improve the accuracy of parsing results.
但当服装没有或很少有标签和注释时,即在无约束条件下,上述方法就不能达到预期的效果。为了使时尚解析模型在无约束条件下更好地工作,Liang等人。 [24] 引入了联合图像分割和标记方法,该方法由两个推理阶段组成:图像共同分割和区域共同标记。图像联合分割用于提取可区分的服装区域,区域联合标记用于识别服装项目。该方法有助于克服服装与人体之间严重遮挡的问题,成功地精确定位查询的感兴趣区域。梁等人。 [25] 还将时尚解析与CNN模型相结合,捕捉服装外观和结构之间的复杂关联。梁等人。 [26] 然后建立了一个名为上下文化CNN(Co-CNN)的改进模型,旨在同时捕获跨层上下文和全局图像级上下文,以提高解析结果的准确性。
Thanks to the import of CNN, fashion parsing model was optimized. Ruan et al. [27] used Mask R-CNN [28] to achieve multi-person parsing. And Liu et al. [29] proposed Matching CNN (M-CNN), which solved the issue of human parsing methods depended on the hand-designed pipelines composed of multiple sequential components. Obviously, CNN model greatly helped solved some fashion parsing related problems and it deserved further work.
得益于CNN的导入,优化了时尚解析模型。阮等人。 [27] 使用Mask R-CNN [28] 实现多人解析。和刘等人。 [29] 提出了匹配CNN(M-CNN),它解决了人类解析方法依赖于由多个顺序组件组成的手工设计管道的问题。显然,CNN 模型极大地帮助解决了一些时尚解析相关的问题,值得进一步研究。
C. Item Retrieval C. 物品检索
1) Overview 1)概述
In order to cater to the diverse preferences of consumers in clothing purchases, it is necessary to incorporate personalized recommendations based on their past searches and feedback. This integration of historical data has the potential to significantly enhance the efficiency of the search process. Item retrieval is a widely employed technique that allows for the searching and retrieval of visually similar items or images by leveraging their visual content. By combining item retrieval with fashion detection, a more comprehensive approach can be achieved. To recommend appropriate clothing for users, recall and sort play an important role. Recall is the ability of the retrieval system to find and retrieve all relevant fashion items that are similar to the query image. A high recall indicates that the system can successfully identify a significant portion of similar items, ensuring that relevant results are not missed. For example, if a user queries for a specific dress, high recall means the system can find most similar dresses in the database. After that, the system will sort those clothes based on how similar the search results are to the queried image. The items with the highest similarity or relevance are ranked at the top of the list, while less relevant or dissimilar items are ranked lower. To make item retrieval better fit the user’s need, some studies explored the incorporation of deep learning methodologies with item retrieval, yielding notable advancements in the effectiveness of personalized recommendations.
为了迎合消费者在服装购买方面的多样化偏好,有必要根据他们过去的搜索和反馈纳入个性化推荐。这种历史数据的整合有可能显着提高搜索过程的效率。项目检索是一种广泛采用的技术,它允许通过利用视觉内容来搜索和检索视觉上相似的项目或图像。通过将项目检索与时尚检测相结合,可以实现更全面的方法。为了给用户推荐合适的服装,回忆和排序发挥着重要作用。召回率是检索系统查找并检索与查询图像相似的所有相关时尚商品的能力。高召回率表明系统可以成功识别大部分相似项目,确保不会遗漏相关结果。例如,如果用户查询一件特定的衣服,高召回率意味着系统可以在数据库中找到最相似的衣服。之后,系统将根据搜索结果与查询图像的相似程度对这些衣服进行排序。具有最高相似性或相关性的项目排名在列表的顶部,而不太相关或不相似的项目排名较低。为了使项目检索更好地满足用户的需求,一些研究探索了深度学习方法与项目检索的结合,在个性化推荐的有效性方面取得了显着的进步。
2) Development 2)发展
The introduction of the item retrieval method has swiftly gained popularity in the recognition of clothing trends, layering techniques, body shape analysis, and postures. Li et al. [10] proposed a two-step approach for cross-scenario retrieval of clothing items and fine-grained clothing style recognition. Firstly, they introduced a hierarchical super-pixel merging algorithm based on semantic segmentation to obtain intact query clothing items. Secondly, to address the challenges of clothing style recognition across different scenarios, they employed sparse coding based on domain-adaptive dictionary learning to enhance the accuracy of the classifier and adaptability of the dictionary. By leveraging the acquired fine-grained attributes of the clothing items and utilizing matching scores, the retrieval results can be re-ranked, optimizing the effectiveness of the item retrieval process.
项目检索方法的引入在服装趋势、分层技术、体型分析和姿势的识别中迅速流行起来。李等人。 [10] 提出了一种跨场景服装检索和细粒度服装风格识别的两步方法。首先,他们引入了基于语义分割的分层超像素合并算法,以获得完整的查询服装项目。其次,为了解决跨场景服装风格识别的挑战,他们采用基于领域自适应字典学习的稀疏编码来增强分类器的准确性和字典的适应性。通过利用获取的服装项目的细粒度属性并利用匹配分数,可以对检索结果进行重新排序,从而优化项目检索过程的有效性。
In order to optimize fashion image retrieval, Goenka et al. [30] proposed the FashionVLP model,which was based on feedback model. This model consists of two parallel blocks: one for processing the reference image and feedback, and the other for processing target images. This approach effectively fuses target image features without relying on text or transformer layers, resulting in increased efficiency in recognition tasks.
为了优化时尚图像检索,Goenka 等人。 [30] 提出了基于反馈模型的FashionVLP模型。该模型由两个并行块组成:一个用于处理参考图像和反馈,另一个用于处理目标图像。这种方法有效地融合了目标图像特征,而不依赖于文本或转换器层,从而提高了识别任务的效率。
The advancements in deep learning have greatly influenced the development of item retrieval by leveraging deep neural network architectures. Huang et al. [31] presented the Dual Attribute-aware Ranking Network (DARN) to capture comprehensive features. During the feature learning phase, semantic attributes and visual similarity constraints are embedded, while addressing inter-domain differences. Ak et al. [32] also utilized the structure of pooling layers and proposed the FashionSearchNet model, which exploits attribute activation maps to learn region-specific attribute representations. This approach enhances the understanding of regions within fashion images and enables the retrieval of fashion items based on specific attributes.
深度学习的进步极大地影响了利用深度神经网络架构的项目检索的发展。黄等人。 [31] 提出了双属性感知排名网络(DARN)来捕获综合特征。在特征学习阶段,嵌入语义属性和视觉相似性约束,同时解决域间差异。阿克等人。 [32] 还利用池化层的结构并提出了 FashionSearchNet 模型,该模型利用属性激活图来学习特定于区域的属性表示。这种方法增强了对时尚图像中区域的理解,并能够根据特定属性检索时尚商品。
These research showed that item retrieval would no doubt improve the accuracy, efficiency, and effectiveness of fashion detection processing.
这些研究表明,项目检索无疑会提高时尚检测处理的准确性、效率和有效性。
Fashion Synthesis 时尚综合
A. Image-Guided Synthesis
A. 图像引导合成
1) Overview 1)概述
Due to the advancement of the generative models such as Generative Adversarial Networks (GANs) [1], [33] and Diffusion Models [3], we have witnessed a rapid development in the field of image processing [2], [34]. These sophisticated models have revolutionized the way we generate, manipulate, and enhance images, enabling unprecedented levels of realism and creativity.
由于生成模型(例如生成对抗网络(GAN))的进步 [1] , [33] 和扩散模型 [3] ,我们见证了图像处理领域的快速发展 [2] , [34] 。这些复杂的模型彻底改变了我们生成、操作和增强图像的方式,实现了前所未有的真实感和创造力。
Some of these models have been applied in the fashion industry to assist designers in creating and improving fashion items. We categorize them based on the different types of input, including image-guided, sketch-guided, text-guided, and multimodal-guided models. These models provide valuable guidance and inspiration to fashion designers, allowing them to leverage the power of artificial intelligence in their creative process. The following image illustrates the process of fashion synthesis using various inputs.
其中一些模型已应用于时尚行业,帮助设计师创造和改进时尚单品。我们根据不同类型的输入对它们进行分类,包括图像引导、草图引导、文本引导和多模态引导模型。这些模型为时装设计师提供了宝贵的指导和灵感,使他们能够在创作过程中利用人工智能的力量。下图说明了使用各种输入进行时尚合成的过程。
Most of the image-guided fashion synthesis models are based on fashion style transfer which is a branch of neural style transfer. Fashion style transfer involves taking a fashion style image as the target and generating clothing images that embody the desired fashion styles. The main challenge in Fashion Style Transfer is to preserve the underlying design of the input clothing while seamlessly integrating the desired style patterns. Generative Adversarial Network (GAN) [33] and diffusion models have had significant development and impact in the field of image style transfer.
大多数图像引导的时尚合成模型都是基于时尚风格迁移,这是神经风格迁移的一个分支。时尚风格迁移是以时尚风格图像为目标,生成体现所需时尚风格的服装图像。时尚风格迁移的主要挑战是保留输入服装的底层设计,同时无缝集成所需的风格图案。生成对抗网络(GAN) [33] 扩散模型在图像风格迁移领域取得了重大发展和影响。
Goodfellow et al. [33] proposed a new framework for estimating generative models via an adversarial process, in which they simultaneously trained two models: a generative model
古德费洛等人。 [33] 提出了一个通过对抗过程估计生成模型的新框架,其中他们同时训练了两个模型:生成模型 G
捕获数据分布和判别模型 D
估计样本来自训练数据而不是 G 的概率
。 G的训练过程
就是最大化D的概率
犯错误。该框架对应于极小极大两人游戏。 GAN 已成功应用于图像合成,在生成真实且多样化的数据方面显示出巨大潜力。以下时装设计作品基于生成对抗网络。
Many of the GAN-based fashion design methods are inspired by the first NST algorithm proposed by Gatys et al. [1], [33]. They construct the content component of the output image by penalizing the difference between high-level representations obtained from the content input images and output images. Additionally, they create the style component by aligning the Gram-based summary statistics of the style input images and output images. As for the task of image-guided fashion synthesis, the models are given appearance input images
许多基于 GAN 的时装设计方法都受到 Gatys 等人提出的第一个 NST 算法的启发。 [1] , [33] 。他们通过惩罚从内容输入图像和输出图像获得的高级表示之间的差异来构建输出图像的内容组件。此外,他们通过对齐样式输入图像和输出图像的基于 Gram 的摘要统计来创建样式组件。对于图像引导时尚合成的任务,模型被给予外观输入图像 I a 代表纹理、图案、颜色等和结构输入图像 I s 代表服装的轮廓和类型。
Denoising diffusion probabilistic model (DDPM) [3] is another generative modeling approach based on diffusion processes, aimed at modeling and generating high-dimensional data. It iteratively updates the conditional density using random noise to progressively generate realistic samples. DDPM introduces denoising priors to further improve the quality of generated samples. It is capable of generating high-quality samples, handling complex data, and finds extensive applications across multiple domains. DDPM has provided new possibilities for the advancement of the fashion design field.
去噪扩散概率模型(DDPM) [3] 是另一种基于扩散过程的生成建模方法,旨在建模和生成高维数据。它使用随机噪声迭代更新条件密度,以逐步生成真实样本。 DDPM引入去噪先验以进一步提高生成样本的质量。它能够生成高质量的样本、处理复杂的数据,并在多个领域找到广泛的应用。 DDPM为服装设计领域的进步提供了新的可能。
The development of GANs and diffusion models has had a profound impact on the field of image style transfer. They provide powerful tools and methods for designers, artists, and researchers to generate images with unique styles and artistic qualities. These techniques not only offer new possibilities for image generation and style conversion but also hold innovative potential in domains such as digital art, design, and visual effects. Additionally, they have stimulated research and development in computer vision and deep learning, advancing the understanding and exploration of image generation and processing. Therefore, the development of the aforementioned two technologies has created the technological background for the emergence of Image-guided Synthesis Models.
GAN 和扩散模型的发展对图像风格迁移领域产生了深远的影响。它们为设计师、艺术家和研究人员提供了强大的工具和方法来生成具有独特风格和艺术品质的图像。这些技术不仅为图像生成和风格转换提供了新的可能性,而且在数字艺术、设计和视觉效果等领域具有创新潜力。此外,他们还促进了计算机视觉和深度学习的研究和开发,促进了对图像生成和处理的理解和探索。因此,上述两项技术的发展为图像引导合成模型的出现创造了技术背景。
2) Development 2)发展
The ground breaking work in this area was proposed by Jiang et al. [11] on the basis of GAN. They are the first one applying artificial intelligence to automatically generate fashion style images. They proposed an end-to-end feed-forward neural network consisting of a fashion style generator
Jiang等人提出了该领域的开创性工作。 [11] 基于GAN。他们是第一个应用人工智能自动生成时尚风格图像的人。他们提出了一个端到端的前馈神经网络,由时尚风格生成器 G 组成
和判别器 D
。由鉴别器计算的全局和基于补丁的样式和内容损失交替反向传播到生成器网络以进行优化。在全局优化阶段,保留服装的形状和设计,而局部优化阶段则保留详细的款式图案。
Although this model has a better effect compared to the previous related global or patch based neural style transfer works, it still has some limitations. When the texture in the input image is not prominent, the output results may be less satisfactory. Also, the network may preserve some of the original colors from the content image and the resolution of the generated clothing is relatively low.
尽管与之前相关的全局或基于补丁的神经风格迁移工作相比,该模型具有更好的效果,但它仍然存在一些局限性。当输入图像中的纹理不突出时,输出结果可能不太令人满意。此外,网络可能会保留内容图像中的一些原始颜色,并且生成的服装的分辨率相对较低。
Then, Jiang et al. [35] proposed the FashionG framework also on the basis of GAN for single-style generation and proposed the SC-FashionG framework for for mix-and-match style generation. FashionG includes a generator and a discriminator. It is worth noting that for SC-FashionG, they incorporated a spatial segmentation mask into the input channels to ensure that each style is exclusively assigned to particular regions. This process involves two stages: offline training and online generation. Inputs for the offline stage consist of content images, two style images and two additional channels which are opposite up-down and down-up spatial masks. They are used to guide that the one style is transferred onto the upper part of the output and the other style is transferred onto the bottom part. At this time, the SC-FashionG training framework calculates spatially constrained patch and global reconstruction losses. In the online generation stage, for an input clothing image and an arbitrary spatial mask, they generate outputs with the offline trained generator
然后,江等人。 [35] 同样在 GAN 的基础上提出了用于单一风格生成的 FashionG 框架,并提出了用于混合搭配风格生成的 SC-FashionG 框架。 FashionG 包括生成器和鉴别器。值得注意的是,对于 SC-FashionG,他们将空间分割掩模合并到输入通道中,以确保每种风格都专门分配给特定区域。这个过程涉及两个阶段:离线训练和在线生成。离线阶段的输入包括内容图像、两个风格图像和两个与上下和上下空间掩模相反的附加通道。它们用于引导一种样式转移到输出的上部,而另一种样式转移到输出的底部。此时,SC-FashionG 训练框架计算空间约束补丁和全局重建损失。在在线生成阶段,对于输入的服装图像和任意空间掩模,他们使用离线训练的生成器 G 生成输出
。通过这种方式,该框架可以生成混合搭配的时尚风格输出。
Sbai et al. [12] introduced a specific conditioning of GANs on texture and shape elements for generating fashion design images.They tried different Generative Adversarial Networks architectures, novel loss functions to encourage creativity. Moreover, they put together an evaluation protocol associating automatic metrics and human experimental studies to evaluate the results. Although the generated clothing closely resembles designs created by human designers rather than computers, the generation process is relatively random and lacks a specific design direction.
斯拜等人。 [12] 引入了 GAN 对纹理和形状元素的特定条件,以生成时装设计图像。他们尝试了不同的生成对抗网络架构和新颖的损失函数来鼓励创造力。此外,他们制定了一个评估协议,将自动指标和人体实验研究联系起来,以评估结果。尽管生成的服装与人类设计师而不是计算机创建的设计非常相似,但生成过程相对随机并且缺乏特定的设计方向。
Huang et al. [2] proposed a GAN-based method for multimodal unsupervised image-to-image translation called Multimodal Unsupervised Image-to-image Translation (MUNIT). It can generate output images with similar content but different styles from the input images in two categories. It was not invented for fashion design, but it can still be applied in this field by training the model with a set of prepared fashion design images. Then, by inputting a design image into the MUNIT model and adjusting the style code, we can achieve transformations to other styles. This allows designers to quickly explore variations in fashion designs with different styles, textures, and patterns.
黄等人。 [2] 提出了一种基于 GAN 的多模态无监督图像到图像转换方法,称为多模态无监督图像到图像转换(MUNIT)。它可以生成与两类输入图像内容相似但风格不同的输出图像。它不是为时装设计而发明的,但仍然可以通过使用一组准备好的时装设计图像来训练模型来应用于该领域。然后,通过将设计图像输入到MUNIT模型中并调整样式代码,我们可以实现到其他样式的转换。这使得设计师能够快速探索具有不同风格、纹理和图案的时装设计的变化。
Apart from this, more projects based on GAN adapted to fashion design area appears. Yan et al. [36] proposed a texture and shape disentangled generative adversarial network (TSD-GAN) to perform well and creative design with the transformation of texture and shape in fashion items.
除此之外,还有更多基于GAN的、适应时尚设计领域的项目出现。严等人。 [36] 提出了一种纹理和形状解缠生成对抗网络(TSD-GAN),通过时尚单品中纹理和形状的转变来执行良好的创意设计。
In the TSD-GAN, an FAEnc module is designed to disentangle the input image features into texture and shape representations. They proposed TMNet and SMNet modules to decompose the texture and shape features into hierarchical representations to capture coarse and fine styles. Their MFGen module aims to utilize these hierarchical representations to synthesize mixed-style fashion items. A Fusionblock module learns the mapping relationship between texture and shape representations. Additionally, a fashion attributes discriminator predicts real-or-fake distributions, while a patch discriminator calculates pixel-wise texture similarity.
在 TSD-GAN 中,FAEnc 模块旨在将输入图像特征分解为纹理和形状表示。他们提出了 TMNet 和 SMNet 模块,将纹理和形状特征分解为层次表示,以捕获粗略和精细的风格。他们的 MFGen 模块旨在利用这些分层表示来合成混合风格的时尚单品。 Fusionblock 模块学习纹理和形状表示之间的映射关系。此外,时尚属性鉴别器预测真假分布,而补丁鉴别器则计算像素级纹理相似度。
Based on shape and texture codes interpolation and principal component analysis [37], the TSD-GAN method assists designers in quickly generating multiple different clothing design options without altering the overall design style and texture. Designers can manipulate the variations in the shape and texture of the clothing by adjusting the weights of principal component vectors, without the need for manual editing or redesigning of the garments. This allows designers to quickly realize their creative ideas and explore a wider range of design possibilities.
基于形状和纹理代码插值和主成分分析 [37] ,TSD-GAN方法协助设计师在不改变整体设计风格和质感的情况下,快速生成多种不同的服装设计方案。设计师可以通过调整主成分向量的权重来控制服装形状和纹理的变化,而无需手动编辑或重新设计服装。这使得设计师能够快速实现他们的创意并探索更广泛的设计可能性。
Yan et al. [38] proposed a generative adversarial network with heatmap-guided semantic disentanglement (HSD-GAN) to perform an “intelligent” design with “inspiration” transfer.
严等人。 [38] 提出了一种具有热图引导语义解缠(HSD-GAN)的生成对抗网络,以执行具有“灵感”传递的“智能”设计。
Specifically, Texture Brush utilizes two main techniques: heatmap-guided semantic disentanglement and texture brush. The heatmap-guided semantic disentanglement technique decomposes the semantic information in fashion designs or clothing photos into several heatmaps, each representing a specific texture or color. These heatmaps can be used to control texture transfer and color variations. On the other hand, the texture brush technique applies these textures to another clothing photo, enabling texture transfer. They introduced a semantic disentanglement attention-based encoder to capture the most discriminative regions of input items and disentangle the features into attributes and texture. A generator is developed to synthesize mixed-style fashion items by utilizing them. Additionally, they introduced a heatmap-based patch loss to evaluate the visual-semantic matching degree between the input and generated texture.
具体来说,纹理画笔利用两种主要技术:热图引导的语义解缠和纹理画笔。热图引导的语义解缠技术将时装设计或服装照片中的语义信息分解为多个热图,每个热图代表特定的纹理或颜色。这些热图可用于控制纹理传输和颜色变化。另一方面,纹理画笔技术将这些纹理应用到另一张服装照片上,从而实现纹理转移。他们引入了一种基于语义解缠注意力的编码器来捕获输入项中最具辨别力的区域,并将特征分解为属性和纹理。开发了一种生成器来利用它们来合成混合风格的时尚物品。此外,他们引入了基于热图的补丁损失来评估输入和生成的纹理之间的视觉语义匹配程度。
Yan et al. [39] also proposed a novel intelligent design approach named FadGAN with similar function. FadGAN encodes the source and target images into shared latent vectors and independent attribute vectors using two encoders. During this process, the attribute vectors are separated through an additional attribute encoder, and the inspiration transfer from the source image to the target image is achieved by swapping the attribute vectors. To ensure high-quality fashion attributes in the generated designs, the model utilizes predefined fashion attribute vectors to constrain their consistency with fashion elements. FadGAN consists of an attribute encoder based on Variational Autoencoders (VAEs) [40] and an image generator based on Conditional GANs [41]. During training, it optimizes the entire network by minimizing the reconstruction error and adversarial loss of the image generator and attribute encoder. Ultimately, the model can generate fashion designs with high-quality fashion attributes, aiding designers in faster creation.
严等人。 [39] 还提出了一种具有类似功能的新型智能设计方法,名为 FadGAN。 FadGAN 使用两个编码器将源图像和目标图像编码为共享潜在向量和独立属性向量。在此过程中,通过附加的属性编码器分离属性向量,并通过交换属性向量实现从源图像到目标图像的灵感传递。为了确保生成的设计具有高质量的时尚属性,该模型利用预定义的时尚属性向量来约束它们与时尚元素的一致性。 FadGAN 由基于变分自动编码器 (VAE) 的属性编码器组成 [40] 以及基于条件 GAN 的图像生成器 [41] 。在训练过程中,它通过最小化图像生成器和属性编码器的重建误差和对抗性损失来优化整个网络。最终,该模型可以生成具有高质量时尚属性的时装设计,帮助设计师更快地进行创作。
To sum up, GAN-based methods can be primely used in fashion design to generate new clothes. However, these methods lack control over the appearance and shape of clothes when transferring from non-fashion domain images.
综上所述,基于 GAN 的方法主要可用于时装设计中来生成新衣服。然而,这些方法在从非时尚领域图像转移时缺乏对衣服外观和形状的控制。
Before the advent of diffusion models, GANs had already undergone a significant period of development. As a result, the number of models based on diffusion models as the underlying network architecture is relatively fewer. To deal with new fashion design task aimed to transfer a reference appearance image onto a clothing image while preserving the structure of the clothing image, Cao, Chai, et al. [42] presented a reference-based fashion design with structure-aware transfer by diffusion models called DiffFashion. It can semantically generate new clothes from a provided clothing image and a reference appearance image. Specifically, the method separates the foreground clothing using automatically generated semantic masks conditioned on labels. These masks serve as guidance in the denoising process to preserve the structural information. Additionally, a pretrained vision Transformer (ViT) is utilized to guide both the appearance and structure aspects.
在扩散模型出现之前,GAN 已经经历了一个重要的发展时期。因此,基于扩散模型作为底层网络架构的模型数量相对较少。为了处理新的时装设计任务,旨在将参考外观图像转移到服装图像上,同时保留服装图像的结构,曹,柴等人。 [42] 提出了一种基于参考的时装设计,通过名为 DiffFashion 的扩散模型进行结构感知传输。它可以根据提供的服装图像和参考外观图像在语义上生成新衣服。具体来说,该方法使用以标签为条件的自动生成的语义掩模来分离前景服装。这些掩模充当去噪过程中的指导,以保留结构信息。此外,还利用预训练的视觉 Transformer (ViT) 来指导外观和结构方面。
Compared to the previous work, DiffFashion can generate more realistic images in the fashion design task. However, due to the randomness of diffusion, the mask cannot guarantee good results every time. Better masks need to be generated to improve the task.
与之前的工作相比,DiffFashion 可以在时装设计任务中生成更加真实的图像。但由于扩散的随机性,掩模不能保证每次都有好的效果。需要生成更好的掩模来改进任务。
B. Sketch-Guided Synthesis
B. 草图引导合成
1) Overview 1)概述
Some of the fashion design models are realized by taking sketches as input to create new fashion items or revise existed fashion items. These models can assist fashion designers in quickly and efficiently designing new garments, with excellent human-computer interaction. Designers can fill sketches with different fabric options and make changes to the detailed style of the garments.
一些时装设计模型是通过以草图作为输入来创建新的时装项目或修改现有的时装项目来实现的。这些模型可以帮助时装设计师快速高效地设计新服装,并具有出色的人机交互性。设计师可以用不同的面料选项填充草图,并对服装的细节风格进行更改。
2) Development 2)发展
Isola et al. [43] introduced a method named conditional adversarial networks for image-to-image translation. It achieves impressive results in various image translation tasks including fashion design by transforming a sketch into a photo. The approach involves training a generator and discriminator network simultaneously. The generator network generates images in the desired domain, while the discriminator network distinguishes between real and generated images. It lays the foundation for the subsequent development of the field of image translation for fashion design.
伊索拉等人。 [43] 引入了一种名为条件对抗网络的图像到图像翻译方法。它通过将草图转换为照片,在包括时装设计在内的各种图像翻译任务中取得了令人印象深刻的结果。该方法涉及同时训练生成器和鉴别器网络。生成器网络在所需域中生成图像,而鉴别器网络区分真实图像和生成图像。为服装设计图像翻译领域的后续发展奠定了基础。
Xian et al. [44] are the first to examine texture control in the image synthesis area. They proposed an approach called TextureGAN for controlling fashion image synthesis. It allows users to place a texture patch on a sketch at arbitrary locations and scales to control the desired output texture by training the generative network with a local texture loss in addition to adversarial and content loss. The proposed algorithm is able to generate plausible images that are faithful to user controls. But it cannot be used in more complex scenes.
西安等人。 [44] 是第一个在图像合成领域检查纹理控制的人。他们提出了一种称为 TextureGAN 的方法来控制时尚图像合成。它允许用户在草图上的任意位置和比例放置纹理补丁,通过除了对抗性和内容损失之外还使用局部纹理损失来训练生成网络来控制所需的输出纹理。所提出的算法能够生成忠实于用户控件的合理图像。但不能用于更复杂的场景。
Cui et al. [45] proposed an end-to-end virtual garment display method called FashionGAN based on Conditional GAN. The method requires user input of a fashion sketch and fabric image, enabling quick and automatic display of a virtual garment image that aligns with the input sketch and fabric. Moreover, it can also be extended to contour images and garment images, which further improves the reuse rate of fashion design.
崔等人。 [45] 基于Conditional GAN,提出了一种名为FashionGAN的端到端虚拟服装展示方法。该方法需要用户输入时装草图和织物图像,从而能够快速自动显示与输入草图和织物对齐的虚拟服装图像。而且,还可以扩展到轮廓图像和服装图像,进一步提高了服装设计的复用率。
FashionGAN establishes a bijection relationship between fabric and latent vector so that the latent vector can be explained with fabric information. Moreover, some local losses are added to help generate images with stripe texture. The method can indeed achieve impressive results in fashion design, however, the pattern design is relatively limited, and there is still room for improvement.
FashionGAN建立了织物和潜在向量之间的双射关系,使得潜在向量可以用织物信息来解释。此外,还添加了一些局部损失以帮助生成具有条纹纹理的图像。该方法在服装设计中确实可以取得令人瞩目的效果,但图案设计相对有限,仍有改进的空间。
Dong et al. [46] proposed a novel Fashion Editing Generative Adversarial Network (FE-GAN), which enables users to manipulate fashion images with arbitrary sketches and a few sparse color strokes.
董等人。 [46] 提出了一种新颖的时尚编辑生成对抗网络(FE-GAN),它使用户能够通过任意草图和一些稀疏的颜色笔画来操纵时尚图像。
To achieve realistic interactive results, the FE-GAN incorporates a free-form parsing network that predicts the complete human parsing map, guiding fashion image manipulation and crucially contributing to producing convincing results. Additionally, a foreground-based partial convolutional encoder is developed, and an attention normalization layer is designed for the multiple scales layers of the decoder in the fashion editing network. Compared to previous works, the network demonstrates good performance in fashion editing. However, the editing operations themselves have limited functionality, and the quality of the generated results is highly dependent on the input sketch.
为了实现真实的交互结果,FE-GAN 结合了一个自由形式的解析网络,可以预测完整的人体解析图,指导时尚图像处理,并为产生令人信服的结果做出至关重要的贡献。此外,还开发了基于前景的部分卷积编码器,并为时尚编辑网络中解码器的多个尺度层设计了注意力归一化层。与之前的作品相比,该网络在时尚编辑方面表现出了良好的表现。然而,编辑操作本身的功能有限,并且生成结果的质量高度依赖于输入草图。
Yan et al. [47] introduced a GAN-based AI-driven framework for completing the design of fashion items’ sketches.
严等人。 [47] 推出了基于 GAN 的人工智能驱动框架,用于完成时尚单品草图的设计。
To incorporate different textures into various designed sketches, conditional feature interaction (CFI) is proposed to learn semantic mapping from the sketch to the texture. Two training schemes are developed, namely end-to-end training and divide-and-conquer training, with the latter demonstrating superior performance in terms of compatibility, diversity, and authenticity. However, the proposed network structure has the following limitations: the generation process is relatively random and lacks controllability, and the model cannot generate images based on color ratios.
为了将不同的纹理合并到各种设计的草图中,提出了条件特征交互(CFI)来学习从草图到纹理的语义映射。开发了两种训练方案,即端到端训练和分而治之训练,后者在兼容性、多样性和真实性方面表现出优越的性能。然而,所提出的网络结构存在以下局限性:生成过程相对随机且缺乏可控性,并且模型无法根据颜色比例生成图像。
C. Text-Guided Synthesis C. 文本引导合成
1) Overview 1)概述
With the development of multimodal machine learning techniques, some AI systems are capable of assisting in fashion design based on textual guidance. Designers simply need to input a textual description of the desired garment into the network architecture to obtain a designed outfit. This method is highly convenient to use, but if more complex images are desired, it places a relatively high demand on the user’s descriptive abilities.
随着多模态机器学习技术的发展,一些人工智能系统能够基于文本指导辅助时装设计。设计师只需将所需服装的文本描述输入到网络架构中即可获得设计的服装。这种方法使用起来非常方便,但如果需要比较复杂的图像,对用户的描述能力提出了较高的要求。
2) Development 2)发展
Given an input image of a person and a sentence describing a different outfit, GAN based model put forward by Zhu et al. [48] can dress the person as desired while preserving their pose. They decomposed the complex generative process into two conditional stages. In the first stage, they generated a plausible semantic segmentation map that aligns with the wearer’s pose as a latent spatial arrangement. They incorporated an effective spatial constraint to guide the generation of this map. In the second stage, they employed a generative model with a newly proposed compositional mapping layer to render the final image, considering precise regions and textures conditioned on the generated semantic segmentation map.
给定一个人的输入图像和描述不同服装的句子,Zhu 等人提出的基于 GAN 的模型。 [48] 可以根据需要为人物着装,同时保持其姿势。他们将复杂的生成过程分解为两个条件阶段。在第一阶段,他们生成了一个合理的语义分割图,该图与佩戴者的姿势作为潜在的空间排列相一致。他们纳入了有效的空间约束来指导该地图的生成。在第二阶段,他们采用带有新提出的合成映射层的生成模型来渲染最终图像,并考虑以生成的语义分割图为条件的精确区域和纹理。
However, the results generated are limited by the current database they adopted, for the training set contains images mostly with a plain background.
然而,生成的结果受到他们采用的当前数据库的限制,因为训练集包含的图像大多具有纯背景。
Günel et al. [49] proposed a novel approach called FiLMedGAN, which utilizes feature-wise linear modulation (FiLM) to establish a connection between visual features and natural language representations, enabling their transformation without relying on additional spatial information. The approach adopts a GAN-based architecture that enables users to edit outfit images by inputting different descriptions to generate new outfits. The FiLMedGAN model, incorporating skipping connections and FiLMed residual blocks, achieves excellent performance and meets the desired objectives.
古内尔等人。 [49] 提出了一种名为 FiLMedGAN 的新颖方法,该方法利用特征线性调制(FiLM)在视觉特征和自然语言表示之间建立联系,从而在不依赖额外空间信息的情况下实现它们的转换。该方法采用基于 GAN 的架构,使用户能够通过输入不同的描述来编辑服装图像,以生成新的服装。 FiLMedGAN 模型结合了跳跃连接和 FiLMed 残差块,实现了出色的性能并满足了预期目标。
Zhou et al. [50] presented a method to manipulate the visual appearance like pose and attribute of a person’s image according to natural language descriptions. They presented a novel two-stage pipeline to achieve it. The pipeline first learns to infer a reasonable target human pose based on the description, and then synthesizes an appearance transferred person image according to the text in conjunction with the target pose. However, the project mainly showcases the functionality of changing the color and design of clothing, which is relatively simple and limited in scope.
周等人。 [50] 提出了一种根据自然语言描述来操纵视觉外观(例如人物图像的姿势和属性)的方法。他们提出了一种新颖的两级管道来实现这一目标。该管道首先学习根据描述推断合理的目标人体姿势,然后根据文本结合目标姿势合成外观转移的人物图像。但该项目主要展示改变服装颜色和设计的功能,相对简单且范围有限。
There is a task called composing text and image to image retrieval (CTI-IR), which aims to retrieve images relevant to a query image based on text descriptions of desired modifications. To deal with it, Zhang et al. [51] proposed an end-to-end trainable network based on GAN for simultaneous image generation and CTI-IR. The model presented in this study learns generative and discriminative features for the query image through joint training of a generative model and a retrieval model. It automatically manipulates the visual features of the reference image based on the text description using adversarial learning between the synthesized and target images. Global-local collaborative discriminators and attention-based generators are leveraged to focus on both global and local differences between the query and target images.
有一项称为组合文本和图像到图像检索(CTI-IR)的任务,其目的是根据所需修改的文本描述来检索与查询图像相关的图像。为了解决这个问题,张等人。 [51] 提出了一种基于 GAN 的端到端可训练网络,用于同时生成图像和 CTI-IR。本研究中提出的模型通过生成模型和检索模型的联合训练来学习查询图像的生成和判别特征。它使用合成图像和目标图像之间的对抗性学习,根据文本描述自动操纵参考图像的视觉特征。利用全局-局部协作鉴别器和基于注意力的生成器来关注查询图像和目标图像之间的全局和局部差异。
As a result, the model enhances semantic consistency and fine-grained details in the generated images, which can be utilized for interpretation and empowerment of the retrieval model. The network combines the fields of retrieval and image generation to achieve the effect of fashion design.
因此,该模型增强了生成图像中的语义一致性和细粒度细节,可用于解释和增强检索模型。该网络结合了检索和图像生成领域,达到了服装设计的效果。
D. Multimodal-Guided Synthesis
D. 多模态引导合成
With the advancements in multimodal technologies in the field of AI, fashion design tasks have witnessed significant enhancements, enabling designers to leverage multimodal inputs for enhanced creativity and convenience. The three main modalities of control signals for conditional image synthesis: textual controls, visual controls including image and sketch input, and preservation controls refers to complete the missing parts in an image.
随着人工智能领域多模态技术的进步,时装设计任务得到了显着增强,使设计师能够利用多模态输入来增强创造力和便利性。用于条件图像合成的控制信号的三种主要形式:文本控制、包括图像和草图输入的视觉控制以及指完成图像中缺失部分的保存控制。
Zhang et al. [13] proposed a novel two-stage architecture called M6-UFC, which aims to unify multiple multimodal controls in a universal form for conditional image synthesis. Non-auto regressive generation (NAR) is utilized to improve inference speed, enhance holistic consistency, and support preservation controls. Additionally, a progressive generation algorithm based on relevance and fidelity estimators is designed by the authors to ensure relevance and fidelity.
张等人。 [13] 提出了一种名为 M6-UFC 的新型两级架构,旨在以通用形式统一多个多模态控制,用于条件图像合成。非自动回归生成 (NAR) 用于提高推理速度、增强整体一致性并支持保存控制。此外,作者还设计了一种基于相关性和保真度估计器的渐进生成算法,以确保相关性和保真度。
Fashion Recommendation 时尚推荐
After fashion detection and synthesis, fashion recommendation has emerged as a developing and challenging area. Customers now require the ability to discover desired products tailored to various situations. Over the past fifteen years, an increasing number of highly efficient computer vision algorithms have been developed to address this demand in the market.
继时尚检测和综合之后,时尚推荐已成为一个正在发展和充满挑战的领域。客户现在需要能够发现适合各种情况的所需产品。在过去的十五年里,越来越多的高效计算机视觉算法被开发出来来满足市场的这一需求。
Based on user requirements, recommender systems [52], [53] can be categorized into two types: task-based and input-based recommender systems [4]. Task-based recommender systems are designed for specific scenarios that users intend to engage in, such as parties or trips. These systems assist users in selecting fashion items or generating matching designs that align with the desired scenario. On the other hand, input-based recommender systems cater to users who already possess certain items and seek recommendations for matching items from their existing collection. These systems aid users in selecting a complementary item that pairs well with their current possessions.
根据用户需求,推荐系统 [52] , [53] 可以分为两种类型:基于任务的推荐系统和基于输入的推荐系统 [4] 。基于任务的推荐系统是针对用户想要参与的特定场景而设计的,例如聚会或旅行。这些系统帮助用户选择时尚单品或生成与所需场景相符的匹配设计。另一方面,基于输入的推荐系统迎合已经拥有某些物品并从其现有收藏中寻求匹配物品的推荐的用户。这些系统帮助用户选择与他们当前拥有的物品完美搭配的补充物品。
A. Task-Based Recommender Systems
A. 基于任务的推荐系统
In the realm of fashion recommendations, users frequently make choices based on their individual needs, such as specific activities they plan to participate in.
在时尚推荐领域,用户经常根据个人需求做出选择,例如他们计划参加的特定活动。
Shen et al. [54] have pioneered the development of a Scenario-Oriented recommendation system. This system utilizes the open mind common sense (OMCS) [55], a comprehensive knowledge corpus encompassing people’s everyday common sense. It represents the first technology that offers product recommendations based on real-world user scenarios, encompassing a wide array of themes. Leveraging contextual information and potential product associations, this system assists users in easily identifying the most suitable product, even when they are unsure about specific details. Jagadeesh et al. [56] have introduced two categories of recommenders: deterministic recommenders (DFR) and stochastic recommenders (SFR). Their approach involves extracting fashion insights from vast amounts of online fashion images and their accompanying rich metadata. These recommenders enable the identification of valuable fashion-related patterns and trends.
沉等人。 [54] 率先开发了面向场景的推荐系统。该系统利用开放思维常识(OMCS) [55] ,涵盖人们日常常识的综合知识库。它代表了第一种根据真实用户场景提供产品推荐的技术,涵盖广泛的主题。利用上下文信息和潜在的产品关联,该系统可以帮助用户轻松识别最合适的产品,即使他们不确定具体细节。贾加迪什等人。 [56] 引入了两类推荐器:确定性推荐器(DFR)和随机推荐器(SFR)。他们的方法包括从大量在线时尚图像及其附带的丰富元数据中提取时尚见解。这些推荐器可以识别有价值的时尚相关图案和趋势。
The previously mentioned recommender methods were considered relatively basic, lacking the utilization of neural networks. As consumer needs continue to grow, more advanced and effective recommender models have been proposed.
前面提到的推荐方法被认为是相对基础的,缺乏神经网络的利用。随着消费者需求的不断增长,更先进、更有效的推荐模型被提出。
Huynh et al. [57] introduced a groundbreaking unsupervised learning approach called complementary recommendation using adversarial feature transform (CRAFT). CRAFT builds upon the principles of GANs [33]. However, unlike direct image synthesis, CRAFT focuses on training in the feature space. The genetic converter within CRAFT is capable of generating diverse characteristics through the utilization of random input vectors. The transformed feature vector is then employed to recommend images based on their nearest neighbors in the feature space. By learning the joint distribution of co-occurring visual objects in an unsupervised manner, CRAFT is applied to visual complementary recommendation. Remarkably, this approach does not necessitate the presence of annotations or labels to indicate complementary relationships.
黄等人。 [57] 引入了一种突破性的无监督学习方法,称为使用对抗性特征变换的互补推荐(CRAFT)。 CRAFT 建立在 GAN 的原则之上 [33] 。然而,与直接图像合成不同,CRAFT 侧重于特征空间的训练。 CRAFT 中的遗传转换器能够通过利用随机输入向量生成不同的特征。然后使用变换后的特征向量根据特征空间中的最近邻图像来推荐图像。通过以无监督的方式学习同时出现的视觉对象的联合分布,CRAFT 被应用于视觉互补推荐。值得注意的是,这种方法不需要存在注释或标签来指示互补关系。
Previous recommender systems often relied on recommending fashion items similar to a user’s previous purchases, resulting in a lack of item diversity. To address this challenge, De Divitiis et al. [58] proposed a self-supervised contrastive learning model known as memory augmented neural networks (MANNs). This approach tackles the issue by combining color and shape feature disentanglement. It incorporates external memory modules that store pairing modalities between different types of clothing, such as tops and bottoms. By addressing issues arising from imbalanced data distribution, compact and representative memories are obtained. These memories are then used to expand the common controller loss, facilitating the training of the memory modules. The usage of MANNs with disentangled features and memory augmentation enables the recommender system to recommend a more diverse range of fashion items to users. This approach enhances the overall recommendation quality and overcomes the limitations of previous systems.
以前的推荐系统通常依赖于推荐与用户之前购买的商品类似的时尚商品,导致商品缺乏多样性。为了应对这一挑战,De Divitiis 等人。 [58] 提出了一种称为记忆增强神经网络(MANN)的自监督对比学习模型。这种方法通过结合颜色和形状特征解开来解决这个问题。它包含外部存储模块,可存储不同类型服装(例如上衣和下装)之间的搭配方式。通过解决数据分布不平衡引起的问题,获得紧凑且具有代表性的存储器。然后,这些存储器用于扩展公共控制器损耗,从而促进存储器模块的训练。使用具有解缠结特征和记忆增强功能的 MANN 使推荐系统能够向用户推荐更多样化的时尚单品。这种方法提高了整体推荐质量并克服了以前系统的局限性。
Ye et al. [59] introduced a scene-aware fashion recommender system (SAFRS) that takes into account the context of recommendations, which accurately selects outfits based on the scene context provided and completes diverse scene-aware fashion recommendation tasks.
叶等人。 [59] 提出了考虑推荐上下文的场景感知时尚推荐系统(SAFRS),可以根据提供的场景上下文准确选择服装,完成多样化的场景感知时尚推荐任务。
B. Input-Based Recommender Systems
B. 基于输入的推荐系统
Users not only choose fashion items with the use of scenarios, they will also choose according to their own conditions. If they choose in the fashion items they have, the input-based algorithms will come in handy at this time. For early work, Iwata et al. [60] proposed a probabilistic topic model to recommend tops for bottoms by learning information about coordinates from visual features in each fashion item. Given a photo of a fashion item (tops or bottoms) used as a query, the recommender system first detects the face region, and determines the top and bottom regions by assuming that they are were divided in a certain proportion. The segmentation method is rough and far from reality. To meet the user requirements, new input-based algorithms have emerged in recent years. More and more innovative network architectures or data processing methods have been used to better adapt to the needs of users.
用户不仅会根据使用场景来选择时尚单品,还会根据自身条件进行选择。如果他们选择自己拥有的时尚单品,基于输入的算法此时就会派上用场。对于早期工作,岩田等人。 [60] 提出了一种概率主题模型,通过从每个时尚单品的视觉特征中学习坐标信息来推荐上衣和下装。给定用作查询的时尚单品(上衣或下装)的照片,推荐系统首先检测面部区域,并通过假设它们按一定比例划分来确定顶部和底部区域。分割方法比较粗糙,与现实相去甚远。为了满足用户需求,近年来出现了新的基于输入的算法。越来越多创新的网络架构或数据处理方法被用来更好地适应用户的需求。
Lu et al. [61] proposed a learningable personalized anchor embedding (LPAE) method for personalized clothing recommendation and new user analysis. This model uses stacked self-attention mechanism to encode clothing into compact embedded to capture the high-level relationship between fashion items. It uses a set of anchors to model the fashion preferences of each user, and calculate the similarity between the preference score based on the embedded clothing and the user anchor, where the similar encoded fashion items can be retrieved according to the distance in space. This method is in an advanced position in ordinary settings and cold start settings.
卢等人。 [61] 提出了一种可学习的个性化锚嵌入(LPAE)方法,用于个性化服装推荐和新用户分析。该模型使用堆叠式自注意力机制将服装编码为紧凑嵌入,以捕获时尚单品之间的高级关系。它使用一组锚点对每个用户的时尚偏好进行建模,并计算基于嵌入服装的偏好得分与用户锚点之间的相似度,其中可以根据空间距离检索相似的编码时尚项目。该方法在普通设置和冷启动设置中处于先进地位。
Delmas et al. [62] proposed attention-based retrieval with text-explicit matching and implicit similarity (ARTEMIS), a new method for image search with free-form text modifiers. It has two modules focusing on how potential targets fit the textual modifier, and comparing potential target images to the reference image assisted by the text, respectively.
德尔马斯等人。 [62] 提出了基于注意力的文本显式匹配和隐式相似性检索(ARTEMIS),这是一种使用自由格式文本修饰符进行图像搜索的新方法。它有两个模块,分别关注潜在目标如何适应文本修饰符,以及将潜在目标图像与文本辅助的参考图像进行比较。
Among numerous structures, some methods have innovated around some core elements. Below are some representative network architectures or methods.
在众多的结构中,一些方法是围绕一些核心要素进行创新的。以下是一些具有代表性的网络架构或方法。
1) Methods Based on Knowledge Graph
1)基于知识图谱的方法
Zhan et al. [63] proposed an end-to-end personalized outfit recommender system (
詹等人。 [63] 提出了一种端到端的个性化服装推荐系统(A
3
-FKG),它研究了知识图谱在捕获实体之间的连接性方面的使用,并利用多模态信息的互补优势。它包括两级注意力模块,对应于特定于用户的关系感知和目标感知网络,将知识图谱融入时尚领域的推荐系统中。
Dong et al. [64] proposed a new designer-oriented recommender system using knowledge graph to support personalised fashion design, which provides a feedback and self-adjustment mechanism that can the users’ perceptual feedback and adjust its knowledge base automatically.
董等人。 [64] 提出了一种新的面向设计师的推荐系统,利用知识图谱支持个性化时装设计,提供反馈和自我调整机制,可以根据用户的感知反馈并自动调整其知识库。
2) Textual Features 2)文本特征
Facing the challenges of visual understanding and visual matching, Lin et al. [65] proposed a co-supervision learning framework, namely FARM. It captures aesthetic characteristics with the supervision of generation learning, and includes a layer-to-layer matching mechanism to evaluate the matching score. FRAM can deal with the situation that given an image of top and a query bottom item in vector description, where it can generate a image of matching bottom.
面对视觉理解和视觉匹配的挑战,Lin 等人。 [65] 提出了一种共同监督学习框架,即FARM。它在代学习的监督下捕捉审美特征,并包括一个层对层的匹配机制来评估匹配分数。 FRAM可以处理在向量描述中给定顶部图像和查询底部项的情况,它可以生成匹配底部的图像。
Discussion 讨论
A. Marketing and Convenience
A. 营销和便利
AI-generated content (AIGC) [66], [67] is revolutionizing industries, fashion design included. The market potential is enormous, especially in fields like fashion. AIGC meets the demand for captivating content, such as product descriptions and trend analyses, sparking interest among designers and businesses.
人工智能生成的内容 (AIGC) [66] , [67] 正在彻底改变各个行业,包括时装设计。市场潜力巨大,尤其是在时尚等领域。 AIGC 满足了对产品描述和趋势分析等引人入胜的内容的需求,激发了设计师和企业的兴趣。
AIGC’s cost-cutting prowess is impressive. Traditional content creation requires various specialists, but AIGC automates processes, leading to substantial savings. This efficiency lets brands allocate resources strategically. Beyond cost, AIGC brings unprecedented convenience. It generates content on-demand, eliminating delays seen with human content creation. This ensures consistent, timely material flow-a boon for fashion brands chasing real-time trends.
AIGC 的成本削减能力令人印象深刻。传统的内容创建需要各种专家,但 AIGC 可以实现流程自动化,从而节省大量成本。这种效率让品牌可以战略性地分配资源。除了成本之外,AIGC 还带来了前所未有的便利。它按需生成内容,消除了人工内容创建所出现的延迟。这确保了一致、及时的物料流动——这对于追逐实时趋势的时尚品牌来说是一个福音。
In fashion design, AI complements human designers. It offers insights from data, predicts trends, and automates design elements. This blend enhances designers’ capabilities, combining creativity with data-driven insights. In short, AIGC’s success is transformative. It reshapes creativity, efficiency, and convenience across markets, particularly in fashion. The partnership of human and AI promises an innovative future.
在时装设计中,人工智能可以补充人类设计师。它提供数据洞察、预测趋势并自动化设计元素。这种融合增强了设计师的能力,将创造力与数据驱动的见解相结合。简而言之,AIGC 的成功具有变革性。它重塑了整个市场的创造力、效率和便利性,尤其是在时尚领域。人类与人工智能的合作预示着创新的未来。
B. Challenges and Future Trends
B. 挑战和未来趋势
With the development in recent years, we are still facing some challenges in fashion design, especially in terms of model training and user requirements. Here are a few prominent challenges listed.
随着近几年的发展,我们在服装设计方面仍然面临一些挑战,特别是在模型训练和用户需求方面。以下列出了一些突出的挑战。
Data Quality and Diversity: Ensuring access to high-quality and diverse fashion datasets remains a challenge. Developing more comprehensive and representative data sets covering different styles and cultures will improve the performance and inclusiveness of AI models in Fashion design and recommendations. Currently, many datasets are not open-source, leading to situations where resources cannot be shared. This poses challenges for design.
数据质量和多样性:确保访问高质量和多样化的时尚数据集仍然是一个挑战。开发涵盖不同风格和文化的更全面、更具代表性的数据集将提高人工智能模型在时装设计和推荐中的性能和包容性。目前,很多数据集并未开源,导致资源无法共享的情况。这给设计带来了挑战。Real-Time Fashion Synthesis: Improving the efficiency and speed of fashion synthesis models is crucial for real-time design applications. Optimizing algorithms and utilizing hardware advancements can enable designers to interactively explore and iterate design options, thereby simplifying the design process.
实时时尚合成:提高时尚合成模型的效率和速度对于实时设计应用至关重要。优化算法和利用硬件进步可以使设计人员能够交互式地探索和迭代设计选项,从而简化设计过程。Multimodal Integration: Further research is needed to strengthen the integration of multimodal inputs in fashion design and recommender systems. Creating seamless interaction between text, images, sketches, and other forms will enable designers to express their ideas more effectively, resulting in more accurate and personalized output.
多模式整合:需要进一步研究来加强时装设计和推荐系统中多模式输入的整合。在文本、图像、草图和其他形式之间创建无缝交互将使设计师能够更有效地表达他们的想法,从而产生更准确和个性化的输出。Interpretability and Explainability: As artificial intelligence models become increasingly complex, understanding their decision-making process and providing explanations for their recommendations becomes crucial. Developing interpretable and explainable artificial intelligence systems in fashion design and recommendation will increase trust and enable designers and users to understand and modify the generated output.
可解释性和可解释性:随着人工智能模型变得越来越复杂,理解其决策过程并为其建议提供解释变得至关重要。在时装设计和推荐中开发可解释和可解释的人工智能系统将增加信任,并使设计师和用户能够理解和修改生成的输出。Combination with Industry: AIGC has brought a transformative shift to designers’ roles, demanding adaptation. They must embrace data-driven insights and collaborate with AI to stay trend-aligned. Combining human creativity with AI-generated content sparks unique designs and enables rapid prototyping for trend responsiveness.
与行业结合: AIGC 给设计师的角色带来了变革性的转变,需要适应。他们必须接受数据驱动的见解并与人工智能合作以保持趋势一致。将人类创造力与人工智能生成的内容相结合,激发独特的设计,并实现快速原型设计以响应趋势。
The future of fashion design is heading towards a convergence of enhanced personalization and seamless multimodal interfaces. With development in fashion design, personalized experiences will take center stage, as AI systems leverage user preferences, body measurements, and style choices to provide tailored recommendations. Simultaneously, the integration of various input modalities such as text, images, sketches, and virtual reality will enable designers and users to communicate their ideas effortlessly. This fusion of enhanced personalization and seamless multimodal interfaces will empower fashion designers and consumers to co-create unique and personalized designs that perfectly align with individual tastes and preferences. It will revolutionize the fashion industry, fostering a deeper level of engagement, satisfaction, and creativity in the design process. Furthermore, augmented reality is gradually applied in fashion industry to enhance the user experiences [68], where Users can perform some virtual try-on, rather than just viewing items.
时尚设计的未来正在走向增强个性化和无缝多模式界面的融合。随着时尚设计的发展,个性化体验将成为焦点,因为人工智能系统会利用用户偏好、身体测量和风格选择来提供量身定制的建议。同时,文本、图像、草图和虚拟现实等各种输入方式的集成将使设计师和用户能够轻松地交流他们的想法。增强的个性化和无缝多模式界面的融合将使时装设计师和消费者能够共同创造独特的个性化设计,完全符合个人的品味和偏好。它将彻底改变时尚业,在设计过程中培养更深层次的参与度、满意度和创造力。此外,增强现实逐渐应用于时尚行业,提升用户体验 [68] ,用户可以在其中进行一些虚拟试穿,而不仅仅是查看商品。
Conclusion 结论
The survey covered different topics related to fashion design, including fashion detection, synthesis, recommendation, and the use of multimodal inputs.
该调查涵盖了与时装设计相关的不同主题,包括时装检测、合成、推荐和多模式输入的使用。
In the area of fashion detection, computer vision algorithms have been developed to accurately identify fashion items in images or videos. These algorithms help in analyzing fashion trends, understanding consumer preferences, and providing valuable insights to fashion designers and retailers.
在时尚检测领域,计算机视觉算法已被开发出来,可以准确识别图像或视频中的时尚单品。这些算法有助于分析时尚趋势、了解消费者偏好,并为时装设计师和零售商提供有价值的见解。
Fashion synthesis techniques, such as GAN-based models, have emerged as powerful tools for generating new clothing designs. These models can disentangle and manipulate shape, texture, and style representations, allowing designers to explore a wide range of design possibilities quickly and efficiently. Diffusion models, with denoising priors, have also shown promise in generating high-quality and realistic samples.
时尚合成技术(例如基于 GAN 的模型)已成为生成新服装设计的强大工具。这些模型可以解开和操纵形状、纹理和风格表示,使设计师能够快速有效地探索各种设计可能性。具有去噪先验的扩散模型也显示出了生成高质量和真实样本的希望。
Fashion recommendation systems play a crucial role in assisting customers in finding the products they are looking for based on different situations and conditions. Traditional recommendation methods have been enhanced with the integration of computer vision algorithms, probabilistic topic models, and deep learning architectures. These systems can provide personalized recommendations, considering factors such as user preferences, contextual information, and complementary relationships between fashion items.
时尚推荐系统在帮助顾客根据不同的情况和条件找到他们正在寻找的产品方面发挥着至关重要的作用。通过计算机视觉算法、概率主题模型和深度学习架构的集成,传统的推荐方法得到了增强。这些系统可以考虑用户偏好、上下文信息以及时尚单品之间的互补关系等因素,提供个性化推荐。
The survey also highlighted the importance of multimodal inputs in fashion design. Models that utilize textual, visual, and preservation controls have been developed to enable designers to create new garments or revise existing ones more efficiently. Sketch-guided models allow designers to input sketches and generate fashion items accordingly, while text-guided models assist in designing outfits based on textual descriptions.
该调查还强调了多模式输入在时装设计中的重要性。利用文本、视觉和保存控制的模型已经开发出来,使设计师能够更有效地创造新服装或更有效地修改现有服装。草图引导模型允许设计师输入草图并相应地生成时装单品,而文本引导模型则有助于根据文本描述设计服装。
In conclusion, the survey demonstrates the significant progress made in the application of AI and machine learning techniques in fashion design. These advancements have resulted in more efficient and creative design processes, personalized recommendations, and improved user experiences. As the field continues to evolve, further research and development in areas such as self-attention mechanisms, knowledge graphs, and textual features are expected to enhance the capabilities of fashion design and recommendation systems even further.
总之,调查表明人工智能和机器学习技术在时装设计中的应用取得了重大进展。这些进步带来了更高效、更有创意的设计流程、个性化推荐和改善的用户体验。随着该领域的不断发展,自注意力机制、知识图谱和文本特征等领域的进一步研究和开发有望进一步增强时装设计和推荐系统的能力。
ACKNOWLEDGMENT 致谢
(Ziyue Guo, Zongyang Zhu, and Yizhi Li contributed equally to this work.)
(郭子跃、朱宗阳和李一智对这项工作做出了同等贡献。)