avatarMy Manifestation Guru

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

1606

Abstract

de a textual description.</li></ol><figure id="711f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*wRKMknhb71EMFvyI.png"><figcaption><a href="https://paperswithcode.com/dataset/deepfashion">Image Source</a></figcaption></figure><h1 id="7683">Dataset pre-processing</h1><p id="ff17">The following components make up the pre-processing pipeline:</p><ol><li>Align the human body in the image’s center based on the human pose</li><li>Combine the clothes color and fabric annotations into a single texture annotation</li><li>Tidy up the annotations, and apply some image filtering</li><li>Divide the entire dataset into two groups: training and testing.</li></ol><figure id="a9c1"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*MFMOGASf1go9UyhE.png"><figcaption><a href="https://raw.githubusercontent.com/sameer-goel/Text2Human/main/assets/dataset_overview.png">Image Source</a></figcaption></figure><h1 id="7e7b">Model Training</h1><p id="fa54">During model training, started with training the parsing generation network. Then, training the top level of the hierarchical VQ-VAE. (VQ-VAE is Vector Quantized Variational Autoencoder. VQ-VAE was proposed in <a href="https://arxiv.org/abs/1711.00937">Neural Discrete Representation Learning</a>)</p><p id="9ea0">Further, training the sampler with mixture-of-experts. To train the sampler, started with training a model to tokenize the parsing maps. And finally, training the index prediction network.</p><h1 id="b5a2">Results</h1><p id="23f1">You can install the same using the github <a href="https://github.com/sameer-goel

Options

/Text2Human">https://github.com/sameer-goel/Text2Human</a> and run UI demo.</p><figure id="f113"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*nAeZWBAzVrVN49Ty.png"><figcaption><a href="https://raw.githubusercontent.com/sameer-goel/Text2Human/main/assets/ui.png">Image Source</a></figcaption></figure><p id="9c4c">You can select the attributes to customize the desired human images.</p><figure id="a459"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*pySkkWkaJrDmrxel.png"><figcaption><a href="https://raw.githubusercontent.com/sameer-goel/Text2Human/main/assets/results.png">Image Source</a></figcaption></figure><blockquote id="57e1"><p>Kudos to <a href="https://yumingj.github.io/">Yuming Jiang</a>, <a href="https://williamyang1991.github.io/">Shuai Yang</a>, <a href="http://haonanqiu.com/">Haonan Qiu</a>, <a href="https://dblp.org/pid/50/8731.html">Wayne Wu</a>, <a href="https://www.mmlab-ntu.com/person/ccloy/">Chen Change Loy</a> and <a href="https://liuziwei7.github.io/">Ziwei Liu</a> from <a href="https://www.mmlab-ntu.com/index.html">MMLab@NTU</a> affliated with S-Lab, Nanyang Technological University and SenseTime Research.</p></blockquote><h1 id="34a0">More Results</h1><p id="dfbe">Here are some more synthetic images generated from this project.</p><figure id="7677"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*_zjeQ0NsdSHOLMyMCEIj-A.png"><figcaption><a href="https://www.youtube.com/watch?v=yKh4VORA_E0">Image Source</a></figcaption></figure><p id="ef09">I must say, the results were quite interesting.</p></article></body>

Generate Text2Human via Hugging Face 🤗

Human Image Generation using Text

This python library Text2Human generates a person’s image by only providing a textual description about the gender and cloths.

Image Source

Project Specifications

Raw data DeepFashion-MultiModal Pre-processed dataset Pre-trained models Try out web Demo

Raw Dataset

DeepFashion-MultiModal, a large-scale high-quality human dataset with rich multi-modal annotations. It has the following properties:

  1. There are 44,096 high-resolution human photos in all, with 12,701 full-body images.
  2. We manually mark the human parser labels of 24 classes for each complete body picture.
  3. We carefully annotate the key-points on each complete body image.
  4. For each human picture, we extract DensePose.
  5. Each image is manually tagged with clothing shape and texture attributes.
  6. For each photograph, we include a textual description.
Image Source

Dataset pre-processing

The following components make up the pre-processing pipeline:

  1. Align the human body in the image’s center based on the human pose
  2. Combine the clothes color and fabric annotations into a single texture annotation
  3. Tidy up the annotations, and apply some image filtering
  4. Divide the entire dataset into two groups: training and testing.
Image Source

Model Training

During model training, started with training the parsing generation network. Then, training the top level of the hierarchical VQ-VAE. (VQ-VAE is Vector Quantized Variational Autoencoder. VQ-VAE was proposed in Neural Discrete Representation Learning)

Further, training the sampler with mixture-of-experts. To train the sampler, started with training a model to tokenize the parsing maps. And finally, training the index prediction network.

Results

You can install the same using the github https://github.com/sameer-goel/Text2Human and run UI demo.

Image Source

You can select the attributes to customize the desired human images.

Image Source

Kudos to Yuming Jiang, Shuai Yang, Haonan Qiu, Wayne Wu, Chen Change Loy and Ziwei Liu from MMLab@NTU affliated with S-Lab, Nanyang Technological University and SenseTime Research.

More Results

Here are some more synthetic images generated from this project.

Image Source

I must say, the results were quite interesting.

Hugging Face
Text2human
Machine Learning
Ml So Good
Artificial Intelligence
Recommended from ReadMedium