What trousers should i wear this shirt with ? What bag goes well with this dress and these boots?This is a sort of question that would require a fashion brain.
The world’s biggest fashion etailers (ASOS & NetAPorter) show this as the only recommendation on their fashion product page.(How to wear this/Buy the look)
Lets see if we can make an ensemble recommendation engine using Deep Learning
Standard way to find a vector image representation would be fine tune a pre-trained CNN (Inception Model) with fashion tags in a multi-label training environment.
If we can assemble deep tags like neck types and skirt lengths etc, we can create a CNN based tag classification engine and use the fully connected last layer as the image representation.That representation can be used via transfer learning into multiple problems like similarity recommendations.
The problem with such representations is that they contain only the visual cues on the image, i.e that all round neck t-shirts will be closer to each other in that vector space.This does not capture syntactic information or information based on things that co-occur. It does not capture the intrinsic relationship between a white shirt and denim jeans.
A word embedding is a learned representation for text where words that have the same meaning have a similar representation.It is a dense representation of a word in a vocabulary.
Almost all state of the art advances in NLP use the concept of word embeddings.They can be trained using multiple methods.CBOW and Skip gram models are very common.[ReadMore]
The most interesting property about word embeddings is that the word vectors capture many linguistic regularities, for example vector operations vector(‘Paris’) — vector(‘France’) + vector(‘Italy’) results in a vector that is very close to vector(‘Rome’), and vector(‘king’) — vector(‘man’) + vector(‘woman’) is close to vector(‘queen’) [ReadMore]
Word embeddings learn contextual and syntactic information in the language vector space.
Inspired by word embeddings i trained Fashion Embeddings: Dense Representations of fashion images which contain visual + relational(styling) information
Fashion Embeddings
A CNN+CBOW+Triplet Loss model that captures both visual and relational information
Example of ensembles.For each ensemble i have a set individual product images.
Alternatively you can run object detection on instagram fashion images to detect and segment fashion items that are worn together to get trending ensembles.
Two examples with the same label have their embeddings close together in the embedding space.Two examples with different labels have their embeddings far away. [ReadMore]
Collected around 1000 ensembles from ASOS which were created by their stylists as recommendations on their product pages.
A Bucket is a category of fashion items.e.g Shirts,Shoes,Dresses.
Create PQ Index for entire catalog per bucket.Pick an image(A) from a evaluation ensemble.
Ea=VogueNet(A)
Eb,B=IndexSearch(Ea) in BucketBIndex
Ec,C=IndexSearch(VogueNet(Ea,Eb)) in BucketCIndex
Ensemble = {A,B,C}
Calculate Top10 precision for each search assuming ground truth from ASOS ensemble
I feel there is a lot of scope for future work on this and i would love to collaborate with people/startups working on such products.
I am available on LinkedIn
Bibliography:
This story is published in Noteworthy, where thousands come every day to learn about the people & ideas shaping the products we love.
Follow our publication to see more stories featured by the Journal team.