Product categorization/product classification is the organization of products into their respective departments or categories. As well, a large part of the process is the design of the product taxonomy as a whole. Product categorization was initially a text classification task that analyzed the product’s title to choose the appropriate category. However, numerous methods have been developed which take into account the product title, description, images, and other available metadata. The following papers on product categorization represent essential reading in the field and offer novel approaches to tasks. product classification 1. Don’t Classify, Translate In this paper, researchers from the National University of Singapore and the Rakuten Institute of Technology propose and explain a novel machine translation approach to product categorization. The experiment uses the Rakuten Data Challenge and Rakuten Ichiba datasets. Their method translates or converts a product’s description into a sequence of tokens which represent a root-to-leaf path to the correct category. Using this method, they are also able to propose meaningful new paths in the taxonomy. The researchers state that their method outperforms many of the existing classification algorithms commonly used in machine learning today. – Dec. 14, 2018 Published/Last Updated – Maggie Yundi Li (National University of Singapore), Stanley Kok (National University of Singapore), and Liling Tan (Rakuten Institute of Technology) Authors and Contributors [Read Now] 2. Large-Scale Categorization of Japanese Product Titles Using Neural Attention Models The authors of this paper propose attention convolutional neural network (ACNN) models over baseline convolutional neural network (CNN) models and gradient boosted tree (GBT) classifiers. The study uses Japanese product titles taken from Rakuten Ichiba as training data. Using this data, the authors compare the performance of the three methods (ACNN, CNN, and GBT) for large-scale product categorization. While differences in accuracy can be less than 5%, even minor improvements in accuracy can result in millions of additional correct categorizations. Lastly, the authors explain how an ensemble of ACNN and GBT models can further minimize false categorizations. – April, 2017 for EACL 2017 Published/Last Updated – From the Rakuten Institute of Technology: Yandi Xia, Aaron Levine, Pradipto Das Giuseppe Di Fabbrizio, Keiji Shinzato and Ankur Datta Authors and Contributors [Read Now] 3. Atlas: A Dataset and Benchmark for Ecommerce Clothing Product Classification Image via github.com/vumaasha/Atlas Researchers at the University of Colorado and Ericsson Research (Chennai, India) have created a large product dataset known as Atlas. In this paper, the team presents their dataset which includes over 186,000 images of clothing products along with their product titles. Furthermore, they introduce related work in the field that has influenced their study. Finally, they test their dataset using a Resnet34 classification model and a Seq to Seq model to categorize the products. The data is taken from Indian ecommerce stores, so some of the categories used may not be applicable to Western markets. However, the dataset has been open-sourced and is available on Github. – Aug. 19, 2019 Published/Last Updated – Venkatesh Umaashankar (Ericsson Research),  Girish Shanmugam (Ericsson Research), and Aditi Prakash (University of Colorado) Authors and Contributors [Read Now] 4. Large Scale Product Categorization using Structured and Unstructured Attributes In this study, a team at WalmartLabs compares hierarchical models to flat models for product categorization. The researchers employ deep-learning based models which extract features from each product to create a product signature. In the paper, the researchers describe a multi-LSTM and multi-CNN based approach to this extreme classification task. Furthermore, they present a novel way to use structured attributes. The team states that their methods can be scaled to take into account any number of product attributes during categorization. – Mar. 1, 2019 Published/Last Updated – From WalmartLabs: Abhinandan Krishnan and Abilash Amarthaluri Authors and Contributors [Read Now] 5. Multi-Label Product Categorization Using Multi-Modal Fusion Models In this paper, researchers from New York University and U.S. Bank investigate multi-modal approaches to categorize products on Amazon. Their approach utilizes multiple classifiers trained on each type of input data from the product listings. Using a dataset of 9.4 million Amazon products, they developed a tri-modal model for product classification based on product images, titles, and descriptions. Their tri-modal late fusion model retains an F1 score of 88.2%. The findings of their study demonstrate that increasing the number of modalities could improve performance in multi-label product categorization. – June 30, 2019 Published/Last Updated – Pasawee Wirojwatanakul (New York University) and Artit Wangperawong (U.S. Bank) Authors and Contributors [Read Now] In the papers on product categorization above, the researchers trained their models on open datasets which included millions of products. However, if you are building a product categorization model for commercial use, these datasets will not be available to you. Take a look at , , and the , for open data that may be of use to you. Kaggle Google Dataset Search Ultimate Dataset Library Also published at https://lionbridge.ai/articles/5-must-read-papers-on-product-categorization-for-data-scientists/

Amazon

Google

5 Essential Product Classification Papers for Data Scientists

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

10 AI and ML Apps, Games, and Tools for Android Phones

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

10 AI and ML Apps, Games, and Tools for Android Phones

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps