Late 2018 I participated in kaggle’s **“Quick, Draw! Doodle Recognition Challenge”.**For those of you who are unaware, below is a short description of this game: “Quick, Draw!” was released as an experimental game to educate the public in a playful way about how AI works. The game prompts users to draw an image depicting a certain category, such as ”banana,” “table,” etc. As part of this competition, a subset of more than 1B drawings was released which had 340 labels. The competitors needed to improve the existing AI which distinguishes whether a user has correctly been able to draw what was asked for. For each test image, the need was to predict the three most probable classes the doodle might belong to. algorithm key_id,word9000003627287624,The_Eiffel_Tower airplane donut9000010688666847,The_Eiffel_Tower airplane donut The finest algorithm was chosen based on its Mean Average Precision @ 3 (MAP@3). where is the number of scored drawings in the test data, is the precision at cutoff and is the number of predictions per drawing. U P(k) k n Initial deep dive into the data The drawings were captured as timestamped vectors, tagged with metadata including what the player was asked to draw and in which country the player was located. Each of the 340 classes had CSV files in the below format defining how each of the doodles was drawn by the corresponding player. A sample for doodles with apple as class Using the below code we can convert each of the strokes in the drawing column of the above file into a corresponding image. BASE_SIZE = 256 draw_cv2(raw_strokes, size=299, lw=4, time_color= ):img = np.zeros((BASE_SIZE, BASE_SIZE), np.uint8) t, stroke enumerate(raw_strokes): i range(len(stroke[0]) - 1):color = 255 - min(t, 10) * 13 time_color 255_ = cv2.line(img, (stroke[0][i], stroke[1][i]),(stroke[0][i + 1], stroke[1][i + 1]), color, lw)img = cv2.copyMakeBorder(img,4,4,4,4,cv2.BORDER_CONSTANT) size != BASE_SIZE: cv2.resize(img, (size, size), interpolation=cv2.INTER_LINEAR) : img def False for in for in if else if return else return For example here is one from the snowman file: snowman We converted all of the strokes into corresponding images and stored them in corresponding folders(train & test). A batch of images in training data Using a convolutional neural network to identify the doodle Ideally, there are multiple ways this problem could have been tackled, for example as there is a sequential component to it with strokes being a sequence of coordinates a recurrent neural network could also be used. I rather preferred to tackle this as a computer vision problem as it is more easier to test and learn by visualizing the results in an image problem than a sequential one like the one we are working on. The architecture chosen by us was Resnets and its variants( ). We started off with Resnet18 and gradually tested the problem for performance even across bigger architectures. https://arxiv.org/abs/1512.03385 Empirically I observed Resnet34 gave us more bang for each buck than any of other networks. What did I learn from initial experimentation from the data A look at the data and subsequent runs suggested the need for this problem was a simplified network which could zoom through these large number of doodles. The need of the hour was a simplified network with the ability to run multiple epochs within a limited time frame. Hence, I did not even try any complicated architectures which in the end was a great decision. Using Resnet34 the highest volume of data that I could run my experimentation on was 30% and it did show that more data does help with the generalization ability when you have a simple but quite diverse(more number of labels) dataset. Noise in the data On further observation, it was observed that there was lots of noise in the training data, that is there were lots of doodles which were wrongly labeled. This was actually impacting the capability of the model as you are inherently giving wrong instructions to it. Possible solutions for this, which I could not try are the development of another network to identify wrongly labeled images or hand labeling high loss images(those where there is highest difference between actual and predicted). learning Where did I land The highest MAP@3 I got was 0.91444 on the public leaderboard which generalized quite well with a score of 0.91318 on the private leaderboard. Considering the winner of the competition was on 0.95480 I was on the correct path. The one strategy that could have made a difference was if I had spent more time improving the noise in the data but this is a learning for next time. Thanks, everyone for reading my experience of tackling this extremely interesting problem! For anyone looking to try there hand at this below is the link to the competition. _How accurately can you identify a doodle?_www.kaggle.com Quick, Draw! Doodle Recognition Challenge

Apple

Too Long; Didn't Read

In your car, at home, or at work — Bosch technology shapes many areas of life.

What I learned when trying to improve an AI agent in a game using deep learning

Too Long; Didn't Read

Company Mentioned

Shubrashankh Chatterjee

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

What I learned when trying to improve an AI agent in a game using deep learning

Too Long; Didn't Read

Company Mentioned

Shubrashankh Chatterjee

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES