Recently, we’ve launched a new series of machine learning articles performed by Artur Kuzin, our Lead Data Scientist. Today, Artur is showing up with a new story of his, speaking about the participation as a mentor in the “IEEE’s Camera Model Identification” competition and sharing his recent experience in team management and problem-solving.
This winter I’ve participated as a mentor in the “IEEE’s Signal Processing Society — Camera Model Identification” competition. The problem was to identify the camera that the image was taken with. There were ten camera models as classes in the image collection: two iPhones, seven Android phones, and one DSLR camera.
The problem was to identify the camera that the image was taken with. There were ten camera models as classes in the image collection: two iPhones, seven Android phones, and one DSLR camera.
Usually, you have Deep Learning solutions distinguishing all kinds of objects on the image from each other (i.e., cats vs dogs, porn vs bikini, cars vs buses). At the same time, there shouldn’t be any difference in how and on which device an image of a cat (or a car) was taken.
With this competition it was exactly the opposite: regardless of what were the objects on photos, we needed to identify the device. Which means, we had to consider such things as sensor noise, image processing artifacts, optical defects, etc.
The majority of kaggle teams is formed of participants with similar leaderboard scores, with every team member having his own working solution pipeline. Earlier, I wrote a post with a good example of this approach. But this time we had it the other way: we separated pipeline on consecutive and independent parts, so each team member was focusing on his own part.
Data mining. It was allowed to use external datasets in this competition. Our team member Artur Fattakhov wrote a web scraper based on BeautifulSoup library. It was also necessary to use a web browser emulator selenium in order to handle a page structure on Flickr correctly. All in all, we downloaded more than 500 Gb of images from Yandex.Fotki, Flickr, and Wikimedia Commons.
Data filtering. That was my only code-level contribution in this project. I chose following criteria for filtering: 1) typical image size (camera-wise), 2) JPEG compression level, 3) information on camera model in EXIF, 4) information on processing software in EXIF.
A vast majority of Moto-X images had to be left out because there were several different hardware versions in that class.
Validation scheme. Learning and validation module development was delegated to Ilya Kibardin. Unfortunately, validation on holdout of kaggle dataset was inconsistent: our neural network scored close to 1.0 accuracy (and we had about 0.96 on the leaderboard).
Therefore, we decided to validate on additional data (urls were provided by Gleb Posobin: https://www.kaggle.com/c/sp-society-camera-model-identification/discussion/47235). We also included to this additional dataset 10% of kaggle training set in order to make classes more balanced.
Best checkpoints were selected by the lowest cross-entropy value on the manipulated and unaffected data.
Training approach. Somewhere in the middle in the course of the competition, Andres Torrubia made publicly available his complete solution on a kaggle forum. That was a real game changer because the models appeared to be quite good and everybody immediately started using it. The situation changed again when Ivan Romanov posted pytorch version of the same solution: it was fast and easy to parallelize on multiple GPUs. The sad part that these guys ended up being #30 and #45 in the final leaderboard, but they will always remain unbeaten in our hearts.
Our team member Ilya elaborated Misha’s code (https://www.kaggle.com/c/sp-society-camera-model-identification/discussion/48679) in the following way:
Learning. The learning step consisted of the fine-tuning of the complete neural network, without any frozen layers since the problem is very different from ImageNet.
1. Start with Adam with learning rate 1e-4. If loss shows no improvement during 2–3 epochs — learning rate halves. This should be performed until convergence is achieved.
2. Use SGD instead of Adam; learning rate is cyclic from 1e-3 to 1e-6 (three cycles).
The final ensemble. I asked Ilya to apply my recent approach from “Amazon from Space” competition: we fine-tuned nine models, chose three best checkpoints from them, used every selected checkpoints to calculate predictions with Test Time Augmentation (TTA) and averaged these predictions by geometric mean.
This approach led us to the second place in the overall ranking, and #1 in the student category. We successfully passed the first stage and made it to the second stage of the competition that took place during the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing in Canada. It is quite remarkable that the #3 team also was considered a student team. From their score, it seems that we have outperformed their solution by only one correct prediction.
After we received our confirmation letters, Ilya and Artur F. began to prepare for the second stage, while Valery, Andrey and myself decided that we could not participate in it. Unfortunately, Ilya and Artur’s visas were denied by the Canadian embassy. But luckily for us, organizing committee understood our situation and allowed us to participate in the competition remotely.
There was no leaderboard on the second stage: by the rules of the competition, we had only one attempt to submit our solution. Also, there was no information on camera models in the training set. So, basically, that was a double fail: no external data and non-representative local validation.
At first, we tried to use the pipeline from the first stage. Every model quickly gained high accuracy (around 0.97), but their prediction was not overlapping on the test sample (around 0.87). It seemed to me that our models were heavily overfitting. So I came up with the new plan:
1. Use best models from the first stage for feature extraction;
2. Perform feature extraction with following PCA reducing;
3. Train LightGBM classifier.
I proposed this plan for the following reasons. The nets were already trained to extract important features outside the specific image context. Therefore, it would be enough just to train something lightweight on top of it. However, since the data on this stage may be very different from the data on the first stage, it would be better if the classifier was non-linear, like gradient boosting on decision trees. I have successfully applied this idea in previous competitions (I’ve also shared some code). DenseNet appeared to outperform Resnext and SE-Resnext on local validation. As a result, our final solution looked like this:
The number of the training samples for the manipulated part was multiplied by seven because I have extracted features from every manipulated transform of a data separately.
At the end, we took the second place in the final stage, but there were a number of important circumstances. First, the jury evaluated solutions not by accuracy but by a presentation. The winner of the final created not only a presentation but also a live demo of their algorithm. And second, we still have no idea about the final score of every team (committee does not give this information away even being asked directly).
Fun fact: the vast majority of Russian-speaking data scientists are members of the Open Data Science community. That is a very friendly environment, and during the first stage, every team from our community added [ods.ai] to the team name. It turned out that there were a lot of them on the top of the leaderboard! After that such kaggle legends as inversion and Giba joined our community to figure out what we were up to.
I really enjoyed participating as a mentor in a competition. I was able to give several valuable advice on baseline refinement and performing of local validation — all of that was based on my experience in prior competitions. I think, this practice in team forming has proved itself and should be applied in the future contests: Kaggle Master/Grandmaster as a solution architect and 2–3 Kaggle Experts as coders and researchers. In my opinion, that’s a definite win-win situation: experienced Kaggle users don’t have to spend much time writing code while the beginners achieve a better result, avoid common mistakes and gain experience really fast.