Hi, my name is Prashant Kikani and in this blog post, I share some tricks and tips to compete in Kaggle competitions and some code snippets which help in achieving results in limited resources. Here is my Kaggle profile.
1. TTA(test time augmentation) in inference similar to be done in training.
TTA is making predictions on the same data-sample multiple times but each time, the sample will be augmented. So, the overall sample is the same, but we do some data augmentation similar to be done while training.
Doing TTA is a common way to make our model predictions more robust & reduce the variance in our predictions. Therefore, ultimately improving the score.
2. Change image size at inference time
While training a computer vision model, we often resize our images to 512x512 or 256x256 size. We do that to fit our data and model in the GPU memory. In most cases, we can't train the model with original high-resolution images.
But, at inference time, we can keep the images in the original shape or in high resolution. Because we don't do/can't do back-propagation at inference time. Doing this helps because we are giving our model more relevant data in the form of those extra pixels.
3. Ensemble of multiple diverse models
Ensemble is a technique in which we combine multiple diverse models (which are mostly trained on the same data) by using all the models at inference time. For example, averaging predictions of all the models on a test data-sample. The goal of ensemble is to reduce the biad and/or variance in our predictions. Here's a great notebook to learn more about ensemble with Python code. Also here is a great blog about the ensemble.
In Kaggle, people share their codes with certain performances on the public leaderboard. What we can do is, we can train our own model and we can ensemble our model with that best public model.
4. Gradient Accumulation or effective batch size.
Generally, GPU RAM becomes a hurdle train bigger models in robust manner. Kaggle GPU provided 16 GB of GPU memory. But in some cases, we can't fit higher
batch_size
in that RAM. Higher batch_size
is good to train robust models. So, we can do gradient accumulation to make our batch_size
effectively higher. Here's a sample code in PyTorch from this gist.model.zero_grad() # Reset gradients tensors
for i, (inputs, labels) in enumerate(training_set):
predictions = model(inputs) # Forward pass
loss = loss_function(predictions, labels) # Compute loss function
loss = loss / accumulation_steps # Normalize our loss (if averaged)
loss.backward() # Backward pass
if (i+1) % accumulation_steps == 0: # Wait for several backward steps
optimizer.step() # Now we can do an optimizer step
model.zero_grad() # Reset gradients tensors
if (i+1) % evaluation_steps == 0: # Evaluate the model when we...
evaluate_model() # ...have no gradients accumulated
We do
optimizer.step()
not every batch, but for every accumulation_steps
. Which will also change the back-propagation period. And that will effectively change our batch_size
higher.5. Post-processing on predictions
Once we get our predictions on the test data, we can do some processing on it based on the metric of the competition & nature of data. For example,
AUC
is the metric measure performance, then rank average to ensemble models will perform better than a simple average. You can find a sample Python code for this here.LogLoss
is the metric of the competition, then simple average of labels gives the best naive baseline. Also, multiplying all the metrics with 0.99 or 1.01 helps in LogLoss
metric.6. Feature Engineering and Data Augmentation
This is very obvious and famous thing to do in Kaggle. Using all the given data in a competition, can we make more relevant data to make our model more robust & better?
In tabular competitions, we combine multiple columns from our data to make more relevant columns in our data. For example,
height
and width
columns of a house are given, we can create its total_area
by multiplying those two columns. Lots of things can be done in feature engineering - we just need to use our brain in how we can create more relevant data out of existing data.Data augmentation is mostly done in image and text data. In image data, we can apply all sorts of transformations to make more data with the same label as the original image like:
In text-data, we can augment using back-translation. Like, given an English sentence, we translate that sentence to let say German & translate it back from German to English. Now, that new English sentence may not be exactly the same as the original sentence but the meaning will be more or less the same.
Feature engineering is a skill that requires creativity & logical thinking. And that's what differentiates a good Kagglers from a novice!
If you have enjoyed this blog, you may find some of my other blogs interesting!
Happy Kaggling. Cheers!