Hi, my name is and in this blog post, I share some tricks and tips to compete in Kaggle competitions and some code snippets which help in achieving results in limited resources. Here is . Prashant Kikani my Kaggle profile For Deep Learning competitions 1. TTA(test time augmentation) in inference similar to be done in training. TTA is making predictions on the same data-sample multiple times but each time, the sample will be augmented. So, the overall sample is the same, but we do some data augmentation similar to be done while training. Doing TTA is a common way to make our model predictions more robust & reduce the variance in our predictions. Therefore, ultimately improving the score. 2. Change image size at inference time While training a computer vision model, we often resize our images to 512x512 or 256x256 size. We do that to fit our data and model in the GPU memory. In most cases, we can't train the model with original high-resolution images. But, at inference time, we can keep the images in the original shape or in high resolution. Because we don't do/can't do back-propagation at inference time. Doing this helps because we are giving our model more relevant data in the form of those extra pixels. 3. Ensemble of multiple diverse models Ensemble is a technique in which we combine multiple diverse models (which are mostly trained on the same data) by using all the models at inference time. For example, averaging predictions of all the models on a test data-sample. The goal of ensemble is to reduce the biad and/or variance in our predictions. Here's a to learn more about ensemble with Python code. Also here is a about the ensemble. great notebook great blog In Kaggle, people share their codes with certain performances on the public leaderboard. What we can do is, we can train our own model and we can ensemble our model with that best public model. 4. Gradient Accumulation or effective batch size. Generally, GPU RAM becomes a hurdle train bigger models in robust manner. Kaggle GPU provided 16 GB of GPU memory. But in some cases, we can't fit higher in that RAM. Higher is good to train robust models. So, we can do gradient accumulation to make our effectively higher. Here's a sample code in PyTorch from this . batch_size batch_size batch_size gist model.zero_grad() i, (inputs, labels) enumerate(training_set): predictions = model(inputs) loss = loss_function(predictions, labels) loss = loss / accumulation_steps loss.backward() (i+ ) % accumulation_steps == : optimizer.step() model.zero_grad() (i+ ) % evaluation_steps == : evaluate_model() # Reset gradients tensors for in # Forward pass # Compute loss function # Normalize our loss (if averaged) # Backward pass if 1 0 # Wait for several backward steps # Now we can do an optimizer step # Reset gradients tensors if 1 0 # Evaluate the model when we... # ...have no gradients accumulated We do not every batch, but for every . Which will also change the back-propagation period. And that will effectively change our higher. optimizer.step() accumulation_steps batch_size 5. Post-processing on predictions Once we get our predictions on the test data, we can do some processing on it based on the metric of the competition & nature of data. For example, If is the metric measure performance, then average to ensemble models will perform better than a simple average. You can find a sample Python code for this . AUC rank here If is the metric of the competition, then simple average of labels gives the best naive baseline. Also, multiplying all the metrics with 0.99 or 1.01 helps in metric. LogLoss LogLoss Sometimes, rather than a simple average of predictions of models in the ensemble, geometric mean can be better. is a sample Python for this. Here Most post-processing techniques depend on the nature of the competition data. Training data may have some signals to tweak the model predictions to improve the score. 6. Feature Engineering and Data Augmentation This is very obvious and famous thing to do in Kaggle. Using all the given data in a competition, can we make more relevant data to make our model more robust & better? In tabular competitions, we combine multiple columns from our data to make more relevant columns in our data. For example, and columns of a house are given, we can create its by multiplying those two columns. Lots of things can be done in feature engineering - we just need to use our brain in how we can create more relevant data out of existing data. height width total_area Data augmentation is mostly done in image and text data. In image data, we can apply all sorts of transformations to make more data with the same label as the original image like: Rotate the image by any degree between 0-360. Crop the irrelevant part. Changing the opacity of the image. Flip the image horizontally/vertically. In text-data, we can augment using back-translation. Like, given an English sentence, we translate that sentence to let say German & translate it back from German to English. Now, that new English sentence may not be exactly the same as the original sentence but the meaning will be more or less the same. Feature engineering is a skill that requires creativity & logical thinking. And that's what differentiates a good Kagglers from a novice! If you have enjoyed this blog, you may find interesting! some of my other blogs Happy Kaggling. Cheers!