Imagine if you could get all the tips and tricks you need to hammer a Kaggle competition. I have gone over 39 Kaggle competitions including – $1,000,000 Data Science Bowl 2017 – $100,000 Intel & MobileODT Cervical Cancer Screening – $100,000 2018 Data Science Bowl – $60,000 Airbus Ship Detection Challenge – $60,000 Planet: Understanding the Amazon from Space – $50,000 APTOS 2019 Blindness Detection – $37,000 Human Protein Atlas Image Classification – $30,000 SIIM-ACR Pneumothorax Segmentation – $25,000 Inclusive Images Challenge – and extracted that knowledge for you. Dig in. Contents External Data Preprocessing Data Augmentations Modeling Hardware Setups Loss Functions Training Tips Evaluation and Cross-validation Ensembling Methods Post Processing External Data Use of the data because it contains detailed annotations from radiologists LUng Node Analysis Grand Challenge Use of the data because it had radiologist descriptions of each tumor that they found LIDC-IDRI Use , Flickr CC Wikipedia Commons datasets Use Human Protein Atlas Dataset Use dataset IDRiD Data Exploration and Gaining insights with the 0.5 threshold Clustering of 3d segmentation Identify if there is a substantial difference in train/test label distributions Preprocessing Perform blob Detection using the . Used the implementation available in package. Difference of Gaussian (DoG) method skimage Use of in order to reduce the time of training patch-based inputs for training Use for loading data instead of because it has a faster reader cudf Pandas Ensure that all the images have the same orientation Apply contrast limited adaptive histogram equalization Use for all general image preprocessing OpenCV Employ and adding manual annotations automatic active learning in order to apply the same model to scans of different thicknesses Resize all images to the same resolution into normalized 3D numpy arrays Convert scan images Apply single using Dark Channel Prior Image Haze Removal Convert all data to Hounsfield units Find duplicate images using pair-wise correlation on RGBY Make labels more balanced by developing a sampler Apply p to test data in order seudo labeling to improve score (CLAHE) with kernel size 32×32 Scale down images/masks to 320×480 Histogram equalization Convert DCM to PNG Calculate the when there are duplicate images md5 hash for each image Data Augmentations Use package for augmentations albumentations Apply random rotation by 90 degrees Use h orizontal, vertical or both flips Attempt : Elastic Transform, PerspectiveTransform, Piecewise Affine transforms, pincushion distortion heavy geometric transformations Apply random HSV Use of for generalization to prevent loss of useful image information loss-less augmentation Apply channel shuffling Do based on class data augmentation frequency Apply gaussian noise Use for data augmentation lossless permutations of 3D images by a random angle from 0 to 45 degrees Rotate by a random factor from 0.8 to 1.2 Scale changing Brightness Randomly change hue, saturation and value Apply augmentations D4 Contrast limited adaptive Use the histogram equalization Auto augmentation strategy Augment Modeling Architectures Use of a based architecture. Adopted the concepts and applied them to 3D input tensors U-net Employing automatic active learning and adding manual annotations The for training features with different receptive fields inception-ResNet v2 architecture with adversarial training Siamese networks , , x 5 with Dense (FC) layer as the final layer ResNet50 Xception Inception ResNet v2 Use of a which returns a fixed-length output no matter the input size global max-pooling layer Use of stacked dilated convolutions VoxelNet Replace plus sign in with concat and conv1x1 LinkNet skip connections Generalized mean pooling Keras to train the model from scratch using 224x224x3 NASNetLarge Use of the to slide over the imagesImagenet-pre-trained as the feature extractor 3D convnet ResNet152 Replace the final fully-connected layers of ResNet by 3 fully connected layers with dropout Use in the decoder ConvTranspose Applying the Implementing the network with adjusted receptive fields and a 64 unit bottleneck layer on the end of the network VGG baseline architecture C3D Use of type architectures with pre-trained weights to improve convergence and performance of binary segmentation on 8-bit RGB input images UNet since it’s fast and memory efficient LinkNet MASKRCNN BN-Inception Fast Point R-CNN and Seresnext UNet Deeplabv3 Faster RCNN SENet154 ResNet152 NASNet-A-Large EfficientNetB4 ResNet101 GAPNet PNASNet-5-Large Densenet121 AC-GAN XceptionNet (96), XceptionNet (299), Inception v3 (139), InceptionResNet v2 (299), DenseNet121 (224) from AlbuNet (resnet34) ternausnets from SpaceNet Resnet50 selim_sef SpaceNet 4 SCSE from Unet (seresnext50) selim_sef SpaceNet 4 A custom Unet and Linknet architecture FPNetResNet50 (5 folds) FPNetResNet101 (5 folds) FPNetResNet101 (7 folds with different seeds) PANetDilatedResNet34 (4 folds) PANetResNet50 (4 folds) EMANetResNet101 (2 folds) RetinaNet Deformable R-FCN Deformable Relation Networks Hardware Setups Use of the AWS GPU instance p2.xlarge with a NVIDIA K80 GPU Pascal Titan-X GPU Use of 8 TITAN X GPUs 6 GPUs: 2 1080Ti + 4 1080 Server with 8×NVIDIA Tesla P40, 256 GB RAM and 28 CPU cores Intel Core i7 5930k, 2×1080, 64 GB of RAM, 2x512GB SSD, 3TB HDD GCP 1x P100, 8x CPU, 15 GB RAM, SSD or 2x P100, 16x CPU, 30 GB RAM NVIDIA Tesla P100 GPU with 16GB of RAM Intel Core i7 5930k, 2×1080, 64 GB of RAM, 2x512GB SSD, 3TB HDD 980Ti GPU, 2600k CPU, and 14GB RAM Loss Functions because it works well with imbalanced data Dice Coefficient whose aim is to reduce the distance between the predicted segmentation and the ground truth Weighted boundary loss that creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input and target MultiLabelSoftMarginLoss Balanced cross entropy (BCE) that involves weighing the positive and negative examples by a certain coefficient with logit loss that performs direct optimization of the mean intersection-over-union loss in neural networks based on the convex Lovasz extension of sub-modular losses Lovasz obtained by summing the Focal and Lovasz losses FocalLoss + Lovasz that incorporates margin in order to maximise face class separability Arc margin loss that computes the npairs loss between y_true and y_pred. Npairs loss A combination of functions BCE and Dice loss – a pairwise ranking that is is smooth everywhere and thus is easier to optimize LSEP that simultaneously learns a center for deep features of each class and penalizes the distances between the deep features and their corresponding class centers Center loss that augments standard loss functions such as Softmax that trains a network to embed features of the same class at the same time maximizing the embedding distance of different classes Ring Loss Hard triplet loss that involves subtracting the BCE and DICE losses then adding 1 1 + BCE – Dice that is the binary cross-entropy minus the log of the dice loss Binary cross-entropy – log(dice) of BCE, dice and focal Combinations that loss performs direct optimization of the mean intersection-over-union loss Lovasz Loss -Dice loss is obtained by calculating smooth dice coefficient function BCE + DICE that is an improvement to the standard cross-entropy criterion Focal loss with Gamma 2 – this is basically a summation of the three loss functions BCE + DICE + Focal that incorporates the area and size information and integrates the information in a dense deep learning model Active Contour Loss 1024 * BCE(results, masks) + BCE(cls, cls_target) – Kappa is a loss function for multi-class classification of ordinal data in deep learning. In this case we sum it and the focal loss Focal + kappa — Additive Angular Margin Loss for Deep Face Recognition ArcFaceLoss s – Soft Dice uses predicted probabilities oft Dice trained on positives only which is a custom loss used by the Kaggler 2.7 * BCE(pred_mask, gt_mask) + 0.9 * DICE(pred_mask, gt_mask) + 0.1 * BCE(pred_empty, gt_empty) that creates a criterion that uses a squared term if the absolute element-wise error falls below 1 and an L1 term otherwise nn.SmoothL1Loss() Use of the in scenarios where it seems to work better than . Mean Squared Error objective function binary-cross entropy objective function Training tips Try different learning rates Try different batch sizes Use SDG with momentum with manual rate scheduling Too much will reduce the accuracy augmentation Train on image on full images crops and predict Use of Keras’s to the learning rate ReduceLROnPlateau() Train then apply soft and hard augmentation to some epochs without augmentation until plateau last one and use 1000 images from Freeze all layers except the Stage1 for tuning Make labels more balanced by Use of developing a sampler class aware sampling Use dropout and augmentation while tuning the last layer to improve score Pseudo Labeling Use Adam reducing LR on plateau with patience 2–4 Use Cyclic LR with SGD Reduce the by a factor of two if validation loss does not improve for two consecutive epochs learning rate Repeat the of 10 batches worst batch out Train with default UNET so that each edge pixel is covered twice Overlap tiles Hyperparameter tuning: learning rate on training, non-maximum suppression and score threshold on inference with low confidence score Remove low bounding box Train different then build an ensemble convolutional neural networks Stop is decreasing training when the F1 score with gradual reducingTrain ANNs in 5 folds and 30 repeats Differential learning rate a stacking way using Track of your experiments using . Neptune Evaluation and cross-validation Split on by classes non-uniform stratified Avoid by applying while the last layer overfitting cross-validation tuning 10-fold CV ensemble for classification Combination ensembles for detection of 5 10-fold CV Sklearn’s 5 stratified K fold function K Fold Cross-Validation Adversarial Validation & Weighting Ensembling methods Use simple for ensemble majority voting on the the z-location and the XGBoost max malignancy at 3 zoom levels, amount of strange tissue for models classes. This was done for raw data features only. LightGBM with too many for CatBoost a second-layer model Training with 7 features for the gradient boosting classifier Use to speed up model training. In this technique, models are first trained on simple samples then progressively moving to hard ones. ‘curriculum learning’ Ensemble with ResNet50, InceptionV3, and InceptionResNetV2 for object detection Ensemble method An ensemble of , , and architectures n with a classification network — architecture Mask RCNN YOLOv3 Faster RCNN DenseNet-121 Post Processing Apply — presenting an image to a model several times with different random transformations and average the predictions you get test time augmentation Equalize test prediction instead of only using predicted classes probabilities Apply to the geometric mean predictions is covered at least thrice because UNET tends to have bad predictions around edge areas. Overlap tiles during inferencing so that each edge pixel and bounding box shrinkage Non-maximum suppression to detach objects in instance segmentation problems. Watershed post processing Final Thoughts Hopefully, this article gave you some background into image segmentation tips and tricks and given you some tools and frameworks that you can use to start competing. We’ve covered tips on: architectures training tricks, losses, pre-processing, post processing ensemblingtools and frameworks. If you want to go deeper down the rabbit hole, simply follow the links and see how the best image segmentation models are built. Happy segmenting! This article was originally posted by Derrick Mwiti on the Neptune blog . If you liked it, you may like it there :) You can also find me tweeting @Neptune_ai or posting on Linkedin about ML and Data Science stuff.