Using PyTorch, FastAI and the CIFAR-10 image dataset In this article, we’ll try to replicate the used by the FastAI team to win the Stanford competition by training a model that achieves 94% accuracy on the in under 3 minutes. approach DAWNBench CIFAR-10 dataset NOTE : Some basic familiarity with PyTorch and the FastAI library is assumed here. If you want to follow along, see these instructions for a quick setup. Dataset The CIFAR-10 dataset consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class. There are 50,000 training images (5,000 per class) and 10,000 test images. Here are 10 random images from each class: You can download the data or by running the following commands: here cd data wget http://files.fast.ai/data/cifar10.tgz tar -xf cifar10.tgz Once the data is downloaded, start the Jupyter notebook server using the command and create a new notebook called inside . jupyter notebook cifar10-fast.ipynb fastai/courses/dl1 Let’s define a helper function to create data loaders with data augmentation: torchvision.transforms tt torchvision.datasets ImageFolder torch.utils.data DataLoader fastai.dataset ModelData PATH = trn_dir, val_dir = PATH + , PATH + stats = (( , , ), ( , , )) tfms = [tt.ToTensor(), tt.Normalize(*stats)] aug_tfms = tt.Compose([tt.RandomCrop( , padding= ), tt.RandomHorizontalFlip()] + tfms) trn_ds = ImageFolder(trn_dir, aug_tfms) val_ds = ImageFolder(val_dir, tt.Compose(tfms)) aug_ds = ImageFolder(val_dir, aug_tfms) trn_dl = DataLoader(trn_ds, batch_size=bs, shuffle= , num_workers=num_workers, pin_memory= ) val_dl = DataLoader(val_ds, batch_size=bs, shuffle= , num_workers=num_workers, pin_memory= ) aug_dl = DataLoader(aug_ds, batch_size=bs, shuffle= , num_workers=num_workers, pin_memory= ) data = ModelData(PATH, trn_dl, val_dl) data.aug_dl = aug_dl data.sz = data import as from import from import from import : def get_data (bs, num_workers) "data/cifar10/" 'train' 'test' 0.4914 0.4822 0.4465 0.2023 0.1994 0.2010 # Data transforms (normalization & data augmentation) 32 4 # PyTorch datasets # PyTorch data loaders True True False True False True # FastAI model data 32 return A few things to note about : get_data We’re using the as the validation dataset, to keep things simple. data/test Typically, you should use a subset of the training data for validation. We’re using multiple workers to leverage multi-core CPUs. This helps load the images and apply transformations faster. The variable contains channel-wise means and standard deviations for entire dataset, and is used to normalize the data. stats The data loader applies data augmentation to the validation dataset. It is used for test time augmentation ( ). aug_dl TTA Network Architecture We’ll use a model called WideResNet-22, inspired from the family of architectures introduced in the paper . It has the following architecture: Wide Residual Networks A few notable aspects of the architecture: It’s quite similar to popular architectures, except that the intermediate layers have a lot more channels (96, 192 & 384) ResNet It has 22 convolutional layers, indicated in the diagram as . Conv(size, input_channels, output_channels, stride=1) There are 9 residual blocks with shortcut connections, organized into 3 groups. The first block of each group increase the number of channels to 96, 192 and 384 respectively.The first blocks of groups 2 & 3 also downsample the feature map from 32x32 to 16x16 and 8x8 respectively using convolutional layers with stride 2 (highlighted in orange). Let’s first implement a generic module class for creating the residual blocks: torch.nn nn torch.nn.functional F nn.Conv2d(in_channels=ni, out_channels=nf, kernel_size=ks, stride=stride, padding=ks// , bias= ) nn.Sequential(nn.BatchNorm2d(ni), nn.ReLU(inplace= ), conv_2d(ni, nf)) super().__init__() self.bn = nn.BatchNorm2d(ni) self.conv1 = conv_2d(ni, nf, stride) self.conv2 = bn_relu_conv(nf, nf) self.shortcut = x: x ni != nf: self.shortcut = conv_2d(ni, nf, stride, ) x = F.relu(self.bn(x), inplace= ) r = self.shortcut(x) x = self.conv1(x) x = self.conv2(x) * x.add_(r) import as import as : def conv_2d (ni, nf, stride= , ks= ) 1 3 return 2 False : def bn_relu_conv (ni, nf) return True : class BasicBlock (nn.Module) : def __init__ (self, ni, nf, stride= ) 1 lambda if 1 : def forward (self, x) True 0.2 return Next, let’s define a generic class which will allow us to create a network with groups, blocks per group and a factor which can be used to adjust the width of the network i.e. the number of channels. It also adds the pooling and linear layers at the end. WideResNet n_groups N k start = BasicBlock(ni, nf, stride) rest = [BasicBlock(nf, nf) j range( , N)] [start] + rest super().__init__() x.view(x.size( ), ) super().__init__() layers = [conv_2d( , n_start)] n_channels = [n_start] i range(n_groups): n_channels.append(n_start*( **i)*k) stride = i> layers += make_group(N, n_channels[i], n_channels[i+ ], stride) layers += [nn.BatchNorm2d(n_channels[ ]), nn.ReLU(inplace= ), nn.AdaptiveAvgPool2d( ), Flatten(), nn.Linear(n_channels[ ], n_classes)] self.features = nn.Sequential(*layers) self.features(x) WideResNet(n_groups= , N= , n_classes= , k= ) : def make_group (N, ni, nf, stride) for in 1 return : class Flatten (nn.Module) : def __init__ (self) : def forward (self, x) return 0 -1 : class WideResNet (nn.Module) : def __init__ (self, n_groups, N, n_classes, k= , n_start= ) 1 16 # Increase channels to n_start using conv layer 3 # Add groups of BasicBlock(increase channels & downsample) for in 2 2 if 0 else 1 1 # Pool, flatten & add linear layer for classification 3 True 1 3 : def forward (self, x) return : def wrn_22 () return 3 3 10 6 Finally, we can also create a helper function for WideResNet-22, which has 3 groups, 3 residual blocks per group and . It’s always a good idea to define flexible and generic models, so that you can easily experiment with deeper or wider networks. k=6 Training and Evaluation Let’s define a couple of helper functions for instantiating the model and evaluating the results: fastai.conv_learner ConvLearner, num_cpus, accuracy data = get_data(bs, num_cpus()) learn = ConvLearner.from_model_data(arch.cuda(), data) learn.crit = nn.CrossEntropyLoss() learn.metrics = [accuracy] learn preds, targs = learn.TTA() preds = * preds[ ] + * preds[ :].sum( ) accuracy_np(preds, targs) from import : def get_learner (arch, bs) """Create a FastAI learner using the given model""" return : def get_TTA_accuracy (learn) """Calculate accuracy with Test Time Agumentation(TTA)""" 0.6 0 0.4 1 0 return Finally, let’s train the model using the , which involves gradually increasing the learning rate and decreasing the momentum till about halfway into the cycle, and then doing the opposite. Here’s what it looks like: 1 cycle policy On a 6-core Intel i5 CPU and NVIDIA GTX 1080 Ti, the training takes about 15 minutes. You might see slightly different results depending on your hardware. Here’s a plot of the loss, learning rate and momentum over time: And that’s it! Feel free to play around with the network architecture, learning rate, cycle length and other factors to try and get a better result in a shorter time. You can find the entire code for this post in . this Github gist