How to Train DeepLabV3 with MobileNetV2 Using TensorFlow

Content Overview

Install necessary dependencies
Import required libraries
Custom dataset preparation for semantic segmentation
Configure the DeepLabV3 Mobilenet model for custom dataset
Create the Task object (tfm.core.base_task.Task) from the config_definitions.TaskConfig

This tutorial trains a DeepLabV3 with Mobilenet V2 as backbone model from the TensorFlow Model Garden package (tensorflow-models).

Model Garden contains a collection of state-of-the-art models, implemented with TensorFlow's high-level APIs. The implementations demonstrate the best practices for modeling, letting users to take full advantage of TensorFlow for their research and product development.

Dataset: Oxford-IIIT Pets

The Oxford-IIIT pet dataset is a 37-category pet image dataset with roughly 200 images for each class. The images have large variations in scale, pose, and lighting. All images have an associated ground truth annotation of breed.

This tutorial demonstrates how to:

Use models from the TensorFlow Models package.
Train/Fine-tune a pre-built DeepLabV3 with mobilenet as backbone for Semantic Segmentation.
Export the trained/tuned DeepLabV3 model

Install necessary dependencies

pip install -U -q "tf-models-official"

Import required libraries

import os
import pprint
import numpy as np
import matplotlib.pyplot as plt

from IPython import display

import tensorflow as tf
import tensorflow_datasets as tfds


import orbit
import tensorflow_models as tfm
from official.vision.data import tfrecord_lib
from official.vision.utils import summary_manager
from official.vision.serving import export_saved_model_lib
from official.vision.utils.object_detection import visualization_utils

pp = pprint.PrettyPrinter(indent=4) # Set Pretty Print Indentation
print(tf.__version__) # Check the version of tensorflow used

%matplotlib inline

2024-02-02 12:12:13.799558: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-02 12:12:13.799625: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-02 12:12:13.801330: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2.15.0

Custom dataset preparation for semantic segmentation

Models in the Official repository (of model-garden) require models in a TFRecords dataformat.

Please check this resource to learn more about TFRecords data format.

Oxford_IIIT_pet:3 dataset is taken from Tensorflow Datasets

(train_ds, val_ds, test_ds), info = tfds.load(
    'oxford_iiit_pet:3.*.*',
    split=['train+test[:50%]', 'test[50%:80%]', 'test[80%:100%]'],
    with_info=True)
info

tfds.core.DatasetInfo(
    name='oxford_iiit_pet',
    full_name='oxford_iiit_pet/3.2.0',
    description="""
    The Oxford-IIIT pet dataset is a 37 category pet image dataset with roughly 200
    images for each class. The images have large variations in scale, pose and
    lighting. All images have an associated ground truth annotation of breed.
    """,
    homepage='http://www.robots.ox.ac.uk/~vgg/data/pets/',
    data_dir='gs://tensorflow-datasets/datasets/oxford_iiit_pet/3.2.0',
    file_format=tfrecord,
    download_size=773.52 MiB,
    dataset_size=774.69 MiB,
    features=FeaturesDict({
        'file_name': Text(shape=(), dtype=string),
        'image': Image(shape=(None, None, 3), dtype=uint8),
        'label': ClassLabel(shape=(), dtype=int64, num_classes=37),
        'segmentation_mask': Image(shape=(None, None, 1), dtype=uint8),
        'species': ClassLabel(shape=(), dtype=int64, num_classes=2),
    }),
    supervised_keys=('image', 'label'),
    disable_shuffling=False,
    splits={
        'test': <SplitInfo num_examples=3669, num_shards=4>,
        'train': <SplitInfo num_examples=3680, num_shards=4>,
    },
    citation="""@InProceedings{parkhi12a,
      author       = "Parkhi, O. M. and Vedaldi, A. and Zisserman, A. and Jawahar, C.~V.",
      title        = "Cats and Dogs",
      booktitle    = "IEEE Conference on Computer Vision and Pattern Recognition",
      year         = "2012",
    }""",
)

Helper function to encode dataset as tfrecords

def process_record(record):
  keys_to_features = {
  'image/encoded': tfrecord_lib.convert_to_feature(
      tf.io.encode_jpeg(record['image']).numpy()),
  'image/height': tfrecord_lib.convert_to_feature(record['image'].shape[0]),
  'image/width': tfrecord_lib.convert_to_feature(record['image'].shape[1]),
  'image/segmentation/class/encoded':tfrecord_lib.convert_to_feature(
      tf.io.encode_png(record['segmentation_mask'] - 1).numpy())
  }
  example = tf.train.Example(
      features=tf.train.Features(feature=keys_to_features))
  return example

Write TFRecords to a folder

output_dir = './oxford_iiit_pet_tfrecords/'
LOG_EVERY = 100
if not os.path.exists(output_dir):
  os.mkdir(output_dir)

def write_tfrecords(dataset, output_path, num_shards=1):
  writers = [
        tf.io.TFRecordWriter(
            output_path + '-%05d-of-%05d.tfrecord' % (i, num_shards))
        for i in range(num_shards)
    ]
  for idx, record in enumerate(dataset):
    if idx % LOG_EVERY == 0:
      print('On image %d', idx)
    tf_example = process_record(record)
    writers[idx % num_shards].write(tf_example.SerializeToString())

Write training data as TFRecords

output_train_tfrecs = output_dir + 'train'
write_tfrecords(train_ds, output_train_tfrecs, num_shards=10)

On image %d 0
On image %d 100
On image %d 200
On image %d 300
On image %d 400
On image %d 500
On image %d 600
On image %d 700
On image %d 800
Corrupt JPEG data: 240 extraneous bytes before marker 0xd9
Corrupt JPEG data: premature end of data segment
On image %d 900
On image %d 1000
On image %d 1100
On image %d 1200
On image %d 1300
On image %d 1400
On image %d 1500
On image %d 1600
On image %d 1700
On image %d 1800
On image %d 1900
On image %d 2000
On image %d 2100
On image %d 2200
On image %d 2300
On image %d 2400
On image %d 2500
On image %d 2600
On image %d 2700
On image %d 2800
On image %d 2900
On image %d 3000
On image %d 3100
On image %d 3200
On image %d 3300
On image %d 3400
On image %d 3500
On image %d 3600
On image %d 3700
On image %d 3800
On image %d 3900
On image %d 4000
On image %d 4100
On image %d 4200
On image %d 4300
On image %d 4400
On image %d 4500
On image %d 4600
On image %d 4700
On image %d 4800
On image %d 4900
On image %d 5000
On image %d 5100
On image %d 5200
On image %d 5300
On image %d 5400
On image %d 5500

Write validation data as TFRecords

output_val_tfrecs = output_dir + 'val'
write_tfrecords(val_ds, output_val_tfrecs, num_shards=5)

On image %d 0
On image %d 100
On image %d 200
On image %d 300
On image %d 400
On image %d 500
On image %d 600
On image %d 700
On image %d 800
On image %d 900
On image %d 1000
On image %d 1100

Write test data as TFRecords

output_test_tfrecs = output_dir + 'test'
write_tfrecords(test_ds, output_test_tfrecs, num_shards=5)

On image %d 0
On image %d 100
On image %d 200
On image %d 300
On image %d 400
On image %d 500
On image %d 600
On image %d 700

Configure the DeepLabV3 Mobilenet model for custom dataset

train_data_tfrecords = './oxford_iiit_pet_tfrecords/train*'
val_data_tfrecords = './oxford_iiit_pet_tfrecords/val*'
test_data_tfrecords = './oxford_iiit_pet_tfrecords/test*'
trained_model = './trained_model/'
export_dir = './exported_model/'

In Model Garden, the collections of parameters that define a model are called configs. Model Garden can create a config based on a known set of parameters via a factory.

Use the mnv2_deeplabv3_pascal experiment configuration, as defined by tfm.vision.configs.semantic_segmentation.mnv2_deeplabv3_pascal.

Please find all the registered experiements here

The configuration defines an experiment to train a DeepLabV3 model with MobilenetV2 as backbone and ASPP as decoder.

There are also other alternative experiments available such as

seg_deeplabv3_pascal
seg_deeplabv3plus_pascal
seg_resnetfpn_pascal
mnv2_deeplabv3plus_cityscapes

and more. One can switch to them by changing the experiment name argument to the get_exp_config function.

exp_config = tfm.core.exp_factory.get_exp_config('mnv2_deeplabv3_pascal')

model_ckpt_path = './model_ckpt/'
if not os.path.exists(model_ckpt_path):
  os.mkdir(model_ckpt_path)

!gsutil cp gs://tf_model_garden/cloud/vision-2.0/deeplab/deeplabv3_mobilenetv2_coco/best_ckpt-63.data-00000-of-00001 './model_ckpt/'
!gsutil cp gs://tf_model_garden/cloud/vision-2.0/deeplab/deeplabv3_mobilenetv2_coco/best_ckpt-63.index './model_ckpt/'

Copying gs://tf_model_garden/cloud/vision-2.0/deeplab/deeplabv3_mobilenetv2_coco/best_ckpt-63.data-00000-of-00001...

Operation completed over 1 objects/28.2 MiB.                                     
Copying gs://tf_model_garden/cloud/vision-2.0/deeplab/deeplabv3_mobilenetv2_coco/best_ckpt-63.index...

Operation completed over 1 objects/12.5 KiB.

Adjust the model and dataset configurations so that it works with custom dataset.

num_classes = 3
WIDTH, HEIGHT = 128, 128
input_size = [HEIGHT, WIDTH, 3]
BATCH_SIZE = 16

# Backbone Config
exp_config.task.init_checkpoint = model_ckpt_path + 'best_ckpt-63'
exp_config.task.freeze_backbone = True

# Model Config
exp_config.task.model.num_classes = num_classes
exp_config.task.model.input_size = input_size

# Training Data Config
exp_config.task.train_data.aug_scale_min = 1.0
exp_config.task.train_data.aug_scale_max = 1.0
exp_config.task.train_data.input_path = train_data_tfrecords
exp_config.task.train_data.global_batch_size = BATCH_SIZE
exp_config.task.train_data.dtype = 'float32'
exp_config.task.train_data.output_size = [HEIGHT, WIDTH]
exp_config.task.train_data.preserve_aspect_ratio = False
exp_config.task.train_data.seed = 21 # Reproducable Training Data

# Validation Data Config
exp_config.task.validation_data.input_path = val_data_tfrecords
exp_config.task.validation_data.global_batch_size = BATCH_SIZE
exp_config.task.validation_data.dtype = 'float32'
exp_config.task.validation_data.output_size = [HEIGHT, WIDTH]
exp_config.task.validation_data.preserve_aspect_ratio = False
exp_config.task.validation_data.groundtruth_padded_size = [HEIGHT, WIDTH]
exp_config.task.validation_data.seed = 21 # Reproducable Validation Data
exp_config.task.validation_data.resize_eval_groundtruth = True # To enable validation loss

Adjust the trainer configuration.

logical_device_names = [logical_device.name
                        for logical_device in tf.config.list_logical_devices()]

if 'GPU' in ''.join(logical_device_names):
  print('This may be broken in Colab.')
  device = 'GPU'
elif 'TPU' in ''.join(logical_device_names):
  print('This may be broken in Colab.')
  device = 'TPU'
else:
  print('Running on CPU is slow, so only train for a few steps.')
  device = 'CPU'


train_steps = 2000
exp_config.trainer.steps_per_loop = int(train_ds.__len__().numpy() // BATCH_SIZE)

exp_config.trainer.summary_interval = exp_config.trainer.steps_per_loop # steps_per_loop = num_of_validation_examples // eval_batch_size
exp_config.trainer.checkpoint_interval = exp_config.trainer.steps_per_loop
exp_config.trainer.validation_interval = exp_config.trainer.steps_per_loop
exp_config.trainer.validation_steps =  int(train_ds.__len__().numpy() // BATCH_SIZE) # validation_steps = num_of_validation_examples // eval_batch_size
exp_config.trainer.train_steps = train_steps
exp_config.trainer.optimizer_config.warmup.linear.warmup_steps = exp_config.trainer.steps_per_loop
exp_config.trainer.optimizer_config.learning_rate.type = 'cosine'
exp_config.trainer.optimizer_config.learning_rate.cosine.decay_steps = train_steps
exp_config.trainer.optimizer_config.learning_rate.cosine.initial_learning_rate = 0.1
exp_config.trainer.optimizer_config.warmup.linear.warmup_learning_rate = 0.05

This may be broken in Colab.

Print the modified configuration.

pp.pprint(exp_config.as_dict())
display.Javascript('google.colab.output.setIframeHeight("500px");')

{   'runtime': {   'all_reduce_alg': None,
                   'batchnorm_spatial_persistent': False,
                   'dataset_num_private_threads': None,
                   'default_shard_dim': -1,
                   'distribution_strategy': 'mirrored',
                   'enable_xla': False,
                   'gpu_thread_mode': None,
                   'loss_scale': None,
                   'mixed_precision_dtype': None,
                   'num_cores_per_replica': 1,
                   'num_gpus': 0,
                   'num_packs': 1,
                   'per_gpu_thread_count': 0,
                   'run_eagerly': False,
                   'task_index': -1,
                   'tpu': None,
                   'tpu_enable_xla_dynamic_padder': None,
                   'use_tpu_mp_strategy': False,
                   'worker_hosts': None},
    'task': {   'allow_image_summary': True,
                'differential_privacy_config': None,
                'eval_input_partition_dims': [],
                'evaluation': {   'report_per_class_iou': True,
                                  'report_train_mean_iou': True},
                'export_config': {'rescale_output': False},
                'freeze_backbone': True,
                'init_checkpoint': './model_ckpt/best_ckpt-63',
                'init_checkpoint_modules': ['backbone', 'decoder'],
                'losses': {   'class_weights': [],
                              'gt_is_matting_map': False,
                              'ignore_label': 255,
                              'l2_weight_decay': 4e-05,
                              'label_smoothing': 0.0,
                              'loss_weight': 1.0,
                              'mask_scoring_weight': 1.0,
                              'top_k_percent_pixels': 1.0,
                              'use_binary_cross_entropy': False,
                              'use_groundtruth_dimension': True},
                'model': {   'backbone': {   'mobilenet': {   'filter_size_scale': 1.0,
                                                              'model_id': 'MobileNetV2',
                                                              'output_intermediate_endpoints': False,
                                                              'output_stride': 16,
                                                              'stochastic_depth_drop_rate': 0.0},
                                             'type': 'mobilenet'},
                             'decoder': {   'aspp': {   'dilation_rates': [],
                                                        'dropout_rate': 0.0,
                                                        'level': 4,
                                                        'num_filters': 256,
                                                        'output_tensor': False,
                                                        'pool_kernel_size': [],
                                                        'spp_layer_version': 'v1',
                                                        'use_depthwise_convolution': False},
                                            'type': 'aspp'},
                             'head': {   'decoder_max_level': None,
                                         'decoder_min_level': None,
                                         'feature_fusion': None,
                                         'level': 4,
                                         'logit_activation': None,
                                         'low_level': 2,
                                         'low_level_num_filters': 48,
                                         'num_convs': 0,
                                         'num_filters': 256,
                                         'prediction_kernel_size': 1,
                                         'upsample_factor': 1,
                                         'use_depthwise_convolution': False},
                             'input_size': [128, 128, 3],
                             'mask_scoring_head': None,
                             'max_level': 6,
                             'min_level': 3,
                             'norm_activation': {   'activation': 'relu',
                                                    'norm_epsilon': 0.001,
                                                    'norm_momentum': 0.99,
                                                    'use_sync_bn': True},
                             'num_classes': 3},
                'name': None,
                'train_data': {   'additional_dense_features': [],
                                  'apply_tf_data_service_before_batching': False,
                                  'aug_policy': None,
                                  'aug_rand_hflip': True,
                                  'aug_scale_max': 1.0,
                                  'aug_scale_min': 1.0,
                                  'autotune_algorithm': None,
                                  'block_length': 1,
                                  'cache': False,
                                  'crop_size': [],
                                  'cycle_length': 10,
                                  'decoder': {   'simple_decoder': {   'attribute_names': [   ],
                                                                       'mask_binarize_threshold': None,
                                                                       'regenerate_source_id': False},
                                                 'type': 'simple_decoder'},
                                  'deterministic': None,
                                  'drop_remainder': True,
                                  'dtype': 'float32',
                                  'enable_shared_tf_data_service_between_parallel_trainers': False,
                                  'enable_tf_data_service': False,
                                  'file_type': 'tfrecord',
                                  'global_batch_size': 16,
                                  'groundtruth_padded_size': [],
                                  'image_feature': {   'feature_name': 'image/encoded',
                                                       'mean': (   123.675,
                                                                   116.28,
                                                                   103.53),
                                                       'num_channels': 3,
                                                       'stddev': (   58.395,
                                                                     57.120000000000005,
                                                                     57.375)},
                                  'input_path': './oxford_iiit_pet_tfrecords/train*',
                                  'is_training': True,
                                  'output_size': [128, 128],
                                  'prefetch_buffer_size': None,
                                  'preserve_aspect_ratio': False,
                                  'resize_eval_groundtruth': True,
                                  'seed': 21,
                                  'sharding': True,
                                  'shuffle_buffer_size': 1000,
                                  'tf_data_service_address': None,
                                  'tf_data_service_job_name': None,
                                  'tfds_as_supervised': False,
                                  'tfds_data_dir': '',
                                  'tfds_name': '',
                                  'tfds_skip_decoding_feature': '',
                                  'tfds_split': '',
                                  'trainer_id': None,
                                  'weights': None},
                'train_input_partition_dims': [],
                'validation_data': {   'additional_dense_features': [],
                                       'apply_tf_data_service_before_batching': False,
                                       'aug_policy': None,
                                       'aug_rand_hflip': True,
                                       'aug_scale_max': 1.0,
                                       'aug_scale_min': 1.0,
                                       'autotune_algorithm': None,
                                       'block_length': 1,
                                       'cache': False,
                                       'crop_size': [],
                                       'cycle_length': 10,
                                       'decoder': {   'simple_decoder': {   'attribute_names': [   ],
                                                                            'mask_binarize_threshold': None,
                                                                            'regenerate_source_id': False},
                                                      'type': 'simple_decoder'},
                                       'deterministic': None,
                                       'drop_remainder': False,
                                       'dtype': 'float32',
                                       'enable_shared_tf_data_service_between_parallel_trainers': False,
                                       'enable_tf_data_service': False,
                                       'file_type': 'tfrecord',
                                       'global_batch_size': 16,
                                       'groundtruth_padded_size': [128, 128],
                                       'image_feature': {   'feature_name': 'image/encoded',
                                                            'mean': (   123.675,
                                                                        116.28,
                                                                        103.53),
                                                            'num_channels': 3,
                                                            'stddev': (   58.395,
                                                                          57.120000000000005,
                                                                          57.375)},
                                       'input_path': './oxford_iiit_pet_tfrecords/val*',
                                       'is_training': False,
                                       'output_size': [128, 128],
                                       'prefetch_buffer_size': None,
                                       'preserve_aspect_ratio': False,
                                       'resize_eval_groundtruth': True,
                                       'seed': 21,
                                       'sharding': True,
                                       'shuffle_buffer_size': 1000,
                                       'tf_data_service_address': None,
                                       'tf_data_service_job_name': None,
                                       'tfds_as_supervised': False,
                                       'tfds_data_dir': '',
                                       'tfds_name': '',
                                       'tfds_skip_decoding_feature': '',
                                       'tfds_split': '',
                                       'trainer_id': None,
                                       'weights': None} },
    'trainer': {   'allow_tpu_summary': False,
                   'best_checkpoint_eval_metric': 'mean_iou',
                   'best_checkpoint_export_subdir': 'best_ckpt',
                   'best_checkpoint_metric_comp': 'higher',
                   'checkpoint_interval': 344,
                   'continuous_eval_timeout': 3600,
                   'eval_tf_function': True,
                   'eval_tf_while_loop': False,
                   'loss_upper_bound': 1000000.0,
                   'max_to_keep': 5,
                   'optimizer_config': {   'ema': None,
                                           'learning_rate': {   'cosine': {   'alpha': 0.0,
                                                                              'decay_steps': 2000,
                                                                              'initial_learning_rate': 0.1,
                                                                              'name': 'CosineDecay',
                                                                              'offset': 0},
                                                                'type': 'cosine'},
                                           'optimizer': {   'sgd': {   'clipnorm': None,
                                                                       'clipvalue': None,
                                                                       'decay': 0.0,
                                                                       'global_clipnorm': None,
                                                                       'momentum': 0.9,
                                                                       'name': 'SGD',
                                                                       'nesterov': False},
                                                            'type': 'sgd'},
                                           'warmup': {   'linear': {   'name': 'linear',
                                                                       'warmup_learning_rate': 0.05,
                                                                       'warmup_steps': 344},
                                                         'type': 'linear'} },
                   'preemption_on_demand_checkpoint': True,
                   'recovery_begin_steps': 0,
                   'recovery_max_trials': 0,
                   'steps_per_loop': 344,
                   'summary_interval': 344,
                   'train_steps': 2000,
                   'train_tf_function': True,
                   'train_tf_while_loop': True,
                   'validation_interval': 344,
                   'validation_steps': 344,
                   'validation_summary_subdir': 'validation'} }
<IPython.core.display.Javascript object>

Set up the distribution strategy.

# Setting up the Strategy
if exp_config.runtime.mixed_precision_dtype == tf.float16:
    tf.keras.mixed_precision.set_global_policy('mixed_float16')

if 'GPU' in ''.join(logical_device_names):
  distribution_strategy = tf.distribute.MirroredStrategy()
elif 'TPU' in ''.join(logical_device_names):
  tf.tpu.experimental.initialize_tpu_system()
  tpu = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='/device:TPU_SYSTEM:0')
  distribution_strategy = tf.distribute.experimental.TPUStrategy(tpu)
else:
  print('Warning: this will be really slow.')
  distribution_strategy = tf.distribute.OneDeviceStrategy(logical_device_names[0])

print("Done")

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')
Done

Create the `Task` object (`tfm.core.base_task.Task`) from the `config_definitions.TaskConfig`.

The Task object has all the methods necessary for building the dataset, building the model, and running training & evaluation. These methods are driven by tfm.core.train_lib.run_experiment.

model_dir = './trained_model/'

with distribution_strategy.scope():
  task = tfm.core.task_factory.get_task(exp_config.task, logging_dir=model_dir)

Originally published on the TensorFlow website, this article appears here under a new headline and is licensed under CC BY 4.0. Code samples shared under the Apache 2.0 License.

How to Train DeepLabV3 with MobileNetV2 Using TensorFlow

Content Overview

Install necessary dependencies

Import required libraries

Custom dataset preparation for semantic segmentation

Helper function to encode dataset as tfrecords

Write TFRecords to a folder

Write training data as TFRecords

Write validation data as TFRecords

Write test data as TFRecords

Configure the DeepLabV3 Mobilenet model for custom dataset

Adjust the model and dataset configurations so that it works with custom dataset.

Adjust the trainer configuration.

Print the modified configuration.

Set up the distribution strategy.

Create the Task object (tfm.core.base_task.Task) from the config_definitions.TaskConfig.

Create the `Task` object (`tfm.core.base_task.Task`) from the `config_definitions.TaskConfig`.