paint-brush
What Data Scientists Should Know About Multi-output and Multi-label Trainingby@sharmi1206
1,582 reads
1,582 reads

What Data Scientists Should Know About Multi-output and Multi-label Training

by Sharmistha ChatterjeeJanuary 18th, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Multi-output Machine Learning deals with complex decision-making in many real-world applications. Multi-task learning aims at learning multiple related tasks simultaneously, where each task outputs one single label, and learning multiple tasks is similar to learning multiple outputs. The first approach of training an inductive classifier or regression model can be a time-consuming task — particularly so when training data sets are very large. The second approach enables to create a model that simultaneously predicts a set of two or more classification labels, regression values, or even joint classification-regression outputs from only a single training iteration.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - What Data Scientists Should Know About Multi-output and Multi-label Training
Sharmistha Chatterjee HackerNoon profile picture

Multi-output Machine Learning — MixedRandomForest

Introduction

Multi-output learning subsumes many learning problems in multiple disciplines and deals with complex decision-making in many real-world applications. It has a multivariate nature and the multiple outputs may have complex interactions. Many strucurted inference problems have been architected by multi-output machine learning. The output values have diverse data types, depending on the type of ML problem.

For example,

0/1 based Binary output values can refer to a multi-label classification problem. Such examples also range from :

  • Nominal output values to a multi-dimensional classification problem
  • Ordinal output values to label ranking problem
  • Real-valued outputs to a multi-target regression problem.

Special Use Cases of Multi-output Learning

Multi-class Classification: Multi-class classification can be categorized as a traditional single-output learning paradigm when the output class is represented by the integer encoding. It can also be extended to a multi-output learning scenario if each output class is represented by the one-hot vector.

Fine-grained Classification: In this type of classification, though the vector representation is the same as fine-grained classification outputs to the multi-class classification outputs, their internal structures of the vectors are different. Labels under the same parent tend to have a closer relationship than the ones under different parents in the label hierarchy.

Multi-task Learning: Multi-task learning aims at learning multiple related tasks simultaneously, where each task outputs one single label, and learning multiple tasks is similar to learning multiple outputs. It leverages the relatedness between tasks to improve the performance of learning models. The major difference between multi-task learning and multi-output learning is that different tasks might be trained on different training sets or features in multi-task learning, while output variables usually share the same training data or features in multi-output learning.

In Multi-output pattern recognition problems, each instance in the dataset has two or more output values (nominal or real-valued)— i.e., the output value is a vector rather than a scalar. They are solved by any of the following methods:

  • By transforming the multi-label (or multi-output) into multiple single-output problems.
  • By adopting a pattern recognition algorithm so that it directly handles multi-output data.

However, there are certain pros and cons to the above-mentioned approaches:

The first approach of training an inductive classifier or regression model can be a time-consuming task — particularly so when training data sets are very large. When multiple models need to be trained using the same input data — but with different output data — the training time is unusually too high making it unsuitable for large datasets. Consequently, this also impacts the processing requirements.

The second adaption approach enables to create a model that simultaneously predicts a set of two or more classification labels, regression values, or even joint classification-regression outputs from only a single training iteration. If the prediction tasks are related (i.e., there is a correlation or covariance between output values), training a coherent multi-output model can potentially bring benefits in the form of increased predictive performance compared to training multiple disjoint models.

Here in this blog, we discuss a Mixed/Multi-target RandomForest model, that supports :

"multi-output problems with multiple classification outputs, multiple regression outputs, as well as arbitrary joint classification-regression outputs”.

Further, the algorithm provides support for mixed-task multi-task learning, i.e., it is possible to train the model on any number of classification tasks and regression tasks, simultaneously. The Random Forest predictor lets each individual ensemble member vote for the most probable output according to its learned decision rule. The ensemble members’ votes are tallied and aggregated, as a combined classifier — with mode for classification and mean for regression — to yield a common ensemble output.

Multi-Output vs Multi-Label Classification

In multi-output classification, the goal is to learn a classification rule whose output is a set, or vector, of labels i.e. y1 belong to Y1, y2 belonging to Y2, yn belonging to Yn and v, the vector is composed of y1, y2, y3, …yn.

Multi-label classification involves classifying instances into several labels that share semantics, for example, the problem of classifying songs according to their genre — it is possible to classify a dance as either ballet or traditional, but also possible to classify a dance as both (i.e. ”ballet traditional”). Here, pop and rock share semantics: they both relate to the songs’ genre, and are thus two different values of the same label. There is also no a priori knowledge regarding the size of the output — it is very possible that a song cannot be classified as any previously known genre, or that it is best classified as several different genres.

The problem of multi-output classification is effectively the opposite of multi-label classification — the output values do not share semantics, but the number of outputs is known a priori.

In a classification problem where the goal is to simultaneously predict temperature (low, medium, or high) and pressure (low, medium, or high) inside a pressure cooker — in this case, the model is expected to output exactly two values, one value for the temperature label and another for the pressure label.

The machine learning task of solving a multi-output problem thus involves building a predictive model that simultaneously outputs a set of (two or more) labels that measure different concepts — essentially two or more separate (although related) classification problems are solved concurrently within the same model.

A multi-output classification is multitask-classification — which illustrates the fact that a multi-output classification problem is effectively equivalent to multiple simultaneous (multi-tasked) single-label classification problems.

Multi-label problems can be transformed into multi-output problems, the opposite is not necessarily true.

Joint Classification-Regression Problems

In the multi-output problem containing both classification tasks and regression tasks, solving unrelated joint classification-regression problems need not be more difficult than training a set of classifiers and regressors on the individual tasks. If the tasks are related, the algorithm adaption method can be used to provide the best results in terms of predictive performance.

Joint Classification-Regression Trees can be solved with a tree induction algorithm that simultaneously solves one classification task and one regression task. Much like MT-DT and MRT, the joint classification-regression tree (JCRT) solves multiple simultaneous prediction tasks by modifying the node-split function in the inductive step and marking terminal nodes with appropriate values for each task. Due to the nature of joint classification-regression problems, the modified split function is required to consider the error of both the classification part and the regression part simultaneously.

The split function uses an entropy function consisting of three parts:

Shannon entropy is computed for the classification part. A weighted differential entropy is calculated for the regression part Shannon entropies and differential entropies exist in different ranges, hence a normalization step is applied to combine the two entropies.

Joint classification-regression forests, when evaluated on spatially structured data in the form of CT scans perform two tasks are:

  • Classify pixels according to objects (organs)
  • Estimate the distance from each pixel to the object’s boundary.

Research results demonstrate that joint forests are not only able to provide accurate estimates of object boundaries, but also improve the accuracy of the classification of objects.

Reference Implementation Classification-Regression with MixedRandomForest

The below example has referred Github source https://github.com/donlnz/morfist

from morfist import MixedRandomForest, cross_validation
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.model_selection import cross_val_score
import sklearn.datasets as dst
import numpy as np# Config
n_trees = 20x_reg, y_reg = dst.load_linnerud(return_X_y=True)
x_cls, y_cls = dst.load_digits(return_X_y=True)
x_mix_1, y_mix_1 = x_reg, np.vstack([y_reg, y_reg < y_reg.mean()]).T
x_mix_2, y_mix_2 = x_cls, np.vstack([y_cls, y_cls]).T

The available default set of hyper-parameters for MixedRandomForest are:

n_estimators=10,
max_features='sqrt',
min_samples_leaf=5,
choose_split='mean',
class_targets=None

The below code snippet demonstrates comparative results between Scikit-learn’s RandomForest’s Classifier and Multi-output MixedRandomForest.

cls_rf = MixedRandomForest(
   n_estimators=n_trees,
   min_samples_leaf=1,
   class_targets=[0]
)

cls_skrf = RandomForestClassifier(n_estimators=n_trees)

cls_scores = cross_validation(
   cls_rf,
   x_cls,
   y_cls,
   class_targets=[0],
   folds=10
)

scores = cross_val_score(
   cls_skrf,
   x_cls,
   y_cls
)

print('Classification with Single output: ')
print('\t morfist (accuracy): {}'.format(cls_scores.mean()))
print('\t scikit-learn (accuracy): {}'.format(scores.mean()))

Results

morfist (accuracy): 0.9632721202003339
scikit-learn (accuracy): 0.928187335049821

The below code snippet demonstrates comparative results between Scikit-learn’s RandomForestRegressor and Multi-output MixedRandomForest.

reg_rf = MixedRandomForest(
   n_estimators=n_trees,
   min_samples_leaf=5
)

reg_skrf = RandomForestRegressor(n_estimators=n_trees)

reg_scores = cross_validation(
   reg_rf,
   x_reg,
   y_reg,
   folds=10
)
scores = cross_val_score(
   reg_skrf,
   x_reg,
   y_reg,
   scoring='neg_mean_squared_error'
)
print('Multivariate Regression multiple outputs: ')
print('\t morfist (rmse): {}'.format(reg_scores.mean()))
print('\t scikit-learn (rmse): {}'.format(np.sqrt(-scores.mean())))

Results

morfist (rmse): 11.758534341303097
scikit-learn (rmse): 17.79445305492162

The below code snippet demonstrates multiple output labels on a classification dataset.

mix_rf = MixedRandomForest(
   n_estimators=n_trees,
   min_samples_leaf=1,
   class_targets=[0]
)
mix_scores = cross_validation(
   mix_rf,
   x_mix_2,
   y_mix_2,
   folds=10,
   class_targets=[0]
)
print('Mixed output on Classification Dataset: ')
print('\t Task 1 (original) (accuracy): {}'.format(mix_scores[0]))
print('\t Task 2 (additional) (rmse): {}'.format(mix_scores[1]))

Results

Mixed output on Classification Dataset:
task 1 (original) (accuracy): 0.9627156371730662
task 2 (additional) (rmse): 1.117109567700808

Other Python Libraries

scikit-multiflow : A python-based multi-output stream/batch learning framework, can be used within Jupyter Notebooks and can be used with scikit-learn. The following figure depicts the multi-output capabilities of different java and python libraries.

Multi/Single output learning frameworks, Source

Performance Evaluation of Multi-Output Learning

Label identification and evaluation are one of the primary steps to quantify the quality of labels and label representations. It also plays a key role in the performance of multi-output tasks. Learning models with different multi-outputs can be used to determine and then improve upon the label quality in terms of different tasks. Labels can be evaluated from three different perspectives.

To add valid checks in order to determine whether the annotation has good quality (Step A).To determine and infer on the possible labels, so as to conclude how well the chosen label representation can actually represent the labels (Step B).To determine the coverage, so as to evaluate how the provided label set well covers the dataset (Label Set). After the evaluation, there occurs a human intervention when the human expert explores and addresses the underlying issues, and provides feedback to improve different aspects of labels accordingly.

Real-World Applications of Multi-output Learning

The sample figure below explains the real application of different labels applied to an image in the social network. These labels correspond to varying output structures fitted for multi-output learning.

Reference

Independent Vector: Independent vector is the vector with independent dimensions, where each dimension represents a particular label that does not necessarily depend on other labels. This includes tags, attributes, bag-of-words, bad-of-visual-words, hash codes and etc. of a given data.

Distribution: Provides the information of probability distribution for each dimension, like a tag with the largest weight.

Ranking: It shows the tags ordered from the most important to the least. Examples of its application are text categorization ranking, question answering, and visual object recognition.

Text: Text can be in the form of keywords, sentences, paragraphs or even documents. Applications for text outputs can be document summarization and paragraph generation.

Sequence: Sequence (used in speech recognition, language translation) is usually a sequence of elements selected from a label set or word set. Each element prediction is dependent on past predicted outputs and present input. An output sequence often corresponds with an input sequence.

Tree: The Tree is represented as a hierarchical labeled structure to display the outputs. The outputs have the hierarchical internal structure where each output belong to a label as well as its ancestors in the tree, useful in syntactic parsing.

Image: One of the output objects are images consisting of multiple pixel values. A single pixel is predicted depending on the input and the pixels around it to consider an overall region prediction. Image output applications include super-resolution construction, text-to-image synthesis, which generates images from natural language descriptions, and face generation.

Bounding Box: The bounding box is often used to find the exact locations of the objects that appeared in an image and it is commonly used in object recognition and object detection

Link: A partitioned social network with edges represents the friendship of the users, the goal is to predict whether two currently unlinked users will be friends in the future.

Graph: A graph made up of a set of nodes and edges and it is used to model the relations between objects, where each object is represented by a node. The connected objects are linked by an edge.

Others: Contour and polygons are similar to the bounding box which can be used to localize objects in an image. In information retrieval, the output can be a list of data objects that are similar to the given query. In image segmentation, the output is usually segmentation masks for different objects, used for detecting common saliency on the multiple images.

Conclusion

The key motivation for solving multi-output pattern recognition problems using algorithm adaption is that a single model trained on a set of related tasks will show an improvement in predictive performance as compared to a set of individual models, each trained on a single task.

It is seen that MixedRandomForest gives better accuracy for Classification and better RMSE for Regression when applied on Scikit-learn’s dataset — linnerud dataset (multivariate regression), and digits dataset (classification). 

But MixedRandomForest takes more time to run, than scikit-learn. There’s a slight increase in accuracy on the multi-output classification dataset. You can read more on the deep learning mechanisms of Keras’s multi-output

classification at https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/ and 

multi-label classification at https://www.pyimagesearch.com/2018/05/07/multi-label-classification-with-keras/skflow-tensorflow also provides a method to evaluate the multi-output regression model.

References