Too Long; Didn't Read
In Data Science, imbalanced datasets are no surprises. If the datasets intended for classification problems like Sentiment Analysis, Medical Imaging or other problems related to Discrete Predictive Analytics (for example-Flight Delay Prediction) have unequal number of instances (samples or datapoints) for different classes, then those datasets are said to be imbalanced. This means that there is an imbalance between the classes in the dataset due to large difference between the number of instances belonging to each class. The class having comparatively less number of instances than the other is known to be <strong>minority </strong>with respect to the class having comparatively larger number of the samples (known as <strong>majority</strong>). An example of imbalanced dataset is given below: