Use Up-Sampling and Weights to Address Imbalance Data Problem

Written by ryan-yu | Published 2020/03/24
Tech Story Tags: machine-learning | statistics | data-science | predictive-analytics | data | ml | data-analytics | coding

TLDR Imbalance data means the classes we want to predict are disproportional. Classes that make up a large proportion of the data are called majority classes. Those that make a smaller portion are minority classes. The true positive rate drops from 97% to 33% for class 1. Using balanced class weight improves recall from 33% to 96%, but incurs many false positive and precision decreases from 100% to 36%. Another approach is to apply up-sampling. This means we randomly sample with replacement from minority class to increase proportion of minority class.via the TL;DR App

no story

Published by HackerNoon on 2020/03/24