We often get blocked at different steps while working on a machine learning problem. In order to solve almost all these steps, I have listed down all the major challenges we face and steps we can take to overcome those. I have also categorised these challenges into different sub domain for easier understanding namely Data Preparation, Model Training and Model Deployment. Data Preparation Data collection: Getting incomplete data is usually a headache sometimes when we start collecting data. Even when we get data, it turns out to be bias data. Bias is any deviation from the truth in data collection or data analysis that can cause false conclusion. Then comes the curse of dimensionality which refers to the phenomena that occur when analyzing high dimensional data that does not occur in low dimensional spaces. Finally we have data sparsity problem. Imagine that you have a table with lots of null or impossible values. These values represents the sparsity in your data. Steps to overcome: Dedicate proper time to understand the problem and the proper datasets you need to solve the problem Enrich the data Dimension-reduction techniques Outliers: Out of range numerical values or unkown categorical value in our data It shows drastic influence on squared loss functions Steps to overcome: Discretization techniques like binning can help in reducing the squared loss functions Robust methods like Huber loss functions Missing Data: This affects in information loss and therefore affects the model’s accuracy Information bias which happens when key information is either measured, collected, or interpreted inaccurately Steps to overcome: Tree based modelling techniques can help in dealing with such problem Discretization can also help here in reducing the loss function Imputation Sparse target variables: It happens when there is a low primary event occurence rate Overwhelming preponderance of zero or missing values in target Steps to overcome: Proportional oversampling Mixture models Model Training Overfitting: Main reason behind overfitting is high variance and low bias that fails to generalize properly Steps to overcome: Regularization - It is a technique used for tuning the function by adding an additional penalty term in the error function Noise Injection - This method refers to adding "noise" artificially to the input data during the training process Cross validation - It is a technique that is used for the assessment of how the results of statistical analysis generalize to an independent data set Computational resource exploitation: Most of the times, we perform single threaded algorithm implementation Heavy reliability on interpreted languages Steps to overcome: Train many single threaded models in parallel Hardware acceleration for example GPU and SSD Low level native libraries Cloud - Google colab notebooks Ensemble models: Single model sometimes fails to provide adequate accuracy Single model also leads to overfitting - high variance and low bias that fails to generalize properly Steps to overcome: Ensemble models like bagging, boosting and stacking can help overcome the problem Custom or manual combination of prediction sometime help in achieving better accuracy Hyper parameter tuning: Combinatorial explosion which is a rapid growth of the complexity of a problem due to how the combinatorics of the problem is affected by the input, happens with hyper parameter in conventional algorithms. Steps to overcome: Local search optimization which also includes genetic algorithm Grid search or rand search techniques help in finding the best pair of hyper parameter from the ones we feed. Model Interpretation: Large number of parameters and rules makes it difficult to interpret the model Steps to overcome: Variable selection by using regularization techniques Surrogate models Interpretation methods like LIME Partial dependency plots, feature importance graphs can assist in interpreting the models Model Deployment Model deployment: Trained model logic must be used from developing environment to a operational computing system to assist an organization in making decision Steps to overcome: Web-service scoring can help people in getting the results Dashboards of the models ouput is easier for any organization to understand Model decay: From the time since the model was created, business problem and market conditions might change New observation fall out of the domain of training data Steps to overcome: Regular monitoring of model especially when the accuracy decreases Update model regularly whenever there are changes in the data or system affecting the model Thanks for reading till the end and hope you like it ! Previously published at https://medium.com/@siddhesh_jadhav/how-to-deal-with-major-challenges-in-machine-learning-1fc7e719bd0b

How to Deal With Major Challenges in Machine Learning

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

10 Tips For Junior Developers To Succeed in Code

25 Useful Websites Developers Don't Use Enough

17 Most Popular Tools For React Developers

3 Key Questions You Need To Answer Before You Push That Code

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

10 Tips For Junior Developers To Succeed in Code

25 Useful Websites Developers Don't Use Enough

17 Most Popular Tools For React Developers

3 Key Questions You Need To Answer Before You Push That Code

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps