subhojit banerjee


NO you don’t need personal data for personalization

1.What? So all those articles proclaiming personalization can’t be possible unless you give them the last ounce of your personal data to make the world a better place were just shams? UNDOUBTEDLY.

2. Before I go into the details, here is the proof from a recent classification use case where a precision of 94% was achieved on a random forest model that uses aggregated, GDPR compliant features

3. The results do not use any personal data. The trick is to use carefully engineered features as is used in Resys 15 paper and about 100G of clickstream data with minor tweaks to accommodate the use case. A simple random forest classifier was used with the default parameters with no model stacking, which has the obvious advantage of scalability and model interpretability — an important element of GDPR.

4. The GDPR comes into force 25 May, 2018. This has huge ramifications for machine learning models deployed in production which use personal data i.e. companies have to simply stop using the machine learning models in production if they don’t comply with the provisions else face fines upto 20 million euro or 4% of global turnover, whichever is higher.

5. As proved in the results, personalization can be possible without using personal information albeit by putting in a little bit more brain cycles in feature engineering and understanding the domain. It was possible all along but now with GDPR coming into to force there are no excuses anymore

6. The age old wisdom of crowds still works aka aggregation and pseudonumization

7. Hit me up if you have any questions and remember if you are not paying for the product, you are the product

Topics of interest

More Related Stories