Return to Origin (RTO) refers to the buyer’s product sent back/returned to the place of origin. RTO is one of the most significant issues of e-commerce companies which almost makes up for more than 30% of orders and every RTO constitutes a loss of INR 100–200 per order (shipping, manpower, packaging, fuel, time, etc.)
To reduce or solve any issue one must understand the root cause of the problem first, from my experience in data science at e-commerce and in general following are the reasons for RTO:
No doubt RTO cannot brought to zero but even slightest of reduction of RTO can save a lot for business.
Identify non-deliverable addresses before shipping, improve success rates and increase revenue.
Detect risky and fraudulent orders which pose a high risk to RTO and substantially cut down on RTO losses.
You may build your own model if you have lots of users on the platform. For startups, they can consider Amazon Fraud Detector which helps to catch fraud user registrations.
Sample Input and Output
This is the most tricky and interesting data science core modeling part. Fraud problems are highly imbalanced problems and data crunch is a common issue.
Objective*: To identify if a transaction by a user is fraudulent or risky?*
To solve for this you don’t need lots of data points, we just need quality data points. Fraudulent users also follow some common traits we just need to catch those patterns.
Features to consider:
Demographics — Age, Gender, etc.
Geographics — IP Location, GPS Location, Input Address, etc.
Technographics — Device Info like Device Model, Device Brand, etc.
Behavioral Features — # of sessions, Avg. session duration on the app, account login and logout activity, items visited per session, # of RTO by a user in past, past orders reason for RTO, lifetime value of the user, etc.
Registration Information — User email domain (legit or not), signup method, phone number, etc.
Transactional Features — Payment Mode (COD, UPI, Wallet, Credit Card, Debit Card, etc.), Checkout time, Transaction Amount, OTP auto-detected or manually entered, etc.
Using these features, build a fraud prediction classifier model given this is an imbalance classification problem use class weights to treat it. Metrics: Use F-beta and AUC-ROC.
Recall > Precision
In the fraud example, the recall would be the percentage of fraudulent transactions we manage to detect, whilst precision is the percentage of the transactions we classify as fraudulent that are actually fraudulent. When classifying fraud, we mainly care about the recall of the fraudulent class, that is we want to correctly classify as many fraudulent transactions as possible. We still care about precision, just less than recall.
Benford’s law is applicable mostly everywhere
As a bonus for those who are curious, I wanted to touch on Benford’s law quickly.
Benford’s law is this curve that appears all over maths. It appears in the Fibonacci sequence and the Collatz conjecture. If you take the first digit of the amount transacted by non-fraudulent transactions, you can see that the first digit's distribution (see below image) follows the curve in orange. If you try to do the same for fraudulent data, it simply does not follow Benford’s law as nicely. One could implement this into their model by for example applying some weighting to transactions by the number they start with for example.
Incorrect address identification, Fraud registration/signup detection, and Risky or fraud user identification — will help us know the probability/chances of RTO. Based on the severity of the risk, we can take the right measures like:
Connect, Follow, or Endorse me on LinkedIn if you found this read useful. To learn more about me visit: Here
Email me if you have anything interesting for me: [email protected]
Connect with 1:1 Book a Call here: https://topmate.io/shaurya
I am open to Gigs or Consults you can reach out to me on LinkedIn: https://www.linkedin.com/in/shaurya-uppal/
I am nominated for the HackerNoon 2022 Noonies, Vote for me: https://www.noonies.tech/2022/programming/2022-hackernoon-contributor-of-the-year-data
This story was first published here.