410 reads

Using Data Science To Deal With RTOs

by Shaurya UppalAugust 22nd, 2022

Too Long; Didn't Read

To reduce or solve any issue one must understand the root cause of the problem first, from my experience in data science at e-commerce and in general following are the reasons for RTO: Incorrect Address entered by the user User Fraud — Evilness / Revenge 😈 User’s impulsive buying with COD options and later casually refusing to receive the delivery (in general studies claim that 40% of all the COD (Cash on delivery) orders are returned to the sellers) Item available at a lower cost on some other platform

Companies Mentioned

Coin Mentioned

featured image - Using Data Science To Deal With RTOs

Return to Origin (RTO) refers to the buyer’s product sent back/returned to the place of origin. RTO is one of the most significant issues of e-commerce companies which almost makes up for more than 30% of orders and every RTO constitutes a loss of INR 100–200 per order (shipping, manpower, packaging, fuel, time, etc.)

Reasons for RTO

To reduce or solve any issue one must understand the root cause of the problem first, from my experience in data science at e-commerce and in general following are the reasons for RTO:

Incorrect Address entered by the user
User Fraud — Evilness / Revenge 😈
User’s impulsive buying with COD options and later casually refusing to receive the delivery (in general studies claim that 40% of all the COD (Cash on delivery) orders are returned to the sellers)
Item available at a lower cost on some other platform

Business Impact by RTO

Loss in (Fuel, Manpower, Packaging, etc.)forward and reverse logistics cost
Blocked Inventory

No doubt RTO cannot brought to zero but even slightest of reduction of RTO can save a lot for business.

How can we reduce RTO?

Identify non-deliverable addresses before shipping, improve success rates and increase revenue.
Detect risky and fraudulent orders which pose a high risk to RTO and substantially cut down on RTO losses.

Approach

Incorrect Address Detection

Most common is catching gibberish text in address input — Gibberish Detector (use of Markov Chain Model)
The user's IP Location (Pincode/State) is not the same as the Input Address pin code and state. Some may raise the question that it could be a user placing an order for someone else; giving leverage to only old users but for new users beware as this could be a fraudulent case. Where is my IP Location?
Another case; could be if you take the user’s geolocation/IP location — possibly a fraudulent user uses a VPN (use VPN detector service ).
Identify the presence of the address component using NLP techniques: House Number, Area Name, City, State, and Pincode. (Pass an address only if the input address has all the components) Address Matching using Fuzzy Logic

Post-identifying address components validate the correctness of Area Name, City, etc. from your past data collection (address store) or map API.

A) Identify Fraud Registrations

You may build your own model if you have lots of users on the platform. For startups, they can consider Amazon Fraud Detector which helps to catch fraud user registrations.

Sample Input and Output

B) Detect risky or fraud transactions by a User

This is the most tricky and interesting data science core modeling part. Fraud problems are highly imbalanced problems and data crunch is a common issue.

Objective*: To identify if a transaction by a user is fraudulent or risky?*

To solve for this you don’t need lots of data points, we just need quality data points. Fraudulent users also follow some common traits we just need to catch those patterns.

Features to consider:

Demographics — Age, Gender, etc.

Geographics — IP Location, GPS Location, Input Address, etc.

Technographics — Device Info like Device Model, Device Brand, etc.

Behavioral Features — # of sessions, Avg. session duration on the app, account login and logout activity, items visited per session, # of RTO by a user in past, past orders reason for RTO, lifetime value of the user, etc.

Registration Information — User email domain (legit or not), signup method, phone number, etc.

Transactional Features — Payment Mode (COD, UPI, Wallet, Credit Card, Debit Card, etc.), Checkout time, Transaction Amount, OTP auto-detected or manually entered, etc.

Using these features, build a fraud prediction classifier model given this is an imbalance classification problem use class weights to treat it. Metrics: Use F-beta and AUC-ROC.

Recall > Precision

In the fraud example, the recall would be the percentage of fraudulent transactions we manage to detect, whilst precision is the percentage of the transactions we classify as fraudulent that are actually fraudulent. When classifying fraud, we mainly care about the recall of the fraudulent class, that is we want to correctly classify as many fraudulent transactions as possible. We still care about precision, just less than recall.

Bonus Content: Benford’s Law

Benford’s law is applicable mostly everywhere

As a bonus for those who are curious, I wanted to touch on Benford’s law quickly.

Benford’s law is this curve that appears all over maths. It appears in the Fibonacci sequence and the Collatz conjecture. If you take the first digit of the amount transacted by non-fraudulent transactions, you can see that the first digit's distribution (see below image) follows the curve in orange. If you try to do the same for fraudulent data, it simply does not follow Benford’s law as nicely. One could implement this into their model by for example applying some weighting to transactions by the number they start with for example.

Conclusion

Incorrect address identification, Fraud registration/signup detection, and Risky or fraud user identification — will help us know the probability/chances of RTO. Based on the severity of the risk, we can take the right measures like:

Disabling COD for an order
Banning a user from placing an order
Changing return or cancellation policy for that user i.e. charging penalty on the user in case of frequent RTOs

If you liked this blog, don’t forget to hit the ❤️ . Stay tuned for the next one!

Connect, Follow, or Endorse me on LinkedIn if you found this read useful. To learn more about me visit: Here

Email me if you have anything interesting for me: [email protected]

Connect with 1:1 Book a Call here: https://topmate.io/shaurya

I am open to Gigs or Consults you can reach out to me on LinkedIn: https://www.linkedin.com/in/shaurya-uppal/

I am nominated for the HackerNoon 2022 Noonies, Vote for me: https://www.noonies.tech/2022/programming/2022-hackernoon-contributor-of-the-year-data