Android is the most popular mobile operating system now with more than 76% market share. This has, in part, been possible due to its open nature and the multiple vendors which ship Android on their devices. But it also has huge problems in terms of fragmentation. For e.g Android Oreo was released in August 2017 and iOS 11 was released in September 2017, Oreo has 14.2% market adoption and iOS 11 has 85% even though Oreo had a headstart. To get to 85% adoption, you need to target Android
Kitkat which was released in Oct 2013. Though this is only the tip of the iceberg.
I was the lead for Flock Android, a team messaging app which competes with the likes of Slack. We started receiving some weird complaints from users about missing notifications. Now since it was a business messaging app, notifications were critical. We tried out everything to reproduce this issue, including bombarding devices with notifications using test scripts, trying out different networks but we just could not reproduce the issue reliablity. Worse we had a couple of incidents with devices in our team but logs showed absolutely nothing. Google GCM said that the notification was sent but there was no trace of the notification in the app’s logs, it was as if the OS was swallowing notifications. And somehow popular apps like whatsapp, facebook were immune to these issues. The final nail in the coffin was the CEO himself missing notifications on some occassions. Google was unhelpful as usual with their non existent dev customer support. We were able to find a bunch of support articles on the internet where other apps had essentially listed down steps to turn off battery optimizations and that seemed to work ( Interestly whatsapp, facebook are usually added by default in these lists). But there was no way to do it automatically and while we could also do it in response to a support ticket, it can be painful for users.
After a couple of days we were able to encounter a few missed notifications on a oneplus device which we had connected to a machine taking system logs. And voila we got a few lines of what was happening.
01–12 09:05:23.649 2719 2839 I ActivityManager: [BgDetect]chkExcessCpu level: 0 doKills: true auto_mode: false uptime: 18305401–12 09:05:23.661 2719 2839 I ActivityManager: [BgDetect]detect excessive cpu on process to.talk(pid : 13406) level 0 usage 2901–12 09:05:23.892 2719 2839 I ActivityManager: [BgDetect]force stop to.talk (uid 10151) level 001–12 09:05:23.893 2719 2839 I ActivityManager: Force stopping to.talk appid=10151 user=0: from pid 271901–12 09:05:23.893 2719 2839 I ActivityManager: Killing 13406:to.talk/u0a151 (adj 200): stop to.talk
Apparently some process called BgDetect figured out that our app was taking too much CPU and decided to kill it. Android being open source, we figured that we should be able to get sources for bgdetect. But not only could we not find BgDetect in Android sources, it was non existent on oneplus sources too. We then got in touch with our marketing team and reached out to some contacts at oneplus, who directed us to their dev team. They asked for our apk and voila , in a week we were whitelisted along with the likes of Whatsapp and Facebook.
But the issue persisted on Xiaomi, Oppo and a bunch of chinese manufacturers and for those we still had to dish out steps to whitelist Flock in battery optimizations. Apparently these devices have the concept of autostart build in on top of standard android.
In Android, a push notification wakes up an application/service which in turn displays the push notification. In case an application is force killed, then the system will not wake up the application. We discovered that on some devices like Xiaomi , our app would not wake up at all and by default started as force-killed. The only way to avoid this was to be a part of an ‘autostart’ list, which denoted apps which could wake up ( ‘auto start’ ) after being closed.
Though for convenience, manufacturers choose to auto add popular applications like Facebook , Whatsapp etc by default. And since we were an upcoming startup there was no way we could make it in there by default. Since a significant chunk of our users were from India, the complaints kept getting more frequent as we got more users and people started posting negative reviews. We had to find a solution. We also realized that this was pretty common, there are multiple websites which have FAQ pages dedicated to making notifications work on Android. For e.g. Hike
Using double-acks to detect missed pushes
To solve the problem, we first had to figure out which users are missing notifications. While the standard HTTP GCM Push API doesn’t give you delivery receipts, the XMPP API does. GCM delivery receipts essentially tell you whether the device received the push notification or not. We coupled this with duplicate notification delivery receipts from inside of the app. So any device where we got ack’s from GCM but not from the device, was potentially missing pushes. Once the devices were identified, we started sending them bot messages on how to add flock in their device’s auto-start list.
Architecture to determine missed pushes
Android is an incredibly fragmented ecosystem, which only seems to be getting more fragmented. While Google
is making some determined efforts to solve this via Project Treble and other initiatives, it cannot stop manufactures from adding additional features like autostart. Autostart helps slower devices feel faster by keeping fewer apps in memory and manufactures of low-end devices will continue to use it. Android seems to be following the footsteps of Desktop Linux , with every manufacturer essentially creating their own custom distribution. The future for Android only seems to be only more fragmented.