How a Unicorn Startup in Japan Leveraged the Power of Microservices

Written by gong023 | Published 2022/05/21
Tech Story Tags: software-architecture | startup | microservices | storytelling | kubernetes | golang | terraform | software-development

TLDRvia the TL;DR App

About this Article

Hello, folks. My name is Sho Ogihara. I am Tech Lead of Mercari US Microservices Platform. We're famous for being the first startup unicorn in Japan, and also famous for Superbowl Ad in US. I have 7+ years of experience in this company, but will leave soon to become a freelance. Today I’d like to write about my experience of Mercari US with my gratitude.

We started the business in 2013, and when I joined at the end of 2014, our backend was a traditional PHP and MySQL (LAMP) stack. But now we do the microservices architecture by kubernetes, istio, gRPC, golang, Prometheus, and so on. If you're interested in our current tech stack, please visit our engineering blog. There are many fantastic posts.

How could we modernize our backend system? I’ll tell you the story but please remember this is NOT the official one and more like my perspective and experience. This post may include many mistakes. Even so, I hope this experience will be helpful for other engineers.

Mockup (<=2014)

I heard our app was originally created by Ruby on Rails for the mockup. However, it was rewritten fully in PHP. This sounds crazy because RoR was one of the most popular frameworks at that time, around 2014.

Actually this makes sense to me because our CEO had good connections with many senior PHP engineers. He recruited members through social media like Facebook and Twitter at first. For startups, it’s very important to ship the product, and it’s best to work on it with your friends. The current trend is not the only reason to select certain technology.

Old fashioned LAMP (2014~2015)

Our app was released using the PHP framework named dietcake, which is inspired by CakePHP but is thinner and easier to learn. It's designed to enable newbies to develop the new API endpoint from the very first day when they join.

In this phase, it was unique that the API endpoints were designed specifically for iOS and Android screens. For example, please assume an endpoint GET /items/:id. In the general REST design, this endpoint would return an object like this:

{
    "id": "m1000",
    "name": "iPhone 13 pro max",
    "price": "1000",
    "status": "sold",
    "buyer_id": 12345,
}

However, our API response was designed like this on purpose:

{
    "item": {
        "id": "m1000",
        "name": "iPhone 13 pro max",
        "price": "1000",
        "status": "sold"
    },
    "buyer": {
        "id": 12345,
        "name": "buyer-12345",
        "icon": "https://icon-url"
    },
    "youMayAlsoLike": [
        {
            "name": "iPhone 12",
            "price": 900,
            "status": "onSale"
        }
    ]
}

As you can see, all the related objects are included in a single response. So, for example, the clients don’t have to call another API to get the details of buyer information. This design might be different from the well-known best practice of REST. Actually, we can see a similar idea to GraphQL today.

I would not say REST is wrong and the GraphQL approach is better. My point is, that this design was the evidence that the mental model of backend developers was close to the mobile app screen. Regardless of the position, we discussed the app spec well. It might be longer than discussing the design of the backend system. At that time we were driven by the design of the app itself. And I think it’s very important, especially for startups.

Anyways the tech stack was chosen from the startup perspective, carefully and consciously. Many engineers tend to be satisfied with just being capable of using the technology. But engineering skill is to achieve the goal, not the goal itself. I think the first engineering leaders were aware of it.

Many tries, got more complex (2015~2017)

Though our app was originally started in Japan, we desired to challenge the global market early. The US version was released very quickly, it was only 1 year after the JP version. The reason why we could release so quickly is we didn’t add too many localizations except for the shipping, payment, and i18n messaging. I think it was on purpose because we were confident with the product. Facebook and Twitter don't change the app by region, right? Our idea was something like it. Also, the speed was the most important as we experienced. We should release the app and see the reaction of the market as soon as possible, rather than spending too long on the investigation. So the backend API code was shared in JP and US.

However, the market was not easy as we expected. We do the e-commerce app, so it was critical that the shipping is very different from Japan. In addition, it’s not easy to start a new team as always. Especially when it’s in other countries, and if the members start a new life there. Actually, I was one of them. Now Mercari is quite a global company and has many supports. But it was not so from the beginning.

Eventually, the company decided to spend all the product members' resources for the US version for a while. Because many members suddenly started to contribute to the US version, as a result, many "if region == US" were created in the code. We knew this practice was not good, but there were no other options because the spec was literally to solve the problems in the US. No matter what coding technique we use, it's impossible to avoid it completely. Actually "if region == US" was not the only thing. To adapt to the new market, we tried many features. It’s absolutely good to run the sprint many times rapidly, trial and error. But by the time we noticed, half of the endpoints were not being used in the production environment. I feel like our development agility was degraded in this era.

Start Microservices (2017~2018)

We needed something for the breakthrough. One day after Thanksgiving, our CPO announced to start project ‘double’, which came from the goal that we’ll make the retention rate double. The highlight of this project was building a completely different app from scratch. Even the app theme color was changed to purple from red, and the logo was also changed. It was literally from zero, and decided to rewrite everything on iOS and Android.

Unlike clients, the backend couldn’t be fully scratched because it’s too challenging to discard the existing users and items. Instead, we decided to put a new gateway before the previous backend. In this gateway, we used new technologies like Kubernetes, gRPC, golang, and more. And we decided to implement the new features as microservices in the new gateway world. The architecture overview looked like this.

However, for me, it was NOT the most important that we started the new tech stack. It was most important that we got the chance to rethink our app spec. And the new gateway became the filter to distinguish what is really necessary for us.

When we find the problems for the code, there are two approaches to solve, refactoring or rewriting. Refactoring is to fix the code without changing the behavior. It is a good approach if the problem is ‘how to write’.

But, at that time, our problem was ‘what to write’. Our code was getting exhausted because of many challenges. We should rethink what was really needed for our app. If we need to change the behavior of the code, it is not refactoring but rewriting. Usually, it’s hard for the backend to do this because the backend API has to take care of the compatibility. Luckily, we got the rare chance to do it. If you’re the backend architect and considering the refactoring, I would like to suggest involving product managers in the project and reconsidering the spec. It is the most powerful to clean up the spec in order to clean up the code. The spec tends to be the worst debt in many cases. Regarding this, the project went smoothly because CPO was on our side.

After working hard for 3 months, we finally released the new version. The current purple Mercari was started here. Though it didn’t mean everything was solved, we got the starting point here.

Start Microservices Platform (2018~2021)

We just got a new place to write microservices. The next step was how to accelerate this architecture. We worked on many solutions for this. Let me introduce some of them.

The first was to analyze the most used PHP classes in the previous monolith codebase and then implement them as microservices. The classes such as User, Item, Payment, etc are the basics of our app. It’s hard to implement any features without these classes. Otherwise, we had to continue implementing most of the features in the legacy codebase.

Also, we could get knowledge of microservice development by implementing such kinds of core object microservices. Though we had the policy to separate Kubernetes namespaces by microservices, other than this, we didn’t have so many ideas for the code design, monitoring, networking, and so on. These microservices became good experiments to think of our practices.

By the way, this is off-topic but it took about one week to finish the static analytics for the legacy PHP codebase, even though I used the machine we usually use for machine learning. The generated SVG file of the dependency graph could never be opened in the web browser. I learned how impossible to understand our codebase through the recognition of human beings.

After we got knowledge from some microservices, we made a tool named starter-kit, which is the terraform module and the microservices template. Actually, this is developed by the JP team and, not by us, the idea is described in this presentation. This tool is shared in both JP and US microservices.

I think one of the most important ideas of starter kit is to clarify the owners of the microservices. Actually, to me, microservices are not only the way to design software, but also to design human organizations. When we think of the bottleneck of development, it tends to be communication with others. The more stakeholders the developers have, the more frustration they feel. Microservices abstract the communication to the API interface, and abstract the problems to the success rate and latency. The only thing we should do is to speak the same protocol (we use gRPC). When we design such communication of teams, it is the most important prerequisite to clarify who are the owners.

Another big challenge after the starter kit was to move the legacy PHP to the Kubernetes environment. I know some people have an allergy to PHP. For them, it seems PHP is immediately evil and it must be erased as soon as possible. But IMO the genuine evil is the complexity of the code. It happens no matter the programming languages we use. Moreover, it happens no matter whether microservice or not.

Whatever my point is we’ll have to get along with the previous PHP monolith for a long time. We have to admit that the things written in the monolith are correct to some extent. At least it is NOT legacy in my definition, because there are maintainers. As long as the development is active, it is not a legacy.

In that case, the problem was the development tools were very different between microservices and the previous monolith. Because we created the microservices as a brand new place, the practices such as how to develop, how to deploy, and how to monitor, were not the same. The team was going to be separated by microservices development or not. However, the development tools are definitely NOT the essence of development. We must use the practices but must not be used by the practice. The teams must be designed by goals, never by tools or tech stack.

Then we started the project to migrate the monolith which was running on VMs, to the Kubernetes. It started with containerizing and took about 1 year to finish all the processes. The details of this project are written here. Please check it out.

Wrap up

Actually, in addition to the above project, there were many interesting projects, like integrating istio, scoring microservices, and recreating our cluster to VPC native. Unfortunately, I cannot write everything but it’ll be published in our company blog.

Through over 7 years of experience in this company, one of the most important lessons for me was that we should regard the software as the human body. Because things cannot be perfect from birth, we should let the software circulate like the body replaces the cell every day. I believe one of the reasons why we could become the best e-commerce company in JP is because we updated our app rapidly. Though we had some difficulties due to this, we could get over it by shifting microservices. I love the words of Charles Darwin:

It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is most adaptable to change.

Also, another important lesson is we should sometimes follow our intuition. By shifting microservices architecture, we got many benefits such as observability, a brand new machine learning approach, and more. But we didn’t really aim for all the benefits from the beginning. These are kind of side effects of our curiosity. If we pursue only the small and certain wins, we are likely to be conservative and lose the benefit in the long term. So if I’m asked why microservices, I will answer “one of the reasons is just for fun” with confidence.

At last, as stated in the beginning, I’ll become a freelance from next month. Please feel free to contact gon.gong.gone@gmail if you’re interested in me.


Written by gong023 | Freelance (present) <- Tech Lead of Mercari, US Microservice Platform (7+ years)
Published by HackerNoon on 2022/05/21