Introduction
At Grab, we build a seamless user experience that addresses more and more of the daily lifestyle needs of people across Southeast Asia. We’re proud of our مینون rides, payments, and delivery services, and want to provide a unified experience across these offerings.
Here are a couple of examples of what مینون does for millions of people across Southeast Asia every day:
The مینون Passenger application reached superapp status more than a year ago and continues to provide hundreds of life-changing use cases in dozens of areas for millions of users.
With the big product scale, it brings with it even bigger technical challenges. Here are a couple of dimensions that can give you a sense of the scale we’re working with.
Engineering and Product Structure
Technical and product teams work in close collaboration to outserve our consumers. These teams are combined into dedicated groups to form Tech Families and focus on similar use cases and areas.
مینون consists of many Tech Families who work on food, payments, transport, and other services, which are supported by hundreds of engineers. The diverse landscape makes the development process complicated and requires the industry’s best practices and approaches.
Codebase Scale Overview
The Passenger Applications (Android and iOS) contain more than 2.5 million lines of code each and it keeps growing. We have 1000+ modules in the Android app and 700+ targets in the iOS app. Hundreds of commits are merged by all the mobile engineers on a daily basis.
To maintain the health of the codebase and product stability, we run 40K+ unit tests on Android and 30K+ unit tests on iOS, as well as thousands of UI tests and hundreds of end-to-end tests on both platforms.
Build Time Challenges
The described complexity and scale do not come without challenges. A huge codebase propels the build process to the ultimate extreme- challenging the efficiency of build systems and hardware used to compile the superapp, and creating out of the line challenges to be addressed.
Local Build Time
Local build time (the build on engineers’ laptop) is one of the most obvious challenges. More code goes in the application binary, hence the build system requires more time to compile it.
ADR Local Build Time
The Android ecosystem provides a great out-of-the-box tool to build your project called Gradle. It’s flexible and user friendly, and provides huge capabilities for a reasonable cost. But is this always true? It appears to not be the case due to multiple reasons. Let’s unpack these reasons below.
Gradle performs well for medium sized projects with say 1 million line of code. Once the code surpasses that 1 million mark (or so), Gradle starts failing in giving engineers a reasonable build time for the given flexibility. And that’s exactly what we have observed in our Android application.
At some point in time, the Android local build became ridiculously long. We even encountered cases where engineers’ laptops simply failed to build the project due to hardware resources limits. Clean builds took by the hours, and incremental builds easily hit dozens of minutes.
iOS Local Build Time
Xcode behaved a bit better compared to Gradle. The Xcode build cache was somehow bearable for incremental builds and didn’t exceed a couple of minutes. Clean builds still took dozens of minutes though. When Xcode failed to provide the valid cache, engineers had to rerun everything as a clean build, which killed the experience entirely.
CI Pipeline Time
Each time an engineer submits a Merge Request (MR), our CI kicks in running a wide variety of jobs to ensure the commit is valid and doesn’t introduce regression to the master branch. The feedback loop time is critical here as well, and the pipeline time tends to skyrocket alongside the code base growth. We found ourselves on the trend where the feedback loop came in by the hours, which again was just breaking the engineering experience, and prevented us from delivering the world’s best features to our consumers.
As mentioned, we have a large number of unit tests (30K-40K+) and UI tests (700+) that we run on a pre-merge pipeline. This brings us to hours of execution time before we could actually allow MRs to land to the master branch.
The number of daily commits, which is by the hundreds, adds another stone to the basket of challenges.
All this clearly indicated the area of improvement. We were missing opportunities in terms of engineering productivity.
The Extra Mile
The biggest question for us to answer was how to put all this scale into a reasonable experience with minimal engineering idle time and fast feedback loop.
Build Time Critical Path Optimisation
The most reasonable thing to do was to pay attention to the utilisation of the hardware resources and make the build process optimal.
This literally boiled down to the simplest approach:
- Decouple building blocks
- Make building blocks as small as possible
This approach is valid for any build system and applies for both iOS and Android. The first thing we focused on was to understand what our build graph looked like, how dependencies were distributed, and which blocks were bottlenecks.
Given the scale of the apps, it’s practically not possible to manage a dependency tree manually, thus we created a tool to help us.
Critical Path Overview
We introduced the Critical Path concept:
The critical path is the longest (time) chain of sequential dependencies, which must be built one after the other.
Even with an infinite number of parallel processors/cores, the total build time cannot be less than the critical path time.
We implemented the tool that parsed the dependency trees (for both Android and iOS), aggregated modules/target build time, and calculated the critical path.
The concept of the critical path introduced a number of action items, which we prioritised:
- The critical path must be as short as possible.
- Any huge module/target on the critical path must be split into smaller modules/targets.
- Depend on interfaces/bridges rather than implementations to shorten the critical path.
- The presence of other teams’ implementation modules/targets in the critical path of the given team is a red flag.
Project’s Scale Factor
To implement the conceptually easy action items, we ran a Grab-wide program. The programme has impacted almost every mobile team at مینون and involved 200+ engineers to some degree. The whole implementation took 6 months to complete.
During this period of time, we assigned engineers who were responsible to review the changes, provide support to the engineers across Grab, and monitor the results.
Results
Even though the overall plan seemed to be good on paper, the results were minimal – it just flattened the build time curve of