The Ledger
Engineering

How we cut build times in half without changing the stack

Our continuous integration pipeline had quietly grown into a bottleneck. Every push meant a fourteen-minute wait, and on a busy afternoon the queue stretched far longer. We assumed the only fix was a bigger machine or a different toolchain. It turned out we were wrong on both counts.

Before touching a single line of configuration, we wrote down what we actually believed was slow. Most of the team pointed at the test suite. A few blamed the dependency install. Nobody mentioned the step that, once we looked, was eating nearly forty percent of every run. The lesson arrived early and stayed with us: intuition about performance is almost always wrong, and it is wrong in expensive ways.

Start by measuring, not guessing

We added lightweight timing to each stage of the build and let it collect data across a few hundred runs. The picture that came back was unambiguous. The work was not where we thought it was, and the headline number hid a long tail of cache misses that only showed up under concurrency. With a real profile in hand, the changes almost suggested themselves.

The fastest build is the one you never have to run twice. Cache aggressively, invalidate precisely, and measure everything in between. - A principle we now keep on the wall

First we made the dependency cache deterministic, keyed on the lockfile rather than the branch. That alone reclaimed several minutes on the common path. Then we split the test run so that the slow integration suite ran in parallel with the unit tests instead of after them. None of this required a new platform, a new runner, or a rewrite. The stack stayed exactly the same.

Six weeks later the median run sits just under seven minutes, down from fourteen, and the afternoon queue has all but disappeared. The surprising part was not the result but how ordinary the work was. We did not need cleverness. We needed to look honestly at what was in front of us, and then fix the part that was actually broken.