Articles/Skills & Practice

How to Figure Out What to Build First

The most interesting technical problem at your firm is almost never the right starting point. Here is how to find the one that pays off fastest.

February 2026·7 min read

Firms that decide to build custom AI tools almost always start with the wrong thing. Not the wrong tool - the wrong criteria for choosing.

The most common pattern is ambition: the leader wants to build the most transformative thing, the thing that will change how the firm works at its core. The second most common pattern is visibility: they want to build the thing that will impress clients, or that partners will notice, or that sounds most interesting to describe. Both lead to first builds that are too complex, too dependent on edge cases being handled correctly, and too hard to evaluate once they are built.

The right first build is usually unglamorous. It is the workflow that has been quietly taking up senior time every week for years. Nobody has complained about it loudly because it works - it just requires more expensive human attention than it should.

The four criteria that actually matter

A good first build has four characteristics. Frequency: it happens often enough that time savings compound quickly. Senior involvement: a partner or senior associate currently has to touch it, not because it requires their judgment on every instance, but because the firm has not found another way to maintain quality. Output clarity: there is a reasonably well-defined version of a good output - not perfectly defined, but clear enough that you could tell a new hire what done looks like. And manageable edge cases: the exceptions and unusual situations are infrequent enough that version one does not have to handle all of them.

When you find a workflow that checks all four, you have found a good first build. When you find a workflow that checks three of four, it is worth understanding which criterion it fails before proceeding.

A practical way to score candidates: estimate weekly frequency, multiply by the hours of senior time it currently requires, then discount by how unclear the output definition is. The high-scorers tend to be unglamorous but are the right place to start. You can revisit the interesting problems after you have proven the pattern works.

How to map the candidates

Start with a conversation that most firms skip: where does senior time actually go? Not where leaders think it goes, not what shows up in time-tracking software. Ask a few senior people to walk through their last two weeks in concrete terms. What did they actually do? What could have been done by a more junior person if the quality bar had been clearly defined? What did they do themselves because it was faster than explaining it?

That last question surfaces the best candidates. The thing that a senior person does themselves not because it genuinely requires their judgment, but because explaining the standard is harder than just doing it - that is the thing most worth building a tool around. The standard already exists in that person's head. The work of building the tool is mostly the work of externalizing it.

After mapping, look for the intersection of high volume and senior touch. A task that happens fifty times a week but is done by junior staff with no senior involvement is a different problem - there may be a training or process issue, not necessarily a build opportunity. A task that requires a senior person but only happens twice a year does not have enough volume to make the build worthwhile. The sweet spot is the task that is recurring and that a senior person either does directly or has to review carefully.

Output clarity is the thing most people underestimate

The hardest part of scoping a first build is usually not the technical side. It is forcing a precise definition of what good output looks like. Most firms have a tacit standard - experienced people know it when they see it - but that standard has never been written down at the level of specificity that a tool requires.

This is worth working through before building, not after. Ask the people who currently do the work to describe a perfect output in detail. Then ask them to describe a common failure mode - not a catastrophic failure, but the subtle version of a not-quite-right output that gets quietly fixed before the client sees it. Those failure modes are the things the tool needs to be designed around from the start.

If the senior people on your team cannot agree on what good output looks like, that is important information. It means the standard is more contested than you realized, and building a tool will surface that disagreement rather than resolve it. Better to resolve it first.

What version one is actually for

The goal of the first build is not to solve the problem entirely. It is to prove that this category of problem is solvable with a custom tool, and to do it in a way that produces real data about what works and what needs refinement.

Version one should handle the common case well. It can decline gracefully on edge cases. It can flag uncertainty for human review. What it cannot do is try to handle everything - that scope expansion is what turns a focused, useful first build into a sprawling second build that never quite works.

Firms that build the first tool with the right scope almost always build a second. The proof of concept generates confidence and surfaces the next obvious candidate. Firms that overbuild the first tool often stall, spend months trying to make an overly ambitious tool perform consistently, and end up with something that gets used inconsistently or quietly abandoned.

Wrong first build

Solves the most ambitious problem
Handles every edge case in v1
Chosen for visibility or impressiveness
Output definition unclear at the start

Right first build

Solves a frequent, senior-touched workflow
Handles the common case, flags the rest
Chosen because the ROI math is obvious
Output definition explicit before a line is written

The right first build will probably not be the thing you would describe at a conference. It will be the thing that quietly saves your most expensive people three hours a week, produces consistent output without requiring their involvement, and makes it obvious what to build next.

Figuring out what to build is the first conversation in any Apparatus custom development engagement. Before any technical work starts, we map your firm's candidate workflows and apply these criteria to find the one that is most likely to produce a useful, maintainable tool in the shortest time. If you want to understand what that conversation looks like, understanding why engagements fail is a useful place to start.

How to Figure Out What to Build First

The four criteria that actually matter

How to map the candidates

Output clarity is the thing most people underestimate

What version one is actually for

Ready to build something your firm owns?