Articles/Training & Evaluation

What AI Development for Professional Services Firms Should Actually Look Like

Most firms have seen the bad version. Discovery that goes nowhere, builds that do not survive client work, handoffs to nobody. Here is what the good version looks like.

February 2026·7 min read

A lot of professional services firms have been through at least one AI engagement that did not produce what they expected. The pattern is recognizable: a discovery phase that generated a detailed strategy document, a build that worked in demos but struggled on real client work, a handoff that assumed capabilities the team did not have. Sometimes the relationship just faded. Sometimes the tool got quietly shelved.

The firms that come back for another attempt usually have a clear sense of what went wrong. What they are less sure about is what the right version should look like.

Discovery that produces a specification, not a strategy

The worst version of discovery is when a vendor spends several weeks interviewing your team, synthesizes their findings into a document full of opportunity assessments, and hands it back to you with a proposal attached. You now have a strategy. You do not have anything that runs.

Good discovery has a specific output: a working specification for one thing worth building. That means the workflow is mapped in enough detail to design a system around it. The inputs and outputs are defined. The edge cases are identified. The success criteria are written down in terms specific enough to test against. And the first candidate for development - the one with the clearest return and the most achievable scope - is chosen on reasoning you can follow.

Discovery should end with an unambiguous answer to the question: what exactly are we building, and how will we know when it is working?

A build the firm is inside of, not briefed about

The other common failure mode: the vendor disappears for six weeks, then comes back with a demo. The demo looks good. You approve it, the tool gets deployed, and then you discover the gaps. The edge cases your team knew to anticipate but never articulated. The output format that does not match what your clients actually receive. The logic that works on clean examples but not on messy real work.

These gaps are predictable. They exist because the people who know the work were not involved in the build closely enough to catch them while there was still time to fix them cheaply.

Good development is iterative and the firm is inside it. That does not mean your team is writing code. It means you are seeing the tool work against real examples from your practice - not toy scenarios constructed to demonstrate the concept - before the build is finished. You are giving feedback that shapes the output. You are catching the gaps while they are still cheap to address.

The test that matters is not "does this work on a clean example?" It is "does this produce output we would actually use on the Smith matter?" If you cannot run that test during the build, you will run it after deployment - which is a much more expensive place to find problems.

What a good build looks like compared to the common version

Common version

Long discovery that produces a strategy document
Build happens mostly out of view
Testing against constructed demo examples
Gaps discovered after deployment
Handoff requires ongoing vendor support to maintain
Tool lives on vendor infrastructure

What it should look like

Discovery produces a specific build specification
Firm is involved in iterative review throughout
Testing against real work from real matters
Edge cases caught and handled before deployment
Handoff includes documentation your team can follow
Tool runs on your infrastructure, you own the code

A handoff that actually transfers capability

The handoff question is where a lot of AI development engagements quietly reveal what they were really built for. If the handoff requires the vendor to remain involved - if the tool breaks without ongoing support, if your team cannot modify it as your workflows evolve, if the documentation is too thin to follow - then what was delivered was not a tool. It was a dependency.

A real handoff transfers capability. Your team understands how the tool works, what it is doing, and where it can fail. The documentation is specific enough for someone competent to maintain it. The code lives in your environment. The tool works when the engagement ends because it was built to work without the people who built it.

This matters particularly for professional services firms, where the workflows the tool is encoding are themselves competitive advantages. If your contract risk scoring logic, your client intake process, or your research methodology lives on a vendor's platform and requires their ongoing involvement to maintain, you have handed over control of something that should belong to the firm.

The precondition that changes everything

There is one thing that distinguishes firms that get good outcomes from custom AI development from those that do not. It is not firm size, practice area, or technical sophistication. It is preparation.

Firms with documented workflows, connected data, and clear output criteria get better tools built faster at lower cost. Firms without that foundation spend development budget on discovery work that could have been done more cheaply beforehand. The preparation work - mapping workflows, connecting data sources, building the internal knowledge base that an AI system can actually use - is what makes custom development tractable.

That is worth doing before you start a development conversation, not during it. See also: what to have in place before you build custom AI agents and how to figure out what to build first.