Articles/Skills & Practice

How Codex Turned an Onboarding Problem Into a Product

Apparatus built Codex because its own onboarding kept depending on whoever had time to run it. The before and after - and what the build actually involved.

February 2026·7 min read

For a long time, onboarding at Apparatus worked the way it works at most professional services firms: a senior person sat down with a new hire and walked through how things are done. Standards, SOPs, expectations, the specific ways the firm handles edge cases that no document quite captures.

When that senior person had bandwidth, it worked well. The new hire got a thorough orientation. They understood not just what the firm did but why - the reasoning behind the standards. When that person was in the middle of a client engagement, the new hire got a pile of documents and an open invitation to ask questions, which most people use much less than they should.

The result was inconsistency. Not dramatic inconsistency - the kind that surfaces in a complaint or an obvious mistake. The quieter kind, where two people who joined three months apart had meaningfully different understandings of the firm's standards, and this only became apparent when something fell through the cracks on a real engagement.

The decision to build

The goal was not to automate onboarding. A new hire's first weeks involve dozens of things that require human judgment, relationship-building, and real-time dialogue. Trying to automate all of that would be both technically harder and practically worse.

The goal was narrower: make the knowledge-transfer part consistent regardless of who had time that week. The firm had SOPs. It had documented standards. The gap was not documentation - it was that nobody was actively checking whether new hires had actually internalized the important parts before those parts became relevant on client work.

The insight that shaped the build was this: the problem is not that the knowledge doesn't exist. It is that the firm had no way to know what any given person actually understood until something happened.

What Codex does

Codex is not a chatbot. It is not a document repository. It is a system that delivers scenario-based questions about the firm's SOPs through Slack or email, tracks who has demonstrated knowledge of what, and flags gaps before they surface in client work.

The mechanics are straightforward. A new hire joins. Over the following weeks, Codex surfaces questions - not quiz-style true/false, but scenario questions. A situation comes up on an engagement. How would you handle it? What does our standard call for here? The person responds. The response is evaluated against the firm's documented standard. If there is a meaningful gap, it gets flagged for a follow-up conversation - a human conversation, not another automated question.

The key design principle: Codex surfaces gaps while there is still time to address them, not after they have caused a problem. A junior consultant who has not internalized the firm's conflict-of-interest protocol should not discover that gap on a client call.

Nobody has to run a training session. Nobody has to remember to check in. The system runs the process and reports on it. A senior person looking at the Codex dashboard on a Friday afternoon can see which team members have covered which areas of the firm's SOPs and which areas have not been addressed yet.

What the build actually involved

The discovery phase was a few weeks. Not weeks of technical scoping - weeks of working through the firm's SOPs to understand which knowledge was most consequential to get wrong. Every firm has areas where a misunderstanding is annoying versus areas where a misunderstanding is expensive. Those are different lists, and the tool needed to prioritize accordingly.

From those conversations, the question bank was built. The questions were designed to reveal understanding, not just recall. Knowing a policy exists is different from knowing how to apply it when a specific situation comes up at 4pm on a Thursday with a client waiting. The questions were scenario-based for this reason.

The delivery mechanism - Slack and email, rather than a dedicated platform - was a deliberate choice. Adding a new tool to someone's workflow during their first weeks is asking them to build a new habit at the moment when they are already building many new habits. Meeting people where their work already happens means the system gets used without requiring behavior change.

The evaluation layer required the most careful work. Automated evaluation of open-ended responses is harder than evaluating multiple choice, and the stakes of false positives (flagging a correct answer as a gap) were high - you do not want to undermine a confident new hire's self-assessment based on an incorrect automated read. The evaluation was calibrated against real responses during testing, with human review of edge cases, until the false positive rate was low enough to trust.

Where it went from there

Codex started as an internal tool. It solved Apparatus's onboarding problem. Then, in conversations with other professional services firms, it became clear that the problem was not specific to Apparatus. Most firms with documented SOPs have the same gap: the documentation exists, and the knowledge transfer is inconsistent, and there is no systematic way to know what any given person understands until something happens.

The pattern that made Codex useful internally - active knowledge testing rather than passive documentation review - is the pattern that made it useful for other firms. Not a new workflow for new hires to adopt. A system that meets people in the tools they already use and actively tracks what they know.

That transition from internal tool to product is a pattern worth paying attention to. The best internal tools often solve problems that are common enough to matter elsewhere. The constraint is building the internal tool well enough that it is actually worth using - which requires the same discipline as any external product, even when the only users are your own team.

If this pattern is relevant to a problem at your firm - a workflow that depends on a particular person having time, where the consistency of what gets transferred matters - that is often a good candidate for a custom build. The criteria for identifying a good first build apply here: frequency, senior involvement, output clarity. The Apparatus custom development practice starts with exactly this kind of scoping.

Next step

Ready to build something your firm owns?

We scope each engagement around your highest-value workflow. The conversation starts with what you want to build and whether we are the right people to build it.