How to Measure Whether Your Firm's AI Infrastructure Is Actually Working

Metrics, usage signals, and what to do when a Skill is not being used.

February 2026·6 min read

Measuring whether a team is using AI is straightforward. Measuring whether the AI infrastructure the team has built is actually working is harder - because the metrics are less obvious and more of them require direct observation rather than passive tracking.

The distinction matters. A firm where everyone is using AI individually - and where each person has a different set of prompts, different quality standards, and no shared infrastructure - can score well on AI adoption metrics. That firm is not building infrastructure that compounds.

The three things worth measuring

Infrastructure effectiveness comes down to three questions: Is it being used? Does using it produce better output? Is it growing over time?

01 — Skills library usage

Track which Skills are being used, how often, and by how many people. A Skill that is owned by one person and used only by that person is not a firm asset - it is a personal tool. A Skill that is used regularly by multiple team members on similar tasks is what the library is for.

The diagnostic question: if you look at the Skills library usage, which Skills have not been run in the past month? Those are candidates for archiving or improvement. The ones being run frequently are working; understanding why they work well informs how to build the next ones.

02 — Output quality consistency

When two people run the same Skill on similar inputs, how similar is the output? This is the infrastructure quality test. High-quality infrastructure produces output within a predictable range - different people, similar inputs, similar quality floors. Individual AI use without shared infrastructure produces wide variance.

The practical way to measure this: pick a Skill used by multiple team members, gather recent outputs from it, and do a quick comparison. If the outputs differ significantly in quality or format, the Skill needs work or the usage instructions are not clear enough.

03 — Library growth rate

A healthy Skills library grows over time because the team is actively identifying new workflows worth systematizing and adding them. A library that stopped growing six months after it was built has stalled - usually because the governance is too heavy (making it hard to add things) or the habit of contributing has not taken hold.

Target: at least one new Skill added per month per team, on average. More is better, but one per month is a signal that the habit is alive.

What to measure for data connections

Data connections are harder to measure than Skills usage, but two signals are worth tracking:

Are the connections being used in work product? If AI has access to your document archive but team members are still manually searching for and pasting documents into context, the connection is not being used. The question is not whether the connection works technically - it is whether the team has formed the habit of using it.

Is the retrieved data appearing in outputs? When AI retrieves from a connected source and uses it in an answer, that should be visible in the output - cited, referenced, or clearly drawing on firm-specific information rather than general knowledge. If you cannot tell whether AI is using the connection or not, you need to add explicit sourcing requirements to the relevant Skills.

When a Skill is not being used

A Skill that is not being used is almost always one of four problems:

It is hard to find. The name does not match how people think about the task it handles. Fix: rename it to match the natural language the team uses for that task.

The input requirements are unclear. People try it once, are not sure what to provide, get mediocre results, and stop using it. Fix: add a concrete example of a good input alongside the Skill.

The output quality does not justify the overhead. Running the Skill takes as long as doing the task manually, or the output requires significant cleanup. Fix: rebuild the Skill from scratch starting with the output format, working backward.

The task it handles is not actually recurring. It seemed like a useful Skill when it was built, but the task comes up rarely enough that people forget it exists. Fix: archive it and focus the library on workflows that come up at least weekly.

Module 6 of Apparatus 202 covers infrastructure assessment and the process of auditing what is working. For the adoption-level version of this question - how to measure whether your team is using AI at all - this earlier piece covers that.

How to Measure Whether Your Firm's AI Infrastructure Is Actually Working

The three things worth measuring

What to measure for data connections

When a Skill is not being used

Ready to turn what your team knows into something that lasts?