The Real Reason Law Firms Get Bad Output from ChatGPT
Spoiler: it's the prompt. What's actually going wrong and how to fix it.

The most common complaint from law firms about AI is that the output is generic, hallucinated, or just not useful for legal work. The conclusion is usually that ChatGPT isn't suited for legal practice - too much risk, too little precision.
That conclusion is almost always wrong. The output is generic because the prompt is generic. The hallucinations happen because nobody told the model to flag its uncertainty. The legal-specific failure isn't a model problem. It's a prompting problem.
What a typical legal prompt looks like
The gap between what gets typed and what's needed
Ask a room of lawyers how they typically prompt an AI tool and you'll hear things like: "Summarize this contract," "What are the risks in this clause?" or "Draft a demand letter for this situation." Those aren't prompts. They're requests - the same kind of thing you'd type into a search engine.
The model gets a one-sentence request and fills in everything it doesn't know with its own defaults: a generic legal context, a generic standard of review, generic assumptions about what the output is for. The output is generic because the defaults are generic - which is exactly what you get when you don't replace them with specifics.
Compare that to how a supervising partner would brief a junior associate. They wouldn't say "summarize this contract." They'd say: "Review the indemnification clause and flag anything that shifts more risk to our client than the standard terms we use. The client is a mid-size tech company. This is an enterprise SaaS agreement. We're trying to decide whether to push back or accept as-is." That's the difference between a request and a brief. The model responds the same way an associate would - much better to a brief.
Where legal prompts specifically fail
Four patterns that produce bad legal output
No jurisdiction or governing law
"What are my client's remedies here?" - remedies for what, under which state's law, in which context? The model will answer based on whatever feels most common. Adding "under Delaware corporate law" or "this is a California employment matter" narrows the output significantly. Jurisdiction is context the model needs and most lawyers forget to provide.
No document in the prompt
"What are the risks in a standard assignment clause?" produces generic output about assignment clauses in general. "Here is the assignment clause from the contract we're reviewing: [paste clause]. What are the risks specific to this language?" produces specific output about that clause. Legal work is almost always document-grounded. The prompt should be too. Working from documents rather than memory is also the main way to reduce hallucination risk.
No standard of review or client context
The model doesn't know whether this is a transactional review where you're negotiating aggressively, or a diligence review where you're flagging issues for an acquisition, or a litigation matter where you're looking for leverage. These produce different outputs. Tell the model which situation you're in and what the client needs to walk away with.
No instruction to flag uncertainty
By default, AI models produce confident-sounding output whether they're certain or not. In legal work, this is where hallucination risk concentrates - fabricated citations, mischaracterized case holdings, made-up statutory provisions. Adding "flag any claim you are uncertain about" and "cite the specific section of the document for any factual claim" to every legal prompt catches most of this before it matters.
The fix
Brief the model like you'd brief an associate
The mental model that changes legal prompting is treating the AI like a capable but completely uninformed junior associate who started today. They need jurisdiction, the actual documents, the client situation, what good output looks like, and what to flag when they're not sure. That's not optional context - it's the minimum brief for useful work.
The mechanics of how to structure that brief are covered in detail in the anatomy of a structured prompt. For law firms specifically, the role component ("you are a senior associate at a corporate law firm specializing in technology transactions") does more work than in most other contexts - it sets vocabulary, judgment calls, and the standard the model applies when something is ambiguous.
The firms that have stopped getting bad output from ChatGPT didn't switch tools. They changed how they prompt.
