TL;DR

When Sullivan & Cromwell apologized to a federal judge for AI-generated errors in a court filing, the story made headlines—but the same pattern is playing out across every industry. Hallucinations are a feature of how LLMs work; the real risk sits downstream, when humans treat plausible output as fact without verification.

Diagnosing AI Hallucinations: Treatment or Workflow?

In April 2026, Sullivan & Cromwell, one of the most prestigious law firms in the United States, filed a public apology to a federal bankruptcy judge after submitting a motion that contained AI-generated errors and fabricated citations. The firm has more than 900 attorneys. Its partners reportedly charge more than $2,000 an hour.

The apology said internal AI-use policies were not followed and the firm's citation review process did not catch the fabricated content before filing.¹ A 900-attorney firm with mandatory AI training, tracked completions, and office manuals telling lawyers to "trust nothing and verify everything" still ended up writing that letter. But the irony is hard to miss: a firm that has advised OpenAI on major matters still saw its internal verification process fail on a standard bankruptcy motion.

In the first quarter of 2026 alone, U.S. courts imposed at least $145,000 in sanctions in cases involving AI-related filing fabrications. Researcher Damien Charlotin's tracking database has now logged nearly 1,500 reported instances globally.

Most coverage of these stories treats them as a lawyer problem. The framing is too narrow. Courtrooms keep records, so courtroom failures show up in the news. The same pattern is happening across every business that uses AI, just without the public record.

What hallucination actually means

The word suggests malfunction, but that framing keeps confusing people.

Instead of retrieving information, a large language model predicts the next plausible word based on statistical patterns learned during training. When the answer is well represented, the output can look like knowledge. When the answer is missing or poorly grounded, the model can still produce text that fits the shape of a correct answer without being one. That is a feature of how the system works, the model is doing what it was made to do.

Once that lands, the legal news stops being surprising. The model generated cases that sound like real cases because generating plausible language is the entire function.

Why should you care?

A model that confidently produces text shaped like a correct statute citation but isn't one creates liability risk. You may not be filing briefs, but you are still using AI, or your team is, or your vendors are. Often all three.

Your operations team drafts client-facing emails with ChatGPT and pastes in compliance answers nobody verified against an actual policy.
Your developer accepts code suggestions from Copilot that reference libraries that may not exist, may carry unclear licensing terms, or may introduce a vulnerability you cannot easily trace to its source.
Your project management tool added an AI summary feature and updated its terms of service. You clicked accept. That feature now interprets client conversations and generates action items that someone downstream treats as the record.

In each case, the real exposure is downstream. Someone is treating the output as fact.

The legal exposure is real

The Texas Data Privacy and Security Act does not distinguish between a privacy notice written by a human and one generated by a model, it cares whether the notice is accurate.

The FTC launched Operation AI Comply and has brought enforcement actions against companies over deceptive AI-related marketing claims, and some cyber insurance carriers have begun adding exclusions or tighter conditions for losses tied to unverified AI-generated content.

Then there are the contracts. Most vendor agreements signed before 2024 contain no AI-specific terms. Vendors have since added AI features, updated their terms, and the auto-accepted version now governs how your client data is processed.

If a model trained on that data produces something that ends up in a customer-facing output, the question of who is liable does not have a friendly answer.

What competence looks like now

The Sullivan & Cromwell apology made one thing clear… a written AI policy does not equal a working verification process, and a growing number of federal courts now require disclosure of AI use in submissions.³ The same standard is already inside healthcare, financial services, and government contracting. Every other regulated industry is next.

Responsible AI use is a set of practices:

Verification before delivery. Every output, every time, especially when the recipient is a client, a regulator, a court, or a counterparty.
Named accountability. If no one is assigned to verify, no one will. The role has to live somewhere on an org chart.
Vendor review. The agreement you signed two years ago has been quietly replaced. Read the current one.
Correct mental model. AI tools generate text. Verification is a separate step that someone has to do. Treat output as plausible until proven true.

The point is to use AI well. The tools are too useful to ignore, and the firms pretending otherwise lose ground every quarter. What matters is what you do with the time AI saves you. Spend it on verification and the speed compounds in your favor. Spend it on shipping faster without checking, and the saved minutes accumulate as a liability that resolves through sanctions, lost deals, voided coverage, or a customer noticing that your contract terms and your privacy notice no longer agree.

The reframe

Hallucinations are the model doing its job, which is generating plausible language. The failure, where it occurs, sits with the human.

The lawyer who did not read the brief. The founder who did not check the compliance answer. The operations lead who let a vendor change the terms without flagging it. AI did not invent a new category of risk, existing categories just became cheaper to trigger.

If your team uses AI, the work is to be specific about who verifies what, when, and against which source. That is the whole game.

The diagnosis

A lawyer using a properly grounded legal AI tool with citation verification built in will see far fewer fake cases than a lawyer pasting a question into raw ChatGPT. That is already true today and will become more true.

Hallucinations in open-ended use, the kind where someone asks the model a general question with no retrieval and no verification, are not going away. The model cannot reliably distinguish between fluent recall and fluent invention, and there is no simple architectural fix for that in open-ended use today.

Sources

¹ Reuters, "Sullivan & Cromwell Law Firm Apologizes for AI 'Hallucinations' in Court Filing," April 2026. https://www.tradingview.com/news/reuters.com,2026:newsml_L1N4140K7:0-sullivan-cromwell-law-firm-apologizes-for-ai-hallucinations-in-court-filing/

² ComplexDiscovery, "The AI Sanction Wave: $145K in Q1 Penalties Signals Courts Have Lost Patience with GenAI Filing Failures," April 9, 2026. https://complexdiscovery.com/the-ai-sanction-wave-145k-in-q1-penalties-signals-courts-have-lost-patience-with-genai-filing-failures/

³ Bloomberg Law, "Federal Court Judicial Standing Orders on Artificial Intelligence," 2026. https://www.bloomberglaw.com/external/document/XCN3LDG000000/litigation-comparison-table-federal-court-judicial-standing-orde