

The gap between demo and deployment
Most legal AI you see in 2026 is demoware. A clean transcript, a confident answer, a screenshot in a fundraising deck. We've shipped AinSeen — a Saudi legal counsel agent — to real users in a real regulatory environment, and we've learned that almost everything interesting about this problem lives in the space between a working demo and a deployment a partner is willing to put their name on.
This post is field notes from that space, not a marketing piece. If you're a team trying to build legal AI in a regulated market — Saudi or otherwise — these are the things we wish someone had told us.
What AinSeen is
AinSeen is built for legal teams operating inside Saudi Arabia — in-house counsel at enterprises and government bodies, and the firms that work with them. Day-to-day, it does the work that takes a junior associate the longest part of an afternoon: finding the right regulation, the right Council of Ministers resolution, or the right ministry circular for a question a senior lawyer just asked, and returning a cited answer in Arabic. The faster the senior lawyer gets to a defensible starting position, the more of their day they spend on the parts of the job a machine can't do.
The things that turned out to matter most
Citation integrity is the entire game. An LLM that hallucinates a case, a regulation, or a fatwa is not a legal tool. It's a liability. The architecture is retrieval-bound: the model cannot cite a source that wasn't returned by the retrieval step over our vetted Saudi legal corpus, and every cited passage is verified against the original document before it reaches the user. We treat the model less as a knower of law and more as a writer with read access to a curated library. The lawyering lives in the citation, not in the prose.
Arabic legal language is not Arabic. The vocabulary, register, and citation conventions of Saudi legal text are a dialect inside a dialect. Off-the-shelf Arabic-tuned models speak Modern Standard Arabic competently. They do not speak the language a Saudi attorney actually drafts in. What worked for us was building our evaluation set with practising Saudi attorneys before we touched the prompt, not after. The Arabic that came back from the model on day one looked competent and was wrong in ways only a Saudi legal practitioner could see. Our evals catch that now; an off-the-shelf benchmark would not.
“Counsel” is a verb you cannot delegate without consent. In a regulated practice, the agent does not give advice. It assembles, summarises, and cites. The lawyer gives advice. We spent more time getting that boundary right — in the UI, in the prompts, in the disclaimers, in the audit log — than we did improving raw model quality.
Latency is a trust signal. Lawyers wait for answers; that's the job. They do not wait silently for an AI to think. The product feels different when streaming is honest about what step it's on — searching, drafting, citing — versus when it just spins.
What we got wrong the first time
The first thing we got wrong was confidence calibration. Our earliest internal build was good at sounding right. It had answers for everything. Sounding right is a vice in a legal tool — a vice users punish you for the first time the answer is also wrong. We rebuilt the agent to be honest about uncertainty: to say “this is what I found, and here is where the corpus is thin,” to refuse to draft an opinion the cited material doesn't support, and to surface conflicting authorities rather than pick one. The model is meaningfully less impressive in a demo now. It is meaningfully more useful in a real legal workflow.
What we'd tell another team building in this space
Three things.
Pick a partner before you pick a feature. Legal AI built without a practising attorney in the loop converges on the average of the internet. The internet is not where good legal work lives. AinSeen has been built with practising Saudi attorneys reviewing real outputs from the first week. We do not ship a feature that hasn't been tested against an actual case file. That partnership is the single largest reason the product is usable today.
Build for audit before you build for speed. Every output the agent produces should be inspectable, citable, and reproducible. If a regulator asks “why did it say that?”, “I don't know” is not an acceptable answer. We log the entire reasoning chain — sources retrieved, prompts used, model version — for every interaction.
Be honest about what an LLM is and isn't doing. Pretending the model “understands” the law is a sales tactic that backfires the first time it doesn't. We tell users what AinSeen is — a fast, careful research assistant — and the trust we get back is durable.
Where this goes next
The pattern that worked for AinSeen — narrow scope, strict citation, audit-first design, partner-validated outputs — is the pattern we're now applying to other agents we're building. We are interested in regulated workflows where the corpus is large, the answer must be cited, and the cost of being wrong is high. We'll write about the next one when there's something worth saying.
If you're a Saudi government or enterprise team thinking about deploying production AI in a regulated workflow, we'd like to talk. Get in touch.


