We build agentic systems for companies that have moved past the demo phase. Every engagement starts with an evaluation harness — written before we touch a foundation model — and ends with a system your team can operate, governance committee can defend, and auditor can read. Hosted on your cloud, governed by your policies.
Every capability below is staffed by the same senior bench. The lead who scopes the engagement also writes the first PR. You will see the same names on the status report you see on the commit history.
Multi-step agents built on the Claude API and equivalent foundation models. Tool use, structured outputs, planning, reflection and recovery. We instrument every step so failure modes are observable, not hypothetical.
Hybrid retrieval over vector and keyword indices, reranking, query rewriting and citation enforcement. We treat search as a product surface, not a side effect of an embedding model.
Golden datasets, regression suites, online evals, red-team prompts and adversarial test packs. We write the rubric your auditor will ask for before your first prompt ships.
When prompting is not enough: parameter-efficient fine-tuning, distillation of large models into smaller ones, and the documentation pack your model risk committee will require.
Most engagements start small. A fixed-scope discovery sprint, an architecture review, an evals harness. If the work continues, we shape it into a T&M arrangement or an embedded squad — never the other way around.
Every artefact is yours from day one. We don't hold source, infrastructure or accounts hostage. The work product lives in your repositories, your cloud accounts, your wiki.
Customer support agents and document understanding.
Clinical summarisation under HIPAA constraints.
Contract review with citation-locked outputs.
AI-native product features and copilots.
First call is with the practice lead. We'll come back within 48 hours with either a scoped proposal or a written redirect — including to other firms when that's the right answer.