§ 04 · Service

AI Agency

Evals before models.

We build agentic systems for companies that have moved past the demo phase. Every engagement starts with an evaluation harness — written before we touch a foundation model — and ends with a system your team can operate, governance committee can defend, and auditor can read. Hosted on your cloud, governed by your policies.

Engage AI Agency

Eval coverage at launch

≥85%

Avg. latency cut

−38%

Models in production

Foundation-model agnostic

yes

Claude API

Agentic systems

RAG

Evals-first

Fine-tuning

Governance

Red-teaming

pgvector

Bedrock

Hosted on your cloud

Claude API

Agentic systems

RAG

Evals-first

Fine-tuning

Governance

Red-teaming

pgvector

Bedrock

Hosted on your cloud

§ A · What we do

4 capabilities,
one bench.

Every capability below is staffed by the same senior bench. The lead who scopes the engagement also writes the first PR. You will see the same names on the status report you see on the commit history.

01Capability

Agent design

Multi-step agents built on the Claude API and equivalent foundation models. Tool use, structured outputs, planning, reflection and recovery. We instrument every step so failure modes are observable, not hypothetical.

Claude API · OpenAI · Bedrock · Vertex

02Capability

Retrieval & search (RAG)

Hybrid retrieval over vector and keyword indices, reranking, query rewriting and citation enforcement. We treat search as a product surface, not a side effect of an embedding model.

pgvector · Weaviate · Qdrant · Elastic

03Capability

Evals, guardrails & red-teaming

Golden datasets, regression suites, online evals, red-team prompts and adversarial test packs. We write the rubric your auditor will ask for before your first prompt ships.

Promptfoo · in-house harness · LLM-as-judge

04Capability

Fine-tuning, distillation, governance

When prompting is not enough: parameter-efficient fine-tuning, distillation of large models into smaller ones, and the documentation pack your model risk committee will require.

LoRA · DPO · RLAIF · model cards

engage

§ B · How we engage

Three shapes.
Pick the one that fits the question.

Most engagements start small. A fixed-scope discovery sprint, an architecture review, an evals harness. If the work continues, we shape it into a T&M arrangement or an embedded squad — never the other way around.

Shape 01

Fixed scope

Best fit: Evals harness build, RAG pilot, first agent.
Cadence: 6 – 12 week sprint, milestone-billed.
Pricing: From ₹35L · USD 45k

Shape 02

Time & materials

Best fit: AI programmes with evolving scope.
Cadence: Monthly burn, written eval reports.
Pricing: Open rate card by role

Shape 03

Embedded AI squad

Best fit: Long-running AI product team.
Cadence: Dedicated 3 – 6 engineers, named roster.
Pricing: Quarterly renewable

§ C · Deliverables

What you walk away with.

Every artefact is yours from day one. We don't hold source, infrastructure or accounts hostage. The work product lives in your repositories, your cloud accounts, your wiki.

01Evaluation harness with golden datasets and CI integration
02Agent or RAG system deployed in your cloud, behind your VPC
03Model card, system card and a written governance brief
04Red-team report with reproducible adversarial prompts
05Cost and latency budget per request, with monitoring dashboards

§ D · Verticals

Sectors where this practice has shipped at depth.

BFSI

Customer support agents and document understanding.

Healthtech

Clinical summarisation under HIPAA constraints.

Legal

Contract review with citation-locked outputs.

SaaS

AI-native product features and copilots.

§ E · Engage

Bring us the awkward question.

First call is with the practice lead. We'll come back within 48 hours with either a scoped proposal or a written redirect — including to other firms when that's the right answer.

Engage ElvixIT Applied AI Internships

Loading_