Ship agentic systems for enterprise clients — RAG, multi-step tool use, evaluation pipelines. You write evals before you write prompts.
§ 02
Responsibilities
- Design and ship agentic AI features for enterprise clients
- Build robust evals and offline test harnesses
- Tune retrieval, prompts and tools against real metrics
- Partner with infra to deploy and observe in production
§ 03
Requirements
- 2+ years of professional engineering experience
- Strong Python or TypeScript; comfort with vector DBs (pgvector, Weaviate)
- Practical experience with at least one foundation-model API (Claude, OpenAI, Gemini)
- Bias for measurement: you can describe how you'd write evals before you build
§ 04
Nice to have
- Experience with agent frameworks, ReAct, function-calling at scale