Adverse-event root-cause analysis across 45 hospitals.
Clinician-graded RCA rubric on historical cases; case-pack completeness; governance review.
Microsoft Delta is a team of engineers, designers, and product managers embedded directly with customers that turns ambiguous business problems into production-grade agentic systems with measurable impact. If this sounds interesting, we’d like to learn more about you.
Apply nowWe don’t ship agents we can’t measure. Every engagement starts with an evaluation harness — graded by domain experts, replayed against historical decisions, and re-run on every release. Here’s what that looks like in production.
Adverse-event root-cause analysis across 45 hospitals.
Clinician-graded RCA rubric on historical cases; case-pack completeness; governance review.
Specialty-drug denial appeal letters.
3-axis rubric — factual (QAFactEval) · clinical (MedHELM) · payer-acceptance — graded weekly with humans.
Power-of-Attorney abuse investigations.
Labelled past investigations; agent vs. human on the same cases; replay vs. historical outcomes.
Trade-claim validation.
Replay vs. historical analyst override decisions; $-weighted scoring; audit-grade explainability.
Alt-fund historical onboarding to custody schema.
Schema-mapping accuracy on labelled fund samples; HITL on ambiguous mappings.
Inbound chat lead routing & qualification.
~1k chat golden dataset; 3× class F1 + accuracy; re-run every release.
Returns-inspection agent — vendor chargebacks (60% of returns).
Per-measurement abs deviation vs. ground truth on labelled returns; tolerance-aware pass/fail (e.g. 2" waistband, 1.5"); vendor-chargeback acceptance rate.
We embed directly with the customer, turning ambiguous problems into production agents end-to-end. We work alongside the customer through discovery and prototyping, then deploy directly into their tenant.
Immerse in the customer's workflow, map the problem space, and identify where AI agents create measurable impact.
Build a working prototype on real data to demonstrate speed, quality, and ROI potential.
Deploy a production-grade AI agent in the customer's tenant. Design custom evals and hill climb against real workflows to deliver measurable ROI.
We’re looking for engineers, designers, and product managers who want to build evaluable AI agents alongside Microsoft’s most strategic customers.
Apply now