Boutique SRE & Platform Practice

Reliable, observable,
and leaner to run.

Auti is my boutique practice — senior SRE, platform, and cost engineering for AI-era infrastructure. You work directly with me: no juniors, no hand-off, no filler.

Start a conversation Book a call

I'm not an agency and I'm not staff augmentation. Auti is a founder-led practice: I take on the specific reliability, cost, and platform problems you need solved — and I do the work myself.

The person you talk to is the person who does the work. When an engagement needs more hands, I bring in specialists to match.

What I do

Senior reliability engineering, end to end.

Reliability & SRE

On-call, incident response, and the resilience work that stops 3am pages — Kubernetes, AWS, and the production systems your product actually runs on.

Observability

A single, alertable view across metrics, logs, traces, and profiles — Prometheus, Grafana, Loki, Mimir, Tempo, Pyroscope — so diagnosis is fast instead of guesswork.

Cost & FinOps

Cloud spend you can see and control — per-team cost visibility, usage governance, and the reporting that turns a surprise bill into a decision you made on purpose.

Platform & AI Infrastructure

The foundation your developers ship on — Kubernetes platforms, CI/CD, internal developer platforms, and secure, governed paths to deploy AI agents.

The model

Not a vendor. Not a team extension. A partner with skin in the game.

You get me

The person you brief is the person who architects and ships it. No sales lead, no account manager, no quiet hand-off to juniors.

Senior by default

No padding, no managed-service layer. Just a senior engineer who has run these systems in production and been on the pager for them.

Accountable, by name

One person owns the outcome and you always know who it is. That's me — start to finish.

Ways to work together

Three ways to bring me in.

Fixed scope

Cost & Reliability Audit

1–2 weeks · fixed fee

A focused deep-dive into a system's reliability and cloud cost. You leave with written findings and a prioritized roadmap your team can act on — not a slide deck.

Embedded

Fractional SRE & Platform

2–4 days/week · monthly retainer

I join your team as a senior SRE and platform engineer — on-call judgment, platform work, and cost governance — for a set slice of the week, for as long as you need it.

Defined outcome

Delivery Sprint

Fixed outcome · clear timeline

A specific thing, built and shipped: a migration, an observability stack, an internal platform. Clear scope, a clear timeline, and a clean handover with docs and runbooks.

Not sure which fits? Tell me the problem and I'll recommend the smallest engagement that solves it.

The short version

Nine years as an SRE — through platform migrations, the incidents nobody planned for, the 3am pages, and the cost reviews no one enjoys. That's the judgment I bring to every engagement.

Work

A few problems I've been trusted with.

AI Infrastructure · Enterprise

Secure AI-agent platform

Helped build the platform developers deploy AI agents on — Langfuse observability, a gateway governing model and token usage, and managed RAG. Turned ad-hoc, self-wired deployments into a self-serve path in minutes, with observability and cost governance built in by default.

Health-tech · HITRUST

ECS→EKS migration & CI/CD

Led automation for a months-long, multi-team migration from ECS to EKS, and built a reusable CI library that took new services from days of pipeline plumbing to minutes of YAML — with supply-chain scanning and hardened base images by default.

See all work

About

A small firm. High standards. A long view.

I'm Dinesh Auti. Before founding Auti, I spent ~9 years as an SRE inside the teams that built and ran the systems — migrations, platforms, observability, cost, and on-call. The work taught me that reliability is less about heroics than about what you put in place before the incident.

Auti is deliberately small. I take on a limited number of engagements so the work stays senior and hands-on — not spread thin across a growth-at-all-costs roster.

Learn more about how I work

Also from Auti

Good incident triage shouldn't require your best engineer.

tarkya.io

Tarka is an open-source AI investigation agent that ingests Prometheus alerts and delivers a structured triage report in under 60 seconds — what failed, how severe, and the exact commands to run next. Built from the same reliability engineering principles I bring to every engagement.

Stop being the human runbook.

Tarka encodes the investigation steps your senior engineers carry in their heads. Whoever picks up the alert can run a real, structured triage — without waking anyone up.

A full report before Slack catches fire.

27 diagnostic checks across Prometheus, Kubernetes, and logs. What failed, how severe, ranked hypotheses, and the exact commands to run — in under 60 seconds.

Your infra. Your data. No SaaS.

Apache 2.0, fully self-hosted. Sensitive incident data never leaves your environment. Run it from a laptop CLI or wire it into Alertmanager as an automated webhook pipeline.

Free · No email

Is your cloud bill a decision — or a monthly surprise?

Ten honest questions on cost visibility, waste, purchasing, and Kubernetes efficiency. You get a score out of 30 and the exact areas worth fixing first. Two minutes, no signup.

Take the Reality Check

Got a reliability problem worth solving?

I work best with founders and engineering leaders who have a real problem, a real deadline, and the judgment to know when they need senior help.

Start the conversation Book a call

I take on a limited number of engagements at a time. If you're timing-sensitive, reach out early.

Reliable, observable,and leaner to run.