Production LLM Engineering
5 daysIntermediate → Advanced
End-to-end: routing, serving compilers, KV-cache, streaming, evals, cost telemetry. Hands-on with vLLM, TensorRT-LLM, and Triton on real models.
- Stand up a production-grade inference endpoint
- Tune throughput and latency with compilers and batching
- Wire cost-per-call telemetry that holds up in finance review