LLM engineering
18 posts
Stanford teaches LLMs by making you build one
What CS336 actually teaches LLM engineers, where the course exposes silent drift, and why the skills transfer directly to RAG, agents, and eval.
The bottleneck moved past the model
Notes from the Mistral AI Now summit on what the new enterprise stack means for automation pipelines and workforce transformation.
The refund letter addressed to Dear [Name]
Why ChatGPT's first output is a draft, not a deliverable, and what production AI systems actually require beyond the prompt.
Better AI isn't what separates winning deployments.
Stanford studied 51 AI deployments and found a 71 vs 40 productivity gap. The difference was pipeline design, not model choice.
arXiv just raised the bar
arXiv's one-year ban on unchecked LLM errors signals a shift: validation pipelines, not better prompts, now define competent AI systems.
Complexity theory never said that
Complexity theory does not prove human-level ML is impossible. Here is what the theorems actually say and how to design AI systems around real constraints.
AI costs more than humans
Nvidia says AI costs more than human workers. The real issue is architecture, not compute price. Here is how to fix the unit economics.
Managed Agents pricing is an architecture decision
Claude Managed Agents pricing isn't a cost center - it's an orchestration lever. Here's how to evaluate it against real total cost of ownership.
How Production Systems Actually Work With LLMs-Not Which Model You Choose
Production-grade AI systems don't depend on choosing between Claude and ChatGPT. They rely on consistent engineering: input sanitization, output validation, fallback logic, and structured pipelines-regardless of the underlying LLM.
Running Gemma 4 Locally via Codex CLI: What Actually Works in Practice
Running Gemma 4 locally via Codex CLI offers isolation but not guaranteed consistency. Real reliability comes from input validation, output schema checks, and disciplined system design-not the model alone.
Why 'AI Agent in Seconds' Platforms Fail in Production
Most 'AI agent in seconds' platforms sacrifice reliability for speed. Real production use demands validation, state persistence, and observability-features most no-code tools lack. This post explains why quick deployments fail at scale and how to build systems that actually endure.
Why Cloudflare CLI Automation Fails Without Verification
Cloudflare CLI automation fails without verification. This post explains why input validation, output checking, and idempotency are essential for reliable deployments-without speculative claims or exaggerated risks.