AI eval Archives - Kato Coaching

Abstract mosaic pattern of interconnected spirals in copper and teal

The oracle has opinions

Leave a Comment / AI, AI eval / Katja Obring

This is part of a series where I write about what I’m learning in an AI evals and analytics course. Earlier posts cover the basics of evals, sniff tests, and quantitative evaluation with LLM-as-a-judge. In traditional testing, the oracle is trustworthy. You know what the right answer is, or at least you know who decides. […]

The oracle has opinions Read More »

Gemini said An intricate digital illustration featuring a large, circular mosaic design set against a background of teal and gold cracked-tile patterns. The central focus is a series of concentric rings and spirals. A prominent white outer ring is inscribed with dark, abstract symbols resembling ancient script or runic markings. Inside this, a spiral of blocky tiles in shades of teal, turquoise, and burnt orange winds toward a glowing golden-orange center. Thin gold lines branch out from the central circle across the surrounding teal mosaic, creating a map-like or biological appearance. The style combines elements of ancient artifacts with modern digital geometric art.

The Hard Part of AI Evals Isn’t the Tooling

Leave a Comment / AI, AI eval, Critical Thinking / Katja Obring

Additional thoughts on session three of the AI Evals and Analytics Playbook1. The first part is here2. Every major shift in how we build and ship software has been followed by a wave of tooling that automates the tractable part and leaves the actual problem to the practitioner. Agile gave us story pointing ceremonies and

The Hard Part of AI Evals Isn’t the Tooling Read More »

A digital illustration in a steampunk style showing a woman operating a complex, copper-toned industrial machine. The woman, with her hair in a braided bun and wearing a dark dress with a brown apron, stands at a control panel with several round pressure gauges and a large lever. To her right, a conveyor belt carries square tiles marked with green checkmarks and red "X" symbols. A mechanical arm hangs over the belt, and the tiles lead toward a brick furnace filled with bright orange flames. The massive machine behind her is adorned with glowing teal lanterns, interlocking gears, and pipes emitting small puffs of steam. The overall atmosphere is warm and industrial, with a blend of historical and fantastical technology.

The Verification Trap

Leave a Comment / AI, AI eval, Critical Thinking / Katja Obring

Someone posted a thinking piece in a Slack channel I’m in last week, long and earnest and well-structured, arguing that quality engineering needs to evolve for the age of agentic AI, that we need to stop thinking about quality as testing and start treating it as a systemic property. I read it and felt a

The Verification Trap Read More »

Gemini said A minimalist, stylized illustration of a grid containing symbols on a cream-colored background. The grid is four columns wide by five rows tall, outlined in dark blue. Most of the cells contain simple dark blue icons such as dots, horizontal bars, and small squares. Two cells in the center stand out: one is a solid teal green rectangle, and the one to its right contains a tan-colored checkmark. The bottom-left and bottom-center cells are blank. The overall aesthetic is clean and geometric, resembling a dashboard or a data tracking table.

The anatomy of a metric

Leave a Comment / AI eval, Critical Thinking, DevOps, testing / Katja Obring

Session two closed with a question I couldn’t answer.1 When a product scores 78% on a given metric, what tells you whether that’s good enough to ship? I flagged it as something session three would probably address, and it did, though not in the way I expected. The question can’t be meaningfully answered until you’ve

The anatomy of a metric Read More »

The AI evals field chose a flawed tool and stuck with it

Leave a Comment / AI, AI eval, Critical Thinking, testing / Katja Obring

Session one left me with two things I hadn’t resolved.1 The first was a line the instructor said almost in passing: “the hard part is scalability, not automation.” I wrote it down because it piqued something, but I couldn’t quite work out what problem it was pointing at. The second was a question I kept

The AI evals field chose a flawed tool and stuck with it Read More »

AI eval