What four experiments taught me about model personality

A practical experiment in classifying model personalities and assigning them to planner, reviewer, and executor roles

Most people who use more than one frontier model eventually learn that models have different task strengths. One model is better for fast edits. Another is better for long-form reasoning. Another is better for exhaustive review. That part is not especially surprising anymore.

What I wanted to test was more specific: whether those differences could be described as personality. Not personality in the human sense, but a repeatable instinct that shows up across tasks. If that instinct can be named, tested, and connected to failure modes, then model selection becomes less vibes-based.

[Read More]

Prompt Forge: Multi-Model Prompt Evaluation with Snowflake Cortex

Score, compare, and optimize your prompts across 9 dimensions

Give three frontier models the same prompt, and you’ll get three distinct interpretations of what “good” means. I discovered this gradually, over months of working with AI coding assistants on real projects.

Early on, I’d occasionally switch models mid-task, sometimes because I was curious and sometimes because I wanted a second opinion on a tricky problem. I’d notice changes in the outputs, but I couldn’t pin down what was driving them. Was it my prompt? The task itself? The model? Some combination of all three? The variables were tangled together, and I wasn’t being rigorous enough to isolate them.

[Read More]

Four Signals, One Decision: How Ensemble AI Solves Unstructured Data Matching

A Weekend Build Powered by Snowflake Cortex AI and Cortex Code

Retail Data Harmonizer Dashboard

Some of the hardest work in retail analytics happens long before dashboards, forecasts, or machine learning models.

It begins with a much less glamorous challenge: making sure “the same product” is actually the same product across systems.

In a typical supermarket, there were 31,795 items on the shelf on average in 20241. Across suppliers, distributors, and point-of-sale systems, the data behind those items is often fragmented, inconsistent, and noisy. GS1 reports that retailers may need 10–15 interactions with suppliers to launch each SKU, and that they face ~15,000 issues with inaccurate product data per year on average.2

[Read More]

The Ultimate Pair Programmer - Why AI Coding Needs Human Experience

vibecoding is an incredible partner, but it's not the senior dev.

I. Introduction: The AI Honeymoon

If you’re in tech, you’ve felt the magic. As a regular user of {vibecoding} and other AI-assisted tools, I’ve seen it firsthand: boilerplate code vanishes in seconds, complex functions appear faster than I could have typed them, and my overall workflow is genuinely accelerated. It feels like a superpower, a quantum leap in productivity.

But there’s a catch. I find that it accelerates my development… however, I say that with the context of more than 20 years of programming experience. And that experience, I’ve learned, is more critical than ever.

[Read More]

When Three AIs Fixed a README - The Unanimous Verdict Nobody Expected

Line 184 Was the Problem. 4,776 Lines Were the Solution.

Image courtesy of Google Gemini Image courtesy of Google Gemini

This is the third article in my AI Code-Off experiments series. You can read the first article and second article for context.

Line 184. That’s where the Quick Start section lived in our project README.

Think about that for a moment. A new developer lands on your project, excited to try it out, and has to scroll through 183 lines of architectural philosophy, memory bank explanations, and universal design principles before learning the single most important thing: how to actually use it.

[Read More]

When Three AIs Tried to Fix 1,717 Lines of Code

Three AIs, 3,808 Lines of Analysis, One Unanimous Winner

Image courtesy of Google Gemini Image courtesy of Google Gemini

This blog is a continuation of my AI Code-Off experiments series. You can read the first article here: I Pitted Gemini, Claude, and GPT in a 4-Stage AI ‘Code-Off.’

1,717 lines. That’s how long my Streamlit UI rule, intended as a guide for LLMs and Agents, had grown.

This wasn’t documentation for humans—it was a rule file for AI coding assistants in Cursor. When you ask an AI to build a Streamlit dashboard, it consults files like this to understand best practices and requirements. But somewhere in those 1,717 lines, the guidance had become contradictory. Plotly or PyDeck for visualizations? The rule said both, with no clear preference. Duplicate sections appeared throughout. Critical production guidance (performance, accessibility, deployment) was completely missing.

[Read More]

I Pitted Gemini, Claude, and GPT in a 4-Stage AI 'Code-Off.'

The Winner Was Decided by a Syntax Error.

Image courtesy of Google Gemini Image courtesy of Google Gemini

With over two decades of self-taught programming experience, I’ve seen AI revolutionize my coding over the last ten months. Daily use of LLMs has dramatically sped up projects, from quick prototyping and boilerplate generation to faster debugging. This frees me to focus on architectural decisions and complex problem-solving. However, effective AI use demands a strong grasp of programming fundamentals. LLMs aren’t perfect, and my experience has been crucial in quickly spotting and fixing bugs and logical flaws in AI-generated code.

[Read More]

Update on my blog hosting platform and workflow

Blog Hosting Platform

I’ve been working on my blog workflow recently. I’ve have used Ghost as my blogging platform for a few years. At $5 per month, it is hard to beat the value of the service. However, my needs are fairly simple when it comes to blogging, so I wanted to get back to the basics. I decided to switch to a static site generator because I’m simply not using any of the advanced features of Ghost.

[Read More]

The Curse of Knowledge

Curse of Knowlege

TL;DR

The curse of knowledge happens far more than we realize. Effective communication is not just about conveying information but also about ensuring that your message is well-received and understood. By working to establish a shared understanding, a common context, and a consistent frame of reference, we can avoid confusion and foster a deeper connection with our audience. Let’s strive to communicate effectively and make a positive impact in our interactions with others.

[Read More]

Being More Active - The Sequel

TL;DR

This post is a bit about my background and what I envision for my blog going forward. I need to update my blog more frequently. An 8-year gap between posts is really long!


It’s been a while since my last post

I am updating my blog, one more time. I failed to achieve my objective in my previous blog post, Being More Active, which I wrote in 2016. That blog post was written three years after my last blog post in 2013. I suppose if we were to look at the posting gaps from a breaking personal records perspective, going from a three-year writing gap to an eight-year gap might be quite the record. I really should do better.

[Read More]