Llm -

What four experiments taught me about model personality

A practical experiment in classifying model personalities and assigning them to planner, reviewer, and executor roles

Posted on May 5, 2026 | 29 minutes | 6101 words | Michael Young

Most people who use more than one frontier model eventually learn that models have different task strengths. One model is better for fast edits. Another is better for long-form reasoning. Another is better for exhaustive review. That part is not especially surprising anymore.

What I wanted to test was more specific: whether those differences could be described as personality. Not personality in the human sense, but a repeatable instinct that shows up across tasks. If that instinct can be named, tested, and connected to failure modes, then model selection becomes less vibes-based.

[Read More]

llm ai-tools model-evaluation prompt-engineering ai-agents model-selection software-engineering

Prompt Forge: Multi-Model Prompt Evaluation with Snowflake Cortex

Score, compare, and optimize your prompts across 9 dimensions

Posted on May 1, 2026 | 12 minutes | 2431 words | Michael Young

Give three frontier models the same prompt, and you’ll get three distinct interpretations of what “good” means. I discovered this gradually, over months of working with AI coding assistants on real projects.

Early on, I’d occasionally switch models mid-task, sometimes because I was curious and sometimes because I wanted a second opinion on a tricky problem. I’d notice changes in the outputs, but I couldn’t pin down what was driving them. Was it my prompt? The task itself? The model? Some combination of all three? The variables were tangled together, and I wasn’t being rigorous enough to isolate them.

[Read More]

llm snowflake cortex prompt-engineering ai-tools

The Ultimate Pair Programmer - Why AI Coding Needs Human Experience

vibecoding is an incredible partner, but it's not the senior dev.

Posted on November 12, 2025 | 7 minutes | 1296 words | Michael Young

I. Introduction: The AI Honeymoon

If you’re in tech, you’ve felt the magic. As a regular user of {vibecoding} and other AI-assisted tools, I’ve seen it firsthand: boilerplate code vanishes in seconds, complex functions appear faster than I could have typed them, and my overall workflow is genuinely accelerated. It feels like a superpower, a quantum leap in productivity.

But there’s a catch. I find that it accelerates my development… however, I say that with the context of more than 20 years of programming experience. And that experience, I’ve learned, is more critical than ever.

[Read More]

vibecoding pairprogamming llm

When Three AIs Fixed a README - The Unanimous Verdict Nobody Expected

Line 184 Was the Problem. 4,776 Lines Were the Solution.

Posted on November 1, 2025 | 19 minutes | 3902 words | Michael Young

Image courtesy of Google Gemini

This is the third article in my AI Code-Off experiments series. You can read the first article and second article for context.

Line 184. That’s where the Quick Start section lived in our project README.

Think about that for a moment. A new developer lands on your project, excited to try it out, and has to scroll through 183 lines of architectural philosophy, memory bank explanations, and universal design principles before learning the single most important thing: how to actually use it.

[Read More]

llm ai/ml anthropic openai google

When Three AIs Tried to Fix 1,717 Lines of Code

Three AIs, 3,808 Lines of Analysis, One Unanimous Winner

Posted on October 30, 2025 | 13 minutes | 2657 words | Michael Young

Image courtesy of Google Gemini

This blog is a continuation of my AI Code-Off experiments series. You can read the first article here: I Pitted Gemini, Claude, and GPT in a 4-Stage AI ‘Code-Off.’

1,717 lines. That’s how long my Streamlit UI rule, intended as a guide for LLMs and Agents, had grown.

This wasn’t documentation for humans—it was a rule file for AI coding assistants in Cursor. When you ask an AI to build a Streamlit dashboard, it consults files like this to understand best practices and requirements. But somewhere in those 1,717 lines, the guidance had become contradictory. Plotly or PyDeck for visualizations? The rule said both, with no clear preference. Duplicate sections appeared throughout. Critical production guidance (performance, accessibility, deployment) was completely missing.

[Read More]

llm ai/ml anthropic openai google

I Pitted Gemini, Claude, and GPT in a 4-Stage AI 'Code-Off.'

The Winner Was Decided by a Syntax Error.

Posted on October 25, 2025 | 9 minutes | 1825 words | Michael Young

Image courtesy of Google Gemini

With over two decades of self-taught programming experience, I’ve seen AI revolutionize my coding over the last ten months. Daily use of LLMs has dramatically sped up projects, from quick prototyping and boilerplate generation to faster debugging. This frees me to focus on architectural decisions and complex problem-solving. However, effective AI use demands a strong grasp of programming fundamentals. LLMs aren’t perfect, and my experience has been crucial in quickly spotting and fixing bugs and logical flaws in AI-generated code.

[Read More]

llm ai/ml anthropic openai google