Prompt Forge: Multi-Model Prompt Evaluation with Snowflake Cortex

Score, compare, and optimize your prompts across 9 dimensions

Give three frontier models the same prompt, and you’ll get three distinct interpretations of what “good” means. I discovered this gradually, over months of working with AI coding assistants on real projects.

Early on, I’d occasionally switch models mid-task, sometimes because I was curious and sometimes because I wanted a second opinion on a tricky problem. I’d notice changes in the outputs, but I couldn’t pin down what was driving them. Was it my prompt? The task itself? The model? Some combination of all three? The variables were tangled together, and I wasn’t being rigorous enough to isolate them.

[Read More]

Four Signals, One Decision: How Ensemble AI Solves Unstructured Data Matching

A Weekend Build Powered by Snowflake Cortex AI and Cortex Code

Retail Data Harmonizer Dashboard

Some of the hardest work in retail analytics happens long before dashboards, forecasts, or machine learning models.

It begins with a much less glamorous challenge: making sure “the same product” is actually the same product across systems.

In a typical supermarket, there were 31,795 items on the shelf on average in 20241. Across suppliers, distributors, and point-of-sale systems, the data behind those items is often fragmented, inconsistent, and noisy. GS1 reports that retailers may need 10–15 interactions with suppliers to launch each SKU, and that they face ~15,000 issues with inaccurate product data per year on average.2

[Read More]