From Prompt to Pipeline — Reflections on Building with Generative AI (Part 1)

Model Selection and the Large vs Small Trade-off

Over the last months, I’ve moved from experimenting with chatbots to engineering full generative-AI systems — tools that don’t just talk but work: they extract, validate, generate, and report.
These posts are a reflection of what I’ve learned along the way — about models, prompts, APIs, and design choices that make LLMs practical in production.

This first part looks at model selection — not in abstract benchmarks, but in the realities of development and deployment.

The Two Worlds: Large vs Small Models

I use large models for development — when I need reasoning, creativity, or cross-domain insight.
They are my partners in architecture, design, coding, debugging, testing, and research.
They reason broadly, explain alternatives, and are forgiving when I haven’t defined a task precisely.

By contrast, I use small models for deployment — where privacy, latency, and reproducibility matter.
Once a workflow is clear and structured, a smaller instruction-tuned model executes it faster, cheaper, and often more reliably.

Why Larger Isn’t Always Better

A large model statistically produces fewer errors a priori — its predictions are more consistent across unseen cases.
But that same confidence can be deceptive: when it does err, it often does so with high certainty — making the error less visible and harder to correct.
Smaller models may make more frequent mistakes, but their uncertainty can actually help expose flaws earlier, allowing a posteriori corrections and stronger validation loops.

In other words:

Large models are confident teachers. Small models are humble apprentices.
The trick is to know when to learn from one — and when to rely on the other.

Outlook

In the next posts, I’ll cover:

System Prompts and Role Separation — how I structure system_msg, user_msg, and style layers for predictable output.
Prompt APIs and Templates — how Jinja templates and JSON schemas define logic before generation.
Client/Server and Deployment Models — running local vs remote inference, caching, and parametrisation.

Together, these reflections form a blueprint for moving from prompting to building.

Tags: ai, applied-ai-systems, agentic-pipelines, precision-ai