Prompt Engineering: Zero-Shot vs Few-Shot Prompting

This is Part 8 of the AI Agents series. Parts 1–7 covered how LLMs work, API usage, parameters, Groq, and running models locally. This post starts a new sub-series on Prompt Engineering — the discipline of getting reliable, accurate output from LLMs by changing how you ask, not what you ask.

1. Why prompting matters more than people think

When you get a bad response from an LLM, the instinct is to blame the model. Often, the real problem is the prompt.

An LLM has no context about you, your use case, or what format you need. It infers everything from what you give it. A vague request produces a vague response — not because the model is broken, but because it’s doing its best with incomplete information.

Prompt engineering is the practice of structuring inputs so the model consistently produces the output you actually want. The complexity of the technique you need scales directly with the complexity of the task.

2. Zero-Shot prompting

Zero-shot means asking the model directly, with no examples. You give it a question or task and trust that it already knows what you need.

This works when:

The task is factual and unambiguous
The output format is standard and well-understood
The model was clearly trained on this kind of request

Examples where zero-shot is the right choice:

Task	Prompt
Basic math	`What is 3 + 2?`
Factual lookup	`What is the capital of India?`
Translation	`Translate "Hello, how are you?" into French`
Summarization	`Summarize this paragraph in two sentences: [text]`

These work zero-shot because LLMs are trained on enormous datasets that include math, geography, languages, and writing. There’s no need to explain what “translate” means or show an example French sentence.

Start here. Zero-shot first, every time. Only move to few-shot if results are inconsistent.

3. Few-Shot prompting

Few-shot means including examples in your prompt — input/output pairs that show the model exactly what you want before giving it the real task.

The model doesn’t “learn” from these examples permanently. It reads them as context and pattern-matches your request against them within the current prompt. One accurate example is often enough. Two or three make the pattern clearer.

When to use it:

The output format is custom or non-standard
Zero-shot results are inconsistent or wrong
The model doesn’t “know” your domain-specific conventions

4. Few-shot in practice: URL slugs

URL slugs have specific rules: lowercase, hyphens instead of spaces, no special characters. A zero-shot model might get this right sometimes, but inconsistently. Few-shot makes it deterministic.

import os
from groq import Groq

client = Groq(api_key=os.environ["GROQ_API_KEY"])

prompt = """Convert a title into a URL slug. Lowercase, hyphens for spaces, no special characters.

Input: "This Is a New Post"
Output: this-is-a-new-post

Input: "My Journey Into Deep Learning: A Beginner's Guide"
Output: my-journey-into-deep-learning-a-beginners-guide

Input: "Nerchuko Academy Launched AI Agents Series"
Output:"""

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": prompt}]
)

print(response.choices[0].message.content)

The model sees the pattern from two examples and applies it exactly. No ambiguity about what to do with colons, apostrophes, or capitalization.

5. System prompts + few-shot: structured JSON output

For structured output, combine a system prompt (sets the model’s role and constraints) with few-shot examples (shows the exact format). This is the pattern you’ll use in most real applications.

import os
import json
from groq import Groq

client = Groq(api_key=os.environ["GROQ_API_KEY"])

system_prompt = """You are an expert content assistant.
Given a blog post title, generate a URL-friendly slug and three relevant tags.
Rules:
- slug: lowercase, hyphens for spaces, no special characters
- tags: array of 3 strings, title case
Output strictly as JSON with keys "slug" and "tags". No explanation."""

few_shot_examples = """Examples:

Input: "My Journey Into Deep Learning: A Beginner's Guide"
Output: {"slug": "my-journey-into-deep-learning-a-beginners-guide", "tags": ["Deep Learning", "Machine Learning", "Python"]}

Input: "How to Bake the Perfect Bread"
Output: {"slug": "how-to-bake-the-perfect-bread", "tags": ["Baking", "Recipes", "Food"]}

Now process this input:
Input: "Nerchuko Academy Launched AI Agents Series"
Output:"""

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": few_shot_examples}
    ]
)

result = json.loads(response.choices[0].message.content)
print(result["slug"])
print(result["tags"])

The system prompt establishes the context. The few-shot examples in the user message define the format. Together they constrain the model tightly enough that json.loads() on the output is reliable.

6. When few-shot is essential: custom domains

Two scenarios where zero-shot will always fail and few-shot is the only option:

Custom language or syntax. Suppose your company built a query language called “Nerchuko Language” where SQL’s GROUP BY is written as FACET. The model has never seen Nerchuko Language in training. Zero-shot will produce SQL, not Nerchuko queries.

With few-shot, you show SQL → Nerchuko translation pairs in the prompt. The model infers the mapping and generates valid Nerchuko queries.

prompt = """Translate SQL to Nerchuko Language.

SQL: SELECT name, COUNT(*) FROM users GROUP BY name
Nerchuko: FETCH name, COUNT(*) FROM users FACET name

SQL: SELECT city, AVG(age) FROM employees GROUP BY city
Nerchuko: FETCH city, AVG(age) FROM employees FACET city

SQL: SELECT product, SUM(revenue) FROM sales GROUP BY product
Nerchuko:"""

Custom classification labels. A standard LLM can classify sentiment as Positive or Negative zero-shot. But if your pipeline expects “Highly Positive”, “Neutral”, or “Highly Negative” with specific boundaries you define, the model has no way to infer those categories. Few-shot examples that show which text maps to which label make this reliable.

prompt = """Classify customer review sentiment using these labels: Highly Positive, Positive, Neutral, Negative, Highly Negative.

Review: "This product completely changed my life, absolutely incredible!"
Sentiment: Highly Positive

Review: "It works fine, does what it says."
Sentiment: Neutral

Review: "Terrible experience, broken on arrival and no support."
Sentiment: Highly Negative

Review: "Good quality, arrived on time, happy with the purchase."
Sentiment:"""

7. Zero-shot vs few-shot: decision rule

Start with zero-shot.
│
├─ Output is correct and consistent? → Ship it.
│
└─ Output is wrong, inconsistent, or wrong format?
   │
   ├─ Add 1–2 examples (few-shot) → Test again.
   │
   └─ Still inconsistent? → Add more examples or move to chain-of-thought (next post).

Most prompts end at zero-shot or with 1–2 examples. You rarely need more than three examples for format-based tasks. If you do, the problem is usually under-specified rules, not example count.

8. Practical checklist before shipping a prompt

Before using any prompt in production:

Define the exact output format you expect (JSON schema, string pattern, etc.)
Test zero-shot first — saves time if it already works
If adding examples, make sure they cover edge cases (punctuation, empty inputs, capitalization)
Pin the model name — different models respond differently to the same prompt
Log prompt + output in development so you can debug regressions when you change the prompt

What’s next

Part 9 covers three advanced prompting techniques: Chain of Thought (making the model reason step by step before answering), Self-Consistency (running the same prompt multiple times and taking the majority answer), and Tree of Thoughts (exploring multiple reasoning branches in parallel). These unlock reliable performance on complex multi-step tasks where zero-shot and few-shot fall short.

Full video walkthrough is embedded above.