ReAct and RAG: Giving LLMs Access to the External World
An LLM's knowledge stops at its training cutoff and it can't access your private data. ReAct and RAG are the two prompt engineering frameworks that fix both problems — turning a plain LLM into an agent that can act and retrieve.
This is Part 10 of the AI Agents series and the final part of the prompt engineering sub-series. Parts 8–9 covered Zero-Shot, Few-Shot, Chain-of-Thought, Self-Consistency, and Tree of Thoughts — techniques for improving how an LLM reasons over the information you give it.
This post covers the next level: giving the LLM access to information it doesn’t have at all — live data from the internet, real-time APIs, and your own private documents.
1. The two gaps prompting alone can’t fix
Standard prompting — even with advanced techniques — is still limited to what the model learned during training:
- Temporal gap: Training data has a cutoff. Ask an LLM for last weekend’s box office results or the current temperature in Hyderabad and it cannot answer accurately. It will guess.
- Private knowledge gap: Your company’s internal documents, policies, and data were never in the training set. Ask about “Nerchuko’s remote work policy” and it will hallucinate a plausible-sounding but wrong answer.
Two frameworks address these gaps: ReAct for live external data, RAG for private internal knowledge.
2. ReAct: Reason + Act
ReAct (Reason + Act) gives an LLM access to external tools — web search, APIs, databases, calculators — and lets it decide when and how to use them.
The model doesn’t just generate text. It operates in a loop:
Reason → Act (call a tool) → Observe (read the result) → Reason again
It keeps looping until it has enough information to answer confidently.
Example — current weather:
User: What's the current temperature in Hyderabad?
Reason: I don't have real-time weather data. I need to call a weather API.
Act: call weather_api(city="Hyderabad")
Observe: {"temperature": 32, "condition": "partly cloudy"}
Reason: I now have the data. I can answer.
Answer: It's currently 32°C and partly cloudy in Hyderabad.
Without ReAct, the model guesses a temperature based on seasonal patterns in its training data. With ReAct, it fetches the actual current reading.
Implementing ReAct with tool calling
Most LLM providers support tool calling natively. You define the tools available, and the model decides when to call them.
import os
import json
from groq import Groq
client = Groq(api_key=os.environ["GROQ_API_KEY"])
# Define the tools the model can use
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name"
}
},
"required": ["city"]
}
}
}
]
# Simulate the tool's actual implementation
def get_weather(city: str) -> dict:
# In production: call a real weather API here
return {"city": city, "temperature": 32, "condition": "partly cloudy"}
messages = [{"role": "user", "content": "What's the current temperature in Hyderabad?"}]
# First call: model reasons and decides to call the tool
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=messages,
tools=tools
)
# Check if the model wants to call a tool
tool_call = response.choices[0].message.tool_calls
if tool_call:
# Execute the tool
args = json.loads(tool_call[0].function.arguments)
result = get_weather(**args)
# Feed the result back to the model
messages.append(response.choices[0].message)
messages.append({
"role": "tool",
"tool_call_id": tool_call[0].id,
"content": json.dumps(result)
})
# Second call: model observes the result and generates final answer
final_response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=messages
)
print(final_response.choices[0].message.content)
The model handles the Reason and Observe steps. You implement the actual tool functions. For a real application, get_weather would call a live weather API instead of returning hardcoded data.
When to use ReAct
ReAct is the right choice when answers require:
- Real-time data: weather, stock prices, sports scores, news
- Computation: a calculator tool for math that needs guaranteed accuracy
- External actions: sending emails, querying databases, running code
- Information beyond the training cutoff: anything recent
3. RAG: Retrieval-Augmented Generation
RAG solves the private knowledge problem. Instead of fine-tuning a model on your internal documents (expensive, slow, and stale the moment documents update), RAG retrieves the relevant documents at query time and includes them in the prompt.
Think of it as an open-book test. The LLM doesn’t need to have memorized your company handbook — it just needs to be handed the right page before it answers.
How RAG works
User query
│
▼
Convert query to vector embedding
│
▼
Search vector database for similar document chunks
│
▼
Retrieve top-k matching chunks
│
▼
Inject chunks into prompt as context
│
▼
LLM answers based only on the provided context
The vector database stores your documents as numerical embeddings — representations of meaning rather than exact text. When a question comes in, it’s embedded the same way and matched against stored chunks by similarity. Only the most relevant chunks are retrieved.
This keeps the context window manageable. You’re not dumping an entire 500-page handbook into every prompt — you’re retrieving the 2–3 sections that actually answer the question.
RAG prompt pattern
Once you’ve retrieved the relevant document chunks, the prompt structure is straightforward:
import os
from groq import Groq
client = Groq(api_key=os.environ["GROQ_API_KEY"])
def answer_from_context(question: str, retrieved_chunks: list[str]) -> str:
context = "\n\n".join(retrieved_chunks)
prompt = f"""Answer the question using only the information provided in the context below.
If the answer is not in the context, say "I don't have that information."
Do not use any outside knowledge.
Context:
{context}
Question: {question}"""
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
# Example: company policy documents retrieved from vector DB
retrieved = [
"Nerchuko employees are entitled to 22 paid holidays per year.",
"Holidays include national public holidays and company-specific days as announced annually."
]
print(answer_from_context("How many holidays do Nerchuko employees get?", retrieved))
The instruction "Do not use any outside knowledge" is critical. Without it, the model may blend retrieved content with its training data and hallucinate. The constraint makes answers traceable to source documents.
Why not just fine-tune instead?
Fine-tuning bakes knowledge into the model weights. RAG keeps it in documents.
| Fine-tuning | RAG | |
|---|---|---|
| Update documents | Retrain the model | Update the database |
| Cost | High | Low |
| Answer traceability | Hard | Easy (you know the source chunk) |
| Good for | Behavior/style changes | Knowledge/facts |
For knowledge that changes (policies, product docs, FAQs), RAG is almost always the better choice. Fine-tuning is for changing how the model behaves, not what it knows.
4. ReAct vs RAG: which one to use
| ReAct | RAG | |
|---|---|---|
| Problem | Real-time or external data | Private or internal knowledge |
| Data source | APIs, web, tools | Your own documents |
| Latency | Depends on tool response time | Depends on vector search speed |
| Hallucination risk | Low (grounded in tool results) | Low (grounded in retrieved docs) |
| Use case | ”What’s the weather?" | "What’s our refund policy?” |
They’re not mutually exclusive. A real AI agent often uses both: RAG for internal knowledge and ReAct for anything requiring live external data.
5. The full picture: plain LLM → AI Agent
Looking back at the full series:
- Parts 1–5: What LLMs are, how to use APIs, how to control output
- Parts 6–7: Open-source models via Groq and locally via Ollama
- Parts 8–10: Prompt engineering — from basic prompting to tool use and retrieval
The progression from Parts 8–10 specifically traces how prompts evolve:
- Zero/Few-Shot — tell the model what format you want
- Chain-of-Thought — tell the model how to reason
- ReAct + RAG — give the model external capabilities
At this point, you have a complete foundation. An LLM with a well-structured prompt, tool access via ReAct, and a document retrieval layer via RAG is, functionally, an AI agent — it can reason, act, retrieve, and respond accurately on live and private information.
What’s next
Part 11 goes deep on RAG specifically — how the retrieval layer actually works: chunking strategies, vector embeddings, cosine similarity vs Euclidean distance, and a full implementation using ChromaDB.
Full video walkthrough is embedded above.