What is RAG?
Retrieval-Augmented Generation (RAG) is a technique that enhances Large Language Models (LLMs) by giving them access to external knowledge. Instead of relying solely on pre-trained data, RAG retrieves relevant documents from a vector database and feeds them into the LLM as context — producing more accurate, up-to-date, and domain-specific responses.
💡 Why RAG Matters
LLMs have a knowledge cutoff and can hallucinate facts. RAG grounds the model in your own data — making it perfect for customer support bots, internal knowledge bases, legal document analysis, and more.
Architecture Overview
Your data → Chunks → Vectors → FAISS → Retrieved Chunks → LLM → Grounded Answer
Step 1: Imports & Setup
First, import all required libraries:
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.vectorstores import FAISS
from dotenv import load_dotenv
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from pathlib import Path
print("Ok, let's start with RAG demo")
Step 2: Load the Document
Load your text file using TextLoader and Path:
# Get file
file_path = Path("../data/information.txt").resolve()
print(f"File path: {file_path}")
# Load document
loader = TextLoader(file_path, encoding="utf-8")
documents = loader.load()
print(f"Loaded {len(documents)} document(s)")
# Inspect first document
documents[0]
Step 3: Create Chunks
Split the document into overlapping chunks for better context retrieval:
# Create chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, chunk_overlap=50)
chunks = text_splitter.split_documents(documents=documents)
print(f"Created {len(chunks)} chunk(s)")
Step 4: Create Embeddings
Initialize OpenAI embeddings with text-embedding-3-small and 1024 dimensions:
# Create embeddings
load_dotenv() # load environment variables from .env file
embedding_model = OpenAIEmbeddings(
model="text-embedding-3-small", dimensions=1024)
# Just check embedding, not for rag flow
embeddings = embedding_model.embed_documents(
[chunk.page_content for chunk in chunks])
embeddings
Step 5: Build FAISS Vector Store
Store chunks as vectors in FAISS for fast similarity search:
# Create vectorstore
vector_store = FAISS.from_documents(
documents=chunks, embedding=embedding_model)
# Get relevant documents
relevant_docs = vector_store.similarity_search(
"story of Captain Ahab's")
Step 6: Format Documents for Context
Create a helper to format retrieved documents:
# Format relevant documents for better display
def format_relevant_docs(docs):
return [doc for doc in docs]
Step 7: Create Prompt Template
Build a prompt where {context} is the relevant documents and {question} is the user query:
# Prompt template where context is the relevant documents and question is the user query
prompt = ChatPromptTemplate.from_messages(
[
("system",
"You are a helpful assistant that answers questions based on the following context: {context}"),
("user", "Question: {question}")
])
Step 8: Initialize LLM & Output Parser
Set up GPT-3.5-turbo with zero temperature for deterministic answers:
# Instantiate the model
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
# Output parser
output_parser = StrOutputParser()
Step 9: Build & Run the RAG Chain
Connect everything into a single pipeline and invoke it:
# Create RAG chain
rag_chain = prompt | llm | output_parser
# Invoke the RAG chain
context = format_relevant_docs(relevant_docs)
question = "What is the story of Captain Ahab's?" # Happy test case
# question = "What is the story of shubham?" # Negative test case
rag_chain.invoke(
{"context": context, "question": question}
)
✅ How It Works
1. Load your text file → 2. Split into chunks → 3. Embed with OpenAI → 4. Store in FAISS → 5. Search for relevant chunks → 6. Pass to LLM with prompt → 7. Get accurate answer grounded in your data.
Dependencies
Install these packages before running:
pip install langchain langchain-openai langchain-community faiss-cpu python-dotenv
⚠️ Requirements
• Create a .env file with your OPENAI_API_KEY
• Place your text file at ../data/information.txt (relative to notebook)
• FAISS runs in-memory; for persistence, use vector_store.save_local() and FAISS.load_local()
Next Steps
You now have a working RAG pipeline! To take it further:
- Try the negative test case — ask about something not in the document and see how the LLM handles it
- Add conversation memory for multi-turn Q&A
- Swap FAISS for ChromaDB or Pinecone for persistent storage
- Deploy as a Streamlit or FastAPI web app
🚀 Want a Live Demo?
Try the RAG system interactively — upload your document and ask questions in real-time.
Request a Demo