Building a RAG Pipeline

Python LangChain OpenAI FAISS RAG LLM

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that enhances Large Language Models (LLMs) by giving them access to external knowledge. Instead of relying solely on pre-trained data, RAG retrieves relevant documents from a vector database and feeds them into the LLM as context — producing more accurate, up-to-date, and domain-specific responses.

💡 Why RAG Matters

LLMs have a knowledge cutoff and can hallucinate facts. RAG grounds the model in your own data — making it perfect for customer support bots, internal knowledge bases, legal document analysis, and more.

Architecture Overview

Ingestion Pipeline

📄 Text File

→

🔨 TextLoader

→

✂️ Splitter

→

🔢 Embeddings

→

🗄️ FAISS

Query Time

Retrieval & Generation

❓ Question

→

🔍 Similarity Search

→

📝 Prompt + Context

→

🤖 GPT-3.5 Answer

Your data → Chunks → Vectors → FAISS → Retrieved Chunks → LLM → Grounded Answer

Step 1: Imports & Setup

First, import all required libraries:

python

from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.vectorstores import FAISS
from dotenv import load_dotenv
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from pathlib import Path

print("Ok, let's start with RAG demo")

Step 2: Load the Document

Load your text file using TextLoader and Path:

python

# Get file
file_path = Path("../data/information.txt").resolve()
print(f"File path: {file_path}")

# Load document
loader = TextLoader(file_path, encoding="utf-8")
documents = loader.load()
print(f"Loaded {len(documents)} document(s)")

# Inspect first document
documents[0]

Step 3: Create Chunks

Split the document into overlapping chunks for better context retrieval:

python

# Create chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=50)
chunks = text_splitter.split_documents(documents=documents)
print(f"Created {len(chunks)} chunk(s)")

Step 4: Create Embeddings

Initialize OpenAI embeddings with text-embedding-3-small and 1024 dimensions:

python

# Create embeddings
load_dotenv()  # load environment variables from .env file
embedding_model = OpenAIEmbeddings(
    model="text-embedding-3-small", dimensions=1024)

# Just check embedding, not for rag flow
embeddings = embedding_model.embed_documents(
    [chunk.page_content for chunk in chunks])
embeddings

Step 5: Build FAISS Vector Store

Store chunks as vectors in FAISS for fast similarity search:

python

# Create vectorstore
vector_store = FAISS.from_documents(
    documents=chunks, embedding=embedding_model)

# Get relevant documents
relevant_docs = vector_store.similarity_search(
    "story of Captain Ahab's")

Step 6: Format Documents for Context

Create a helper to format retrieved documents:

python

# Format relevant documents for better display
def format_relevant_docs(docs):
    return [doc for doc in docs]

Step 7: Create Prompt Template

Build a prompt where {context} is the relevant documents and {question} is the user query:

python

# Prompt template where context is the relevant documents and question is the user query
prompt = ChatPromptTemplate.from_messages(
    [
        ("system",
         "You are a helpful assistant that answers questions based on the following context: {context}"),
        ("user", "Question: {question}")
    ])

Step 8: Initialize LLM & Output Parser

Set up GPT-3.5-turbo with zero temperature for deterministic answers:

python

# Instantiate the model
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

# Output parser
output_parser = StrOutputParser()

Step 9: Build & Run the RAG Chain

Connect everything into a single pipeline and invoke it:

python

# Create RAG chain
rag_chain = prompt | llm | output_parser

# Invoke the RAG chain
context = format_relevant_docs(relevant_docs)
question = "What is the story of Captain Ahab's?"  # Happy test case
# question = "What is the story of shubham?" # Negative test case

rag_chain.invoke(
    {"context": context, "question": question}
)

✅ How It Works

1. Load your text file → 2. Split into chunks → 3. Embed with OpenAI → 4. Store in FAISS → 5. Search for relevant chunks → 6. Pass to LLM with prompt → 7. Get accurate answer grounded in your data.

Dependencies

Install these packages before running:

bash

pip install langchain langchain-openai langchain-community faiss-cpu python-dotenv

⚠️ Requirements

• Create a .env file with your OPENAI_API_KEY
• Place your text file at ../data/information.txt (relative to notebook)
• FAISS runs in-memory; for persistence, use vector_store.save_local() and FAISS.load_local()

Next Steps

You now have a working RAG pipeline! To take it further:

Try the negative test case — ask about something not in the document and see how the LLM handles it
Add conversation memory for multi-turn Q&A
Swap FAISS for ChromaDB or Pinecone for persistent storage
Deploy as a Streamlit or FastAPI web app

🚀 Want a Live Demo?

Try the RAG system interactively — upload your document and ask questions in real-time.

Request a Demo

Building a RAG Pipeline with LangChain & FAISS

What is RAG?

💡 Why RAG Matters

Architecture Overview

Step 1: Imports & Setup

Step 2: Load the Document

Step 3: Create Chunks

Step 4: Create Embeddings

Step 5: Build FAISS Vector Store

Step 6: Format Documents for Context

Step 7: Create Prompt Template

Step 8: Initialize LLM & Output Parser

Step 9: Build & Run the RAG Chain

✅ How It Works

Dependencies

⚠️ Requirements

Next Steps

🚀 Want a Live Demo?