AI / LLM Tutorial

Building a RAG Pipeline with LangChain & FAISS

A complete, working Retrieval-Augmented Generation demo using LangChain, OpenAI embeddings, FAISS vector store, and GPT-3.5-turbo.

📅 May 2026 ⏱️ 8 min read 👤 Trenzy Vibes
Python LangChain OpenAI FAISS RAG LLM

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that enhances Large Language Models (LLMs) by giving them access to external knowledge. Instead of relying solely on pre-trained data, RAG retrieves relevant documents from a vector database and feeds them into the LLM as context — producing more accurate, up-to-date, and domain-specific responses.

💡 Why RAG Matters

LLMs have a knowledge cutoff and can hallucinate facts. RAG grounds the model in your own data — making it perfect for customer support bots, internal knowledge bases, legal document analysis, and more.

Architecture Overview

Ingestion Pipeline
📄 Text File
🔨 TextLoader
✂️ Splitter
🔢 Embeddings
🗄️ FAISS
Query Time
Retrieval & Generation
❓ Question
🔍 Similarity Search
📝 Prompt + Context
🤖 GPT-3.5 Answer

Your data → Chunks → Vectors → FAISS → Retrieved Chunks → LLM → Grounded Answer

Step 1: Imports & Setup

First, import all required libraries:

python
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.vectorstores import FAISS
from dotenv import load_dotenv
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from pathlib import Path

print("Ok, let's start with RAG demo")

Step 2: Load the Document

Load your text file using TextLoader and Path:

python
# Get file
file_path = Path("../data/information.txt").resolve()
print(f"File path: {file_path}")

# Load document
loader = TextLoader(file_path, encoding="utf-8")
documents = loader.load()
print(f"Loaded {len(documents)} document(s)")

# Inspect first document
documents[0]

Step 3: Create Chunks

Split the document into overlapping chunks for better context retrieval:

python
# Create chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=50)
chunks = text_splitter.split_documents(documents=documents)
print(f"Created {len(chunks)} chunk(s)")

Step 4: Create Embeddings

Initialize OpenAI embeddings with text-embedding-3-small and 1024 dimensions:

python
# Create embeddings
load_dotenv()  # load environment variables from .env file
embedding_model = OpenAIEmbeddings(
    model="text-embedding-3-small", dimensions=1024)

# Just check embedding, not for rag flow
embeddings = embedding_model.embed_documents(
    [chunk.page_content for chunk in chunks])
embeddings

Step 5: Build FAISS Vector Store

Store chunks as vectors in FAISS for fast similarity search:

python
# Create vectorstore
vector_store = FAISS.from_documents(
    documents=chunks, embedding=embedding_model)

# Get relevant documents
relevant_docs = vector_store.similarity_search(
    "story of Captain Ahab's")

Step 6: Format Documents for Context

Create a helper to format retrieved documents:

python
# Format relevant documents for better display
def format_relevant_docs(docs):
    return [doc for doc in docs]

Step 7: Create Prompt Template

Build a prompt where {context} is the relevant documents and {question} is the user query:

python
# Prompt template where context is the relevant documents and question is the user query
prompt = ChatPromptTemplate.from_messages(
    [
        ("system",
         "You are a helpful assistant that answers questions based on the following context: {context}"),
        ("user", "Question: {question}")
    ])

Step 8: Initialize LLM & Output Parser

Set up GPT-3.5-turbo with zero temperature for deterministic answers:

python
# Instantiate the model
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

# Output parser
output_parser = StrOutputParser()

Step 9: Build & Run the RAG Chain

Connect everything into a single pipeline and invoke it:

python
# Create RAG chain
rag_chain = prompt | llm | output_parser

# Invoke the RAG chain
context = format_relevant_docs(relevant_docs)
question = "What is the story of Captain Ahab's?"  # Happy test case
# question = "What is the story of shubham?" # Negative test case

rag_chain.invoke(
    {"context": context, "question": question}
)

✅ How It Works

1. Load your text file → 2. Split into chunks → 3. Embed with OpenAI → 4. Store in FAISS → 5. Search for relevant chunks → 6. Pass to LLM with prompt → 7. Get accurate answer grounded in your data.

Dependencies

Install these packages before running:

bash
pip install langchain langchain-openai langchain-community faiss-cpu python-dotenv

⚠️ Requirements

• Create a .env file with your OPENAI_API_KEY
• Place your text file at ../data/information.txt (relative to notebook)
• FAISS runs in-memory; for persistence, use vector_store.save_local() and FAISS.load_local()

Next Steps

You now have a working RAG pipeline! To take it further:

  1. Try the negative test case — ask about something not in the document and see how the LLM handles it
  2. Add conversation memory for multi-turn Q&A
  3. Swap FAISS for ChromaDB or Pinecone for persistent storage
  4. Deploy as a Streamlit or FastAPI web app

🚀 Want a Live Demo?

Try the RAG system interactively — upload your document and ask questions in real-time.

Request a Demo