Build a ChatGPT for Your Team Docs: RAG System Tutorial
Step-by-step guide to building your own documentation chatbot. Works with OpenAI, Ollama, or OpenRouter. Complete code included - no AI experience needed!
Build a ChatGPT for Your Team Docs: RAG System Tutorial
Want to build a smart assistant that answers questions about your team's documentation? This step-by-step tutorial shows you how - no AI experience needed.
What You'll Build
A production-ready RAG (Retrieval Augmented Generation) system with:
- π Smart document search using ChromaDB vector database
- π€ Multiple LLM options: Ollama (local/free), OpenAI, or OpenRouter
- π¬ Beautiful chat interface using Gradio
- π³ Docker deployment - just
docker-compose up
- π Sample documentation to get started
- π Flexible: Works with or without API keys
Live Demo:
You: "How do I deploy to production?"
RAG System: "Based on your deployment documentation:
1. Run tests: `npm test`
2. Build production: `npm run build`
3. Deploy: `./scripts/deploy.sh production`
4. Monitor at: https://status.example.com
Source: docs/deployment-guide.md"
Prerequisites
- Docker & Docker Compose installed
- Choose one:
- Option A: 8GB+ RAM for local Ollama (free, private)
- Option B: OpenAI or OpenRouter API key (cloud, fast)
- 30 minutes of your time
- No AI experience required!
Architecture Overview
βββββββββββββββ
β Gradio β β User asks question
β UI β
ββββββββ¬βββββββ
β
βΌ
βββββββββββββββ
β RAG App β β Orchestrates everything
ββββββββ¬βββββββ
β
βββββ΄βββββ¬βββββββββββ¬βββββββββββ
β β β β
βΌ βΌ βΌ βΌ
βββββββ ββββββββ βββββββββββ ββββββββββ
βDocs β βChromaβ β Ollama β βEmbeddingsβ
β DB β βVectorβ β LLM β β Model β
βββββββ ββββββββ βββββββββββ ββββββββββ
Project Structure
rag-system/
βββ docker-compose.yml
βββ Dockerfile
βββ requirements.txt
βββ app.py # Main RAG application
βββ ingest.py # Document ingestion
βββ docs/ # Your documentation
β βββ deployment.md
β βββ architecture.md
β βββ troubleshooting.md
βββ chroma_db/ # Vector database (auto-created)
Step 1: Create Project Structure
mkdir rag-system
cd rag-system
mkdir docs
Step 2: Create Sample Documentation
Create docs/deployment.md
:
# Deployment Guide
## Production Deployment
To deploy to production:
1. Run tests: `npm test`
2. Build: `npm run build`
3. Deploy: `./scripts/deploy.sh production`
4. Monitor: https://status.example.com
## Staging Deployment
For staging environment:
1. Push to staging branch: `git push origin staging`
2. Auto-deploys via GitHub Actions
3. Check: https://staging.example.com
Create docs/troubleshooting.md
:
# Troubleshooting Guide
## Service Won't Start
If the service fails to start:
1. Check logs: `docker logs app-name`
2. Verify environment variables in `.env`
3. Ensure port 3000 is not in use
4. Restart: `docker-compose restart`
## Database Connection Issues
Common database errors:
- "Connection refused" β Check if DB container is running
- "Auth failed" β Verify credentials in `.env`
- "Timeout" β Check network settings
Step 3: Create Requirements File
Create requirements.txt
:
langchain==0.1.0
langchain-community==0.0.20
langchain-openai==0.0.5
chromadb==0.4.22
ollama==0.1.6
gradio==4.19.0
sentence-transformers==2.3.1
openai==1.12.0
Step 4: Create Ingestion Script
Create ingest.py
:
"""
Document Ingestion Script
Supports: Ollama (local) or OpenAI/OpenRouter (cloud)
"""
import os
from langchain_community.document_loaders import TextLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
def ingest_documents():
print("π Loading documents from ./docs...")
# Load all markdown files
loader = DirectoryLoader('./docs', glob="**/*.md", loader_cls=TextLoader)
documents = loader.load()
print(f"β
Loaded {len(documents)} documents")
# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)
print(f"β
Created {len(chunks)} chunks")
# Choose embeddings based on provider
LLM_PROVIDER = os.getenv("LLM_PROVIDER", "ollama").lower()
if LLM_PROVIDER == "ollama":
embeddings = OllamaEmbeddings(
model="nomic-embed-text",
base_url="http://ollama:11434"
)
else:
# Use OpenAI for embeddings (works for OpenAI & OpenRouter)
embeddings = OpenAIEmbeddings(
model="text-embedding-3-small",
openai_api_key=os.getenv("OPENAI_API_KEY")
)
# Store in ChromaDB
print("πΎ Storing in vector database...")
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db",
collection_name="docs"
)
print(f"β¨ Ingestion complete! {len(chunks)} chunks stored")
return vectorstore
if __name__ == "__main__":
ingest_documents()
Step 5: Create Main RAG Application
Create app.py
:
"""
RAG System with Multi-Provider Support
Supports: Ollama (local), OpenAI, or OpenRouter
"""
import os
from langchain_community.embeddings import OllamaEmbeddings
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain_community.llms import Ollama
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
import gradio as gr
print("π Initializing RAG system...")
# Get provider from environment
LLM_PROVIDER = os.getenv("LLM_PROVIDER", "ollama").lower()
# Initialize embeddings
if LLM_PROVIDER == "ollama":
embeddings = OllamaEmbeddings(model="nomic-embed-text", base_url="http://ollama:11434")
else:
embeddings = OpenAIEmbeddings(
model="text-embedding-3-small",
openai_api_key=os.getenv("OPENAI_API_KEY")
)
# Load vector database
vectorstore = Chroma(
persist_directory="./chroma_db",
embedding_function=embeddings,
collection_name="docs"
)
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})
# Initialize LLM based on provider
if LLM_PROVIDER == "ollama":
llm = Ollama(model="llama3.2", base_url="http://ollama:11434", temperature=0.3)
model_name = "llama3.2 (Ollama)"
elif LLM_PROVIDER == "openai":
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.3, openai_api_key=os.getenv("OPENAI_API_KEY"))
model_name = "gpt-4o-mini (OpenAI)"
elif LLM_PROVIDER == "openrouter":
llm = ChatOpenAI(
model="anthropic/claude-3.5-sonnet",
temperature=0.3,
openai_api_key=os.getenv("OPENROUTER_API_KEY"),
openai_api_base="https://openrouter.ai/api/v1"
)
model_name = "claude-3.5-sonnet (OpenRouter)"
# Create RAG chain
prompt = PromptTemplate(
template="""Answer based on the documentation. Cite sources.
Context: {context}
Question: {question}
Answer:""",
input_variables=["context", "question"]
)
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
return_source_documents=True,
chain_type_kwargs={"prompt": prompt}
)
print(f"β
RAG ready! Using {model_name}")
def chat(message, history):
try:
result = qa_chain.invoke({"query": message})
answer = result.get('result', str(result))
sources = result.get('source_documents', [])
if sources:
answer += "\n\n**Sources:**\n" + "\n".join([f"{i+1}. {doc.metadata.get('source')}" for i, doc in enumerate(sources[:3])])
return answer
except Exception as e:
return f"β Error: {str(e)}"
demo = gr.ChatInterface(
chat,
title=f"π€ Documentation Assistant",
description=f"Powered by RAG with {model_name}",
examples=["How do I deploy to production?", "Fix database errors?"],
theme=gr.themes.Soft()
)
if __name__ == "__main__":
demo.launch(server_name="0.0.0.0", server_port=7860)
Step 6: Create Dockerfile
Create Dockerfile
:
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application files
COPY . .
# Expose Gradio port
EXPOSE 7860
# Default command (can be overridden)
CMD ["python", "app.py"]
Step 7: Create Docker Compose & Environment
Create docker-compose.yml
:
version: '3.8'
services:
# Ollama (optional - only for local LLM)
ollama:
image: ollama/ollama:latest
container_name: rag-ollama
volumes:
- ollama_data:/root/.ollama
ports:
- "11434:11434"
networks:
- rag-network
profiles:
- ollama # Only start with --profile ollama
# RAG Application
rag-app:
build: .
container_name: rag-app
volumes:
- ./docs:/app/docs
- ./chroma_db:/app/chroma_db
ports:
- "7860:7860"
networks:
- rag-network
environment:
# Choose provider: ollama, openai, or openrouter
- LLM_PROVIDER=${LLM_PROVIDER:-ollama}
- OLLAMA_BASE_URL=${OLLAMA_BASE_URL:-http://ollama:11434}
- OPENAI_API_KEY=${OPENAI_API_KEY:-}
- OPENROUTER_API_KEY=${OPENROUTER_API_KEY:-}
networks:
rag-network:
driver: bridge
volumes:
ollama_data:
Create .env
file:
# Choose: ollama, openai, or openrouter
LLM_PROVIDER=openai
# OpenAI key (required for openai or openrouter)
OPENAI_API_KEY=your-key-here
# OpenRouter key (only if using OpenRouter for LLM)
# OPENROUTER_API_KEY=your-key-here
Step 8: Choose Your Provider & Run
Option 1: Using OpenAI (Recommended for laptops)
# 1. Set your API key in .env
echo "LLM_PROVIDER=openai" > .env
echo "OPENAI_API_KEY=your-key-here" >> .env
# 2. Build and ingest
docker-compose build rag-app
docker-compose run --rm rag-app python ingest.py
# 3. Start
docker-compose up
Option 2: Using Ollama (Free, requires 8GB+ RAM)
# 1. Start Ollama
docker-compose --profile ollama up -d ollama
sleep 10
# 2. Pull models
docker exec rag-ollama ollama pull llama3.2
docker exec rag-ollama ollama pull nomic-embed-text
# 3. Build and ingest
docker-compose build rag-app
docker-compose run --rm rag-app python ingest.py
# 4. Start
docker-compose up
Option 3: Using OpenRouter (Access to Claude, Gemini, etc.)
# 1. Set keys in .env (needs both!)
echo "LLM_PROVIDER=openrouter" > .env
echo "OPENAI_API_KEY=sk-..." >> .env # For embeddings
echo "OPENROUTER_API_KEY=sk-or-..." >> .env # For LLM
# 2. Build and ingest
docker-compose build rag-app
docker-compose run --rm rag-app python ingest.py
# 3. Start
docker-compose up
Access at: http://localhost:7860
Testing Your RAG System
Try these questions in the Gradio interface:
-
"How do I deploy to production?"
- Should return deployment steps from deployment.md
-
"What should I do if the service won't start?"
- Should return troubleshooting steps
-
"How do I fix database errors?"
- Should cite troubleshooting.md
How It Works
1. Document Ingestion
docs β split into chunks β create embeddings β store in ChromaDB
2. Question Answering
question β create embedding β search similar chunks β
add to LLM context β generate answer
3. Key Components
ChromaDB: Stores document embeddings for fast similarity search Ollama: Runs LLM locally (no API keys!) LangChain: Orchestrates the RAG pipeline Gradio: Provides chat interface
LLM Provider Comparison
Provider | Cost | Speed | Privacy | Best For |
---|---|---|---|---|
Ollama | Free | Medium | 100% Private | Desktops, privacy-focused |
OpenAI | $0.15/1M tokens | Fast | Sent to OpenAI | Most use cases |
OpenRouter | Varies | Fast | Sent to provider | Multi-model access |
When to use each:
- Ollama: Desktop with 8GB+ RAM, need privacy, no ongoing costs
- OpenAI: Laptops, fastest setup, best performance/cost
- OpenRouter: Want Claude, Gemini, or other models
Customization Tips
Use Different Models
Edit docker-compose.yml
:
# For faster responses (smaller model)
docker exec rag-ollama ollama pull llama3.2:1b
# For better quality (larger model)
docker exec rag-ollama ollama pull llama3.1:8b
Update app.py
:
llm = Ollama(
model="llama3.1:8b", # Change model here
base_url="http://ollama:11434"
)
Add More Documents
Just drop markdown files in ./docs
:
cp ~/my-team-docs/*.md ./docs/
docker-compose run --rm rag-app python ingest.py
Adjust Retrieval Settings
In app.py
:
retriever = vectorstore.as_retriever(
search_kwargs={"k": 5} # Return more chunks
)
Change Chunk Size
In ingest.py
:
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=500, # Smaller chunks = more precise
chunk_overlap=100
)
Production Deployment
Add Authentication
demo.launch(
server_name="0.0.0.0",
auth=("username", "password") # Basic auth
)
Add Monitoring
import time
def chat(message, history):
start = time.time()
result = qa_chain({"query": message})
duration = time.time() - start
print(f"Query: {message}")
print(f"Duration: {duration:.2f}s")
return result['result']
Persist Data
Volumes in docker-compose.yml
already persist:
ollama_data
: Downloaded models./chroma_db
: Vector database
Troubleshooting
"ModuleNotFoundError"
# Rebuild with updated requirements
docker-compose build --no-cache
"Ollama connection refused"
# Check Ollama is running
docker ps | grep ollama
# Check logs
docker logs rag-ollama
"No documents found"
# Re-run ingestion
docker-compose run --rm rag-app python ingest.py
Slow responses
# Use smaller model
docker exec rag-ollama ollama pull llama3.2:1b
# Or reduce temperature (faster, less creative)
llm = Ollama(model="llama3.2", temperature=0.1)
What You Learned
β How RAG systems work (Retrieval + Augmentation + Generation) β Document embedding and vector search β Running local LLMs with Ollama β Building chat interfaces with Gradio β Containerizing AI applications with Docker
Next Steps
- Add conversation memory - Remember chat history
- Multi-modal RAG - Add images, PDFs, videos
- Hybrid search - Combine semantic + keyword search
- Agent-based retrieval - Let AI decide when to search
- Deploy to production - Add auth, monitoring, scaling
Complete Code Repository
Get the full working code: π Download Complete RAG System on GitHub
Clone and run:
git clone https://github.com/alwaysnix/ai-literacy.git
cd ai-literacy/rag-system
docker-compose build
docker-compose run --rm rag-app python ingest.py
docker-compose up
Open http://localhost:7860 and start asking questions!
Want to master advanced RAG techniques? Our training covers:
- Multi-modal RAG (images, videos, code)
- Production deployment at scale
- Cost optimization strategies
- Evaluation and monitoring
Quick Reference
# Setup
./setup.sh
# Start
docker-compose up
# Stop
docker-compose down
# Re-ingest docs
docker-compose run --rm rag-app python ingest.py
# View logs
docker-compose logs -f
# Reset everything
docker-compose down -v
rm -rf chroma_db
Questions? The RAG system can answer questions about itself! Just ask it "How does this RAG system work?" π