Tutorial

Build a ChatGPT for Your Team Docs: RAG System Tutorial

Step-by-step guide to building your own documentation chatbot. Works with OpenAI, Ollama, or OpenRouter. Complete code included - no AI experience needed!

AI Literacy Team
2025-03-05
15 min read

Build a ChatGPT for Your Team Docs: RAG System Tutorial

Want to build a smart assistant that answers questions about your team's documentation? This step-by-step tutorial shows you how - no AI experience needed.

What You'll Build

A production-ready RAG (Retrieval Augmented Generation) system with:

  • πŸ” Smart document search using ChromaDB vector database
  • πŸ€– Multiple LLM options: Ollama (local/free), OpenAI, or OpenRouter
  • πŸ’¬ Beautiful chat interface using Gradio
  • 🐳 Docker deployment - just docker-compose up
  • πŸ“š Sample documentation to get started
  • πŸ”‘ Flexible: Works with or without API keys

Live Demo:

You: "How do I deploy to production?"

RAG System: "Based on your deployment documentation:

1. Run tests: `npm test`
2. Build production: `npm run build`
3. Deploy: `./scripts/deploy.sh production`
4. Monitor at: https://status.example.com

Source: docs/deployment-guide.md"

Prerequisites

  • Docker & Docker Compose installed
  • Choose one:
    • Option A: 8GB+ RAM for local Ollama (free, private)
    • Option B: OpenAI or OpenRouter API key (cloud, fast)
  • 30 minutes of your time
  • No AI experience required!

Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Gradio    β”‚ ← User asks question
β”‚     UI      β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  RAG App    β”‚ ← Orchestrates everything
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
   β”Œβ”€β”€β”€β”΄β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚        β”‚          β”‚          β”‚
   β–Ό        β–Ό          β–Ό          β–Ό
β”Œβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Docs β”‚ β”‚Chromaβ”‚ β”‚ Ollama  β”‚ β”‚Embeddingsβ”‚
β”‚ DB  β”‚ β”‚Vectorβ”‚ β”‚  LLM    β”‚ β”‚  Model  β”‚
β””β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Project Structure

rag-system/
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ app.py                 # Main RAG application
β”œβ”€β”€ ingest.py             # Document ingestion
β”œβ”€β”€ docs/                 # Your documentation
β”‚   β”œβ”€β”€ deployment.md
β”‚   β”œβ”€β”€ architecture.md
β”‚   └── troubleshooting.md
└── chroma_db/            # Vector database (auto-created)

Step 1: Create Project Structure

mkdir rag-system
cd rag-system
mkdir docs

Step 2: Create Sample Documentation

Create docs/deployment.md:

# Deployment Guide

## Production Deployment

To deploy to production:

1. Run tests: `npm test`
2. Build: `npm run build`
3. Deploy: `./scripts/deploy.sh production`
4. Monitor: https://status.example.com

## Staging Deployment

For staging environment:

1. Push to staging branch: `git push origin staging`
2. Auto-deploys via GitHub Actions
3. Check: https://staging.example.com

Create docs/troubleshooting.md:

# Troubleshooting Guide

## Service Won't Start

If the service fails to start:

1. Check logs: `docker logs app-name`
2. Verify environment variables in `.env`
3. Ensure port 3000 is not in use
4. Restart: `docker-compose restart`

## Database Connection Issues

Common database errors:

- "Connection refused" β†’ Check if DB container is running
- "Auth failed" β†’ Verify credentials in `.env`
- "Timeout" β†’ Check network settings

Step 3: Create Requirements File

Create requirements.txt:

langchain==0.1.0
langchain-community==0.0.20
langchain-openai==0.0.5
chromadb==0.4.22
ollama==0.1.6
gradio==4.19.0
sentence-transformers==2.3.1
openai==1.12.0

Step 4: Create Ingestion Script

Create ingest.py:

"""
Document Ingestion Script
Supports: Ollama (local) or OpenAI/OpenRouter (cloud)
"""
import os
from langchain_community.document_loaders import TextLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

def ingest_documents():
    print("πŸ“š Loading documents from ./docs...")

    # Load all markdown files
    loader = DirectoryLoader('./docs', glob="**/*.md", loader_cls=TextLoader)
    documents = loader.load()
    print(f"βœ… Loaded {len(documents)} documents")

    # Split into chunks
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    chunks = text_splitter.split_documents(documents)
    print(f"βœ… Created {len(chunks)} chunks")

    # Choose embeddings based on provider
    LLM_PROVIDER = os.getenv("LLM_PROVIDER", "ollama").lower()

    if LLM_PROVIDER == "ollama":
        embeddings = OllamaEmbeddings(
            model="nomic-embed-text",
            base_url="http://ollama:11434"
        )
    else:
        # Use OpenAI for embeddings (works for OpenAI & OpenRouter)
        embeddings = OpenAIEmbeddings(
            model="text-embedding-3-small",
            openai_api_key=os.getenv("OPENAI_API_KEY")
        )

    # Store in ChromaDB
    print("πŸ’Ύ Storing in vector database...")
    vectorstore = Chroma.from_documents(
        documents=chunks,
        embedding=embeddings,
        persist_directory="./chroma_db",
        collection_name="docs"
    )

    print(f"✨ Ingestion complete! {len(chunks)} chunks stored")
    return vectorstore

if __name__ == "__main__":
    ingest_documents()

Step 5: Create Main RAG Application

Create app.py:

"""
RAG System with Multi-Provider Support
Supports: Ollama (local), OpenAI, or OpenRouter
"""
import os
from langchain_community.embeddings import OllamaEmbeddings
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain_community.llms import Ollama
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
import gradio as gr

print("πŸš€ Initializing RAG system...")

# Get provider from environment
LLM_PROVIDER = os.getenv("LLM_PROVIDER", "ollama").lower()

# Initialize embeddings
if LLM_PROVIDER == "ollama":
    embeddings = OllamaEmbeddings(model="nomic-embed-text", base_url="http://ollama:11434")
else:
    embeddings = OpenAIEmbeddings(
        model="text-embedding-3-small",
        openai_api_key=os.getenv("OPENAI_API_KEY")
    )

# Load vector database
vectorstore = Chroma(
    persist_directory="./chroma_db",
    embedding_function=embeddings,
    collection_name="docs"
)

retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})

# Initialize LLM based on provider
if LLM_PROVIDER == "ollama":
    llm = Ollama(model="llama3.2", base_url="http://ollama:11434", temperature=0.3)
    model_name = "llama3.2 (Ollama)"
elif LLM_PROVIDER == "openai":
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.3, openai_api_key=os.getenv("OPENAI_API_KEY"))
    model_name = "gpt-4o-mini (OpenAI)"
elif LLM_PROVIDER == "openrouter":
    llm = ChatOpenAI(
        model="anthropic/claude-3.5-sonnet",
        temperature=0.3,
        openai_api_key=os.getenv("OPENROUTER_API_KEY"),
        openai_api_base="https://openrouter.ai/api/v1"
    )
    model_name = "claude-3.5-sonnet (OpenRouter)"

# Create RAG chain
prompt = PromptTemplate(
    template="""Answer based on the documentation. Cite sources.

Context: {context}
Question: {question}

Answer:""",
    input_variables=["context", "question"]
)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt}
)

print(f"βœ… RAG ready! Using {model_name}")

def chat(message, history):
    try:
        result = qa_chain.invoke({"query": message})
        answer = result.get('result', str(result))
        sources = result.get('source_documents', [])

        if sources:
            answer += "\n\n**Sources:**\n" + "\n".join([f"{i+1}. {doc.metadata.get('source')}" for i, doc in enumerate(sources[:3])])

        return answer
    except Exception as e:
        return f"❌ Error: {str(e)}"

demo = gr.ChatInterface(
    chat,
    title=f"πŸ€– Documentation Assistant",
    description=f"Powered by RAG with {model_name}",
    examples=["How do I deploy to production?", "Fix database errors?"],
    theme=gr.themes.Soft()
)

if __name__ == "__main__":
    demo.launch(server_name="0.0.0.0", server_port=7860)

Step 6: Create Dockerfile

Create Dockerfile:

FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application files
COPY . .

# Expose Gradio port
EXPOSE 7860

# Default command (can be overridden)
CMD ["python", "app.py"]

Step 7: Create Docker Compose & Environment

Create docker-compose.yml:

version: '3.8'

services:
  # Ollama (optional - only for local LLM)
  ollama:
    image: ollama/ollama:latest
    container_name: rag-ollama
    volumes:
      - ollama_data:/root/.ollama
    ports:
      - "11434:11434"
    networks:
      - rag-network
    profiles:
      - ollama  # Only start with --profile ollama

  # RAG Application
  rag-app:
    build: .
    container_name: rag-app
    volumes:
      - ./docs:/app/docs
      - ./chroma_db:/app/chroma_db
    ports:
      - "7860:7860"
    networks:
      - rag-network
    environment:
      # Choose provider: ollama, openai, or openrouter
      - LLM_PROVIDER=${LLM_PROVIDER:-ollama}
      - OLLAMA_BASE_URL=${OLLAMA_BASE_URL:-http://ollama:11434}
      - OPENAI_API_KEY=${OPENAI_API_KEY:-}
      - OPENROUTER_API_KEY=${OPENROUTER_API_KEY:-}

networks:
  rag-network:
    driver: bridge

volumes:
  ollama_data:

Create .env file:

# Choose: ollama, openai, or openrouter
LLM_PROVIDER=openai

# OpenAI key (required for openai or openrouter)
OPENAI_API_KEY=your-key-here

# OpenRouter key (only if using OpenRouter for LLM)
# OPENROUTER_API_KEY=your-key-here

Step 8: Choose Your Provider & Run

Option 1: Using OpenAI (Recommended for laptops)

# 1. Set your API key in .env
echo "LLM_PROVIDER=openai" > .env
echo "OPENAI_API_KEY=your-key-here" >> .env

# 2. Build and ingest
docker-compose build rag-app
docker-compose run --rm rag-app python ingest.py

# 3. Start
docker-compose up

Option 2: Using Ollama (Free, requires 8GB+ RAM)

# 1. Start Ollama
docker-compose --profile ollama up -d ollama
sleep 10

# 2. Pull models
docker exec rag-ollama ollama pull llama3.2
docker exec rag-ollama ollama pull nomic-embed-text

# 3. Build and ingest
docker-compose build rag-app
docker-compose run --rm rag-app python ingest.py

# 4. Start
docker-compose up

Option 3: Using OpenRouter (Access to Claude, Gemini, etc.)

# 1. Set keys in .env (needs both!)
echo "LLM_PROVIDER=openrouter" > .env
echo "OPENAI_API_KEY=sk-..." >> .env  # For embeddings
echo "OPENROUTER_API_KEY=sk-or-..." >> .env  # For LLM

# 2. Build and ingest
docker-compose build rag-app
docker-compose run --rm rag-app python ingest.py

# 3. Start
docker-compose up

Access at: http://localhost:7860

Testing Your RAG System

Try these questions in the Gradio interface:

  1. "How do I deploy to production?"

    • Should return deployment steps from deployment.md
  2. "What should I do if the service won't start?"

    • Should return troubleshooting steps
  3. "How do I fix database errors?"

    • Should cite troubleshooting.md

How It Works

1. Document Ingestion

docs β†’ split into chunks β†’ create embeddings β†’ store in ChromaDB

2. Question Answering

question β†’ create embedding β†’ search similar chunks β†’
add to LLM context β†’ generate answer

3. Key Components

ChromaDB: Stores document embeddings for fast similarity search Ollama: Runs LLM locally (no API keys!) LangChain: Orchestrates the RAG pipeline Gradio: Provides chat interface

LLM Provider Comparison

ProviderCostSpeedPrivacyBest For
OllamaFreeMedium100% PrivateDesktops, privacy-focused
OpenAI$0.15/1M tokensFastSent to OpenAIMost use cases
OpenRouterVariesFastSent to providerMulti-model access

When to use each:

  • Ollama: Desktop with 8GB+ RAM, need privacy, no ongoing costs
  • OpenAI: Laptops, fastest setup, best performance/cost
  • OpenRouter: Want Claude, Gemini, or other models

Customization Tips

Use Different Models

Edit docker-compose.yml:

# For faster responses (smaller model)
docker exec rag-ollama ollama pull llama3.2:1b

# For better quality (larger model)
docker exec rag-ollama ollama pull llama3.1:8b

Update app.py:

llm = Ollama(
    model="llama3.1:8b",  # Change model here
    base_url="http://ollama:11434"
)

Add More Documents

Just drop markdown files in ./docs:

cp ~/my-team-docs/*.md ./docs/
docker-compose run --rm rag-app python ingest.py

Adjust Retrieval Settings

In app.py:

retriever = vectorstore.as_retriever(
    search_kwargs={"k": 5}  # Return more chunks
)

Change Chunk Size

In ingest.py:

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,   # Smaller chunks = more precise
    chunk_overlap=100
)

Production Deployment

Add Authentication

demo.launch(
    server_name="0.0.0.0",
    auth=("username", "password")  # Basic auth
)

Add Monitoring

import time

def chat(message, history):
    start = time.time()
    result = qa_chain({"query": message})
    duration = time.time() - start

    print(f"Query: {message}")
    print(f"Duration: {duration:.2f}s")

    return result['result']

Persist Data

Volumes in docker-compose.yml already persist:

  • ollama_data: Downloaded models
  • ./chroma_db: Vector database

Troubleshooting

"ModuleNotFoundError"

# Rebuild with updated requirements
docker-compose build --no-cache

"Ollama connection refused"

# Check Ollama is running
docker ps | grep ollama

# Check logs
docker logs rag-ollama

"No documents found"

# Re-run ingestion
docker-compose run --rm rag-app python ingest.py

Slow responses

# Use smaller model
docker exec rag-ollama ollama pull llama3.2:1b

# Or reduce temperature (faster, less creative)
llm = Ollama(model="llama3.2", temperature=0.1)

What You Learned

βœ… How RAG systems work (Retrieval + Augmentation + Generation) βœ… Document embedding and vector search βœ… Running local LLMs with Ollama βœ… Building chat interfaces with Gradio βœ… Containerizing AI applications with Docker

Next Steps

  1. Add conversation memory - Remember chat history
  2. Multi-modal RAG - Add images, PDFs, videos
  3. Hybrid search - Combine semantic + keyword search
  4. Agent-based retrieval - Let AI decide when to search
  5. Deploy to production - Add auth, monitoring, scaling

Complete Code Repository

Get the full working code: πŸ‘‰ Download Complete RAG System on GitHub

Clone and run:

git clone https://github.com/alwaysnix/ai-literacy.git
cd ai-literacy/rag-system
docker-compose build
docker-compose run --rm rag-app python ingest.py
docker-compose up

Open http://localhost:7860 and start asking questions!

Want to master advanced RAG techniques? Our training covers:

  • Multi-modal RAG (images, videos, code)
  • Production deployment at scale
  • Cost optimization strategies
  • Evaluation and monitoring

Join the Next Cohort β†’


Quick Reference

# Setup
./setup.sh

# Start
docker-compose up

# Stop
docker-compose down

# Re-ingest docs
docker-compose run --rm rag-app python ingest.py

# View logs
docker-compose logs -f

# Reset everything
docker-compose down -v
rm -rf chroma_db

Questions? The RAG system can answer questions about itself! Just ask it "How does this RAG system work?" πŸ˜‰

Ready to Become AI-Literate?

Join our 2-week hands-on training and go from curious to confident with AI.