Tutorial

Deploy Your First Local LLM with Ollama in 10 Minutes

Run ChatGPT-like AI models on your laptop—100% private, 100% free, no API keys needed. Perfect for engineers who want to experiment without cloud dependencies.

AI Literacy Team
2025-10-08
8 min read

Deploy Your First Local LLM with Ollama in 10 Minutes

You need to test an AI feature. Do you:

  • A) Sign up for OpenAI, add your credit card, worry about costs?
  • B) Use a free tier that'll cut you off mid-project?
  • C) Run a powerful LLM locally, for free, forever?

If you picked C, keep reading.

Why Local LLMs Matter for Engineers

As engineers, we value control, privacy, and not being at the mercy of third-party APIs. But until recently, running LLMs locally meant wrestling with Python environments, downloading 10GB+ model files, and configuring GPU drivers.

Ollama changed everything.

It's like Docker for AI models—one command to pull and run any LLM. No configuration. No complexity. Just AI on your machine.

Who This Is For

  • DevOps/SRE: Test AI-powered runbooks without sharing production logs with OpenAI
  • Security Engineers: Analyze sensitive logs locally, no data leaves your network
  • Developers: Prototype AI features without API rate limits or costs
  • QA Engineers: Generate test data without internet dependency
  • Anyone: Who wants AI without vendor lock-in

What You'll Learn

By the end of this 10-minute tutorial, you'll have:

  • ✅ Ollama installed and running
  • ✅ A ChatGPT-quality model running locally
  • ✅ Practical examples for your role
  • ✅ Integration with your existing tools

System requirements: 8GB+ RAM (16GB recommended), macOS/Linux/Windows

Let's go.


Step 1: Install Ollama (2 minutes)

Ollama works on macOS, Linux, and Windows. Pick your OS:

macOS

# Download and install
curl -fsSL https://ollama.com/install.sh | sh

# Or use Homebrew
brew install ollama

Linux

curl -fsSL https://ollama.com/install.sh | sh

Windows

Download from ollama.com/download and run the installer.

Verify installation:

ollama --version
# Output: ollama version 0.3.12

Step 2: Pull Your First Model (3 minutes)

Ollama supports dozens of models. Let's start with Llama 3.2 (Meta's latest, ChatGPT-quality):

ollama pull llama3.2

What's happening:

  • Downloads ~4GB model (one-time)
  • Optimizes for your CPU/GPU
  • Stores locally in ~/.ollama/models

Other popular models:

ollama pull mistral        # Fast, 7B parameters
ollama pull codellama      # Code-focused
ollama pull phi3           # Tiny but powerful (3.8B)
ollama pull llama3.2:70b   # Maximum quality (needs 32GB+ RAM)

Step 3: Run Your First Prompt (1 minute)

ollama run llama3.2

You're now in an interactive chat. Try:

>>> Explain Kubernetes to a 5-year-old

Kubernetes is like a toy box organizer! Imagine you have lots of toy
containers (apps), and Kubernetes is the grown-up who decides which
shelf (server) each container goes on, makes sure they don't fall off,
and brings you a new one if yours breaks.

>>> /bye

That's it. You're running AI locally.


Step 4: Use It Like an API (2 minutes)

For automation, use Ollama's REST API:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Write a Python function to check if a number is prime",
  "stream": false
}'

Response:

{
  "response": "def is_prime(n):\n    if n < 2:\n        return False\n    for i in range(2, int(n**0.5) + 1):\n        if n % i == 0:\n            return False\n    return True"
}

Python Integration

import requests
import json

def ask_ollama(prompt):
    response = requests.post('http://localhost:11434/api/generate',
        json={
            "model": "llama3.2",
            "prompt": prompt,
            "stream": False
        })
    return response.json()['response']

# Use it
code = ask_ollama("Write a bash script to monitor disk usage")
print(code)

Real-World Use Cases by Role

For DevOps/SRE Engineers

Analyze error logs:

ollama run llama3.2 "Analyze this error and suggest a fix:
FATAL: database connection failed
Error: ECONNREFUSED 127.0.0.1:5432"

Generate Terraform:

ollama run codellama "Generate Terraform code for an AWS S3 bucket
with versioning enabled and public access blocked"

For Security Engineers

Analyze security logs (privately):

cat auth.log | ollama run llama3.2 "Summarize suspicious login attempts
and identify potential brute force attacks"

Generate security policies:

ollama run llama3.2 "Create an AWS IAM policy for read-only S3 access"

For QA/Test Engineers

Generate test data:

import requests

def generate_test_users(count=10):
    prompt = f"Generate {count} realistic test user records in JSON format"
    response = requests.post('http://localhost:11434/api/generate',
        json={"model": "llama3.2", "prompt": prompt, "stream": False})
    return response.json()['response']

print(generate_test_users(5))

For Developers

Code review:

ollama run codellama "Review this function for bugs:
$(cat my_function.py)"

Documentation generation:

ollama run llama3.2 "Generate API documentation for this endpoint:
$(cat routes/api.js)"

Advanced: Model Customization

Create a specialized model for your domain:

# Create a Modelfile
cat > Modelfile << 'EOF'
FROM llama3.2

# Set temperature (creativity)
PARAMETER temperature 0.3

# System prompt
SYSTEM You are an expert DevOps engineer specializing in Kubernetes
and AWS. Provide concise, production-ready solutions with code examples.
EOF

# Build custom model
ollama create devops-expert -f Modelfile

# Use it
ollama run devops-expert "How do I set up auto-scaling in EKS?"

Performance Tips

1. Choose the Right Model Size

ModelSizeRAM NeededSpeedQuality
phi32.3GB8GB⚡⚡⚡⭐⭐⭐
mistral4.1GB8GB⚡⚡⭐⭐⭐⭐
llama3.24.7GB16GB⚡⚡⭐⭐⭐⭐⭐
llama3.2:70b40GB64GB⭐⭐⭐⭐⭐

2. GPU Acceleration

If you have an NVIDIA GPU:

# Ollama auto-detects GPU
ollama run llama3.2
# Check GPU usage: nvidia-smi

3. Multiple Models Running

# Terminal 1
ollama serve

# Terminal 2
ollama run llama3.2

# Terminal 3
ollama run codellama

Troubleshooting

"Model not found"

ollama list  # See installed models
ollama pull llama3.2  # Download if missing

"Out of memory"

Use a smaller model:

ollama pull phi3  # Only 2.3GB

"Slow responses"

Lower temperature for faster inference:

ollama run llama3.2
>>> /set parameter temperature 0.1

"Can't connect to API"

Start the server:

ollama serve

Comparison: Ollama vs Cloud LLMs

FeatureOllama (Local)OpenAI API
CostFree forever$0.002/1K tokens
Privacy100% localData sent to OpenAI
SpeedDepends on hardware~2-5 seconds
InternetNot neededRequired
QualityVery goodExcellent
Best forExperimentation, privacyProduction at scale

When to use Ollama:

  • ✅ Development and testing
  • ✅ Sensitive data processing
  • ✅ Learning AI without cost
  • ✅ Offline environments
  • ✅ Prototyping

When to use OpenAI:

  • ✅ Production applications
  • ✅ Need GPT-4 level quality
  • ✅ High-volume requests
  • ✅ Multimodal (vision, audio)

Next Steps

You now have a local LLM running. Here's what to explore next:

1. Build a RAG System

Combine Ollama with a vector database to chat with your documentation. 👉 Tutorial: Build a RAG System

2. Integrate with LangChain

from langchain.llms import Ollama

llm = Ollama(model="llama3.2")
response = llm("Explain Docker in one sentence")
print(response)

3. Create Custom Tools

# AI-powered log analyzer
alias analyze-logs="cat /var/log/app.log | ollama run llama3.2 'Summarize errors:'"

# Commit message generator
alias commit-msg="git diff | ollama run codellama 'Generate a commit message:'"

4. Run Multiple Models

ollama pull mistral      # Fast responses
ollama pull codellama    # Code tasks
ollama pull llama3.2     # General purpose

Summary

In 10 minutes, you:

  • ✅ Installed Ollama
  • ✅ Downloaded and ran a ChatGPT-quality model
  • ✅ Integrated it via API and Python
  • ✅ Saw practical examples for your role
  • ✅ Learned customization and optimization

Key takeaways:

  • Local LLMs are production-ready in 2025
  • No API keys, no costs, no vendor lock-in
  • Perfect for experimentation and sensitive data
  • Ollama makes it as easy as Docker

Want to Go Deeper?

This is just scratching the surface. In our AI Literacy for Engineers training, you'll learn:

  • Build production RAG systems with Ollama
  • Create AI agents that automate your workflows
  • Deploy LLMs with Docker and Kubernetes
  • Cost optimization: when to use local vs cloud
  • Security best practices for AI systems

Next cohort: November 2 & 9, 2025 Format: 2 Saturdays, 9am-1:30pm (Hybrid) Who: ALL engineers—no AI background needed

Register for the Next Cohort →


Resources


Questions? Drop them in the comments or contact us.

Found this helpful? Share it with your team and subscribe for weekly AI tips.

Ready to Become AI-Literate?

Join our 2-week hands-on training and go from curious to confident with AI.