Local Zero-Cost AI Agent Stack
Build a production AI agent that answers questions about your documents — fully local, zero API cost, no cloud services. Built for solo developers and founders who want AI capabilities without recurring LLM costs.
BUILD PROMPT
Paste into Claude, ChatGPT, or Cursor to start building.
AI-Guided Setup Prompt — Local Zero-Cost AI Agent Stack
Paste this entire prompt into any AI tool like (Claude, ChatGPT, Cursor, Gemini) to get step-by-step guidance building a fully local AI agent with zero API cost.
---
REFERENCE IMPLEMENTATION:
A working, verified version of this entire stack is available at:
https://github.com/devopsballog25-pixel/local-ai-agent-starter
Fork that repo and follow along, or build from scratch. The AI guiding you should reference that repo if anything is unclear or if a step fails.
---
You are a senior developer helping me build a local AI agent from scratch using the Local Zero-Cost AI Agent Stack verified
at repoverifier.dev/solutions/local-zero-cost-ai-agent.
The reference implementation is at:
https://github.com/devopsballog25-pixel/local-ai-agent-starter
I want to build an AI agent that answers questions about my own documents — fully local, zero API cost, no cloud services,
no API keys required.
Guide me one step at a time. Wait for my confirmation before moving to the next step. If something fails, help me debug
it before continuing. Reference the repo above if I get stuck.
---
STACK:
- ollama — runs the LLM locally (llama3.2 model)
- litellm — unified interface to call ollama
- llama_index — handles document ingestion and RAG retrieval
- chroma — stores document embeddings locally (persistent)
All four components have been independently verified as SOLID at repoverifier.dev. This is not a theoretical stack — it has
been tested end-to-end on macOS M4 Pro with Python 3.12.
---
KNOWN GOTCHAS — warn me about these before each relevant step:
1. Python version: litellm requires Python <3.14.
Use Python 3.12 specifically. Check before anything else.
2. Virtual environment: macOS blocks system pip (PEP 668).
Always create .venv with Python 3.12 before installing.
3. OpenAI default: llama_index defaults to OpenAI embeddings.
Without overriding this it throws:
"ValueError: No API key found for OpenAI"
Must set HuggingFaceEmbedding explicitly in every file.
4. Model download: First run downloads BAAI/bge-small-en-v1.5
(~90MB embedding model). This is expected and happens once.
5. Chroma data folder: Add data/ to .gitignore — embeddings
should not be committed to git.
---
STEP 1 — Verify prerequisites
Ask me to run these three commands and share the output:
python3.12 --version
ollama --version
curl http://localhost:11434/api/tags
If python3.12 is missing:
Tell me: brew install [email protected]
If ollama is missing:
Tell me: go to https://ollama.com and install it
Then: ollama pull llama3.2
If llama3.2 is not in the model list:
Tell me: ollama pull llama3.2
Do not move to Step 2 until all three pass.
---
STEP 2 — Project setup
Tell me to run:
mkdir my-ai-agent && cd my-ai-agent
python3.12 -m venv .venv
source .venv/bin/activate
Verify the .venv is active (prompt should show (.venv)).
Do not move to Step 3 until confirmed.
---
STEP 3 — Install dependencies
Tell me to run exactly these commands in order:
pip install llama-index==0.14.21
pip install llama-index-vector-stores-chroma
pip install llama-index-llms-litellm
pip install llama-index-embeddings-huggingface
pip install chromadb==1.5.9
pip install litellm==1.83.14
Warn me before this step: the HuggingFace package is required to avoid the OpenAI API key error — do not skip it.
Wait for all installs to succeed before moving on.
---
STEP 4 — Add my documents
Tell me to run:
mkdir docs
Tell me to copy my own .md, .txt, or .pdf files into the docs folder. These are the documents the agent will
answer questions about.
If I have no documents, tell me to use the sample ones from the reference repo:
https://github.com/devopsballog25-pixel/local-ai-agent-starter/tree/main/docs
Wait for me to confirm files are in the docs folder.
---
STEP 5 — Create ingest.py
Give me the exact code for ingest.py based on the reference implementation at:
https://github.com/devopsballog25-pixel/local-ai-agent-starter/blob/main/ingest.py
The critical requirements:
- Must set HuggingFaceEmbedding BEFORE any llama_index import
- Must use chromadb.PersistentClient(path="./data/chroma")
- Must print: "Ingested N documents from ./docs"
Tell me to run: python ingest.py
Expected: "Ingested 2 documents from ./docs" (or however many)
If it fails with OpenAI error: remind me about HuggingFaceEmbedding
If ollama is not running: tell me to run ollama serve in a separate terminal
---
STEP 6 — Create agent.py
Give me the exact code for agent.py based on the reference implementation at:
https://github.com/devopsballog25-pixel/local-ai-agent-starter/blob/main/agent.py
The critical requirements:
- Must set HuggingFaceEmbedding at the top (same as ingest.py)
- Must configure litellm to use ollama:
from llama_index.llms.litellm import LiteLLM
llm = LiteLLM(
model="ollama/llama3.2",
api_base="http://localhost:11434"
)
Settings.llm = llm
- Must load existing chroma index (not re-ingest)
- Must take question as CLI argument:
python agent.py "Your question here"
Tell me to test with a question about my documents.
Wait for a successful answer before moving on.
---
STEP 7 — Run 3 verification tests
Tell me to run these three tests:
Test 1 — question answered by my documents:
python agent.py "[question about something in my docs]"
Expected: accurate answer from document content
Test 2 — different question answered by my documents:
python agent.py "[another question about my docs]"
Expected: accurate answer from document content
Test 3 — question NOT in my documents:
python agent.py "What is the capital of France?"
Expected: agent says it cannot find relevant information
If it hallucinates an answer: explain why and how to fix
All 3 must pass before we continue.
---
STEP 8 — Cleanup files
Give me a .gitignore containing:
data/
.venv/
__pycache__/
*.pyc
.env
.DS_Store
Give me a README.md that includes:
- What this agent does
- Prerequisites (Python 3.12, ollama, llama3.2)
- Setup: venv, pip install, ingest, query
- How to add new documents
- Link to repoverifier.dev/solutions/local-zero-cost-ai-agent
---
STEP 9 — Push to GitHub
Tell me to run:
git init
git add .
git commit -m "initial: local zero-cost AI agent"
git branch -M main
git remote add origin <my-github-repo-url>
git push -u origin main
---
You are done. I now have a working local AI agent that:
- Answers questions about my documents
- Runs entirely on my machine
- Uses zero API budget
- Can be extended with any documents I add to /docs
For questions or issues, reference:
https://github.com/devopsballog25-pixel/local-ai-agent-starter
Architecture
Dev Tools
- andrej-karpathy-skills — Coding behavior — stops Claude Code from making silent assumptions
- everything-claude-code (ECC) — Agent harness — 68 skills, 53 agents, 26 hooks
- gstack — Development workflow — /plan-eng-review, /feature-dev, /review, /qa, /ship
- litellm — LLM routing — unified interface across 100+ providers including ollama
- ollama — Local LLM runner — run llama3.2 and 100+ models with no API key
- llama_index — RAG pipeline — document ingestion, chunking, embedding and retrieval
- chroma — Local vector store — persistent embedding storage, no server required
Infrastructure
- Local machine — Local infrastructure
Stack Components
| Tool / Service | Type | Role | Verdict |
|---|---|---|---|
| andrej-karpathy-skills★ 93.9k | Dev Tool | Coding behavior — stops Claude Code from making silent assumptions | SOLID |
| everything-claude-code (ECC)★ 172.0k | Dev Tool | Agent harness — 68 skills, 53 agents, 26 hooks | SOLID |
| gstack★ 88.4k | Dev Tool | Development workflow — /plan-eng-review, /feature-dev, /review, /qa, /ship | SOLID |
| litellm★ 45.4k | Dev Tool | LLM routing — unified interface across 100+ providers including ollama | SOLID |
| ollama★ 171.0k | Dev Tool | Local LLM runner — run llama3.2 and 100+ models with no API key | SOLID |
| llama_index★ 49.1k | Dev Tool | RAG pipeline — document ingestion, chunking, embedding and retrieval | SOLID |
| chroma★ 26.0k | Dev Tool | Local vector store — persistent embedding storage, no server required | SOLID |
| Local machine | Service | Local infrastructure |
Proof it works
Built with this exact stack: local-ai-agent-starter demo repo