← All Stacks

Local Zero-Cost AI Agent Stack

Build a production AI agent that answers questions about your documents — fully local, zero API cost, no cloud services. Built for solo developers and founders who want AI capabilities without recurring LLM costs.

BUILD PROMPT

Paste into Claude, ChatGPT, or Cursor to start building.

AI-Executable Build Brief

AI-Guided Setup Prompt — Local Zero-Cost AI Agent Stack

Paste this entire prompt into any AI tool like (Claude, ChatGPT, Cursor, Gemini) to get step-by-step guidance building a fully local AI agent with zero API cost.

---

REFERENCE IMPLEMENTATION:

A working, verified version of this entire stack is available at:

https://github.com/devopsballog25-pixel/local-ai-agent-starter

Fork that repo and follow along, or build from scratch. The AI guiding you should reference that repo if anything is unclear or if a step fails.

---

You are a senior developer helping me build a local AI agent from scratch using the Local Zero-Cost AI Agent Stack verified

at repoverifier.dev/solutions/local-zero-cost-ai-agent.

The reference implementation is at:

https://github.com/devopsballog25-pixel/local-ai-agent-starter

I want to build an AI agent that answers questions about my own documents — fully local, zero API cost, no cloud services,

no API keys required.

Guide me one step at a time. Wait for my confirmation before moving to the next step. If something fails, help me debug

it before continuing. Reference the repo above if I get stuck.

---

STACK:

  • ollama — runs the LLM locally (llama3.2 model)
  • litellm — unified interface to call ollama
  • llama_index — handles document ingestion and RAG retrieval
  • chroma — stores document embeddings locally (persistent)

All four components have been independently verified as SOLID at repoverifier.dev. This is not a theoretical stack — it has

been tested end-to-end on macOS M4 Pro with Python 3.12.

---

KNOWN GOTCHAS — warn me about these before each relevant step:

1. Python version: litellm requires Python <3.14.

Use Python 3.12 specifically. Check before anything else.

2. Virtual environment: macOS blocks system pip (PEP 668).

Always create .venv with Python 3.12 before installing.

3. OpenAI default: llama_index defaults to OpenAI embeddings.

Without overriding this it throws:

"ValueError: No API key found for OpenAI"

Must set HuggingFaceEmbedding explicitly in every file.

4. Model download: First run downloads BAAI/bge-small-en-v1.5

(~90MB embedding model). This is expected and happens once.

5. Chroma data folder: Add data/ to .gitignore — embeddings

should not be committed to git.

---

STEP 1 — Verify prerequisites

Ask me to run these three commands and share the output:

python3.12 --version

ollama --version

curl http://localhost:11434/api/tags

If python3.12 is missing:

Tell me: brew install [email protected]

If ollama is missing:

Tell me: go to https://ollama.com and install it

Then: ollama pull llama3.2

If llama3.2 is not in the model list:

Tell me: ollama pull llama3.2

Do not move to Step 2 until all three pass.

---

STEP 2 — Project setup

Tell me to run:

mkdir my-ai-agent && cd my-ai-agent

python3.12 -m venv .venv

source .venv/bin/activate

Verify the .venv is active (prompt should show (.venv)).

Do not move to Step 3 until confirmed.

---

STEP 3 — Install dependencies

Tell me to run exactly these commands in order:

pip install llama-index==0.14.21

pip install llama-index-vector-stores-chroma

pip install llama-index-llms-litellm

pip install llama-index-embeddings-huggingface

pip install chromadb==1.5.9

pip install litellm==1.83.14

Warn me before this step: the HuggingFace package is required to avoid the OpenAI API key error — do not skip it.

Wait for all installs to succeed before moving on.

---

STEP 4 — Add my documents

Tell me to run:

mkdir docs

Tell me to copy my own .md, .txt, or .pdf files into the docs folder. These are the documents the agent will

answer questions about.

If I have no documents, tell me to use the sample ones from the reference repo:

https://github.com/devopsballog25-pixel/local-ai-agent-starter/tree/main/docs

Wait for me to confirm files are in the docs folder.

---

STEP 5 — Create ingest.py

Give me the exact code for ingest.py based on the reference implementation at:

https://github.com/devopsballog25-pixel/local-ai-agent-starter/blob/main/ingest.py

The critical requirements:

  • Must set HuggingFaceEmbedding BEFORE any llama_index import
  • Must use chromadb.PersistentClient(path="./data/chroma")
  • Must print: "Ingested N documents from ./docs"

Tell me to run: python ingest.py

Expected: "Ingested 2 documents from ./docs" (or however many)

If it fails with OpenAI error: remind me about HuggingFaceEmbedding

If ollama is not running: tell me to run ollama serve in a separate terminal

---

STEP 6 — Create agent.py

Give me the exact code for agent.py based on the reference implementation at:

https://github.com/devopsballog25-pixel/local-ai-agent-starter/blob/main/agent.py

The critical requirements:

  • Must set HuggingFaceEmbedding at the top (same as ingest.py)
  • Must configure litellm to use ollama:

from llama_index.llms.litellm import LiteLLM

llm = LiteLLM(

model="ollama/llama3.2",

api_base="http://localhost:11434"

)

Settings.llm = llm

  • Must load existing chroma index (not re-ingest)
  • Must take question as CLI argument:

python agent.py "Your question here"

Tell me to test with a question about my documents.

Wait for a successful answer before moving on.

---

STEP 7 — Run 3 verification tests

Tell me to run these three tests:

Test 1 — question answered by my documents:

python agent.py "[question about something in my docs]"

Expected: accurate answer from document content

Test 2 — different question answered by my documents:

python agent.py "[another question about my docs]"

Expected: accurate answer from document content

Test 3 — question NOT in my documents:

python agent.py "What is the capital of France?"

Expected: agent says it cannot find relevant information

If it hallucinates an answer: explain why and how to fix

All 3 must pass before we continue.

---

STEP 8 — Cleanup files

Give me a .gitignore containing:

data/

.venv/

__pycache__/

*.pyc

.env

.DS_Store

Give me a README.md that includes:

  • What this agent does
  • Prerequisites (Python 3.12, ollama, llama3.2)
  • Setup: venv, pip install, ingest, query
  • How to add new documents
  • Link to repoverifier.dev/solutions/local-zero-cost-ai-agent

---

STEP 9 — Push to GitHub

Tell me to run:

git init

git add .

git commit -m "initial: local zero-cost AI agent"

git branch -M main

git remote add origin <my-github-repo-url>

git push -u origin main

---

You are done. I now have a working local AI agent that:

  • Answers questions about my documents
  • Runs entirely on my machine
  • Uses zero API budget
  • Can be extended with any documents I add to /docs

For questions or issues, reference:

https://github.com/devopsballog25-pixel/local-ai-agent-starter

Architecture

DEV TOOLS INFRASTRUCTURE andrej-karpathy-skills ★ 93.9k everything-claude-cod… ★ 172.0k gstack ★ 88.4k litellm ★ 45.4k ollama ★ 171.0k llama_index ★ 49.1k chroma ★ 26.0k Local machine infrastructure YOUR SaaS APP

Dev Tools

  • andrej-karpathy-skills — Coding behavior — stops Claude Code from making silent assumptions
  • everything-claude-code (ECC) — Agent harness — 68 skills, 53 agents, 26 hooks
  • gstack — Development workflow — /plan-eng-review, /feature-dev, /review, /qa, /ship
  • litellm — LLM routing — unified interface across 100+ providers including ollama
  • ollama — Local LLM runner — run llama3.2 and 100+ models with no API key
  • llama_index — RAG pipeline — document ingestion, chunking, embedding and retrieval
  • chroma — Local vector store — persistent embedding storage, no server required

Infrastructure

  • Local machine — Local infrastructure

Stack Components

Tool / Service Type Role Verdict
andrej-karpathy-skills★ 93.9k Dev Tool Coding behavior — stops Claude Code from making silent assumptions SOLID
everything-claude-code (ECC)★ 172.0k Dev Tool Agent harness — 68 skills, 53 agents, 26 hooks SOLID
gstack★ 88.4k Dev Tool Development workflow — /plan-eng-review, /feature-dev, /review, /qa, /ship SOLID
litellm★ 45.4k Dev Tool LLM routing — unified interface across 100+ providers including ollama SOLID
ollama★ 171.0k Dev Tool Local LLM runner — run llama3.2 and 100+ models with no API key SOLID
llama_index★ 49.1k Dev Tool RAG pipeline — document ingestion, chunking, embedding and retrieval SOLID
chroma★ 26.0k Dev Tool Local vector store — persistent embedding storage, no server required SOLID
Local machine Service Local infrastructure

Proof it works

Built with this exact stack: local-ai-agent-starter demo repo