← All Reviews
SOLID

chroma — Does the "4 Functions" Vector Database Actually Work Locally?

Claim tested

An open-source vector database claiming a 4-function API with automatic embedding and fully local operation. Tested in-memory and persistent modes with no API key. Core claim verified. One undocumented behaviour: first run silently downloads a 79.3MB embedding model.

Criteria Scorecard

CriterionScore
install_workstrue
claim_testabletrue
readme_accuratetrue
creator_notifiedfalse
errors_documentedtrue
claim_tested_clean_envtrue
verdict_matches_evidencetrue

Display this badge

RepoVerifier: SOLID
[![RepoVerifier: SOLID](https://repoverifier.dev/badges/solid.svg)](https://repoverifier.dev/reviews/chroma-core-chroma)
<a href="https://repoverifier.dev/reviews/chroma-core-chroma"><img src="https://repoverifier.dev/badges/solid.svg" alt="RepoVerifier: SOLID" height="20"></a>

Paste this in your repo’s README. Links back to the full review.

Environment

osmacOS
ram24GB
machineMacBook Pro 14-inch M4 Pro
modes_testedin-memory, persistent, metadata filtering
test_accountfresh macOS user, no prior Python environment
python_version3.13.5
chromadb_version1.5.9

Full Review

What This Repo Claims



An open-source vector database with a 4-function API — add documents, query, get, delete. Chroma handles
tokenization, embedding, and indexing automatically. No API key required. Runs fully locally.

The core promise: semantic search in a few lines of Python, with no external services.

Two modes:
  • In-memory (ephemeral, for prototyping)

  • Persistent (survives restarts, production-ready locally)


What I Tested



Environment:
  • macOS, MacBook Pro 14-inch M4 Pro, 24GB RAM

  • Python 3.13.5, pip 25.3

  • chromadb 1.5.9

  • Fresh macOS user — no prior Python environment


Install:
pip3 install chromadb

No errors. Clean install.

Test 1: In-memory client
import chromadb

client = chromadb.Client()
collection = client.create_collection("test-docs")
collection.add(
    documents=[
        "Python is a high-level programming language",
        "JavaScript runs in the browser and on servers",
        "PostgreSQL is a relational database system",
        "Redis is an in-memory data structure store",
        "Docker containers package applications and dependencies"
    ],
    ids=["doc1", "doc2", "doc3", "doc4", "doc5"]
)
results = collection.query(
    query_texts=["what database stores data in memory?"],
    n_results=2
)


Results:
  • Top result: "Redis is an in-memory data structure store" ✅

  • Second result: "PostgreSQL is a relational database system" ✅


Semantic ranking is correct. Redis ranked above PostgreSQL for an in-memory query despite neither containing the exact query phrase.

Test 2: Topic switching

Query: "frontend web development"
Top result: "JavaScript runs in the browser and on servers" ✅

Test 3: Persistent client
client = chromadb.PersistentClient(path="./chroma-data")
collection = client.get_or_create_collection("persistent-docs")
collection.add(
    documents=["This document will survive restart"],
    ids=["persist1"]
)

# Simulate restart
client2 = chromadb.PersistentClient(path="./chroma-data")
collection2 = client2.get_or_create_collection("persistent-docs")
print(collection2.count())  # → 1


Documents survive client restart. Storage: SQLite + UUID folder in the specified path.

Test 4: Metadata filtering
collection.add(
    documents=["Python tutorial for beginners",
               "Advanced Python patterns"],
    metadatas=[{"level": "beginner"}, {"level": "advanced"}],
    ids=["meta1", "meta2"]
)
results = collection.query(
    query_texts=["Python"],
    n_results=2,
    where={"level": "beginner"}
)
# → "Python tutorial for beginners" ✅


Metadata filtering works correctly alongside semantic search.

Finding: Silent Model Download on First Run



On first use, Chroma automatically downloads the
all-MiniLM-L6-v2 embedding model (79.3MB):

/Users/repoverifiertest/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 
100%|████| 79.3M/79.3M [00:01<00:00, 61.6MiB/s]


The README states "we handle tokenization, embedding, and indexing automatically" — accurate, but does not mention the model download. Developers on slow connections or restricted environments should be aware.

After the first download, subsequent runs use the cached model with no network calls.

What I Did Not Test



  • Client-server mode (separate chroma server process)

  • Chroma Cloud (hosted service — requires account)

  • JavaScript/TypeScript client

  • Collections with 10k+ documents (performance at scale)

  • Docker deployment


This review covers the Python SDK in local mode — the most common starting point for developers building RAG prototypes.

Verdict: Solid



Install works. The 4-function claim is real. Semantic search returns correct results without any embedding configuration. Persistence works out of the box.

The silent model download is worth knowing before you demo in a restricted environment. Everything else works exactly as documented.

26k stars. Downloaded 11M times a month per the project website. The attention is deserved — this is genuinely the fastest path to local semantic search in Python.

Worth installing if you are building anything that needs to search documents by meaning rather than keywords.
This review follows RepoVerifier Standard v1.0. Read the standard →