SOLID

chroma — Does the "4 Functions" Vector Database Actually Work Locally?

Name: chroma — Does the "4 Functions" Vector Database Actually Work Locally?
Item: chroma
Rating: 6
Author: Balaji Loganathan

github.com/chroma-core/chroma ★ 26000 stars

Claim tested

An open-source vector database claiming a 4-function API with automatic embedding and fully local operation. Tested in-memory and persistent modes with no API key. Core claim verified. One undocumented behaviour: first run silently downloads a 79.3MB embedding model.

Criteria Scorecard

Criterion	Score
install_works	true
claim_testable	true
readme_accurate	true
creator_notified	false
errors_documented	true
claim_tested_clean_env	true
verdict_matches_evidence	true

Display this badge

Markdown

[![RepoVerifier: SOLID](https://repoverifier.dev/badges/solid.svg)](https://repoverifier.dev/reviews/chroma-core-chroma)

HTML

<a href="https://repoverifier.dev/reviews/chroma-core-chroma"><img src="https://repoverifier.dev/badges/solid.svg" alt="RepoVerifier: SOLID" height="20"></a>

Paste this in your repo’s README. Links back to the full review.

Environment

osmacOS

ram24GB

machineMacBook Pro 14-inch M4 Pro

modes_testedin-memory, persistent, metadata filtering

test_accountfresh macOS user, no prior Python environment

python_version3.13.5

chromadb_version1.5.9

Full Review

What This Repo Claims

An open-source vector database with a 4-function API — add documents, query, get, delete. Chroma handles
tokenization, embedding, and indexing automatically. No API key required. Runs fully locally.

The core promise: semantic search in a few lines of Python, with no external services.

Two modes:

In-memory (ephemeral, for prototyping)

Persistent (survives restarts, production-ready locally)

What I Tested

Environment:

macOS, MacBook Pro 14-inch M4 Pro, 24GB RAM

Python 3.13.5, pip 25.3

chromadb 1.5.9

Fresh macOS user — no prior Python environment

Install:

pip3 install chromadb

No errors. Clean install.

Test 1: In-memory client

import chromadb

client = chromadb.Client()
collection = client.create_collection("test-docs")
collection.add(
    documents=[
        "Python is a high-level programming language",
        "JavaScript runs in the browser and on servers",
        "PostgreSQL is a relational database system",
        "Redis is an in-memory data structure store",
        "Docker containers package applications and dependencies"
    ],
    ids=["doc1", "doc2", "doc3", "doc4", "doc5"]
)
results = collection.query(
    query_texts=["what database stores data in memory?"],
    n_results=2
)

Results:

Top result: "Redis is an in-memory data structure store" ✅

Second result: "PostgreSQL is a relational database system" ✅

Semantic ranking is correct. Redis ranked above PostgreSQL for an in-memory query despite neither containing the exact query phrase.

Test 2: Topic switching

Query: "frontend web development"
Top result: "JavaScript runs in the browser and on servers" ✅

Test 3: Persistent client

client = chromadb.PersistentClient(path="./chroma-data")
collection = client.get_or_create_collection("persistent-docs")
collection.add(
    documents=["This document will survive restart"],
    ids=["persist1"]
)

# Simulate restart
client2 = chromadb.PersistentClient(path="./chroma-data")
collection2 = client2.get_or_create_collection("persistent-docs")
print(collection2.count())  # → 1

Documents survive client restart. Storage: SQLite + UUID folder in the specified path.

Test 4: Metadata filtering

collection.add(
    documents=["Python tutorial for beginners",
               "Advanced Python patterns"],
    metadatas=[{"level": "beginner"}, {"level": "advanced"}],
    ids=["meta1", "meta2"]
)
results = collection.query(
    query_texts=["Python"],
    n_results=2,
    where={"level": "beginner"}
)
# → "Python tutorial for beginners" ✅

Metadata filtering works correctly alongside semantic search.

Finding: Silent Model Download on First Run

On first use, Chroma automatically downloads the
all-MiniLM-L6-v2 embedding model (79.3MB):

/Users/repoverifiertest/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 
100%|████| 79.3M/79.3M [00:01<00:00, 61.6MiB/s]

The README states "we handle tokenization, embedding, and indexing automatically" — accurate, but does not mention the model download. Developers on slow connections or restricted environments should be aware.

After the first download, subsequent runs use the cached model with no network calls.

What I Did Not Test

Client-server mode (separate chroma server process)

Chroma Cloud (hosted service — requires account)

JavaScript/TypeScript client

Collections with 10k+ documents (performance at scale)

Docker deployment

This review covers the Python SDK in local mode — the most common starting point for developers building RAG prototypes.

Verdict: Solid

Install works. The 4-function claim is real. Semantic search returns correct results without any embedding configuration. Persistence works out of the box.

The silent model download is worth knowing before you demo in a restricted environment. Everything else works exactly as documented.

26k stars. Downloaded 11M times a month per the project website. The attention is deserved — this is genuinely the fastest path to local semantic search in Python.

Worth installing if you are building anything that needs to search documents by meaning rather than keywords.

This review follows RepoVerifier Standard v1.0. Read the standard →