← All Stacks

Web Intelligence Agent Stack

Answer questions about any website or documentation without paid APIs or cloud services. Point it at any static web page and ask questions in plain English — fully local, zero cost per query.

BUILD PROMPT

Paste into Claude, ChatGPT, or Cursor to start building.

AI-Executable Build Brief

AI-Guided Setup Prompt — Web Intelligence Agent Stack

Paste this entire prompt into any AI tool (Claude, ChatGPT, Cursor, Gemini) to get step-by-step guidance building a local web intelligence agent with zero API cost.

---

Demo App

vibe-coding-price-compare — a working app built on this stack.

Scrapes Cursor, Windsurf, GitHub Copilot, Codeium, and Tabnine pricing pages and answers questions in plain English using local AI.

  • Repo: https://github.com/devopsballog25-pixel/vibe-coding-price-compare
  • Video: https://youtu.be/z2RCak111y0

Fork it, run it, and use it as a starting point for your own web intelligence agent.

---

REFERENCE IMPLEMENTATION:

Verified stack components:

  • ScrapeGraphAI: repoverifier.dev/reviews/scrapegraphai-scrapegraph-ai
  • ollama: repoverifier.dev/reviews/ollama-ollama

---

You are a senior developer helping me build a local web intelligence agent using the Web Intelligence Agent Stack verified at repoverifier.dev/solutions/web-intelligence-agent-stack.

I want to build an agent that fetches any web page and extracts structured data from it using natural language prompts — fully local, zero API cost, no cloud services, no API keys required.

Guide me one step at a time. Wait for my confirmation before moving to the next step. If something fails, help me debug it before continuing.

---

STACK:

  • ollama — runs the LLM locally (llama3.1:8b model)
  • nomic-embed-text — embeddings model for chunking and RAG (via ollama)
  • ScrapeGraphAI — fetches web pages and extracts structured JSON using natural language prompts
  • Playwright — headless browser for fetching web content (installed via ScrapeGraphAI)

All components have been independently verified as SOLID at repoverifier.dev. Tested end-to-end on macOS Apple Silicon with Python 3.13.

---

KNOWN GOTCHAS — warn me about these before each relevant step:

1. Model size matters: llama3.2 (2GB) produces unreliable results on real web pages.

Use llama3.1:8b (4.9GB) as the minimum recommended model.

2. Two ollama models required: both llama3.1:8b and nomic-embed-text must be pulled

before running ScrapeGraphAI. Missing either will cause a silent failure.

3. JS-rendered pages are unreliable: ScrapeGraphAI works best on static HTML pages,

Wikipedia, and documentation sites. Avoid React/Next.js SPAs.

4. Playwright install: after pip install scrapegraphai, you must run

python3 -m playwright install chromium separately. Not automatic.

5. Telemetry enabled by default: set SCRAPEGRAPHAI_TELEMETRY_ENABLED=false

in your environment to opt out.

---

STEP 1 — Verify prerequisites

Ask me to run these commands and share the output:

python3 --version

ollama --version

ollama list

Required:

  • Python 3.10 or higher
  • ollama installed and running
  • llama3.1:8b in the model list
  • nomic-embed-text in the model list

If ollama is missing: go to https://ollama.com and install it.

If models are missing, tell me to run:

ollama pull llama3.1:8b

ollama pull nomic-embed-text

Do not move to Step 2 until all prerequisites pass.

---

STEP 2 — Project setup

Tell me to run:

mkdir web-intelligence-agent && cd web-intelligence-agent

python3 -m venv .venv

source .venv/bin/activate

Verify the .venv is active (prompt should show (.venv)).

---

STEP 3 — Install dependencies

Tell me to run in order:

pip install scrapegraphai

python3 -m playwright install chromium

Warn me: Playwright must be installed separately after scrapegraphai — do not skip it.

Wait for both installs to succeed.

---

STEP 4 — Create scraper.py

Give me this exact code:

import json
from scrapegraphai.graphs import SmartScraperGraph

graph_config = {
    "llm": {
        "model": "ollama/llama3.1:8b",
        "model_tokens": 8192,
        "format": "json",
        "base_url": "http://localhost:11434",
    },
    "embeddings": {
        "model": "ollama/nomic-embed-text",
        "base_url": "http://localhost:11434",
    },
    "headless": True,
}

url = input("Enter URL to scrape: ")
prompt = input("What do you want to extract? ")

result = SmartScraperGraph(
    prompt=prompt,
    source=url,
    config=graph_config
).run()

print("\n--- RESULT ---")
print(json.dumps(result, indent=4))

Tell me to run:

python3 scraper.py

Test with:

URL: https://en.wikipedia.org/wiki/Python_(programming_language)

Prompt: Extract the main title, summary, and list all section headings

Expected: structured JSON with title, summary, and section headings.

---

STEP 5 — Verify it works

Tell me to test with a second URL:

URL: https://example.com

Prompt: Extract the domain name, main heading, and description

Expected output:

{
  "domain_name": "example.com",
  "main_heading": "Example Domain",
  "description": "This domain is for use in documentation examples without needing permission."
}

If both tests pass, the stack is working correctly.

Architecture

DEV TOOLS INFRASTRUCTURE ScrapeGraphAI ★ 23.0k ollama dev tool nomic-embed-text dev tool Local machine infrastructure YOUR SaaS APP

Dev Tools

  • ScrapeGraphAI — Fetches web pages and extracts structured JSON using natural language prompts — no CSS selectors needed
  • ollama — Runs llama3.1:8b locally — minimum recommended model for reliable extraction
  • nomic-embed-text — Embeddings model required by ScrapeGraphAI for chunking and semantic search — pull via ollama

Infrastructure

  • Local machine — Local infrastructure

Stack Components

Tool / Service Type Role Verdict
ScrapeGraphAI★ 23.0k Dev Tool Fetches web pages and extracts structured JSON using natural language prompts — no CSS selectors needed SOLID
ollama Dev Tool Runs llama3.1:8b locally — minimum recommended model for reliable extraction SOLID
nomic-embed-text Dev Tool Embeddings model required by ScrapeGraphAI for chunking and semantic search — pull via ollama
Local machine Service Local infrastructure

Proof it works

Built with this exact stack: Tested on macOS Apple Silicon. Built vibe-coding-price-compare — a local app that scrapes Cursor, Windsurf, GitHub Copilot, Codeium, and Tabnine pricing pages and answers questions in plain English. All 5 tools scraped successfully. Zero API keys. Runs entirely on ollama/llama3.1:8b locally.