Web Intelligence Agent Stack
Answer questions about any website or documentation without paid APIs or cloud services. Point it at any static web page and ask questions in plain English — fully local, zero cost per query.
BUILD PROMPT
Paste into Claude, ChatGPT, or Cursor to start building.
AI-Guided Setup Prompt — Web Intelligence Agent Stack
Paste this entire prompt into any AI tool (Claude, ChatGPT, Cursor, Gemini) to get step-by-step guidance building a local web intelligence agent with zero API cost.
---
Demo App
vibe-coding-price-compare — a working app built on this stack.
Scrapes Cursor, Windsurf, GitHub Copilot, Codeium, and Tabnine pricing pages and answers questions in plain English using local AI.
- Repo: https://github.com/devopsballog25-pixel/vibe-coding-price-compare
- Video: https://youtu.be/z2RCak111y0
Fork it, run it, and use it as a starting point for your own web intelligence agent.
---
REFERENCE IMPLEMENTATION:
Verified stack components:
- ScrapeGraphAI: repoverifier.dev/reviews/scrapegraphai-scrapegraph-ai
- ollama: repoverifier.dev/reviews/ollama-ollama
---
You are a senior developer helping me build a local web intelligence agent using the Web Intelligence Agent Stack verified at repoverifier.dev/solutions/web-intelligence-agent-stack.
I want to build an agent that fetches any web page and extracts structured data from it using natural language prompts — fully local, zero API cost, no cloud services, no API keys required.
Guide me one step at a time. Wait for my confirmation before moving to the next step. If something fails, help me debug it before continuing.
---
STACK:
- ollama — runs the LLM locally (llama3.1:8b model)
- nomic-embed-text — embeddings model for chunking and RAG (via ollama)
- ScrapeGraphAI — fetches web pages and extracts structured JSON using natural language prompts
- Playwright — headless browser for fetching web content (installed via ScrapeGraphAI)
All components have been independently verified as SOLID at repoverifier.dev. Tested end-to-end on macOS Apple Silicon with Python 3.13.
---
KNOWN GOTCHAS — warn me about these before each relevant step:
1. Model size matters: llama3.2 (2GB) produces unreliable results on real web pages.
Use llama3.1:8b (4.9GB) as the minimum recommended model.
2. Two ollama models required: both llama3.1:8b and nomic-embed-text must be pulled
before running ScrapeGraphAI. Missing either will cause a silent failure.
3. JS-rendered pages are unreliable: ScrapeGraphAI works best on static HTML pages,
Wikipedia, and documentation sites. Avoid React/Next.js SPAs.
4. Playwright install: after pip install scrapegraphai, you must run
python3 -m playwright install chromium separately. Not automatic.
5. Telemetry enabled by default: set SCRAPEGRAPHAI_TELEMETRY_ENABLED=false
in your environment to opt out.
---
STEP 1 — Verify prerequisites
Ask me to run these commands and share the output:
python3 --version
ollama --version
ollama list
Required:
- Python 3.10 or higher
- ollama installed and running
- llama3.1:8b in the model list
- nomic-embed-text in the model list
If ollama is missing: go to https://ollama.com and install it.
If models are missing, tell me to run:
ollama pull llama3.1:8b
ollama pull nomic-embed-text
Do not move to Step 2 until all prerequisites pass.
---
STEP 2 — Project setup
Tell me to run:
mkdir web-intelligence-agent && cd web-intelligence-agent
python3 -m venv .venv
source .venv/bin/activate
Verify the .venv is active (prompt should show (.venv)).
---
STEP 3 — Install dependencies
Tell me to run in order:
pip install scrapegraphai
python3 -m playwright install chromium
Warn me: Playwright must be installed separately after scrapegraphai — do not skip it.
Wait for both installs to succeed.
---
STEP 4 — Create scraper.py
Give me this exact code:
import json
from scrapegraphai.graphs import SmartScraperGraph
graph_config = {
"llm": {
"model": "ollama/llama3.1:8b",
"model_tokens": 8192,
"format": "json",
"base_url": "http://localhost:11434",
},
"embeddings": {
"model": "ollama/nomic-embed-text",
"base_url": "http://localhost:11434",
},
"headless": True,
}
url = input("Enter URL to scrape: ")
prompt = input("What do you want to extract? ")
result = SmartScraperGraph(
prompt=prompt,
source=url,
config=graph_config
).run()
print("\n--- RESULT ---")
print(json.dumps(result, indent=4))
Tell me to run:
python3 scraper.py
Test with:
URL: https://en.wikipedia.org/wiki/Python_(programming_language)
Prompt: Extract the main title, summary, and list all section headings
Expected: structured JSON with title, summary, and section headings.
---
STEP 5 — Verify it works
Tell me to test with a second URL:
URL: https://example.com
Prompt: Extract the domain name, main heading, and description
Expected output:
{
"domain_name": "example.com",
"main_heading": "Example Domain",
"description": "This domain is for use in documentation examples without needing permission."
}
If both tests pass, the stack is working correctly.
Architecture
Dev Tools
- ScrapeGraphAI — Fetches web pages and extracts structured JSON using natural language prompts — no CSS selectors needed
- ollama — Runs llama3.1:8b locally — minimum recommended model for reliable extraction
- nomic-embed-text — Embeddings model required by ScrapeGraphAI for chunking and semantic search — pull via ollama
Infrastructure
- Local machine — Local infrastructure
Stack Components
| Tool / Service | Type | Role | Verdict |
|---|---|---|---|
| ScrapeGraphAI★ 23.0k | Dev Tool | Fetches web pages and extracts structured JSON using natural language prompts — no CSS selectors needed | SOLID |
| ollama | Dev Tool | Runs llama3.1:8b locally — minimum recommended model for reliable extraction | SOLID |
| nomic-embed-text | Dev Tool | Embeddings model required by ScrapeGraphAI for chunking and semantic search — pull via ollama | |
| Local machine | Service | Local infrastructure |