Don't want to self-host? Try the hosted version β€” free tier includes Neo4j, GraphRAG, and Llama 3.

Try Graffold Free β†’

Developer Documentation

From Docker to First Query in Minutes

Install, ingest, query. Everything you need to self-host Graffold.

graffold
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
v0.2.0 Turn anything into actionable knowledge
API http://localhost:8000 (granian) Memgraph bolt://localhost:7687 (memgraph@mydb) Redis localhost:6379 (sessions + cache) Auth enabled (API_AUTH_TOKEN) LLM bedrock (default)
Usage: graffold <resource> <command> [flags]
Resources: ingest PubMed, bioRxiv, PMC, PDF ingestion enrich CSV/Excel enrichment pipeline pipeline Full automated KG creation query Natural language graph queries consolidate Entity & relationship merging embeddings Vector embedding generation serve Start the API server health Check API / Memgraph / Redis status
Documentation: Quick Start docs/KNOWLEDGE_GRAPH_CREATION_GUIDE.md API Docs http://localhost:8000/docs Frontend http://localhost:5173 GitHub github.com/graffold/graffold-api
Press Ctrl+C to stop Run with --help for all options
1

Setup

Option A: Docker (recommended)

# Start the full stack
docker compose up -d

# What's running:
#   - API server on http://localhost:8000
#   - Memgraph on bolt://localhost:7687
#   - Redis for session caching

# Verify
curl http://localhost:8000/health/ready

Option B: Python package

pip install graffold

# Or with uv
uv pip install graffold

Configure .env

# .env
MEMGRAPH_URI=bolt://localhost:7687
MEMGRAPH_USER=memgraph
MEMGRAPH_PASSWORD=your_password

# LLM provider (pick one)
OLLAMA_API_URL=http://localhost:11434/api/generate    # local
# BEDROCK_MODEL_ID=meta.llama3-8b-instruct-v1:0      # cloud
# CF_API_TOKEN=your_token                              # cloudflare

# Auth
API_AUTH_TOKEN=your_secret_token
2

Ingest Data

Feed documents into the knowledge graph. The pipeline handles chunking, entity extraction, consolidation, and embedding generation.

CLI

# Ingest PDFs
graffold ingest --source pdf --files report.pdf contract.pdf --database mydb

# Ingest from any API source
graffold ingest --source api --endpoint https://your-api.com/docs --database mydb

# Ingest CSVs with column mapping
graffold ingest --source csv --files data.csv --database mydb \
  --column-handlers "0:entity-id,1:entity-name,2:properties"

# Bulk parallel ingestion
graffold ingest --source pdf --files docs/*.pdf --database mydb --parallel

REST API

curl -X POST "http://localhost:8000/v1/ingestion/jobs" \
  -H "Authorization: Bearer $API_AUTH_TOKEN" \
  -F "source=pdf" \
  -F "database=mydb" \
  -F "files=@report.pdf" \
  -F "files=@contract.pdf"

# Response:
# {"job_id": "ing_abc123", "status": "processing", "files": 2}

Python

from graffold import Client

client = Client(base_url="http://localhost:8000", token="your_token")

# Ingest PDFs
job = client.ingest(
    source="pdf",
    files=["report.pdf", "contract.pdf"],
    database="mydb"
)
print(f"Job {job.id}: {job.status}")

# Ingest from DataFrame
import pandas as pd
df = pd.read_csv("data.csv")
job = client.ingest(source="dataframe", data=df, database="mydb")

10MB per file, 50 files per batch via API. CLI has no limits. Supports PDF (vision + OCR), CSV, Excel, Parquet, and any REST API source.

3

Query

Create a session

# Create a session
curl -X POST "http://localhost:8000/v1/sessions" \
  -H "Authorization: Bearer $API_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "llm_service": "ollama",
    "database_name": "mydb",
    "agent_type": "graph_rag_agent",
    "connection_details": {
      "uri": "bolt://localhost:7687",
      "user": "memgraph",
      "password": "your_password"
    }
  }'

# Response:
# {"session_id": "sess_abc123", "created_at": "2026-04-09T09:00:00"}

Execute a query

# Basic query
curl -X POST "http://localhost:8000/v1/sessions/sess_abc123/query" \
  -H "Authorization: Bearer $API_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What connects entity A to entity B?",
    "mode": "hybrid",
    "search_depth": "balanced"
  }'

Streaming (SSE)

# Streaming query (SSE)
curl -N "http://localhost:8000/v1/sessions/sess_abc123/query" \
  -H "Authorization: Bearer $API_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"question": "Overview of risk factors", "stream": true}'

# Events:
# event: retrieval_started
# event: retrieval_complete  β†’ {"source_count": 12, "retrieval_time_ms": 340}
# event: token              β†’ {"text": "Several", "index": 0}
# event: token              β†’ {"text": " risk", "index": 1}
# ...
# event: complete           β†’ {"answer": "...", "sources": [...], "cost": {...}}

Python client (full example)

from graffold import Client

client = Client(base_url="http://localhost:8000", token="your_token")

# Create session
session = client.create_session(
    llm_service="ollama",
    database="mydb",
    connection={"uri": "bolt://localhost:7687", "user": "memgraph", "password": "..."}
)

# Query
result = session.query("What connects entity A to entity B?", mode="hybrid")
print(result.answer)
print(result.sources)  # [{doc_id, excerpt, confidence}, ...]
print(result.cost)     # {prompt_tokens, completion_tokens, estimated_usd}

# Follow-up (uses session context)
result2 = session.query("Which of those have active certifications?")

# KNN expansion
expanded = session.expand_knn(expansion_level=1)
print(expanded.entities)  # newly discovered neighbors

# Streaming
for event in session.query("Summarize all findings", stream=True):
    if event.type == "token":
        print(event.text, end="")

Query Modes

naive

Direct LLM answer, no graph retrieval

local

Entity neighborhood search

global

Community summary-based retrieval

hybrid

Merges local + global with deduplication

Search Depth

fast

Discovery only, no expansion

balanced

1 expansion round, Cypher fallback if < 3 targets

deep

Up to 3 rounds, always runs Cypher fallback

4

Expand & Explore

Discover connected entities beyond the initial answer with KNN expansion.

# Expand results with k-nearest neighbors
curl -X POST "http://localhost:8000/v1/sessions/sess_abc123/expand-knn" \
  -H "Authorization: Bearer $API_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"expansion_level": 1}'

All Endpoints

GET  /health              # Version, uptime (no auth)
GET  /health/ready         # DB + LLM connectivity check (no auth)
GET  /health/live          # Process alive (no auth)

POST /v1/sessions                          # Create session
GET  /v1/sessions/{session_id}             # Get session details
DEL  /v1/sessions/{session_id}             # Delete session

POST /v1/sessions/{session_id}/query       # Execute query
POST /v1/sessions/{session_id}/expand-knn  # KNN expansion
GET  /v1/sessions/{session_id}/stats       # Database statistics

POST /v1/ingestion/jobs                    # Start ingestion job

GET  /v1/costs                             # LLM cost summary
GET  /v1/metrics                           # Performance metrics
GET  /v1/metrics/latency                   # Query latency percentiles
GET  /v1/metrics/multi-hop                 # Multi-hop success rate
GET  /v1/metrics/community                 # Community usage stats
Contact Us