Don't want to self-host? Try the hosted version — free tier includes Neo4j, GraphRAG, and Llama 3.

Developer Documentation

From Docker to First Query in Minutes

Install, ingest, query. Everything you need to self-host Graffold.

graffold

 ██████  ██████   █████  ███████ ███████  ██████  ██      ██████
██       ██   ██ ██   ██ ██      ██      ██    ██ ██      ██   ██
██   ███ ██████  ███████ █████   █████   ██    ██ ██      ██   ██
██    ██ ██   ██ ██   ██ ██      ██      ██    ██ ██      ██   ██
 ██████  ██   ██ ██   ██ ██      ██       ██████  ███████ ██████
  v0.2.0  Turn anything into actionable knowledge
  API           http://localhost:8000  (granian)
  Memgraph      bolt://localhost:7687  (memgraph@mydb)
  Redis         localhost:6379  (sessions + cache)
  Auth          enabled  (API_AUTH_TOKEN)
  LLM           bedrock  (default)
  Usage:  graffold <resource> <command> [flags]
  Resources:
    ingest        PubMed, bioRxiv, PMC, PDF ingestion
    enrich        CSV/Excel enrichment pipeline
    pipeline      Full automated KG creation
    query         Natural language graph queries
    consolidate   Entity & relationship merging
    embeddings    Vector embedding generation
    serve         Start the API server
    health        Check API / Memgraph / Redis status
  Documentation:
    Quick Start   docs/KNOWLEDGE_GRAPH_CREATION_GUIDE.md
    API Docs      http://localhost:8000/docs
    Frontend      http://localhost:5173
    GitHub        github.com/graffold/graffold-api
  Press Ctrl+C to stop
  Run with --help for all options

Setup

Option A: Docker (recommended)

# Start the full stack
docker compose up -d

# What's running:
#   - API server on http://localhost:8000
#   - Memgraph on bolt://localhost:7687
#   - Redis for session caching

# Verify
curl http://localhost:8000/health/ready

Option B: Python package

pip install graffold

# Or with uv
uv pip install graffold

Configure .env

# .env
MEMGRAPH_URI=bolt://localhost:7687
MEMGRAPH_USER=memgraph
MEMGRAPH_PASSWORD=your_password

# LLM provider (pick one)
OLLAMA_API_URL=http://localhost:11434/api/generate    # local
# BEDROCK_MODEL_ID=meta.llama3-8b-instruct-v1:0      # cloud
# CF_API_TOKEN=your_token                              # cloudflare

# Auth
API_AUTH_TOKEN=your_secret_token

Ingest Data

Feed documents into the knowledge graph. The pipeline handles chunking, entity extraction, consolidation, and embedding generation.

CLI

# Ingest PDFs
graffold ingest --source pdf --files report.pdf contract.pdf --database mydb

# Ingest from any API source
graffold ingest --source api --endpoint https://your-api.com/docs --database mydb

# Ingest CSVs with column mapping
graffold ingest --source csv --files data.csv --database mydb \
  --column-handlers "0:entity-id,1:entity-name,2:properties"

# Bulk parallel ingestion
graffold ingest --source pdf --files docs/*.pdf --database mydb --parallel

REST API

curl -X POST "http://localhost:8000/v1/ingestion/jobs" \
  -H "Authorization: Bearer $API_AUTH_TOKEN" \
  -F "source=pdf" \
  -F "database=mydb" \
  -F "files=@report.pdf" \
  -F "files=@contract.pdf"

# Response:
# {"job_id": "ing_abc123", "status": "processing", "files": 2}

Python

from graffold import Client

client = Client(base_url="http://localhost:8000", token="your_token")

# Ingest PDFs
job = client.ingest(
    source="pdf",
    files=["report.pdf", "contract.pdf"],
    database="mydb"
)
print(f"Job {job.id}: {job.status}")

# Ingest from DataFrame
import pandas as pd
df = pd.read_csv("data.csv")
job = client.ingest(source="dataframe", data=df, database="mydb")

10MB per file, 50 files per batch via API. CLI has no limits. Supports PDF (vision + OCR), CSV, Excel, Parquet, and any REST API source.

Query

Create a session

# Create a session
curl -X POST "http://localhost:8000/v1/sessions" \
  -H "Authorization: Bearer $API_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "llm_service": "ollama",
    "database_name": "mydb",
    "agent_type": "graph_rag_agent",
    "connection_details": {
      "uri": "bolt://localhost:7687",
      "user": "memgraph",
      "password": "your_password"
    }
  }'

# Response:
# {"session_id": "sess_abc123", "created_at": "2026-04-09T09:00:00"}

Execute a query

# Basic query
curl -X POST "http://localhost:8000/v1/sessions/sess_abc123/query" \
  -H "Authorization: Bearer $API_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What connects entity A to entity B?",
    "mode": "hybrid",
    "search_depth": "balanced"
  }'

Streaming (SSE)

# Streaming query (SSE)
curl -N "http://localhost:8000/v1/sessions/sess_abc123/query" \
  -H "Authorization: Bearer $API_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"question": "Overview of risk factors", "stream": true}'

# Events:
# event: retrieval_started
# event: retrieval_complete  → {"source_count": 12, "retrieval_time_ms": 340}
# event: token              → {"text": "Several", "index": 0}
# event: token              → {"text": " risk", "index": 1}
# ...
# event: complete           → {"answer": "...", "sources": [...], "cost": {...}}

Python client (full example)

from graffold import Client

client = Client(base_url="http://localhost:8000", token="your_token")

# Create session
session = client.create_session(
    llm_service="ollama",
    database="mydb",
    connection={"uri": "bolt://localhost:7687", "user": "memgraph", "password": "..."}
)

# Query
result = session.query("What connects entity A to entity B?", mode="hybrid")
print(result.answer)
print(result.sources)  # [{doc_id, excerpt, confidence}, ...]
print(result.cost)     # {prompt_tokens, completion_tokens, estimated_usd}

# Follow-up (uses session context)
result2 = session.query("Which of those have active certifications?")

# KNN expansion
expanded = session.expand_knn(expansion_level=1)
print(expanded.entities)  # newly discovered neighbors

# Streaming
for event in session.query("Summarize all findings", stream=True):
    if event.type == "token":
        print(event.text, end="")

Query Modes

naive

Direct LLM answer, no graph retrieval

local

Entity neighborhood search

global

Community summary-based retrieval

hybrid

Merges local + global with deduplication

Search Depth

fast

Discovery only, no expansion

balanced

1 expansion round, Cypher fallback if < 3 targets

deep

Up to 3 rounds, always runs Cypher fallback

Expand & Explore

Discover connected entities beyond the initial answer with KNN expansion.

# Expand results with k-nearest neighbors
curl -X POST "http://localhost:8000/v1/sessions/sess_abc123/expand-knn" \
  -H "Authorization: Bearer $API_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"expansion_level": 1}'

All Endpoints

GET  /health              # Version, uptime (no auth)
GET  /health/ready         # DB + LLM connectivity check (no auth)
GET  /health/live          # Process alive (no auth)

POST /v1/sessions                          # Create session
GET  /v1/sessions/{session_id}             # Get session details
DEL  /v1/sessions/{session_id}             # Delete session

POST /v1/sessions/{session_id}/query       # Execute query
POST /v1/sessions/{session_id}/expand-knn  # KNN expansion
GET  /v1/sessions/{session_id}/stats       # Database statistics

POST /v1/ingestion/jobs                    # Start ingestion job

GET  /v1/costs                             # LLM cost summary
GET  /v1/metrics                           # Performance metrics
GET  /v1/metrics/latency                   # Query latency percentiles
GET  /v1/metrics/multi-hop                 # Multi-hop success rate
GET  /v1/metrics/community                 # Community usage stats