Beginner

Graphify — Codebase Knowledge Graphs

Graphify is an open-source tool that turns any codebase — or any folder of code, SQL schemas, documentation, papers, and diagrams — into a queryable knowledge graph. Released in April 2026, it crossed 22,000 GitHub stars in under ten days. The killer feature: source code is parsed entirely locally using deterministic parsing (no LLM, no network call), so your proprietary code never leaves your machine.

The Problem It Solves

When you ask an AI coding assistant about a large codebase, it has to read many files to build context — and each file consumes tokens. On a 52-file repository, a typical query costs around 123,000 tokens if the AI reads the relevant files directly. With Graphify, the same query costs around 1,700 tokens — a 71.5× reduction.

This matters in both directions: cheaper API costs at scale, and better answers because the model can fit a richer picture of the codebase in a single context window rather than a few raw files.

Why the token reduction is so dramatic

A knowledge graph of a codebase is a compressed, structured summary of relationships — which files import which, which functions call which, which tables a schema references. An AI reading the graph gets the architecture in one pass. Reading raw files gives the AI text to reason about; the graph gives it pre-extracted structure to query.

How Graphify Works

Point at folder
Code, SQL, docs, images, PDFs
Tree-sitter parse
Source code only — no network calls
LLM extract
Docs/PDFs/images via your API key
NetworkX + Leiden
Build graph, detect clusters
Three outputs
graph.html, GRAPH_REPORT.md, graph.json

Graphify pipeline: source code stays local; only non-code files touch the LLM API

The Privacy Model

This is the detail that makes Graphify safe for proprietary codebases:

Source code — stays local

All .ts, .py, .go, .sql and other source files are parsed by Tree-sitter — a deterministic, rule-based parser. No language model is invoked, no network call is made. Your source code does not leave your machine.

Docs, PDFs, images — go to your LLM API

Markdown, PDFs, images, and video transcripts require semantic understanding, so they are sent to your configured AI API (Anthropic, OpenAI, etc.). There is no Graphify relay server — traffic flows directly from your machine to the LLM provider.

What Graphify Produces

graph.html

Interactive visual graph you can open in any browser. Pan, zoom, hover over nodes to see their connections and properties.

GRAPH_REPORT.md

A Markdown summary of central nodes and surprising connections — the 10–20 most connected entities in your codebase, and cross-module links that may not be obvious from the code.

graph.json

Machine-readable graph for querying. AI coding assistants (Claude Code, Cursor, Gemini CLI) can read this file to answer questions about your codebase without re-reading source files.

What Can Be Graphified

Graphify handles multi-modal input — not just source code:

  • Source code — any language Tree-sitter supports (TypeScript, Python, Go, Rust, Java, C/C++, and many more)
  • SQL schemas — tables, columns, foreign keys, indexes as graph nodes and relationships
  • R scripts and shell scripts
  • Markdown documentation — parsed for entities and cross-references
  • PDFs and papers — via LLM semantic extraction
  • Images and diagrams — via LLM vision extraction
  • Video transcripts

This means you can build a single graph that spans app code + database schema + infrastructure scripts + architecture documentation — and query across all of it in one pass.

AI Coding Assistant Integration

Graphify is explicitly designed as a skill for AI coding assistants. Once you have a graph.json, you can include it in your AI assistant's context (e.g., as a CLAUDE.md reference in Claude Code, or as a file in Cursor's context). The assistant queries the graph instead of reading raw files, dramatically reducing token usage per session.

Supported assistants: Claude Code, OpenAI Codex, OpenCode, Cursor, Gemini CLI, and any assistant that can read a file from your project directory.

Getting Started

# Install and run on your codebase

pip install graphify

graphify /path/to/your/project

# Outputs: graph.html, GRAPH_REPORT.md, graph.json

MIT license. Source: github.com/safishamsi/graphify

Checklist: Do You Understand This?

  • Graphify builds a knowledge graph from your codebase locally — source code is parsed by Tree-sitter with no network calls
  • Only docs, PDFs, and images are sent to the LLM API; traffic goes directly to Anthropic/OpenAI, not through Graphify servers
  • Produces three outputs: graph.html (visual), GRAPH_REPORT.md (summary), graph.json (queryable)
  • 71.5× token reduction on a 52-file repo (1,700 vs 123,000 tokens per query)
  • Works as a skill for Claude Code, Cursor, Gemini CLI, and other AI coding assistants
  • Open-source, MIT license, released April 2026

Page built: 01 Jun 2026