Graphify — Codebase Knowledge Graphs
Graphify is an open-source tool that turns any codebase — or any folder of code, SQL schemas, documentation, papers, and diagrams — into a queryable knowledge graph. Released in April 2026, it crossed 22,000 GitHub stars in under ten days. The killer feature: source code is parsed entirely locally using deterministic parsing (no LLM, no network call), so your proprietary code never leaves your machine.
The Problem It Solves
When you ask an AI coding assistant about a large codebase, it has to read many files to build context — and each file consumes tokens. On a 52-file repository, a typical query costs around 123,000 tokens if the AI reads the relevant files directly. With Graphify, the same query costs around 1,700 tokens — a 71.5× reduction.
This matters in both directions: cheaper API costs at scale, and better answers because the model can fit a richer picture of the codebase in a single context window rather than a few raw files.
Why the token reduction is so dramatic
A knowledge graph of a codebase is a compressed, structured summary of relationships — which files import which, which functions call which, which tables a schema references. An AI reading the graph gets the architecture in one pass. Reading raw files gives the AI text to reason about; the graph gives it pre-extracted structure to query.
How Graphify Works
Graphify pipeline: source code stays local; only non-code files touch the LLM API
The Privacy Model
This is the detail that makes Graphify safe for proprietary codebases:
Source code — stays local
All .ts, .py, .go, .sql and other source files are parsed by Tree-sitter — a deterministic, rule-based parser. No language model is invoked, no network call is made. Your source code does not leave your machine.
Docs, PDFs, images — go to your LLM API
Markdown, PDFs, images, and video transcripts require semantic understanding, so they are sent to your configured AI API (Anthropic, OpenAI, etc.). There is no Graphify relay server — traffic flows directly from your machine to the LLM provider.
What Graphify Produces
graph.html
Interactive visual graph you can open in any browser. Pan, zoom, hover over nodes to see their connections and properties.
GRAPH_REPORT.md
A Markdown summary of central nodes and surprising connections — the 10–20 most connected entities in your codebase, and cross-module links that may not be obvious from the code.
graph.json
Machine-readable graph for querying. AI coding assistants (Claude Code, Cursor, Gemini CLI) can read this file to answer questions about your codebase without re-reading source files.
What Can Be Graphified
Graphify handles multi-modal input — not just source code:
- Source code — any language Tree-sitter supports (TypeScript, Python, Go, Rust, Java, C/C++, and many more)
- SQL schemas — tables, columns, foreign keys, indexes as graph nodes and relationships
- R scripts and shell scripts
- Markdown documentation — parsed for entities and cross-references
- PDFs and papers — via LLM semantic extraction
- Images and diagrams — via LLM vision extraction
- Video transcripts
This means you can build a single graph that spans app code + database schema + infrastructure scripts + architecture documentation — and query across all of it in one pass.
AI Coding Assistant Integration
Graphify is explicitly designed as a skill for AI coding assistants. Once you have a graph.json, you can include it in your AI assistant's context (e.g., as a CLAUDE.md reference in Claude Code, or as a file in Cursor's context). The assistant queries the graph instead of reading raw files, dramatically reducing token usage per session.
Supported assistants: Claude Code, OpenAI Codex, OpenCode, Cursor, Gemini CLI, and any assistant that can read a file from your project directory.
Getting Started
# Install and run on your codebase
pip install graphify
graphify /path/to/your/project
# Outputs: graph.html, GRAPH_REPORT.md, graph.json
MIT license. Source: github.com/safishamsi/graphify
Checklist: Do You Understand This?
- Graphify builds a knowledge graph from your codebase locally — source code is parsed by Tree-sitter with no network calls
- Only docs, PDFs, and images are sent to the LLM API; traffic goes directly to Anthropic/OpenAI, not through Graphify servers
- Produces three outputs: graph.html (visual), GRAPH_REPORT.md (summary), graph.json (queryable)
- 71.5× token reduction on a 52-file repo (1,700 vs 123,000 tokens per query)
- Works as a skill for Claude Code, Cursor, Gemini CLI, and other AI coding assistants
- Open-source, MIT license, released April 2026