Beginner

Schema Design with AI

AI is a useful thinking partner for database schema design — it can generate initial schemas from requirements, compare normalisation strategies, and explain trade-offs. But schema design requires domain knowledge and understanding of your actual access patterns, which AI does not have. Use it to explore options and spot issues, not to make final decisions. This page also covers how to annotate your schema to make it AI-readable — which directly improves the accuracy of any LLM-generated SQL.

Prompting for Schema Design

1. Initial Schema from Requirements

Design a PostgreSQL schema for the following requirements:

[DESCRIBE YOUR DOMAIN — e.g., "a SaaS application where organisations have multiple users, each user can have multiple projects, and projects contain tasks with assignees and due dates"]

Requirements:

- [LIST KEY ENTITIES AND RELATIONSHIPS]

- [LIST COMMON QUERY PATTERNS — what will be queried most frequently]

- [LIST ANY CONSTRAINTS — e.g., "a task must belong to exactly one project"]

For each table: provide CREATE TABLE DDL with column names, types, constraints, and foreign keys.

Then: list three design decisions you made and one alternative approach for each.

Asking for design decisions + alternatives makes AI's assumptions explicit, so you can correct the ones that don't fit your actual requirements.

2. Schema Review and Trade-offs

Review this schema design and identify:

1. Normalisation issues (over or under-normalised for typical OLTP use)

2. Missing indexes for the access patterns I described

3. Potential data integrity risks (missing constraints, nullable columns that should be NOT NULL)

4. Any column names or types that could be ambiguous or misleading

Access patterns: [DESCRIBE HOW YOU WILL QUERY THIS DATA]

Schema: [PASTE DDL]

Making Your Schema AI-Readable

Traditional database schemas were designed for humans who know the business domain. LLMs reading your schema to generate SQL do not have that context. Adding semantic annotations improves LLM SQL generation accuracy by up to 27% in documented benchmarks.

Schema annotation techniques

Table COMMENT: explain what the table stores and how it relates to others in business terms
Column COMMENT: explain what each column means, especially non-obvious ones (e.g., "plan: the subscription tier — values: free, starter, pro, enterprise")
Enum values: document what each allowed value means (not just the value itself)
Units annotation: for numeric columns, note the unit (e.g., "amount: transaction value in USD cents, not dollars")
Preferred joins: note which columns are the correct join partners (prevents AI from joining on wrong columns)

Schema context to include in SQL prompts

Table and column names with their business meaning
Which columns have enum constraints and valid values
Which numeric columns have units (cents vs dollars, seconds vs milliseconds)
Soft-delete pattern if used (e.g., "deleted_at IS NULL means active record")
The correct join path between tables (especially in complex schemas with multiple possible joins)

Common Schema Design Decisions AI Can Help With

Decision	Good prompt to explore it	What to decide yourself
Normalisation level	"Compare 3NF vs denormalised for this access pattern: [describe]"	Actual query frequency and performance requirements
JSON vs columns	"Should this data be stored as a JSONB column or normalised into separate columns? Access pattern: [describe]"	How often individual fields are queried vs the whole object
Soft delete vs hard delete	"What are the implications of adding a deleted_at column vs hard deleting rows?"	Regulatory requirements, audit needs, referential integrity rules
Index design	"Suggest indexes for these queries: [list queries]"	Acceptable write overhead; storage budget; actual query frequency in production

Limits of AI Schema Design

What AI cannot know

AI cannot know your actual data volumes, query patterns, growth trajectory, or the business rules that must be enforced. It generates plausible, technically correct schemas — but "plausible" and "right for your system" are different things. Always validate AI schema designs against your real access patterns before building on top of them.

Always verify:

Join correctness — run sample queries with actual data to confirm cardinality
Migration safety — never apply AI-generated migrations to production without testing in staging first
Constraint completeness — AI often under-specifies NOT NULL and UNIQUE constraints
Performance at scale — a schema that works for 1,000 rows may need index changes at 10 million

Checklist: Do You Understand This?

Why should you ask AI to list its design decisions and alternatives when generating a schema, rather than just accepting the output?
What five types of annotation make a schema more AI-readable and improve SQL generation accuracy?
What does the "units annotation" prevent — give a concrete example of a bug it would catch?
For a schema with a soft-delete pattern, what context must you include in any SQL prompt to get correct queries?
Name two schema design decisions where AI can explore trade-offs well, and one decision that only you can make.
What must you do before applying any AI-generated migration to a production database?