Schema-First Engineering: Why the Database Should Drive the Codebase

Most software teams build applications from the outside in. They start with the UI. A designer hands off screens. A frontend engineer builds components. Someone realizes the components need data, so they define an API. Someone else realizes the API needs to store data, so they write a migration. The database schema becomes the last artifact, reverse-engineered from whatever the UI happened to need. This approach ships fast in week one. It creates architectural debt by week four. By month six, the team is talking about a rewrite.

Schema-first engineering inverts this sequence entirely. You start with the database. You model the domain as tables, columns, constraints, and relationships. You generate types from the schema. You derive API contracts from the types. The UI consumes the API. Every layer inherits its shape from the layer below, and the database is the foundation. Changes flow upward through generation and validation, not downward through manual synchronization. This is not a philosophical preference. It is an engineering strategy with measurable outcomes: fewer rewrites, fewer bugs at integration boundaries, and faster onboarding for new engineers who can read the schema and understand the entire domain in minutes.

Why Outside-In Architecture Fails

The outside-in approach feels productive because it produces visible output immediately. A developer sees a mockup, builds a component, and has something to show in hours. But visible output and correct architecture are different things. The problems emerge at integration points.

Consider a multi-tenant SaaS application. The UI team builds a dashboard that displays a list of projects. They define a Project type in the frontend code with the fields they need: id, name, status, createdAt. The API team builds an endpoint that returns project data, defining their own Project type with slightly different naming conventions: project_id, project_name, project_status, created_at. The database engineer creates a projects table with columns that match neither convention perfectly. Now there are three definitions of "project" in the codebase, each maintained independently, each drifting from the others over time.

When someone adds a description field, they add it to the database, forget to update the API serializer, and the frontend shows undefined. When someone renames status to phase in the frontend, the API still returns status, and a mapping layer appears. When someone adds a tenant_id column to enforce multi-tenancy, the API needs a new filter, the frontend needs new context, and three pull requests have to merge in the right order or the system breaks.

Every time two layers define the same concept independently, you have created a synchronization problem that will outlive the engineers who created it.

The Schema-First Approach: Design the Data Model First

In schema-first engineering, the database migration is the first artifact for any new feature. Before writing a component, before designing an API endpoint, you write the SQL that defines the data model. This forces you to answer the hard questions early: What are the entities? What are the relationships? What are the constraints? What needs to be unique? What can be null? What has a default?

Here is a concrete example. You are building an invoicing feature. Before touching any application code, you write the migration:

-- Migration: 020_create_invoicing.sql

create table invoices (
  id            uuid primary key default gen_random_uuid(),
  tenant_id     uuid not null references tenants(id),
  client_id     uuid not null references clients(id),
  invoice_number text not null,
  status        text not null default 'draft'
                check (status in ('draft', 'sent', 'paid', 'overdue', 'void')),
  issued_date   date,
  due_date      date,
  subtotal      integer not null default 0,  -- cents
  tax_rate      numeric(5,4) not null default 0,
  total         integer not null default 0,  -- cents
  notes         text,
  created_at    timestamptz not null default now(),
  updated_at    timestamptz not null default now(),

  unique (tenant_id, invoice_number)
);

create table invoice_line_items (
  id          uuid primary key default gen_random_uuid(),
  invoice_id  uuid not null references invoices(id) on delete cascade,
  description text not null,
  quantity    integer not null default 1 check (quantity > 0),
  unit_price  integer not null default 0,  -- cents
  total       integer not null default 0,  -- cents
  sort_order  integer not null default 0,
  created_at  timestamptz not null default now()
);

-- Row-level security
alter table invoices enable row level security;
create policy invoices_tenant_isolation on invoices
  using (tenant_id = current_setting('app.current_tenant')::uuid);

alter table invoice_line_items enable row level security;
create policy line_items_tenant_isolation on invoice_line_items
  using (invoice_id in (
    select id from invoices where tenant_id = current_setting('app.current_tenant')::uuid
  ));

This single file encodes an enormous amount of architectural decision-making. Money is stored in cents as integers, eliminating floating-point rounding errors. Status is constrained to a known set of values via a check constraint, making invalid states unrepresentable at the database level. Invoice numbers are unique per tenant, not globally. Line items cascade on delete, so you cannot have orphaned line items. Tenant isolation is enforced at the row level, meaning even a bug in the application code cannot leak data across tenants.

None of these decisions would have surfaced if the team had started with the UI. They would have discovered them one at a time, across multiple sprints, usually in the form of bugs.

A database migration is a design document that happens to be executable. If you cannot express your feature as a migration, you do not yet understand the feature well enough to build it.

Type Generation: One Source, Zero Drift

Once the schema exists, types should be generated from it, not written by hand. This is the mechanism that eliminates drift between layers. When the database is the single source of truth and every other layer derives its types from the database, renaming a column is a one-step operation that propagates automatically.

With Supabase, the type generation command is a single CLI call:

npx supabase gen types typescript --local > src/lib/database.types.ts

This produces a TypeScript file with complete type definitions for every table, view, and function in the database. Your application code imports these types directly:

import type { Database } from '@/lib/database.types';

type Invoice = Database['public']['Tables']['invoices']['Row'];
type InvoiceInsert = Database['public']['Tables']['invoices']['Insert'];
type InvoiceUpdate = Database['public']['Tables']['invoices']['Update'];
type LineItem = Database['public']['Tables']['invoice_line_items']['Row'];

The Row type represents what you get back from a SELECT. The Insert type represents what you need to provide for an INSERT (required fields only, defaults excluded). The Update type makes all fields optional for partial updates. These three variants cover every CRUD operation, and they are all generated from the same migration. When you add a column to the migration, re-run the generator, and the TypeScript compiler immediately flags every file that needs to handle the new field. Zero drift. Zero manual synchronization.

API Contracts from Types

The generated types also define your API contracts. If your API returns an Invoice, the response shape is determined by the database schema, not by a separately maintained interface. If your API accepts an InvoiceInsert, the request validation is determined by the column constraints. You can layer additional validation on top (Zod schemas for business rules that go beyond what SQL constraints express), but the structural shape is inherited, not duplicated.

import { z } from 'zod';

// Extend generated types with business validation
const CreateInvoiceRequest = z.object({
  client_id: z.string().uuid(),
  invoice_number: z.string().min(1).max(50),
  issued_date: z.string().date().optional(),
  due_date: z.string().date().optional(),
  notes: z.string().max(2000).optional(),
  line_items: z.array(z.object({
    description: z.string().min(1).max(500),
    quantity: z.number().int().positive(),
    unit_price: z.number().int().nonnegative(),
  })).min(1),
});

Notice that this Zod schema does not redefine the entity. It validates the request payload, adding constraints like minimum line items and maximum string lengths that are business rules, not data model rules. The structural shape still comes from the database. The Zod schema is a refinement layer, not a replacement.

Foreign Keys as Business Logic, RLS as Authorization

One of the most powerful consequences of schema-first engineering is that the database enforces business rules directly. Foreign key constraints ensure referential integrity: you cannot create an invoice for a client that does not exist. Check constraints ensure valid states: an invoice status cannot be "banana." Unique constraints enforce business uniqueness: no duplicate invoice numbers within a tenant.

These constraints are not duplicated in application code. They exist in one place, at the lowest layer, where they cannot be bypassed by a new API endpoint, a background job, or a direct database query. Application code does not need to check whether a client exists before creating an invoice. The database will reject the insert if the foreign key is invalid. Application code does not need to validate status values. The database will reject any value not in the allowed set.

Row-Level Security (RLS) extends this principle to authorization. In a multi-tenant application, tenant isolation is the most critical security boundary. With RLS, this boundary is enforced at the database level. Every query, regardless of which application layer issues it, is automatically filtered by the current tenant. A bug in a new API endpoint cannot accidentally return another tenant's data. A background job that forgets to add a WHERE tenant_id = ? clause is still filtered by the RLS policy.

If your authorization logic only exists in middleware, it will eventually be bypassed. If it exists in the database, every query path is protected by default.

Migration-Driven Development

In schema-first engineering, the migration file is the unit of feature development. Each feature begins with a migration and ends with a migration. The migration is versioned, sequential, and reviewable. It serves as both the implementation and the documentation of the data model change.

A healthy migration history reads like a changelog of the domain model:

001_create_tenants.sql
002_create_users_and_auth.sql
003_create_projects.sql
004_add_project_phases.sql
005_create_tasks.sql
006_add_task_assignments.sql
007_create_invoicing.sql
008_add_invoice_payment_tracking.sql
009_add_project_archiving.sql
010_create_audit_log.sql

Each migration is a small, focused change. Each is independently reviewable. Each is testable in isolation. And critically, each is reversible. A well-structured migration includes both up and down operations, so a failed deployment can be rolled back to the previous schema state without data loss.

This approach also solves the onboarding problem. A new engineer can read the migration history from top to bottom and understand the entire domain model, including when and why each change was made. The migrations are the authoritative history of the application's data architecture. No wiki page, no Confluence document, no README can match the precision of executable SQL that is actually running in production.

Migrations are the only documentation that is guaranteed to be accurate, because if they were wrong, the application would not be running.

The Compound Benefit: Fewer Abstractions, Less Drift

The cumulative effect of schema-first engineering is a codebase with dramatically fewer abstractions. There is no ORM mapping layer that transforms database rows into objects with different names. There is no API serialization layer that reshapes the data for the frontend. There is no frontend type definition that redefines the entity a fourth time. There is one definition, in the database, and every other layer references it.

This reduction in abstraction layers has several measurable effects:

Fewer integration bugs: When all layers share the same shape, field name mismatches and type mismatches are caught at compile time, not at runtime.
Faster feature development: Adding a column requires one migration, one type regeneration, and one UI update. Not three separate PRs across three repos with manual coordination.
Simpler debugging: When a field is wrong in the UI, the investigation path is short. The UI reads the API. The API reads the database. The database is the source. There are no intermediate transformations where data could be corrupted.
Safer refactoring: Renaming a column in the migration and regenerating types produces compiler errors everywhere the old name was used. The compiler becomes your refactoring assistant.
Better security posture: Authorization at the database layer means every new endpoint, every new background job, every new admin tool inherits tenant isolation automatically. You cannot forget to add the security check because the security check is the database itself.

The teams that adopt schema-first engineering do not write less code. They write code with less duplication, fewer translation layers, and tighter contracts between components. The database becomes the spine of the application, and every other layer is a projection of that spine into a different context: TypeScript types for compile-time safety, API endpoints for network access, UI components for human interaction.

This is not a new idea. It is how systems were built before the era of heavyweight ORMs and client-side-first architectures made it fashionable to treat the database as an implementation detail. The database is not a detail. It is the most durable, most constrained, most reliable component in your stack. Let it lead.

The database will outlive your framework, your frontend library, your API layer, and your deployment platform. Design accordingly.