Skip to main content
Data Foundations · Playbook · 3 min read

Best Practices and Project Structure

Separating data transformation logic is key to building trust and maintainability. In our dbt projects, we enforce a clear, three-tiered structure.

dbt is opinionated software. The team that adopted it has, almost certainly, also adopted those opinions in principle — testable transformations, modular SQL, documentation as a first-class artefact, version control over warehouse logic. The gap between adopting dbt and running a dbt project that holds up over years of evolution is where most implementations quietly accumulate technical debt: marts that depend directly on sources, ad-hoc business logic embedded in the BI layer, tests added once and never revisited, documentation that drifts the moment the original author leaves.

The three-tier structure below isn’t a stylistic preference. It’s an architectural decision that encodes the boundaries between raw ingestion, business logic, and consumption-ready data — and it enforces those boundaries through directory conventions that survive team turnover. Each tier has a different audience, a different change cadence, and a different set of invariants. Mixing them is the most common reason dbt projects become unmaintainable.

The principles that make a dbt project survive its second year

The three tiers below are visible in any dbt project; what’s harder to see is the discipline that distinguishes a healthy project from one slowly drifting into technical debt. Three principles consistently separate the two.

Staging is dumb; intermediate is smart; marts are products.

Staging models do nothing the source system could not have done — they rename, type-cast, and clean. Intermediate models are where business logic lives, modularised so a single concept (e.g. “active customer”) has one definition reused across every downstream mart. Marts are the products: stable, documented, owned, with an SLA. Crossing these boundaries — a calculation in staging, a raw source reference in a mart — is the first sign of drift.

Tests are contracts, not afterthoughts.

Generic tests (not_null, unique, accepted_values, referential integrity) are the floor. The real value lives in singular tests that encode business invariants — total revenue must equal sum-of-segment-revenue within £100. When a test fails, it’s a signal that the business has changed in a way the data layer doesn’t yet model — actionable information, not noise.

Documentation is the public API of the data layer.

A mart is only as valuable as the next user’s ability to use it without asking the original author. We enforce 100% column-level documentation on every mart-tier model, with descriptions written for the consumer, not the author. The dbt docs site then becomes a searchable, living data catalogue — the cheapest enablement tool a data team can ship.

The underlying discipline that ties all three together is ruthless naming. A model called customer_dim is reasonable; a model called customer_dim_v2_final_REAL is a confession that the team has lost the ability to deprecate. Naming is how a healthy dbt project tells its own history; it is also how an unhealthy one buries it.

The three tiers

Source-shaped

Staging

Lightly transformed, one staging model per source table, named conventionally (e.g. stg_stripe__customers). Renames, type casts, light cleaning — no joins, no aggregations.

Deliverable A source rename touches exactly one model.

Business-shaped

Intermediate

Reusable building blocks: joins, light aggregations, derived columns, named for the business concept (e.g. int_customer_lifetime_value). Materialised as ephemeral or table depending on cost.

Deliverable Business logic defined once, reused everywhere downstream.

Consumer-facing

Marts

Production-ready, query-optimised star-schema fact and dimension tables, named for the consumer (e.g. dim_customers, fct_orders).

Deliverable A stable, documented interface your BI tool can trust.

A caveat

dbt’s strengths are in batch SQL transformation against a modern warehouse. It is not the right substrate for streaming pipelines, for compute-heavy ML feature engineering that needs to live outside the warehouse, or for ad-hoc analytics that don’t need a permanent home. We have inherited projects where dbt was bent to all three; the result was always more code, slower iteration, and more debt than a fit-for-purpose tool would have produced. The Data Foundations pillar we run with clients includes the architectural-fit assessment that establishes whether dbt is the right tool before the project-structure question is even asked.

Go deeper

Strategic Guide to Modern Data Platform Transformation

Our standalone playbook on modern data platform transformation — dbt architecture for regulated data, governance integration, semantic-layer design, and migration sequencing. Free, delivered by email.

Want this applied to your context?

Let's talk about where you are and where this would land.

Thirty-minute discovery call — no deck, no pitch. We'll talk about where you are and where the highest-impact next move is.

Explore Data Foundations