The premise of a feature store is uncontroversial: features used in training should be identical to features used in serving, and the cheapest way to enforce that invariant is to centralise feature definitions in a single layer that both training jobs and live inference reach into. The premise is right. The execution — across the implementations we have run and the failed implementations we have inherited — is where the discipline gets lost.
Most feature-store programmes stall not because the technology is wrong, but because they are scoped as data-engineering projects when they are, in practice, organisational alignment projects. The data-engineering team builds a beautiful online/offline split. The ML team continues to generate features in their own notebooks because the registered features don’t yet cover their use case. Six months later the registry has thirty features and the team’s models still depend on a hundred ad-hoc transformations that live in someone’s local Jupyter kernel.
The disciplines that determine whether the feature store actually gets used
Adoption, not architecture, is the metric. Three disciplines repeatedly distinguish feature-store programmes that compound from programmes that become shelfware.
Prioritise use cases before choosing technology.
Build the online and offline stores against the same definitions.
Pass production-readiness benchmarks before announcing the platform.
Underneath the three sits the most under-attended decision: ownership. The feature store sits at the boundary between data engineering (who runs the pipelines) and machine learning (who defines the features). Programmes that leave ownership ambiguous stall on every disagreement about who is responsible when a feature breaks. We define a feature-store owner — typically embedded in the ML platform team but with a hard escalation path into data engineering — before any code ships.
Three phases
Pre-implementation
Deliverable Two or three named models ready to migrate first.
Technical setup
Deliverable Training-serving consistency built in from day one.
Production readiness
Deliverable A platform the ML team will actually adopt.
A feature store earns its complexity when the organisation has three to five production ML models that share substantial feature surface. Below that threshold the centralisation cost is higher than the duplication cost; teams should keep their notebook-defined features and revisit when the model portfolio matures. The Production AI & ML pillar we run with clients includes the maturity assessment that establishes whether a feature store is the right next investment.
