Data governance is now AI governance’s load-bearing wall.
Every AI control a regulator names — validation, monitoring, explanation, fairness, redress — eventually asks the same question of the deployer. Show us the data. Lineage, consent, retrieval provenance, training-data posture. The firm reads the intersection of privacy regimes and AI frameworks, and describes the single data-governance file that answers both.
The question every AI framework eventually asks.
ISO/IEC 42001 says it in operational clauses. The NIST AI RMF says it in the Map and Measure functions. The EU AI Act says it at Article 10. OSFI E-23 says it in the model-lifecycle expectations. The four frameworks do not use the same language, and they do not enforce the same way, but they converge on a single evidentiary ask. The deployer must be able to describe, on request, where the data that trained, fine-tuned or retrieves for its AI system came from, under what consent basis it sits, how its quality was assessed, and how it is kept current. This is not privacy. It is not data quality. It is the data-governance file that AI governance now rests on.
The firm has watched the ask harden over the last eighteen months. Two years ago, a deployer could answer most AI-governance reviews with a model card and a validation report. Today the first follow-up question is invariably about the data. Which corpus. From which consent basis. Retained under which retention rule. Refreshed on which cadence. Tested for which shifts. Removed under which deletion rights. The substrate has become the conversation.
Privacy regimes have already answered half the question.
Canadian, US and European privacy law has been moving data governance upstream for more than a decade. PIPEDA codifies accountability, meaningful consent and safeguards. Quebec’s Law 25 goes further: privacy impact assessments are mandatory for many AI-adjacent projects, automated decisions must be disclosed to affected individuals, and cross-border transfers demand a written assessment. GDPR’s Article 22 on automated individual decision-making, combined with DPIA thresholds and the lawfulness-of-processing analysis, already forces most of the data documentation AI frameworks now require. The overlap is not theoretical. A deployer with a mature GDPR and Law 25 posture already has the skeleton of the EU AI Act Article 10 data-governance file.
What has changed is the reviewer. Privacy regulators ask about lawful basis and rights; AI regulators ask about lineage and provenance; model-risk supervisors ask about representativeness and drift. The three questions are variations on the same core demand. The deployer that treats them as separate programmes runs three files; the deployer that treats them as one substrate runs one.
Retrieval is the new governance surface.
The rise of retrieval-augmented generation has pulled the data-governance surface forward into production. In a classical model, the data-governance story ended at training and validation. In a RAG system, the story continues into every query. The vector index is a live corpus. Its sources have provenance, its chunks have retention and redaction posture, its content has drift, and in some architectures it holds personal data subject to deletion rights the deployer must be able to honour. Regulators have noticed. EU AI Act post-market monitoring contemplates ongoing documentation of the corpus; Law 25 deletion rights extend to any store that holds identifiable content; OSFI’s model-risk expectations increasingly read against systems whose behaviour changes as the corpus changes.
The control architecture has to answer this. A retrieval manifest with source-level lineage. A provenance ledger that survives reindexing. A retention and deletion posture that works at chunk granularity. A grounding-evaluation loop that catches retrieval drift before it becomes outcome drift. These are not novel controls — they read from the same primitives as classical data governance — but they now have to operate at the pace of inference.
Training-data posture is a procurement conversation.
The flow-down story is the other piece regulated deployers are now negotiating in real time. Foundation-model providers, GPAI vendors, embedded-AI suppliers and retrieval-stack vendors all sit upstream of the deployer’s governance file. The posture the deployer can present to a regulator depends on what the vendor can present to the deployer. Training-data description. Copyright and trade-secret warranties. Transparency on known exclusions. Incident history. Change-notification covenants. The EU AI Act’s GPAI obligations make some of this explicit; the revised Product Liability Directive makes disclosure a civil-procedure matter; OSFI B-10 makes the cascade a supervisory matter for Canadian FRFIs. The procurement file of 2026 looks different because the governance file it feeds looks different.
One substrate. Many reviewers.
The firm’s read is that the split between privacy, data governance, AI governance and model risk is increasingly administrative rather than substantive. The evidence a Canadian FRFI produces for an OSFI E-23 validation review answers most of what a privacy regulator asks at a Law 25 PIA and most of what an EU AI Act technical file requires under Article 10. The artifact set is one. The reviewers are many. The operational cost of maintaining three separate files — with three reconciliation cycles, three taxonomies, three owners — is what we see second-line teams quietly failing to sustain.
Our data-governance pillar stands up the substrate. Lineage graphs for training, fine-tuning and retrieval corpora. Consent and purpose ledgers mapped to PIPEDA, Law 25 and GDPR. Retrieval manifests a regulator can read. Data-quality controls tied to AI RMF Measure functions and OSFI E-23 validation expectations. Written once, readable by every reviewer the deployer faces. The data conversation is no longer the conversation before the AI conversation. It is the AI conversation. The firm builds the file that recognises that.
One substrate answers every reviewer.
Lineage, consent, retrieval provenance, training-data posture — built once, read by privacy regulators, AI regulators and model-risk supervisors. Talk to us about the file your portfolio has to be able to produce on demand.