The Field-Level Lineage Gap in Microsoft Fabric (And How We're Closing It)
Back to Blog
Data Engineering

The Field-Level Lineage Gap in Microsoft Fabric (And How We're Closing It)

Francois de Wet
2026-02-06
6 min read

Someone asks "where does this number in the report actually come from?" and suddenly you're navigating six notebooks, three warehouse views, and a semantic model. While Microsoft Fabric provides basic item-level lineage visualisation, it lacks the field-level tracking that reveals how individual columns transform across systems.

The challenge becomes acute in medallion architectures where transformations span multiple engines—PySpark notebooks, T-SQL views, Power Query expressions, and pipeline activities—each using distinct syntax and metadata formats.

Why This Matters

Without field-level lineage visibility:

  • Impact analysis becomes speculative rather than data-driven
  • Developers cannot automatically detect downstream breakage from changes
  • Regulatory compliance teams cannot demonstrate data provenance
  • Documentation drifts immediately after code changes

Our Approach: A Hybrid Solution

We've implemented a hybrid approach as a Python tool that runs post-deployment in CI/CD pipelines, combining two strategies:

Deterministic Parsing (80% of the work)

  • SQL view definitions analysed via the SQLGlot library
  • Semantic model JSON structure parsed directly
  • Pipeline copy activities examined for column mappings

LLM-Assisted Parsing (the remaining 20%)

  • PySpark notebook code sent to Claude with extraction prompts
  • Handles conditional logic, dynamic columns, and UDFs
  • All edges receive confidence scoring for human review

Output & Workflow

The tool produces actionable outputs integrated into the development workflow:

  • Versioned lineage graphs created per deployment
  • Diff engine compares snapshots to surface breaking changes
  • Impact analysis reports generated automatically in pull requests

Keeping It Simple

The implementation intentionally prioritises simplicity:

  • SQLite for storage
  • REST API for queries
  • React UI with D3.js force graph visualisation
  • Designed for team adoption within one week

The Bigger Picture

While Fabric adoption accelerates, governance tooling hasn't kept pace. Microsoft Purview integration remains inconsistent for Spark workloads, and third-party solutions lack sufficient field-level depth—particularly in regulated or enterprise environments.

The hybrid deterministic-plus-LLM approach represents the appropriate pattern for any modern platform where transformation logic spans multiple languages and engines. It's not about replacing human oversight—it's about giving teams the visibility they need to move fast without breaking things.

Get in touch if you're dealing with the same lineage challenges in your Fabric environment.

Tags:
#Microsoft Fabric#Data Lineage#Governance#Python#CI/CD