Dagster Deep Dive on Data quality

August 6, 2024

Dimensions of data quality:

  • Timeliness
    • data is ready within a certain time frame
  • Validity
    • data values conform to an accepted format
  • Completeness
    • data is fully populated in attributes and records
  • Consistency
    • data is aligned across systems and sources
  • Accuracy
    • data values are aligned with a source of truth
  • Uniqueness
    • data is free of duplicate values

Data validation tools:

Common challenges:

  • Managing data quality across distributed teams
  • Retroactively enforcing standards and dealing with legacy systems
  • Upfront developer cest of following data quality best practices
  • Establishing ownership of data

Notes:

  • validation should occur at all stages of the data lifecycle (orchestration is a natural home for this)
  • platform owners and governance teams should establish frameworks that promote data enforcement and validation