Dagster and Databricks integration
January 29, 2024
Dagster / Databricks integration:
dagster-databricks
- https://docs.dagster.io/integrations/databricks
- https://docs.dagster.io/_apidocs/libraries/dagster-databricks
- ops
- launch existing job
- launch one-time run of a set of tasks
- step launcher
- resource for running ops as a Databricks job
- op executed on Databricks, pipeline code zipped and copied to DBFS
- pipes (see below)
dagster-pipes
- experimental toolkit for building integrations with external execution environments (allows you to stream logs / events etc)
- requires
dagster-pipes
to be included on Databricks environment, and for jobs to be executed via Dagster - https://docs.dagster.io/guides/dagster-pipes
- https://docs.dagster.io/guides/dagster-pipes/databricks
Databricks:
- https://docs.databricks.com/en/workflows/jobs/jobs-quickstart.html
- https://docs.databricks.com/en/dev-tools/sdk-python.html
- https://learn.microsoft.com/en-us/training/paths/data-engineer-azure-databricks/
- https://georgheiler.com/2023/12/11/dagster-dbt-duckdb-as-new-local-mds/
Job execution:
Dagster example repos: