Orchestrating data workflows with precision and observability
In the ever-changing world of data processing, building robust, observable, and testable pipelines is no longer a luxury—it’s a necessity. At the Databricks Meetup Belgium (presented by Flavien Hancart), Dagster was introduced as an open-source orchestrator. He demonstrated how Dagster integrates effectively with tools such as Azure Data Factory, Databricks, and dbt.
- What is Dagster ?
Dagster is a data orchestration platform designed for modern data architectures. Unlike traditional orchestrators, Dagster offers data-aware orchestration: it doesn’t just execute tasks, it understands the structure and context of the data being manipulated.
With Dagster, pipelines are scripted in Python in a modular and declarative manner. Workflows, called “jobs,” consist of tasks that are easily testable, versionable, and observable. This approach encourages a clear separation of responsibilities, facilitating pipeline maintenance and scalability.
- What makes Dagster stand out ?
Dagster offers a modern and efficient developer experience, including:
A web interface (Dagster UI) for viewing workflows in real time.
Built-in testing capabilities, allowing you to validate transformations before they go live.
Monitoring and alerts, so you are immediately notified if a problem arises.
Software-defined assets, ensuring data traceability and dependency management.
Flavien Hancart’s demonstration highlighted how Dagster brings transparency and reliability to orchestration.
- Integration with Azure
Dagster can be positioned as a central orchestration layer, connecting services such as Azure Data Factory (ADF) for ingestion, Databricks for transformations, and Power BI for visualization.
Azure Data Factory (ADF) can be triggered or monitored via Dagster jobs, centralizing logs and management (custom pipeline using Azure APIs).
Databricks notebooks and workflows can be orchestrated directly in Dagster pipelines thanks to ready-to-use integrations.
It is even possible to trigger Power BI refreshes following data validation steps.
This interconnection enables the construction of end-to-end data pipelines that are testable and fully observable across the entire Azure stack, while maintaining the flexibility required by data teams.
- Simplified testing and notifications
With Dagster, testing is not an afterthought. Unit and integration tests can be defined for each transformation and run automatically during deployments. What’s more, with native support for Slack notifications, email, or custom webhooks, teams stay informed at every stage, whether a pipeline succeeds or fails.
- Conclusion
Dagster is a modern orchestrator, designed for the needs of today’s data teams.