[ui] When viewing logs for a run, the date for a single log row is now shown in the tooltip on the timestamp. This helps when viewing a run that takes place over more than one date.
Added suggestions to the error message when selecting asset keys that do not exist as an upstream asset or in an AssetSelection.
Improved error messages when trying to materialize a subset of a multi-asset which cannot be subset.
[dagster-snowflake] dagster-snowflake now requires snowflake-connector-python>=3.4.0
[embedded-elt] @sling_assets accepts an optional name parameter for the underlying op
[dagster-openai] dagster-openai library is now available.
[dagster-dbt] Added a new setting on DagsterDbtTranslatorSettings called enable_duplicate_source_asset_keys that allows users to set duplicate asset keys for their dbt sources. Thanks @hello-world-bfree!
Log messages in the Dagster daemon for unloadable sensors and schedules have been removed.
[ui] Search now uses a cache that persists across pageloads which should greatly improve search performance for very large orgs.
[ui] groups/code locations in the asset graph’s sidebar are now sorted alphabetically.
Fixed issue where the input/output schemas of configurable IOManagers could be ignored when providing explicit input / output run config.
Fixed an issue where enum values could not properly have a default value set in a ConfigurableResource.
Fixed an issue where graph-backed assets would sometimes lose user-provided descriptions due to a bug in internal copying.
[auto-materialize] Fixed an issue introduced in 1.6.7 where updates to ExternalAssets would be ignored when using AutoMaterializePolicies which depended on parent updates.
[asset checks] Fixed a bug with asset checks in step launchers.
[embedded-elt] Fix a bug when creating a SlingConnectionResource where a blank keyword argument would be emitted as an environment variable
[dagster-dbt] Fixed a bug where emitting events from dbt source freshness would cause an error.
[ui] Fixed a bug where using the “Terminate all runs” button with filters selected would not apply the filters to the action.
[ui] Fixed an issue where typing a search query into the search box before the search data was fetched would yield “No results” even after the data was fetched.
[asset checks] UI performance of asset checks related pages has been improved.
[dagster-dbt] The class DbtArtifacts has been added for managing the behavior of rebuilding the manifest during development but expecting a pre-built one in production.
Microsoft Teams is now supported for alerts. Documentation
A send sample alert button now exists on both the alert policies page and in the alert policies editor to make it easier to debug and configure alerts without having to wait for an event to kick them off.
Added a new run_retries.retry_on_op_or_asset_failures setting that can be set to false to make run retries only occur when there is an unexpected failure that crashes the run, allowing run-level retries to co-exist more naturally with op or asset retries. See the docs for more information.
dagster dev now sets the environment variable DAGSTER_IS_DEV_CLI allowing subprocesses to know that they were launched in a development context.
[ui] The Asset Checks page has been updated to show more information on the page itself rather than in a dialog.
[ui] Fixed an issue where the UI disallowed creating a dynamic partition if its name contained the “|” pipe character.
AssetSpec previously dropped the metadata and code_version fields, resulting in them not being attached to the corresponding asset. This has been fixed.
The new @multi_observable_source_asset decorator enables defining a set of assets that can be observed together with the same function.
[dagster-embedded-elt] New Asset Decorator @sling_assets and Resource SlingConnectionResource have been added for the [dagster-embedded-elt.sling](http://dagster-embedded-elt.sling) package. Deprecated build_sling_asset, SlingSourceConnection and SlingTargetConnection.
Added support for op-concurrency aware run dequeuing for the QueuedRunCoordinator.
dagster-polars has been added as an integration. Thanks @danielgafni!
[dagster-dbt] @dbt_assets now supports loading projects with semantic models.
[dagster-dbt] @dbt_assets now supports loading projects with model versions.
[dagster-dbt] get_asset_key_for_model now supports retrieving asset keys for seeds and snapshots. Thanks @aksestok!
[dagster-duckdb] The Dagster DuckDB integration supports DuckDB version 0.10.0.
[UPath I/O manager] If a non-partitioned asset is updated to have partitions, the file containing the non-partitioned asset data will be deleted when the partitioned asset is materialized, rather than raising an error.
Fixed an issue where creating a backfill of assets with dynamic partitions and a backfill policy would sometimes fail with an exception.
Fixed an issue with the type annotations on the @asset decorator causing a false positive in Pyright strict mode. Thanks @tylershunt!
[ui] On the asset graph, nodes are slightly wider allowing more text to be displayed, and group names are no longer truncated.
[ui] Fixed an issue where the groups in the asset graph would not update after an asset was switched between groups.
[dagster-k8s] Fixed an issue where setting the security_context field on the k8s_job_executor didn't correctly set the security context on the launched step pods. Thanks @krgn!
Observable source assets can now yield ObserveResults with no data_version.
You can now include FreshnessPolicys on observable source assets. These assets will be considered “Overdue” when the latest value for the “dagster/data_time” metadata value is older than what’s allowed by the freshness policy.
[ui] In Dagster Cloud, a new feature flag allows you to enable an overhauled asset overview page with a high-level stakeholder view of the asset’s health, properties, and column schema.
[kubernetes] Fixed an issue where the Kubernetes agent would sometimes leave dangling kubernetes services if the agent was interrupted during the middle of being terminated.
Within a backfill or within auto-materialize, when submitting runs for partitions of the same assets, runs are now submitted in lexicographical order of partition key, instead of in an unpredictable order.
[dagster-k8s] Include k8s pod debug info in run worker failure messages.
[dagster-dbt] Events emitted by DbtCliResource now include metadata from the dbt adapter response. This includes fields like rows_affected, query_id from the Snowflake adapter, or bytes_processed from the BigQuery adapter.
A previous change prevented asset backfills from grouping multiple assets into the same run when using BackfillPolicies under certain conditions. While the backfills would still execute in the proper order, this could lead to more individual runs than necessary. This has been fixed.
[dagster-k8s] Fixed an issue introduced in the 1.6.4 release where upgrading the Helm chart without upgrading the Dagster version used by user code caused failures in jobs using the k8s_job_executor.
[instigator-tick-logs] Fixed an issue where invoking context.log.exception in a sensor or schedule did not properly capture exception information.
[asset-checks] Fixed an issue where additional dependencies for dbt tests modeled as Dagster asset checks were not properly being deduplicated.
[dagster-dbt] Fixed an issue where dbt model, seed, or snapshot names with periods were not supported.
@observable_source_asset-decorated functions can now return an ObserveResult. This allows including metadata on the observation, in addition to a data version. This is currently only supported for non-partitioned assets.
[auto-materialize] A new AutoMaterializeRule.skip_on_not_all_parents_updated_since_cron class allows you to construct AutoMaterializePolicys which wait for all parents to be updated after the latest tick of a given cron schedule.
[Global op/asset concurrency] Ops and assets now take run priority into account when claiming global op/asset concurrency slots.
RepositoryDefinition now takes schedule_defs and partition_set_defs directly. The loading
scheme for these definitions via repository.yaml under the scheduler: and partitions: keys
is deprecated and expected to be removed in 0.8.0.
Mark published modules as python 3.8 compatible.
The dagster-airflow package supports loading all Airflow DAGs within a directory path, file path,
or Airflow DagBag.
The dagster-airflow package supports loading all 23 DAGs in Airflow example_dags folder and
execution of 17 of them (see: make_dagster_repo_from_airflow_example_dags).
The dagster-celery CLI tools now allow you to pass additional arguments through to the underlying
celery CLI, e.g., running dagster-celery worker start -n my-worker -- --uid=42 will pass the
--uid flag to celery.
It is now possible to create a PresetDefinition that has no environment defined.
Added dagster schedule debug command to help debug scheduler state.
The SystemCronScheduler now verifies that a cron job has been successfully been added to the
crontab when turning a schedule on, and shows an error message if unsuccessful.
Breaking Changes
A dagster instance migrate is required for this release to support the new experimental assets
view.
Runs created prior to 0.7.8 will no longer render their execution plans as DAGs. We are only
rendering execution plans that have been persisted. Logs are still available.
Path is no longer valid in config schemas. Use str or dagster.String instead.
Removed the @pyspark_solid decorator - its functionality, which was experimental, is subsumed by
requiring a StepLauncher resource (e.g. emr_pyspark_step_launcher) on the solid.
Dagit
Merged "re-execute", "single-step re-execute", "resume/retry" buttons into one "re-execute" button
with three dropdown selections on the Run page.
Experimental
Added new asset_key string parameter to Materializations and created a new “Assets” tab in Dagit
to view pipelines and runs associated with these keys. The API and UI of these asset-based are
likely to change, but feedback is welcome and will be used to inform these changes.
Added an emr_pyspark_step_launcher that enables launching PySpark solids in EMR. The
"simple_pyspark" example demonstrates how it’s used.
Bugfix
Fixed an issue when running Jupyter notebooks in a Python 2 kernel through dagstermill with
Dagster running in Python 3.
Improved error messages produced when dagstermill spins up an in-notebook context.
Fixed an issue with retrieving step events from CompositeSolidResult objects.
If you are launching runs using DagsterInstance.launch_run, this method now takes a run id
instead of an instance of PipelineRun. Additionally, DagsterInstance.create_run and
DagsterInstance.create_empty_run have been replaced by DagsterInstance.get_or_create_run and
DagsterInstance.create_run_for_pipeline.
If you have implemented your own RunLauncher, there are two required changes:
RunLauncher.launch_run takes a pipeline run that has already been created. You should remove
any calls to instance.create_run in this method.
Instead of calling startPipelineExecution (defined in the
dagster_graphql.client.query.START_PIPELINE_EXECUTION_MUTATION) in the run launcher, you
should call startPipelineExecutionForCreatedRun (defined in
dagster_graphql.client.query.START_PIPELINE_EXECUTION_FOR_CREATED_RUN_MUTATION).
Refer to the RemoteDagitRunLauncher for an example implementation.
New
Improvements to preset and solid subselection in the playground. An inline preview of the pipeline
instead of a modal when doing subselection, and the correct subselection is chosen when selecting
a preset.
Improvements to the log searching. Tokenization and autocompletion for searching messages types
and for specific steps.
You can now view the structure of pipelines from historical runs, even if that pipeline no longer
exists in the loaded repository or has changed structure.
Historical execution plans are now viewable, even if the pipeline has changed structure.
Added metadata link to raw compute logs for all StepStart events in PipelineRun view and Step
view.
Improved error handling for the scheduler. If a scheduled run has config errors, the errors are
persisted to the event log for the run and can be viewed in Dagit.
Bugfix
No longer manually dispose sqlalchemy engine in dagster-postgres
Made boto3 dependency in dagster-aws more flexible (#2418)
Fixed tooltip UI cleanup in partitioned schedule view
The execute_pipeline_with_mode and execute_pipeline_with_preset APIs have been dropped in
favor of new top level arguments to execute_pipeline, mode and preset.
The use of RunConfig to pass options to execute_pipeline has been deprecated, and RunConfig
will be removed in 0.8.0.
The execute_solid_within_pipeline and execute_solids_within_pipeline APIs, intended to support
tests, now take new top level arguments mode and preset.
New
The dagster-aws Redshift resource now supports providing an error callback to debug failed
queries.
We now persist serialized execution plans for historical runs. They will render correctly even if
the pipeline structure has changed or if it does not exist in the current loaded repository.
Clicking on a pipeline tag in the Runs view will apply that tag as a filter.
Bugfix
Fixed a bug where telemetry logger would create a log file (but not write any logs) even when
telemetry was disabled.
Experimental
The dagster-airflow package supports ingesting Airflow dags and running them as dagster pipelines
(see: make_dagster_pipeline_from_airflow_dag). This is in the early experimentation phase.
Improved the layout of the experimental partition runs table on the Schedules detailed view.
The default sqlite and dagster-postgres implementations have been altered to extract the
event step_key field as a column, to enable faster per-step queries. You will need to run
dagster instance migrate to update the schema. You may optionally migrate your historical event
log data to extract the step_key using the migrate_event_log_data function. This will ensure
that your historical event log data will be captured in future step-key based views. This
event_log data migration can be invoked as follows:
from dagster.core.storage.event_log.migration import migrate_event_log_data
from dagster import DagsterInstance
migrate_event_log_data(instance=DagsterInstance.get())
We have made pipeline metadata serializable and persist that along with run information.
While there are no user-facing features to leverage this yet, it does require an instance
migration. Run dagster instance migrate. If you have already run the migration for the
event_log changes above, you do not need to run it again. Any unforeseen errors related to the
new snapshot_id in the runs table or the new snapshots table are related to this migration.
dagster-pandas ColumnTypeConstraint has been removed in favor of ColumnDTypeFnConstraint and
ColumnDTypeInSetConstraint.
New
You can now specify that dagstermill output notebooks be yielded as an output from dagstermill
solids, in addition to being materialized.
You may now set the extension on files created using the FileManager machinery.
dagster-pandas typed PandasColumn constructors now support pandas 1.0 dtypes.
The Dagit Playground has been restructured to make the relationship between Preset, Partition
Sets, Modes, and subsets more clear. All of these buttons have be reconciled and moved to the
left side of the Playground.
Config sections that are required but not filled out in the Dagit playground are now detected
and labeled in orange.
dagster-celery config now support using env: to load from environment variables.
Bugfix
Fixed a bug where selecting a preset in dagit would not populate tags specified on the pipeline
definition.
Fixed a bug where metadata attached to a raised Failure was not displayed in the error modal in
dagit.
Fixed an issue where reimporting dagstermill and calling dagstermill.get_context() outside of
the parameters cell of a dagstermill notebook could lead to unexpected behavior.
Fixed an issue with connection pooling in dagster-postgres, improving responsiveness when using
the Postgres-backed storages.
Experimental
Added a longitudinal view of runs for on the Schedule tab for scheduled, partitioned pipelines.
Includes views of run status, execution time, and materializations across partitions. The UI is
in flux and is currently optimized for daily schedules, but feedback is welcome.
default_value in Field no longer accepts native instances of python enums. Instead
the underlying string representation in the config system must be used.
default_value in Field no longer accepts callables.
The dagster_aws imports have been reorganized; you should now import resources from
dagster_aws.<AWS service name>. dagster_aws provides s3, emr, redshift, and cloudwatch
modules.
The dagster_aws S3 resource no longer attempts to model the underlying boto3 API, and you can
now just use any boto3 S3 API directly on a S3 resource, e.g.
context.resources.s3.list_objects_v2. (#2292)
New
New Playground view in dagit showing an interactive config map
Improved storage and UI for showing schedule attempts
Added the ability to set default values in InputDefinition
Added CLI command dagster pipeline launch to launch runs using a configured RunLauncher
Added ability to specify pipeline run tags using the CLI
Added a pdb utility to SolidExecutionContext to help with debugging, available within a solid
as context.pdb
Added PresetDefinition.with_additional_config to allow for config overrides
Added resource name to log messages generated during resource initialization
Added grouping tags for runs that have been retried / reexecuted.
Bugfix
Fixed a bug where date range partitions with a specified end date was clipping the last day
Fixed an issue where some schedule attempts that failed to start would be marked running forever.
Fixed the @weekly partitioned schedule decorator
Fixed timezone inconsistencies between the runs view and the schedules view
Integers are now accepted as valid values for Float config fields
Fixed an issue when executing dagstermill solids with config that contained quote characters.
dagstermill
The Jupyter kernel to use may now be specified when creating dagster notebooks with the --kernel
flag.
dagster-dbt
dbt_solid now has a Nothing input to allow for sequencing
dagster-k8s
Added get_celery_engine_config to select celery engine, leveraging Celery infrastructure
Documentation
Improvements to the airline and bay bikes demos
Improvements to our dask deployment docs (Thanks jswaney!!)