You are viewing an unreleased or outdated version of the documentation

Changelog#

1.6.9 (core) / 0.22.8 (libraries)#

New#

  • [ui] When viewing logs for a run, the date for a single log row is now shown in the tooltip on the timestamp. This helps when viewing a run that takes place over more than one date.
  • Added suggestions to the error message when selecting asset keys that do not exist as an upstream asset or in an AssetSelection.
  • Improved error messages when trying to materialize a subset of a multi-asset which cannot be subset.
  • [dagster-snowflake] dagster-snowflake now requires snowflake-connector-python>=3.4.0
  • [embedded-elt] @sling_assets accepts an optional name parameter for the underlying op
  • [dagster-openai] dagster-openai library is now available.
  • [dagster-dbt] Added a new setting on DagsterDbtTranslatorSettings called enable_duplicate_source_asset_keys that allows users to set duplicate asset keys for their dbt sources. Thanks @hello-world-bfree!
  • Log messages in the Dagster daemon for unloadable sensors and schedules have been removed.
  • [ui] Search now uses a cache that persists across pageloads which should greatly improve search performance for very large orgs.
  • [ui] groups/code locations in the asset graph’s sidebar are now sorted alphabetically.

Bugfixes#

  • Fixed issue where the input/output schemas of configurable IOManagers could be ignored when providing explicit input / output run config.
  • Fixed an issue where enum values could not properly have a default value set in a ConfigurableResource.
  • Fixed an issue where graph-backed assets would sometimes lose user-provided descriptions due to a bug in internal copying.
  • [auto-materialize] Fixed an issue introduced in 1.6.7 where updates to ExternalAssets would be ignored when using AutoMaterializePolicies which depended on parent updates.
  • [asset checks] Fixed a bug with asset checks in step launchers.
  • [embedded-elt] Fix a bug when creating a SlingConnectionResource where a blank keyword argument would be emitted as an environment variable
  • [dagster-dbt] Fixed a bug where emitting events from dbt source freshness would cause an error.
  • [ui] Fixed a bug where using the “Terminate all runs” button with filters selected would not apply the filters to the action.
  • [ui] Fixed an issue where typing a search query into the search box before the search data was fetched would yield “No results” even after the data was fetched.

Community Contributions#

  • [docs] fixed typo in embedded-elt.mdx (thanks @cameronmartin)!
  • [dagster-databricks] log the url for the run of a databricks job (thanks @smats0n)!
  • Fix missing partition property (thanks christeefy)!
  • Add op_tags to @observable_source_asset decorator (thanks @maxfirman)!
  • [docs] typo in MultiPartitionMapping docs (thanks @dschafer)
  • Allow github actions to checkout branch from forked repo for docs changes (ci fix) (thanks hainenber)!

Experimental#

  • [asset checks] UI performance of asset checks related pages has been improved.
  • [dagster-dbt] The class DbtArtifacts has been added for managing the behavior of rebuilding the manifest during development but expecting a pre-built one in production.

Documentation#

  • Added example of writing compute logs to AWS S3 when customizing agent configuration.
  • "Hello, Dagster" is now "Dagster Quickstart" with the option to use a Github Codespace to explore Dagster.
  • Improved guides and reference to better running multiple isolated agents with separate queues on ECS.

Dagster Cloud#

  • Microsoft Teams is now supported for alerts. Documentation
  • A send sample alert button now exists on both the alert policies page and in the alert policies editor to make it easier to debug and configure alerts without having to wait for an event to kick them off.

1.6.8 (core) / 0.22.8 (libraries)#

Bugfixes#

  • [dagster-embedded-elt] Fixed a bug in the SlingConnectionResource that raised an error when connecting to a database.

Experimental#

  • [asset checks] graph_multi_assets with check_specs now support subsetting.

1.6.7 (core) / 0.22.7 (libraries)#

New#

  • Added a new run_retries.retry_on_op_or_asset_failures setting that can be set to false to make run retries only occur when there is an unexpected failure that crashes the run, allowing run-level retries to co-exist more naturally with op or asset retries. See the docs for more information.
  • dagster dev now sets the environment variable DAGSTER_IS_DEV_CLI allowing subprocesses to know that they were launched in a development context.
  • [ui] The Asset Checks page has been updated to show more information on the page itself rather than in a dialog.

Bugfixes#

  • [ui] Fixed an issue where the UI disallowed creating a dynamic partition if its name contained the “|” pipe character.
  • AssetSpec previously dropped the metadata and code_version fields, resulting in them not being attached to the corresponding asset. This has been fixed.

Experimental#

  • The new @multi_observable_source_asset decorator enables defining a set of assets that can be observed together with the same function.
  • [dagster-embedded-elt] New Asset Decorator @sling_assets and Resource SlingConnectionResource have been added for the [dagster-embedded-elt.sling](http://dagster-embedded-elt.sling) package. Deprecated build_sling_asset, SlingSourceConnection and SlingTargetConnection.
  • Added support for op-concurrency aware run dequeuing for the QueuedRunCoordinator.

Documentation#

  • Fixed reference documentation for isolated agents in ECS.
  • Corrected an example in the Airbyte Cloud documentation.
  • Added API links to OSS Helm deployment guide.
  • Fixed in-line pragmas showing up in the documentation.

Dagster Cloud#

  • Alerts now support Microsoft Teams.
  • [ECS] Fixed an issue where code locations could be left undeleted.
  • [ECS] ECS agents now support setting multiple replicas per code server.
  • [Insights] You can now toggle the visibility of a row in the chart by clicking on the dot for the row in the table.
  • [Users] Added a new column “Licensed role” that shows the user's most permissive role.

1.6.6 (core) / 0.22.6 (libraries)#

New#

  • Dagster officially supports Python 3.12.
  • dagster-polars has been added as an integration. Thanks @danielgafni!
  • [dagster-dbt] @dbt_assets now supports loading projects with semantic models.
  • [dagster-dbt] @dbt_assets now supports loading projects with model versions.
  • [dagster-dbt] get_asset_key_for_model now supports retrieving asset keys for seeds and snapshots. Thanks @aksestok!
  • [dagster-duckdb] The Dagster DuckDB integration supports DuckDB version 0.10.0.
  • [UPath I/O manager] If a non-partitioned asset is updated to have partitions, the file containing the non-partitioned asset data will be deleted when the partitioned asset is materialized, rather than raising an error.

Bugfixes#

  • Fixed an issue where creating a backfill of assets with dynamic partitions and a backfill policy would sometimes fail with an exception.
  • Fixed an issue with the type annotations on the @asset decorator causing a false positive in Pyright strict mode. Thanks @tylershunt!
  • [ui] On the asset graph, nodes are slightly wider allowing more text to be displayed, and group names are no longer truncated.
  • [ui] Fixed an issue where the groups in the asset graph would not update after an asset was switched between groups.
  • [dagster-k8s] Fixed an issue where setting the security_context field on the k8s_job_executor didn't correctly set the security context on the launched step pods. Thanks @krgn!

Experimental#

  • Observable source assets can now yield ObserveResults with no data_version.
  • You can now include FreshnessPolicys on observable source assets. These assets will be considered “Overdue” when the latest value for the “dagster/data_time” metadata value is older than what’s allowed by the freshness policy.
  • [ui] In Dagster Cloud, a new feature flag allows you to enable an overhauled asset overview page with a high-level stakeholder view of the asset’s health, properties, and column schema.

Documentation#

  • Updated docs to reflect newly-added support for Python 3.12.

Dagster Cloud#

  • [kubernetes] Fixed an issue where the Kubernetes agent would sometimes leave dangling kubernetes services if the agent was interrupted during the middle of being terminated.

1.6.5 (core) / 0.22.5 (libraries)#

New#

  • Within a backfill or within auto-materialize, when submitting runs for partitions of the same assets, runs are now submitted in lexicographical order of partition key, instead of in an unpredictable order.
  • [dagster-k8s] Include k8s pod debug info in run worker failure messages.
  • [dagster-dbt] Events emitted by DbtCliResource now include metadata from the dbt adapter response. This includes fields like rows_affected, query_id from the Snowflake adapter, or bytes_processed from the BigQuery adapter.

Bugfixes#

  • A previous change prevented asset backfills from grouping multiple assets into the same run when using BackfillPolicies under certain conditions. While the backfills would still execute in the proper order, this could lead to more individual runs than necessary. This has been fixed.
  • [dagster-k8s] Fixed an issue introduced in the 1.6.4 release where upgrading the Helm chart without upgrading the Dagster version used by user code caused failures in jobs using the k8s_job_executor.
  • [instigator-tick-logs] Fixed an issue where invoking context.log.exception in a sensor or schedule did not properly capture exception information.
  • [asset-checks] Fixed an issue where additional dependencies for dbt tests modeled as Dagster asset checks were not properly being deduplicated.
  • [dagster-dbt] Fixed an issue where dbt model, seed, or snapshot names with periods were not supported.

Experimental#

  • @observable_source_asset-decorated functions can now return an ObserveResult. This allows including metadata on the observation, in addition to a data version. This is currently only supported for non-partitioned assets.
  • [auto-materialize] A new AutoMaterializeRule.skip_on_not_all_parents_updated_since_cron class allows you to construct AutoMaterializePolicys which wait for all parents to be updated after the latest tick of a given cron schedule.
  • [Global op/asset concurrency] Ops and assets now take run priority into account when claiming global op/asset concurrency slots.

Documentation#

  • Fixed an error in our asset checks docs. Thanks @vaharoni!
  • Fixed an error in our Dagster Pipes Kubernetes docs. Thanks @cameronmartin!
  • Fixed an issue on the Hello Dagster! guide that prevented it from loading.
  • Add specific capabilities of the Airflow integration to the Airflow integration page.
  • Re-arranged sections in the I/O manager concept page to make info about using I/O versus resources more prominent.

0.12.11#

Community Contributions#

  • [helm] The ingress now supports TLS (thanks @cpmoser!)
  • [helm] Fixed an issue where dagit could not be configured with an empty workspace (thanks @yamrzou!)

New#

  • [dagstermill] You can now have more precise IO control over the output notebooks by specifying output_notebook_name in define_dagstermill_solid and providing your own IO manager via "output_notebook_io_manager" resource key.

  • We've deprecated output_notebook argument in define_dagstermill_solid in favor of output_notebook_name.

  • Previously, the output notebook functionality requires “file_manager“ resource and result in a FileHandle output. Now, when specifying output_notebook_name, it requires "output_notebook_io_manager" resource and results in a bytes output.

  • You can now customize your own "output_notebook_io_manager" by extending OutputNotebookIOManager. A built-in local_output_notebook_io_manager is provided for handling local output notebook materialization.

  • See detailed migration guide in https://github.com/dagster-io/dagster/pull/4490.

  • Dagit fonts have been updated.

Bugfixes#

  • Fixed a bug where log messages of the form context.log.info("foo %s", "bar") would not get formatted as expected.
  • Fixed a bug that caused the QueuedRunCoordinator’s tag_concurrency_limits to not be respected in some cases
  • When loading a Run with a large volume of logs in Dagit, a loading state is shown while logs are retrieved, clarifying the loading experience and improving render performance of the Gantt chart.
  • Using solid selection with pipelines containing dynamic outputs no longer causes unexpected errors.

Experimental#

  • You can now set tags on a graph by passing in a dictionary to the tags argument of the @graph decorator or GraphDefinition constructor. These tags will be set on any runs of jobs are built from invoking to_job on the graph.
  • You can now set separate images per solid when using the k8s_job_executor or celery_k8s_job_executor. Use the key image inside the container_config block of the k8s solid tag.
  • You can now target multiple jobs with a single sensor, by using the jobs argument. Each RunRequest emitted from a multi-job sensor’s evaluation function must specify a job_name.

0.12.10#

Community Contributions#

  • [helm] The KubernetesRunLauncher image pull policy is now configurable in a separate field (thanks @yamrzou!).
  • The dagster-github package is now usable for GitHub Enterprise users (thanks @metinsenturk!) A hostname can now be provided via config to the dagster-github resource with the key github_hostname:
execute_pipeline(
      github_pipeline, {'resources': {'github': {'config': {
           "github_app_id": os.getenv('GITHUB_APP_ID'),
           "github_app_private_rsa_key": os.getenv('GITHUB_PRIVATE_KEY'),
           "github_installation_id": os.getenv('GITHUB_INSTALLATION_ID'),
           "github_hostname": os.getenv('GITHUB_HOSTNAME'),
      }}}}
)

New#

  • Added a database index over the event log to improve the performance of pipeline_failure_sensor and run_status_sensor queries. To take advantage of these performance gains, run a schema migration with the CLI command: dagster instance migrate.

Bugfixes#

  • Performance improvements have been made to allow dagit to more gracefully load a run that has a large number of events.
  • Fixed an issue where DockerRunLauncher would raise an exception when no networks were specified in its configuration.

Breaking Changes#

  • dagster-slack has migrated off of deprecated slackclient (deprecated) and now uses [slack_sdk](https://slack.dev/python-slack-sdk/v3-migration/).

Experimental#

  • OpDefinition, the replacement for SolidDefinition which is the type produced by the @op decorator, is now part of the public API.
  • The daily_partitioned_config, hourly_partitioned_config, weekly_partitioned_config, and monthly_partitioned_config now accept an end_offset parameter, which allows extending the set of partitions so that the last partition ends after the current time.

0.12.9#

Community Contributions#

  • A service account can now be specified via Kubernetes tag configuration (thanks @skirino) !

New#

  • Previously in Dagit, when a repository location had an error when reloaded, the user could end up on an empty page with no context about the error. Now, we immediately show a dialog with the error and stack trace, with a button to try reloading the location again when the error is fixed.

  • Dagster is now compatible with Python’s logging module. In your config YAML file, you can configure log handlers and formatters that apply to the entire Dagster instance. Configuration instructions and examples detailed in the docs: https://docs.dagster.io/concepts/logging/python-logging

  • [helm] The timeout of database statements sent to the Dagster instance can now be configured using .dagit.dbStatementTimeout.

  • The QueuedRunCoordinator now supports setting separate limits for each unique value with a certain key. In the below example, 5 runs with the tag (backfill: first) could run concurrently with 5 other runs with the tag (backfill: second).

run_coordinator:
  module: dagster.core.run_coordinator
  class: QueuedRunCoordinator
  config:
    tag_concurrency_limits:
      - key: backfill
        value:
          applyLimitPerUniqueValue: True
        limit: 5

Bugfixes#

  • Previously, when specifying hooks on a pipeline, resource-to-resource dependencies on those hooks would not be resolved. This is now fixed, so resources with dependencies on other resources can be used with hooks.
  • When viewing a run in Dagit, the run status panel to the right of the Gantt chart did not always allow scrolling behavior. The entire panel is now scrollable, and sections of the panel are collapsible.
  • Previously, attempting to directly invoke a solid with Nothing inputs would fail. Now, the defined behavior is that Nothing inputs should not be provided to an invocation, and the invocation will not error.
  • Skip and fan-in behavior during execution now works correctly when solids with dynamic outputs are skipped. Previously solids downstream of a dynamic output would never execute.
  • [helm] Fixed an issue where the image tag wasn’t set when running an instance migration job via .migrate.enabled=True.

0.12.8#

New#

  • Added instance on RunStatusSensorContext for accessing the Dagster Instance from within the run status sensors.

  • The inputs of a Dagstermill solid now are loaded the same way all other inputs are loaded in the framework. This allows rerunning output notebooks with properly loaded inputs outside Dagster context. Previously, the IO handling depended on temporary marshal directory.

  • Previously, the Dagit CLI could not target a bare graph in a file, like so:

    from dagster import op, graph
    
    @op
    def my_op():
        pass
    
    @graph
    def my_graph():
        my_op()
    

    This has been remedied. Now, a file foo.py containing just a graph can be targeted by the dagit CLI: dagit -f foo.py.

  • When a solid, pipeline, schedule, etc. description or event metadata entry contains a markdown-formatted table, that table is now rendered in Dagit with better spacing between elements.

  • The hacker-news example now includes instructions on how to deploy the repository in a Kubernetes cluster using the Dagster Helm chart.

  • [dagster-dbt] The dbt_cli_resource now supports the dbt source snapshot-freshness command (thanks @emilyhawkins-drizly!)

  • [helm] Labels are now configurable on user code deployments.

Bugfixes

  • Dagit’s dependency on graphql-ws is now pinned to < 0.4.0 to avoid a breaking change in its latest release. We expect to remove this dependency entirely in a future Dagster release.
  • Execution steps downstream of a solid that emits multiple dynamic outputs now correctly resolve without error.
  • In Dagit, when repositories are loaded asynchronously, pipelines/jobs now appear immediately in the left navigation.
  • Pipeline/job descriptions with markdown are now rendered correctly in Dagit, and styling is improved for markdown-based tables.
  • The Dagit favicon now updates correctly during navigation to and from Run pages.
  • In Dagit, navigating to assets with keys that contain slashes would sometimes fail due to a lack of URL encoding. This has been fixed.
  • When viewing the Runs list on a smaller viewport, tooltips on run tags no longer flash.
  • Dragging the split panel view in the Solid/Op explorer in Dagit would sometimes leave a broken rendered state. This has been fixed.
  • Dagstermill notebook previews now works with remote user code deployment.
  • [dagster-shell] When a pipeline run fails, subprocesses spawned from dagster-shell utilities will now be properly terminated.
  • Fixed an issue associated with using EventMetadata.asset and EventMetadata.pipeline_run in AssetMaterialization metadata. (Thanks @ymrzkrrs and @drewsonne!)

Breaking Changes

  • Dagstermill solids now require a shared-memory io manager, e.g. fs_io_manager, which allows data to be passed out of the Jupyter process boundary.

Community Contributions

  • [helm] Added missing documentation to fields in the Dagster User Deployments subchart (thanks @jrouly!)

Documentation

0.12.7#

New#

  • In Dagit, the repository locations list has been moved from the Instance Status page to the Workspace page. When repository location errors are present, a warning icon will appear next to “Workspace” in the left navigation.
  • Calls to context.log.info() and other similar functions now fully respect the python logging API. Concretely, log statements of the form context.log.error(“something %s happened!”, “bad”) will now work as expected, and you are allowed to add things to the “extra” field to be consumed by downstream loggers: context.log.info("foo", extra={"some":"metadata"}).
  • Utility functions config_from_files, config_from_pkg_resources, and config_from_yaml_strings have been added for constructing run config from yaml files and strings.
  • DockerRunLauncher can now be configured to launch runs that are connected to more than one network, by configuring the networks key.

Bugfixes#

  • Fixed an issue with the pipeline and solid Kubernetes configuration tags. env_from and volume_mounts are now properly applied to the corresponding Kubernetes run worker and job pods.
  • Fixed an issue where Dagit sometimes couldn’t start up when using MySQL storage.
  • [dagster-mlflow] The end_mlflow_run_on_pipeline_finished hook now no longer errors whenever invoked.

Breaking Changes#

  • Non-standard keyword arguments to context.log calls are now not allowed. context.log.info("msg", foo="hi") should be rewritten as context.log.info("msg", extra={"foo":"hi"}).
  • [dagstermill] When writing output notebook fails, e.g. no file manager provided, it won't yield AssetMaterialization. Previously, it would still yield an AssetMaterialization where the path is a temp file path that won't exist after the notebook execution.

Experimental#

  • Previously, in order to use memoization, it was necessary to provide a resource version for every resource used in a pipeline. Now, resource versions are optional, and memoization can be used without providing them.
  • InputContext and OutputContext now each has an asset_key that returns the asset key that was provided to the corresponding InputDefinition or OutputDefinition.

Documentation#

  • The Spark documentation now discusses all the ways of using Dagster with Spark, not just using PySpark