You are viewing an unreleased or outdated version of the documentation

Changelog#

1.6.9 (core) / 0.22.8 (libraries)#

New#

  • [ui] When viewing logs for a run, the date for a single log row is now shown in the tooltip on the timestamp. This helps when viewing a run that takes place over more than one date.
  • Added suggestions to the error message when selecting asset keys that do not exist as an upstream asset or in an AssetSelection.
  • Improved error messages when trying to materialize a subset of a multi-asset which cannot be subset.
  • [dagster-snowflake] dagster-snowflake now requires snowflake-connector-python>=3.4.0
  • [embedded-elt] @sling_assets accepts an optional name parameter for the underlying op
  • [dagster-openai] dagster-openai library is now available.
  • [dagster-dbt] Added a new setting on DagsterDbtTranslatorSettings called enable_duplicate_source_asset_keys that allows users to set duplicate asset keys for their dbt sources. Thanks @hello-world-bfree!
  • Log messages in the Dagster daemon for unloadable sensors and schedules have been removed.
  • [ui] Search now uses a cache that persists across pageloads which should greatly improve search performance for very large orgs.
  • [ui] groups/code locations in the asset graph’s sidebar are now sorted alphabetically.

Bugfixes#

  • Fixed issue where the input/output schemas of configurable IOManagers could be ignored when providing explicit input / output run config.
  • Fixed an issue where enum values could not properly have a default value set in a ConfigurableResource.
  • Fixed an issue where graph-backed assets would sometimes lose user-provided descriptions due to a bug in internal copying.
  • [auto-materialize] Fixed an issue introduced in 1.6.7 where updates to ExternalAssets would be ignored when using AutoMaterializePolicies which depended on parent updates.
  • [asset checks] Fixed a bug with asset checks in step launchers.
  • [embedded-elt] Fix a bug when creating a SlingConnectionResource where a blank keyword argument would be emitted as an environment variable
  • [dagster-dbt] Fixed a bug where emitting events from dbt source freshness would cause an error.
  • [ui] Fixed a bug where using the “Terminate all runs” button with filters selected would not apply the filters to the action.
  • [ui] Fixed an issue where typing a search query into the search box before the search data was fetched would yield “No results” even after the data was fetched.

Community Contributions#

  • [docs] fixed typo in embedded-elt.mdx (thanks @cameronmartin)!
  • [dagster-databricks] log the url for the run of a databricks job (thanks @smats0n)!
  • Fix missing partition property (thanks christeefy)!
  • Add op_tags to @observable_source_asset decorator (thanks @maxfirman)!
  • [docs] typo in MultiPartitionMapping docs (thanks @dschafer)
  • Allow github actions to checkout branch from forked repo for docs changes (ci fix) (thanks hainenber)!

Experimental#

  • [asset checks] UI performance of asset checks related pages has been improved.
  • [dagster-dbt] The class DbtArtifacts has been added for managing the behavior of rebuilding the manifest during development but expecting a pre-built one in production.

Documentation#

  • Added example of writing compute logs to AWS S3 when customizing agent configuration.
  • "Hello, Dagster" is now "Dagster Quickstart" with the option to use a Github Codespace to explore Dagster.
  • Improved guides and reference to better running multiple isolated agents with separate queues on ECS.

Dagster Cloud#

  • Microsoft Teams is now supported for alerts. Documentation
  • A send sample alert button now exists on both the alert policies page and in the alert policies editor to make it easier to debug and configure alerts without having to wait for an event to kick them off.

1.6.8 (core) / 0.22.8 (libraries)#

Bugfixes#

  • [dagster-embedded-elt] Fixed a bug in the SlingConnectionResource that raised an error when connecting to a database.

Experimental#

  • [asset checks] graph_multi_assets with check_specs now support subsetting.

1.6.7 (core) / 0.22.7 (libraries)#

New#

  • Added a new run_retries.retry_on_op_or_asset_failures setting that can be set to false to make run retries only occur when there is an unexpected failure that crashes the run, allowing run-level retries to co-exist more naturally with op or asset retries. See the docs for more information.
  • dagster dev now sets the environment variable DAGSTER_IS_DEV_CLI allowing subprocesses to know that they were launched in a development context.
  • [ui] The Asset Checks page has been updated to show more information on the page itself rather than in a dialog.

Bugfixes#

  • [ui] Fixed an issue where the UI disallowed creating a dynamic partition if its name contained the “|” pipe character.
  • AssetSpec previously dropped the metadata and code_version fields, resulting in them not being attached to the corresponding asset. This has been fixed.

Experimental#

  • The new @multi_observable_source_asset decorator enables defining a set of assets that can be observed together with the same function.
  • [dagster-embedded-elt] New Asset Decorator @sling_assets and Resource SlingConnectionResource have been added for the [dagster-embedded-elt.sling](http://dagster-embedded-elt.sling) package. Deprecated build_sling_asset, SlingSourceConnection and SlingTargetConnection.
  • Added support for op-concurrency aware run dequeuing for the QueuedRunCoordinator.

Documentation#

  • Fixed reference documentation for isolated agents in ECS.
  • Corrected an example in the Airbyte Cloud documentation.
  • Added API links to OSS Helm deployment guide.
  • Fixed in-line pragmas showing up in the documentation.

Dagster Cloud#

  • Alerts now support Microsoft Teams.
  • [ECS] Fixed an issue where code locations could be left undeleted.
  • [ECS] ECS agents now support setting multiple replicas per code server.
  • [Insights] You can now toggle the visibility of a row in the chart by clicking on the dot for the row in the table.
  • [Users] Added a new column “Licensed role” that shows the user's most permissive role.

1.6.6 (core) / 0.22.6 (libraries)#

New#

  • Dagster officially supports Python 3.12.
  • dagster-polars has been added as an integration. Thanks @danielgafni!
  • [dagster-dbt] @dbt_assets now supports loading projects with semantic models.
  • [dagster-dbt] @dbt_assets now supports loading projects with model versions.
  • [dagster-dbt] get_asset_key_for_model now supports retrieving asset keys for seeds and snapshots. Thanks @aksestok!
  • [dagster-duckdb] The Dagster DuckDB integration supports DuckDB version 0.10.0.
  • [UPath I/O manager] If a non-partitioned asset is updated to have partitions, the file containing the non-partitioned asset data will be deleted when the partitioned asset is materialized, rather than raising an error.

Bugfixes#

  • Fixed an issue where creating a backfill of assets with dynamic partitions and a backfill policy would sometimes fail with an exception.
  • Fixed an issue with the type annotations on the @asset decorator causing a false positive in Pyright strict mode. Thanks @tylershunt!
  • [ui] On the asset graph, nodes are slightly wider allowing more text to be displayed, and group names are no longer truncated.
  • [ui] Fixed an issue where the groups in the asset graph would not update after an asset was switched between groups.
  • [dagster-k8s] Fixed an issue where setting the security_context field on the k8s_job_executor didn't correctly set the security context on the launched step pods. Thanks @krgn!

Experimental#

  • Observable source assets can now yield ObserveResults with no data_version.
  • You can now include FreshnessPolicys on observable source assets. These assets will be considered “Overdue” when the latest value for the “dagster/data_time” metadata value is older than what’s allowed by the freshness policy.
  • [ui] In Dagster Cloud, a new feature flag allows you to enable an overhauled asset overview page with a high-level stakeholder view of the asset’s health, properties, and column schema.

Documentation#

  • Updated docs to reflect newly-added support for Python 3.12.

Dagster Cloud#

  • [kubernetes] Fixed an issue where the Kubernetes agent would sometimes leave dangling kubernetes services if the agent was interrupted during the middle of being terminated.

1.6.5 (core) / 0.22.5 (libraries)#

New#

  • Within a backfill or within auto-materialize, when submitting runs for partitions of the same assets, runs are now submitted in lexicographical order of partition key, instead of in an unpredictable order.
  • [dagster-k8s] Include k8s pod debug info in run worker failure messages.
  • [dagster-dbt] Events emitted by DbtCliResource now include metadata from the dbt adapter response. This includes fields like rows_affected, query_id from the Snowflake adapter, or bytes_processed from the BigQuery adapter.

Bugfixes#

  • A previous change prevented asset backfills from grouping multiple assets into the same run when using BackfillPolicies under certain conditions. While the backfills would still execute in the proper order, this could lead to more individual runs than necessary. This has been fixed.
  • [dagster-k8s] Fixed an issue introduced in the 1.6.4 release where upgrading the Helm chart without upgrading the Dagster version used by user code caused failures in jobs using the k8s_job_executor.
  • [instigator-tick-logs] Fixed an issue where invoking context.log.exception in a sensor or schedule did not properly capture exception information.
  • [asset-checks] Fixed an issue where additional dependencies for dbt tests modeled as Dagster asset checks were not properly being deduplicated.
  • [dagster-dbt] Fixed an issue where dbt model, seed, or snapshot names with periods were not supported.

Experimental#

  • @observable_source_asset-decorated functions can now return an ObserveResult. This allows including metadata on the observation, in addition to a data version. This is currently only supported for non-partitioned assets.
  • [auto-materialize] A new AutoMaterializeRule.skip_on_not_all_parents_updated_since_cron class allows you to construct AutoMaterializePolicys which wait for all parents to be updated after the latest tick of a given cron schedule.
  • [Global op/asset concurrency] Ops and assets now take run priority into account when claiming global op/asset concurrency slots.

Documentation#

  • Fixed an error in our asset checks docs. Thanks @vaharoni!
  • Fixed an error in our Dagster Pipes Kubernetes docs. Thanks @cameronmartin!
  • Fixed an issue on the Hello Dagster! guide that prevented it from loading.
  • Add specific capabilities of the Airflow integration to the Airflow integration page.
  • Re-arranged sections in the I/O manager concept page to make info about using I/O versus resources more prominent.

0.12.1#

Bugfixes#

  • Fixes implementation issues in @pipeline_failure_sensor that prevented them from working.

0.12.0 “Into The Groove”#

Major Changes#

  • With the new first-class Pipeline Failure sensors, you can now write sensors to perform arbitrary actions when pipelines in your repo fail using @pipeline_failure_sensor. Out-of-the-box sensors are provided to send emails using make_email_on_pipeline_failure_sensor and slack messages using make_slack_on_pipeline_failure_sensor.

    See the Pipeline Failure Sensor docs to learn more.

  • New first-class Asset sensors help you define sensors that launch pipeline runs or notify appropriate stakeholders when specific asset keys are materialized. This pattern also enables Dagster to infer cross-pipeline dependency links. Check out the docs here!

  • Solid-level retries: A new retry_policy argument to the @solid decorator allows you to easily and flexibly control how specific solids in your pipelines will be retried if they fail by setting a RetryPolicy.

  • Writing tests in Dagster is now even easier, using the new suite of direct invocation apis. Solids, resources, hooks, loggers, sensors, and schedules can all be invoked directly to test their behavior. For example, if you have some solid my_solid that you'd like to test on an input, you can now write assert my_solid(1, "foo") == "bar" (rather than explicitly calling execute_solid()).

  • [Experimental] A new set of experimental core APIs. Among many benefits, these changes unify concepts such as Presets and Partition sets, make it easier to reuse common resources within an environment, make it possible to construct test-specific resources outside of your pipeline definition, and more. These changes are significant and impactful, so we encourage you to try them out and let us know how they feel! You can learn more about the specifics here

  • [Experimental] There’s a new reference deployment for running Dagster on AWS ECS and a new EcsRunLauncher that launches each pipeline run in its own ECS Task.

  • [Experimental] There’s a new k8s_job_executor (https://docs.dagster.io/_apidocs/libraries/dagster-k8s#dagster_k8s.k8s_job_executor)which executes each solid of your pipeline in a separate Kubernetes job. This addition means that you can now choose at runtime (https://docs.dagster.io/deployment/guides/kubernetes/deploying-with-helm#executor) between single pod and multi-pod isolation for solids in your run. Previously this was only configurable for the entire deployment- you could either use the K8sRunLauncher with the default executors (in process and multiprocess) for low isolation, or you could use the CeleryK8sRunLauncher with the celery_k8s_job_executor for pod-level isolation. Now, your instance can be configured with the K8sRunLauncher and you can choose between the default executors or the k8s_job_executor.

New since 0.11.16#

  • Using the @schedule, @resource, or @sensor decorator no longer requires a context parameter. If you are not using the context parameter in these, you can now do this:

    @schedule(cron_schedule="\* \* \* \* \*", pipeline_name="my_pipeline")
    def my_schedule():
      return {}
    
    @resource
    def my_resource():
      return "foo"
    
    @sensor(pipeline_name="my_pipeline")
    def my_sensor():
      return RunRequest(run_config={})
    
  • Dynamic mapping and collect features are no longer marked “experimental”. DynamicOutputDefinition and DynamicOutput can now be imported directly from dagster.

  • Added repository_name property on SensorEvaluationContext, which is name of the repository that the sensor belongs to.

  • get_mapping_key is now available on SolidExecutionContext , allowing for discerning which downstream branch of a DynamicOutput you are in.

  • When viewing a run in Dagit, you can now download its debug file directly from the run view. This can be loaded into dagit-debug.

  • [dagster-dbt] A new dbt_cli_resource simplifies the process of working with dbt projects in your pipelines, and allows for a wide range of potential uses. Check out the integration guide for examples!

Bugfixes#

  • Fixed a bug when retry from failure with fan-in solids didn’t load the right input source correctly. Now the fan-in solids can load the persistent source from corresponding previous runs if retry from failure.
  • Fixed a bug in the k8s_job_executor that caused solid tag user defined Kubernetes config to not be applied to the Kubernetes jobs.
  • Fixed an issue in dagstermill when concurrently running pipelines that contain multiple dagstermill solids with inputs of the same name.

Breaking Changes#

  • The deprecated SystemCronScheduler and K8sScheduler schedulers have been removed. All schedules are now executed using the dagster-daemon proess. See the deployment docs for more information about how to use the dagster-daemon process to run your schedules.

  • If you have written a custom run launcher, the arguments to the launch_run function have changed in order to enable faster run launches. launch_run now takes in a LaunchRunContext object. Additionally, run launchers should now obtain the PipelinePythonOrigin to pass as an argument to dagster api execute_run. See the implementation of DockerRunLauncher for an example of the new way to write run launchers.

  • [helm] .Values.dagsterDaemon.queuedRunCoordinator has had its schema altered. It is now referenced at .Values.dagsterDaemon.runCoordinator. Previously, if you set up your run coordinator configuration in the following manner:

    dagsterDaemon:
      queuedRunCoordinator:
        enabled: true
        module: dagster.core.run_coordinator
        class: QueuedRunCoordinator
        config:
          max_concurrent_runs: 25
          tag_concurrency_limits: []
          dequeue_interval_seconds: 30
    

    It is now configured like:

    dagsterDaemon:
      runCoordinator:
        enabled: true
        type: QueuedRunCoordinator
        config:
          queuedRunCoordinator:
          maxConcurrentRuns: 25
          tagConcurrencyLimits: []
          dequeueIntervalSeconds: 30
    
  • The method events_for_asset_key on DagsterInstance has been deprecated and will now issue a warning. This method was previously used in our asset sensor example code. This can be replaced by calls using the new DagsterInstance API get_event_records. The example code in our sensor documentation has been updated to use our new APIs as well.

Community Contributions#

Experimental#

  • You can now configure the EcsRunLauncher to use an existing Task Definition of your choosing. By default, it continues to register its own Task Definition for each run.

0.11.16#

New#

  • In Dagit, a new page has been added for user settings, including feature flags and timezone preferences. It can be accessed via the gear icon in the top right corner of the page.
  • SensorExecutionContext and ScheduleExecutionContext have been renamed to SensorEvaluationContext and ScheduleEvaluationContext, respectively. The old names will be supported until 0.12.0.

Bugfixes#

  • When turning on a schedule in Dagit, if the schedule had an identical name and identical pipeline name to a schedule in another repository in the workspace, both schedules would incorrectly appear to be turned on, due to a client-side rendering bug. The same bug occurred for sensors. This has now been fixed.
  • The “Copy URL” button on a Run view in Dagit was inoperative for users not using Dagit in localhost or https. This has been fixed.
  • Fixed a bug in Dagit where Dagit would leak memory for each websocket connection.
  • When executing pipeline that contains composite solids, the composite solids mistakenly ignored the upstream outputs. This bug was introduced in 0.11.15, and is now fixed.

Community Contributions#

  • Fixed a link to the Kubernetes deployment documentation. Thanks to @jrouly!

Documentation#

0.11.15#

New#

  • The Python GraphQL client now includes a shutdown_repository_location API call that shuts down a gRPC server. This is useful in situations where you want Kubernetes to restart your server and re-create your repository definitions, even though the underlying Python code hasn’t changed (for example, if your pipelines are loaded programatically from a database)

  • io_manager_key and root_manager_key is disallowed on composite solids’ InputDefinitions and OutputDefinitions. Instead, custom IO managers on the solids inside composite solids will be respected:

    @solid(input_defs=[InputDefinition("data", dagster_type=str, root_manager_key="my_root")])
    def inner_solid(_, data):
      return data
    
    @composite_solid
    def my_composite():
      return inner_solid()
    
  • Schedules can now be directly invoked. This is intended to be used for testing. To learn more, see https://docs.dagster.io/master/concepts/partitions-schedules-sensors/schedules#testing-schedules

Bugfixes#

  • Dagster libraries (for example, dagster-postgres or dagster-graphql) are now pinned to the same version as the core dagster package. This should reduce instances of issues due to backwards compatibility problems between Dagster packages.
  • Due to a recent regression, when viewing a launched run in Dagit, the Gantt chart would inaccurately show the run as queued well after it had already started running. This has been fixed, and the Gantt chart will now accurately reflect incoming logs.
  • In some cases, navigation in Dagit led to overfetching a workspace-level GraphQL query that would unexpectedly reload the entire app. The excess fetches are now limited more aggressively, and the loading state will no longer reload the app when workspace data is already available.
  • Previously, execution would fail silently when trying to use memoization with a root input manager. The error message now more clearly states that this is not supported.

Breaking Changes#

  • Invoking a generator solid now yields a generator, and output objects are not unpacked.

    @solid
    def my_solid():
      yield Output("hello")
    
    assert isinstance(list(my_solid())[0], Output)
    

Experimental#

  • Added an experimental EcsRunLauncher. This creates a new ECS Task Definition and launches a new ECS Task for each run. You can use the new ECS Reference Deployment to experiment with the EcsRunLauncher. We’d love your feedback in our #dagster-ecs Slack channel!

Documentation#

0.11.14#

New#

  • Supplying the "metadata" argument to InputDefinitions and OutputDefinitions is no longer considered experimental.
  • The "context" argument can now be omitted for solids that have required resource keys.
  • The S3ComputeLogManager now takes a boolean config argument skip_empty_files, which skips uploading empty log files to S3. This should enable a work around of timeout errors when using the S3ComputeLogManager to persist logs to MinIO object storage.
  • The Helm subchart for user code deployments now allows for extra manifests.
  • Running dagit with flag --suppress-warnings will now ignore all warnings, such as ExperimentalWarnings.
  • PipelineRunStatus, which represents the run status, is now exported in the public API.

Bugfixes#

  • The asset catalog now has better backwards compatibility for supporting deprecated Materialization events. Previously, these events were causing loading errors.

Community Contributions#

  • Improved documentation of the dagster-dbt library with some helpful tips and example code (thanks @makotonium!).
  • Fixed the example code in the dagster-pyspark documentation for providing and accessing the pyspark resource (thanks @Andrew-Crosby!).
  • Helm chart serviceaccounts now allow annotations (thanks @jrouly!).

Documentation#

  • Added section on testing resources (link).
  • Revamped IO manager testing section to use build_input_context and build_output_context APIs (link).