You are viewing an unreleased or outdated version of the documentation

Changelog#

1.6.9 (core) / 0.22.8 (libraries)#

New#

  • [ui] When viewing logs for a run, the date for a single log row is now shown in the tooltip on the timestamp. This helps when viewing a run that takes place over more than one date.
  • Added suggestions to the error message when selecting asset keys that do not exist as an upstream asset or in an AssetSelection.
  • Improved error messages when trying to materialize a subset of a multi-asset which cannot be subset.
  • [dagster-snowflake] dagster-snowflake now requires snowflake-connector-python>=3.4.0
  • [embedded-elt] @sling_assets accepts an optional name parameter for the underlying op
  • [dagster-openai] dagster-openai library is now available.
  • [dagster-dbt] Added a new setting on DagsterDbtTranslatorSettings called enable_duplicate_source_asset_keys that allows users to set duplicate asset keys for their dbt sources. Thanks @hello-world-bfree!
  • Log messages in the Dagster daemon for unloadable sensors and schedules have been removed.
  • [ui] Search now uses a cache that persists across pageloads which should greatly improve search performance for very large orgs.
  • [ui] groups/code locations in the asset graph’s sidebar are now sorted alphabetically.

Bugfixes#

  • Fixed issue where the input/output schemas of configurable IOManagers could be ignored when providing explicit input / output run config.
  • Fixed an issue where enum values could not properly have a default value set in a ConfigurableResource.
  • Fixed an issue where graph-backed assets would sometimes lose user-provided descriptions due to a bug in internal copying.
  • [auto-materialize] Fixed an issue introduced in 1.6.7 where updates to ExternalAssets would be ignored when using AutoMaterializePolicies which depended on parent updates.
  • [asset checks] Fixed a bug with asset checks in step launchers.
  • [embedded-elt] Fix a bug when creating a SlingConnectionResource where a blank keyword argument would be emitted as an environment variable
  • [dagster-dbt] Fixed a bug where emitting events from dbt source freshness would cause an error.
  • [ui] Fixed a bug where using the “Terminate all runs” button with filters selected would not apply the filters to the action.
  • [ui] Fixed an issue where typing a search query into the search box before the search data was fetched would yield “No results” even after the data was fetched.

Community Contributions#

  • [docs] fixed typo in embedded-elt.mdx (thanks @cameronmartin)!
  • [dagster-databricks] log the url for the run of a databricks job (thanks @smats0n)!
  • Fix missing partition property (thanks christeefy)!
  • Add op_tags to @observable_source_asset decorator (thanks @maxfirman)!
  • [docs] typo in MultiPartitionMapping docs (thanks @dschafer)
  • Allow github actions to checkout branch from forked repo for docs changes (ci fix) (thanks hainenber)!

Experimental#

  • [asset checks] UI performance of asset checks related pages has been improved.
  • [dagster-dbt] The class DbtArtifacts has been added for managing the behavior of rebuilding the manifest during development but expecting a pre-built one in production.

Documentation#

  • Added example of writing compute logs to AWS S3 when customizing agent configuration.
  • "Hello, Dagster" is now "Dagster Quickstart" with the option to use a Github Codespace to explore Dagster.
  • Improved guides and reference to better running multiple isolated agents with separate queues on ECS.

Dagster Cloud#

  • Microsoft Teams is now supported for alerts. Documentation
  • A send sample alert button now exists on both the alert policies page and in the alert policies editor to make it easier to debug and configure alerts without having to wait for an event to kick them off.

1.6.8 (core) / 0.22.8 (libraries)#

Bugfixes#

  • [dagster-embedded-elt] Fixed a bug in the SlingConnectionResource that raised an error when connecting to a database.

Experimental#

  • [asset checks] graph_multi_assets with check_specs now support subsetting.

1.6.7 (core) / 0.22.7 (libraries)#

New#

  • Added a new run_retries.retry_on_op_or_asset_failures setting that can be set to false to make run retries only occur when there is an unexpected failure that crashes the run, allowing run-level retries to co-exist more naturally with op or asset retries. See the docs for more information.
  • dagster dev now sets the environment variable DAGSTER_IS_DEV_CLI allowing subprocesses to know that they were launched in a development context.
  • [ui] The Asset Checks page has been updated to show more information on the page itself rather than in a dialog.

Bugfixes#

  • [ui] Fixed an issue where the UI disallowed creating a dynamic partition if its name contained the “|” pipe character.
  • AssetSpec previously dropped the metadata and code_version fields, resulting in them not being attached to the corresponding asset. This has been fixed.

Experimental#

  • The new @multi_observable_source_asset decorator enables defining a set of assets that can be observed together with the same function.
  • [dagster-embedded-elt] New Asset Decorator @sling_assets and Resource SlingConnectionResource have been added for the [dagster-embedded-elt.sling](http://dagster-embedded-elt.sling) package. Deprecated build_sling_asset, SlingSourceConnection and SlingTargetConnection.
  • Added support for op-concurrency aware run dequeuing for the QueuedRunCoordinator.

Documentation#

  • Fixed reference documentation for isolated agents in ECS.
  • Corrected an example in the Airbyte Cloud documentation.
  • Added API links to OSS Helm deployment guide.
  • Fixed in-line pragmas showing up in the documentation.

Dagster Cloud#

  • Alerts now support Microsoft Teams.
  • [ECS] Fixed an issue where code locations could be left undeleted.
  • [ECS] ECS agents now support setting multiple replicas per code server.
  • [Insights] You can now toggle the visibility of a row in the chart by clicking on the dot for the row in the table.
  • [Users] Added a new column “Licensed role” that shows the user's most permissive role.

1.6.6 (core) / 0.22.6 (libraries)#

New#

  • Dagster officially supports Python 3.12.
  • dagster-polars has been added as an integration. Thanks @danielgafni!
  • [dagster-dbt] @dbt_assets now supports loading projects with semantic models.
  • [dagster-dbt] @dbt_assets now supports loading projects with model versions.
  • [dagster-dbt] get_asset_key_for_model now supports retrieving asset keys for seeds and snapshots. Thanks @aksestok!
  • [dagster-duckdb] The Dagster DuckDB integration supports DuckDB version 0.10.0.
  • [UPath I/O manager] If a non-partitioned asset is updated to have partitions, the file containing the non-partitioned asset data will be deleted when the partitioned asset is materialized, rather than raising an error.

Bugfixes#

  • Fixed an issue where creating a backfill of assets with dynamic partitions and a backfill policy would sometimes fail with an exception.
  • Fixed an issue with the type annotations on the @asset decorator causing a false positive in Pyright strict mode. Thanks @tylershunt!
  • [ui] On the asset graph, nodes are slightly wider allowing more text to be displayed, and group names are no longer truncated.
  • [ui] Fixed an issue where the groups in the asset graph would not update after an asset was switched between groups.
  • [dagster-k8s] Fixed an issue where setting the security_context field on the k8s_job_executor didn't correctly set the security context on the launched step pods. Thanks @krgn!

Experimental#

  • Observable source assets can now yield ObserveResults with no data_version.
  • You can now include FreshnessPolicys on observable source assets. These assets will be considered “Overdue” when the latest value for the “dagster/data_time” metadata value is older than what’s allowed by the freshness policy.
  • [ui] In Dagster Cloud, a new feature flag allows you to enable an overhauled asset overview page with a high-level stakeholder view of the asset’s health, properties, and column schema.

Documentation#

  • Updated docs to reflect newly-added support for Python 3.12.

Dagster Cloud#

  • [kubernetes] Fixed an issue where the Kubernetes agent would sometimes leave dangling kubernetes services if the agent was interrupted during the middle of being terminated.

1.6.5 (core) / 0.22.5 (libraries)#

New#

  • Within a backfill or within auto-materialize, when submitting runs for partitions of the same assets, runs are now submitted in lexicographical order of partition key, instead of in an unpredictable order.
  • [dagster-k8s] Include k8s pod debug info in run worker failure messages.
  • [dagster-dbt] Events emitted by DbtCliResource now include metadata from the dbt adapter response. This includes fields like rows_affected, query_id from the Snowflake adapter, or bytes_processed from the BigQuery adapter.

Bugfixes#

  • A previous change prevented asset backfills from grouping multiple assets into the same run when using BackfillPolicies under certain conditions. While the backfills would still execute in the proper order, this could lead to more individual runs than necessary. This has been fixed.
  • [dagster-k8s] Fixed an issue introduced in the 1.6.4 release where upgrading the Helm chart without upgrading the Dagster version used by user code caused failures in jobs using the k8s_job_executor.
  • [instigator-tick-logs] Fixed an issue where invoking context.log.exception in a sensor or schedule did not properly capture exception information.
  • [asset-checks] Fixed an issue where additional dependencies for dbt tests modeled as Dagster asset checks were not properly being deduplicated.
  • [dagster-dbt] Fixed an issue where dbt model, seed, or snapshot names with periods were not supported.

Experimental#

  • @observable_source_asset-decorated functions can now return an ObserveResult. This allows including metadata on the observation, in addition to a data version. This is currently only supported for non-partitioned assets.
  • [auto-materialize] A new AutoMaterializeRule.skip_on_not_all_parents_updated_since_cron class allows you to construct AutoMaterializePolicys which wait for all parents to be updated after the latest tick of a given cron schedule.
  • [Global op/asset concurrency] Ops and assets now take run priority into account when claiming global op/asset concurrency slots.

Documentation#

  • Fixed an error in our asset checks docs. Thanks @vaharoni!
  • Fixed an error in our Dagster Pipes Kubernetes docs. Thanks @cameronmartin!
  • Fixed an issue on the Hello Dagster! guide that prevented it from loading.
  • Add specific capabilities of the Airflow integration to the Airflow integration page.
  • Re-arranged sections in the I/O manager concept page to make info about using I/O versus resources more prominent.

0.10.3#

New

  • [dagster] Sensors can now specify a minimum_interval_seconds argument, which determines the minimum amount of time between sensor evaluations.
  • [dagit] After manually reloading the current repository, users will now be prompted to regenerate preset-based or partition-set based run configs in the Playground view. This helps ensure that the generated run config is up to date when launching new runs. The prompt does not occur when the repository is automatically reloaded.

Bugfixes

  • Updated the -n/--max_workers default value for the dagster api grpc command to be None. When set to None, the gRPC server will use the default number of workers which is based on the CPU count. If you were previously setting this value to 1, we recommend removing the argument or increasing the number.
  • Fixed issue loading the schedule tick history graph for new schedules that have not been turned on.
  • In Dagit, newly launched runs will open in the current tab instead of a new tab.
  • Dagit bugfixes and improvements, including changes to loading state spinners.
  • When a user specifies both an intermediate storage and an IO manager for a particular output, we no longer silently ignore the IO manager

0.10.2#

Community Contributions

New

  • [dagstermill] Users can now specify custom tags & descriptions for notebook solids.
  • [dagster-pagerduty / dagster-slack] Added built-in hook integrations to create pagerduty/slack alerts when solids fail.
  • [dagit] Added ability to preview runs for upcoming schedule ticks.

Bugfixes

  • Fixed an issue where run start times and end times were displayed in the wrong timezone in Dagit when using Postgres storage.

  • Schedules with partitions that weren’t able to execute due to not being able to find a partition will now display the name of the partition they were unable to find on the “Last tick” entry for that schedule.

  • Improved timing information display for queued and canceled runs within the Runs table view and on individual Run pages in Dagit.

  • Improvements to the tick history view for schedules and sensors.

  • Fixed formatting issues on the Dagit instance configuration page.

  • Miscellaneous Dagit bugfixes and improvements.

  • The dagster pipeline launch command will now respect run concurrency limits if they are applied on your instance.

  • Fixed an issue where re-executing a run created by a sensor would cause the daemon to stop executing any additional runs from that sensor.

  • Sensor runs with invalid run configuration will no longer create a failed run - instead, an error will appear on the page for the sensor, allowing you to fix the configuration issue.

  • General dagstermill housekeeping: test refactoring & type annotations, as well as repinning ipykernel to solve #3401

Documentation

  • Improved dagster-dbt example.
  • Added examples to demonstrate experimental features, including Memoized Development and Dynamic Graph.
  • Added a PR template and how to pick an issue for the first time contributors

0.10.1#

Community Contributions

  • Reduced image size of k8s-example by 25% (104 MB) (thanks @alex-treebeard and @mrdavidlaing!)
  • [dagster-snowflake] snowflake_resource can now be configured to use the SQLAlchemy connector (thanks @basilvetas!)

New

  • When setting userDeployments.deployments in the Helm chart, replicaCount now defaults to 1 if not specified.

Bugfixes

  • Fixed an issue where the Dagster daemon process couldn’t launch runs in repository locations containing more than one repository.
  • Fixed an issue where Helm chart was not correctly templating env, envConfigMaps, and envSecrets.

Documentation

  • Added new troubleshooting guide for problems encountered while using the QueuedRunCoordinator to limit run concurrency.
  • Added documentation for the sensor command-line interface.

0.10.0 "The Edge of Glory"#

Major Changes#

  • A native scheduler with support for exactly-once, fault tolerant, timezone-aware scheduling. A new Dagster daemon process has been added to manage your schedules and sensors with a reconciliation loop, ensuring that all runs are executed exactly once, even if the Dagster daemon experiences occasional failure. See the Migration Guide for instructions on moving from SystemCronScheduler or K8sScheduler to the new scheduler.
  • First-class sensors, built on the new Dagster daemon, allow you to instigate runs based on changes in external state - for example, files on S3 or assets materialized by other Dagster pipelines. See the Sensors Overview for more information.
  • Dagster now supports pipeline run queueing. You can apply instance-level run concurrency limits and prioritization rules by adding the QueuedRunCoordinator to your Dagster instance. See the Run Concurrency Overview for more information.
  • The IOManager abstraction provides a new, streamlined primitive for granular control over where and how solid outputs are stored and loaded. This is intended to replace the (deprecated) intermediate/system storage abstractions, See the IO Manager Overview for more information.
  • A new Partitions page in Dagit lets you view your your pipeline runs organized by partition. You can also launch backfills from Dagit and monitor them from this page.
  • A new Instance Status page in Dagit lets you monitor the health of your Dagster instance, with repository location information, daemon statuses, instance-level schedule and sensor information, and linkable instance configuration.
  • Resources can now declare their dependencies on other resources via the required_resource_keys parameter on @resource.
  • Our support for deploying on Kubernetes is now mature and battle-tested Our Helm chart is now easier to configure and deploy, and we’ve made big investments in observability and reliability. You can view Kubernetes interactions in the structured event log and use Dagit to help you understand what’s happening in your deployment. The defaults in the Helm chart will give you graceful degradation and failure recovery right out of the box.
  • Experimental support for dynamic orchestration with the new DynamicOutputDefinition API. Dagster can now map the downstream dependencies over a dynamic output at runtime.

Breaking Changes#

Dropping Python 2 support

  • We’ve dropped support for Python 2.7, based on community usage and enthusiasm for Python 3-native public APIs.

Removal of deprecated APIs

These APIs were marked for deprecation with warnings in the 0.9.0 release, and have been removed in the 0.10.0 release.

  • The decorator input_hydration_config has been removed. Use the dagster_type_loader decorator instead.
  • The decorator output_materialization_config has been removed. Use dagster_type_materializer instead.
  • The system storage subsystem has been removed. This includes SystemStorageDefinition, @system_storage, and default_system_storage_defs . Use the new IOManagers API instead. See the IO Manager Overview for more information.
  • The config_field argument on decorators and definitions classes has been removed and replaced with config_schema. This is a drop-in rename.
  • The argument step_keys_to_execute to the functions reexecute_pipeline and reexecute_pipeline_iterator has been removed. Use the step_selection argument to select subsets for execution instead.
  • Repositories can no longer be loaded using the legacy repository key in your workspace.yaml; use load_from instead. See the Workspaces Overview for documentation about how to define a workspace.

Breaking API Changes

  • SolidExecutionResult.compute_output_event_dict has been renamed to SolidExecutionResult.compute_output_events_dict. A solid execution result is returned from methods such as result_for_solid. Any call sites will need to be updated.
  • The .compute suffix is no longer applied to step keys. Step keys that were previously named my_solid.compute will now be named my_solid. If you are using any API method that takes a step_selection argument, you will need to update the step keys accordingly.
  • The pipeline_def property has been removed from the InitResourceContext passed to functions decorated with @resource.

Dagstermill

  • If you are using define_dagstermill_solid with the output_notebook parameter set to True, you will now need to provide a file manager resource (subclass of dagster.core.storage.FileManager) on your pipeline mode under the resource key "file_manager", e.g.:

    from dagster import ModeDefinition, local_file_manager, pipeline
    from dagstermill import define_dagstermill_solid
    
    my_dagstermill_solid = define_dagstermill_solid("my_dagstermill_solid", output_notebook=True, ...)
    
    @pipeline(mode_defs=[ModeDefinition(resource_defs={"file_manager": local_file_manager})])
    def my_dagstermill_pipeline():
        my_dagstermill_solid(...)
    

Helm Chart

  • The schema for the scheduler values in the helm chart has changed. Instead of a simple toggle on/off, we now require an explicit scheduler.type to specify usage of the DagsterDaemonScheduler, K8sScheduler, or otherwise. If your specified scheduler.type has required config, these fields must be specified under scheduler.config.
  • snake_case fields have been changed to camelCase. Please update your values.yaml as follows:
    • pipeline_runpipelineRun
    • dagster_homedagsterHome
    • env_secretsenvSecrets
    • env_config_mapsenvConfigMaps
  • The Helm values celery and k8sRunLauncher have now been consolidated under the Helm value runLauncher for simplicity. Use the field runLauncher.type to specify usage of the K8sRunLauncher, CeleryK8sRunLauncher, or otherwise. By default, the K8sRunLauncher is enabled.
  • All Celery message brokers (i.e. RabbitMQ and Redis) are disabled by default. If you are using the CeleryK8sRunLauncher, you should explicitly enable your message broker of choice.
  • userDeployments are now enabled by default.

Core#

  • Event log messages streamed to stdout and stderr have been streamlined to be a single line per event.

  • Experimental support for memoization and versioning lets you execute pipelines incrementally, selecting which solids need to be rerun based on runtime criteria and versioning their outputs with configurable identifiers that capture their upstream dependencies.

    To set up memoized step selection, users can provide a MemoizableIOManager, whose has_output function decides whether a given solid output needs to be computed or already exists. To execute a pipeline with memoized step selection, users can supply the dagster/is_memoized_run run tag to execute_pipeline.

    To set the version on a solid or resource, users can supply the version field on the definition. To access the derived version for a step output, users can access the version field on the OutputContext passed to the handle_output and load_input methods of IOManager and the has_output method of MemoizableIOManager.

  • Schedules that are executed using the new DagsterDaemonScheduler can now execute in any timezone by adding an execution_timezone parameter to the schedule. Daylight Savings Time transitions are also supported. See the Schedules Overview for more information and examples.

Dagit#

  • Countdown and refresh buttons have been added for pages with regular polling queries (e.g. Runs, Schedules).
  • Confirmation and progress dialogs are now presented when performing run terminations and deletions. Additionally, hanging/orphaned runs can now be forced to terminate, by selecting "Force termination immediately" in the run termination dialog.
  • The Runs page now shows counts for "Queued" and "In progress" tabs, and individual run pages show timing, tags, and configuration metadata.
  • The backfill experience has been improved with means to view progress and terminate the entire backfill via the partition set page. Additionally, errors related to backfills are now surfaced more clearly.
  • Shortcut hints are no longer displayed when attempting to use the screen capture command.
  • The asset page has been revamped to include a table of events and enable organizing events by partition. Asset key escaping issues in other views have been fixed as well.
  • Miscellaneous bug fixes, frontend performance tweaks, and other improvements are also included.

Kubernetes/Helm#

Helm

  • We've added schema validation to our Helm chart. You can now check that your values YAML file is correct by running:

    helm lint helm/dagster -f helm/dagster/values.yaml
    
  • Added support for resource annotations throughout our Helm chart.

  • Added Helm deployment of the dagster daemon & daemon scheduler.

  • Added Helm support for configuring a compute log manager in your dagster instance.

  • User code deployments now include a user ConfigMap by default.

  • Changed the default liveness probe for Dagit to use httpGet "/dagit_info" instead of tcpSocket:80

Dagster-K8s [Kubernetes]

  • Added support for user code deployments on Kubernetes.
  • Added support for tagging pipeline executions.
  • Fixes to support version 12.0.0 of the Python Kubernetes client.
  • Improved implementation of Kubernetes+Dagster retries.
  • Many logging improvements to surface debugging information and failures in the structured event log.

Dagster-Celery-K8s

  • Improved interrupt/termination handling in Celery workers.

Integrations & Libraries#

  • Added a new dagster-docker library with a DockerRunLauncher that launches each run in its own Docker container. (See Deploying with Docker docs for an example.)
  • Added support for AWS Athena. (Thanks @jmsanders!)
  • Added mocks for AWS S3, Athena, and Cloudwatch in tests. (Thanks @jmsanders!)
  • Allow setting of S3 endpoint through env variables. (Thanks @marksteve!)
  • Various bug fixes and new features for the Azure, Databricks, and Dask integrations.
  • Added a create_databricks_job_solid for creating solids that launch Databricks jobs.

0.9.22.post0#

Bugfixes

  • [Dask] Pin dask[dataframe] to <=2.30.0 and distributed to <=2.30.1