You are viewing an unreleased or outdated version of the documentation

Changelog#

1.6.9 (core) / 0.22.8 (libraries)#

New#

  • [ui] When viewing logs for a run, the date for a single log row is now shown in the tooltip on the timestamp. This helps when viewing a run that takes place over more than one date.
  • Added suggestions to the error message when selecting asset keys that do not exist as an upstream asset or in an AssetSelection.
  • Improved error messages when trying to materialize a subset of a multi-asset which cannot be subset.
  • [dagster-snowflake] dagster-snowflake now requires snowflake-connector-python>=3.4.0
  • [embedded-elt] @sling_assets accepts an optional name parameter for the underlying op
  • [dagster-openai] dagster-openai library is now available.
  • [dagster-dbt] Added a new setting on DagsterDbtTranslatorSettings called enable_duplicate_source_asset_keys that allows users to set duplicate asset keys for their dbt sources. Thanks @hello-world-bfree!
  • Log messages in the Dagster daemon for unloadable sensors and schedules have been removed.
  • [ui] Search now uses a cache that persists across pageloads which should greatly improve search performance for very large orgs.
  • [ui] groups/code locations in the asset graph’s sidebar are now sorted alphabetically.

Bugfixes#

  • Fixed issue where the input/output schemas of configurable IOManagers could be ignored when providing explicit input / output run config.
  • Fixed an issue where enum values could not properly have a default value set in a ConfigurableResource.
  • Fixed an issue where graph-backed assets would sometimes lose user-provided descriptions due to a bug in internal copying.
  • [auto-materialize] Fixed an issue introduced in 1.6.7 where updates to ExternalAssets would be ignored when using AutoMaterializePolicies which depended on parent updates.
  • [asset checks] Fixed a bug with asset checks in step launchers.
  • [embedded-elt] Fix a bug when creating a SlingConnectionResource where a blank keyword argument would be emitted as an environment variable
  • [dagster-dbt] Fixed a bug where emitting events from dbt source freshness would cause an error.
  • [ui] Fixed a bug where using the “Terminate all runs” button with filters selected would not apply the filters to the action.
  • [ui] Fixed an issue where typing a search query into the search box before the search data was fetched would yield “No results” even after the data was fetched.

Community Contributions#

  • [docs] fixed typo in embedded-elt.mdx (thanks @cameronmartin)!
  • [dagster-databricks] log the url for the run of a databricks job (thanks @smats0n)!
  • Fix missing partition property (thanks christeefy)!
  • Add op_tags to @observable_source_asset decorator (thanks @maxfirman)!
  • [docs] typo in MultiPartitionMapping docs (thanks @dschafer)
  • Allow github actions to checkout branch from forked repo for docs changes (ci fix) (thanks hainenber)!

Experimental#

  • [asset checks] UI performance of asset checks related pages has been improved.
  • [dagster-dbt] The class DbtArtifacts has been added for managing the behavior of rebuilding the manifest during development but expecting a pre-built one in production.

Documentation#

  • Added example of writing compute logs to AWS S3 when customizing agent configuration.
  • "Hello, Dagster" is now "Dagster Quickstart" with the option to use a Github Codespace to explore Dagster.
  • Improved guides and reference to better running multiple isolated agents with separate queues on ECS.

Dagster Cloud#

  • Microsoft Teams is now supported for alerts. Documentation
  • A send sample alert button now exists on both the alert policies page and in the alert policies editor to make it easier to debug and configure alerts without having to wait for an event to kick them off.

1.6.8 (core) / 0.22.8 (libraries)#

Bugfixes#

  • [dagster-embedded-elt] Fixed a bug in the SlingConnectionResource that raised an error when connecting to a database.

Experimental#

  • [asset checks] graph_multi_assets with check_specs now support subsetting.

1.6.7 (core) / 0.22.7 (libraries)#

New#

  • Added a new run_retries.retry_on_op_or_asset_failures setting that can be set to false to make run retries only occur when there is an unexpected failure that crashes the run, allowing run-level retries to co-exist more naturally with op or asset retries. See the docs for more information.
  • dagster dev now sets the environment variable DAGSTER_IS_DEV_CLI allowing subprocesses to know that they were launched in a development context.
  • [ui] The Asset Checks page has been updated to show more information on the page itself rather than in a dialog.

Bugfixes#

  • [ui] Fixed an issue where the UI disallowed creating a dynamic partition if its name contained the “|” pipe character.
  • AssetSpec previously dropped the metadata and code_version fields, resulting in them not being attached to the corresponding asset. This has been fixed.

Experimental#

  • The new @multi_observable_source_asset decorator enables defining a set of assets that can be observed together with the same function.
  • [dagster-embedded-elt] New Asset Decorator @sling_assets and Resource SlingConnectionResource have been added for the [dagster-embedded-elt.sling](http://dagster-embedded-elt.sling) package. Deprecated build_sling_asset, SlingSourceConnection and SlingTargetConnection.
  • Added support for op-concurrency aware run dequeuing for the QueuedRunCoordinator.

Documentation#

  • Fixed reference documentation for isolated agents in ECS.
  • Corrected an example in the Airbyte Cloud documentation.
  • Added API links to OSS Helm deployment guide.
  • Fixed in-line pragmas showing up in the documentation.

Dagster Cloud#

  • Alerts now support Microsoft Teams.
  • [ECS] Fixed an issue where code locations could be left undeleted.
  • [ECS] ECS agents now support setting multiple replicas per code server.
  • [Insights] You can now toggle the visibility of a row in the chart by clicking on the dot for the row in the table.
  • [Users] Added a new column “Licensed role” that shows the user's most permissive role.

1.6.6 (core) / 0.22.6 (libraries)#

New#

  • Dagster officially supports Python 3.12.
  • dagster-polars has been added as an integration. Thanks @danielgafni!
  • [dagster-dbt] @dbt_assets now supports loading projects with semantic models.
  • [dagster-dbt] @dbt_assets now supports loading projects with model versions.
  • [dagster-dbt] get_asset_key_for_model now supports retrieving asset keys for seeds and snapshots. Thanks @aksestok!
  • [dagster-duckdb] The Dagster DuckDB integration supports DuckDB version 0.10.0.
  • [UPath I/O manager] If a non-partitioned asset is updated to have partitions, the file containing the non-partitioned asset data will be deleted when the partitioned asset is materialized, rather than raising an error.

Bugfixes#

  • Fixed an issue where creating a backfill of assets with dynamic partitions and a backfill policy would sometimes fail with an exception.
  • Fixed an issue with the type annotations on the @asset decorator causing a false positive in Pyright strict mode. Thanks @tylershunt!
  • [ui] On the asset graph, nodes are slightly wider allowing more text to be displayed, and group names are no longer truncated.
  • [ui] Fixed an issue where the groups in the asset graph would not update after an asset was switched between groups.
  • [dagster-k8s] Fixed an issue where setting the security_context field on the k8s_job_executor didn't correctly set the security context on the launched step pods. Thanks @krgn!

Experimental#

  • Observable source assets can now yield ObserveResults with no data_version.
  • You can now include FreshnessPolicys on observable source assets. These assets will be considered “Overdue” when the latest value for the “dagster/data_time” metadata value is older than what’s allowed by the freshness policy.
  • [ui] In Dagster Cloud, a new feature flag allows you to enable an overhauled asset overview page with a high-level stakeholder view of the asset’s health, properties, and column schema.

Documentation#

  • Updated docs to reflect newly-added support for Python 3.12.

Dagster Cloud#

  • [kubernetes] Fixed an issue where the Kubernetes agent would sometimes leave dangling kubernetes services if the agent was interrupted during the middle of being terminated.

1.6.5 (core) / 0.22.5 (libraries)#

New#

  • Within a backfill or within auto-materialize, when submitting runs for partitions of the same assets, runs are now submitted in lexicographical order of partition key, instead of in an unpredictable order.
  • [dagster-k8s] Include k8s pod debug info in run worker failure messages.
  • [dagster-dbt] Events emitted by DbtCliResource now include metadata from the dbt adapter response. This includes fields like rows_affected, query_id from the Snowflake adapter, or bytes_processed from the BigQuery adapter.

Bugfixes#

  • A previous change prevented asset backfills from grouping multiple assets into the same run when using BackfillPolicies under certain conditions. While the backfills would still execute in the proper order, this could lead to more individual runs than necessary. This has been fixed.
  • [dagster-k8s] Fixed an issue introduced in the 1.6.4 release where upgrading the Helm chart without upgrading the Dagster version used by user code caused failures in jobs using the k8s_job_executor.
  • [instigator-tick-logs] Fixed an issue where invoking context.log.exception in a sensor or schedule did not properly capture exception information.
  • [asset-checks] Fixed an issue where additional dependencies for dbt tests modeled as Dagster asset checks were not properly being deduplicated.
  • [dagster-dbt] Fixed an issue where dbt model, seed, or snapshot names with periods were not supported.

Experimental#

  • @observable_source_asset-decorated functions can now return an ObserveResult. This allows including metadata on the observation, in addition to a data version. This is currently only supported for non-partitioned assets.
  • [auto-materialize] A new AutoMaterializeRule.skip_on_not_all_parents_updated_since_cron class allows you to construct AutoMaterializePolicys which wait for all parents to be updated after the latest tick of a given cron schedule.
  • [Global op/asset concurrency] Ops and assets now take run priority into account when claiming global op/asset concurrency slots.

Documentation#

  • Fixed an error in our asset checks docs. Thanks @vaharoni!
  • Fixed an error in our Dagster Pipes Kubernetes docs. Thanks @cameronmartin!
  • Fixed an issue on the Hello Dagster! guide that prevented it from loading.
  • Add specific capabilities of the Airflow integration to the Airflow integration page.
  • Re-arranged sections in the I/O manager concept page to make info about using I/O versus resources more prominent.

0.8.3#

Breaking Changes

  • Previously, the gcs_resource returned a GCSResource wrapper which had a single client property that returned a google.cloud.storage.client.Client. Now, the gcs_resource returns the client directly.

    To update solids that use the gcp_resource, change:

    context.resources.gcs.client
    

    To:

    context.resources.gcs
    

New

  • Introduced a new Python API reexecute_pipeline to reexecute an existing pipeline run.
  • Performance improvements in Pipeline Overview and other pages.
  • Long metadata entries in the asset details view are now scrollable.
  • Added a project field to the gcs_resource in dagster_gcp.
  • Added new CLI command dagster asset wipe to remove all existing asset keys.

Bugfix

  • Several Dagit bugfixes and performance improvements
  • Fixes pipeline execution issue with custom run launchers that call executeRunInProcess.
  • Updates dagster schedule up output to be repository location scoped

0.8.2#

Bugfix

  • Fixes issues with dagster instance migrate.
  • Fixes bug in launch_scheduled_execution that would mask configuration errors.
  • Fixes bug in dagit where schedule related errors were not shown.
  • Fixes JSON-serialization error in dagster-k8s when specifying per-step resources.

New

  • Makes label optional parameter for materializations with asset_key specified.
  • Changes Assets page to have a typeahead selector and hierarchical views based on asset_key path.
  • dagster-ssh
    • adds SFTP get and put functions to SSHResource, replacing sftp_solid.

Docs

  • Various docs corrections

0.8.1#

Bugfix

  • Fixed a file descriptor leak that caused OSError: [Errno 24] Too many open files when enough temporary files were created.
  • Fixed an issue where an empty config in the Playground would unexpectedly be marked as invalid YAML.
  • Removed "config" deprecation warnings for dask and celery executors.

New

  • Improved performance of the Assets page.

0.8.0 "In The Zone"#

Major Changes

Please see the 080_MIGRATION.md migration guide for details on updating existing code to be compatible with 0.8.0

  • Workspace, host and user process separation, and repository definition Dagit and other tools no longer load a single repository containing user definitions such as pipelines into the same process as the framework code. Instead, they load a "workspace" that can contain multiple repositories sourced from a variety of different external locations (e.g., Python modules and Python virtualenvs, with containers and source control repositories soon to come).

    The repositories in a workspace are loaded into their own "user" processes distinct from the "host" framework process. Dagit and other tools now communicate with user code over an IPC mechanism. This architectural change has a couple of advantages:

    • Dagit no longer needs to be restarted when there is an update to user code.
    • Users can use repositories to organize their pipelines, but still work on all of their repositories using a single running Dagit.
    • The Dagit process can now run in a separate Python environment from user code so pipeline dependencies do not need to be installed into the Dagit environment.
    • Each repository can be sourced from a separate Python virtualenv, so teams can manage their dependencies (or even their own Python versions) separately.

    We have introduced a new file format, workspace.yaml, in order to support this new architecture. The workspace yaml encodes what repositories to load and their location, and supersedes the repository.yaml file and associated machinery.

    As a consequence, Dagster internals are now stricter about how pipelines are loaded. If you have written scripts or tests in which a pipeline is defined and then passed across a process boundary (e.g., using the multiprocess_executor or dagstermill), you may now need to wrap the pipeline in the reconstructable utility function for it to be reconstructed across the process boundary.

    In addition, rather than instantiate the RepositoryDefinition class directly, users should now prefer the @repository decorator. As part of this change, the @scheduler and @repository_partitions decorators have been removed, and their functionality subsumed under @repository.

  • Dagit organization The Dagit interface has changed substantially and is now oriented around pipelines. Within the context of each pipeline in an environment, the previous "Pipelines" and "Solids" tabs have been collapsed into the "Definition" tab; a new "Overview" tab provides summary information about the pipeline, its schedules, its assets, and recent runs; the previous "Playground" tab has been moved within the context of an individual pipeline. Related runs (e.g., runs created by re-executing subsets of previous runs) are now grouped together in the Playground for easy reference. Dagit also now includes more advanced support for display of scheduled runs that may not have executed ("schedule ticks"), as well as longitudinal views over scheduled runs, and asset-oriented views of historical pipeline runs.

  • Assets Assets are named materializations that can be generated by your pipeline solids, which support specialized views in Dagit. For example, if we represent a database table with an asset key, we can now index all of the pipelines and pipeline runs that materialize that table, and view them in a single place. To use the asset system, you must enable an asset-aware storage such as Postgres.

  • Run launchers The distinction between "starting" and "launching" a run has been effaced. All pipeline runs instigated through Dagit now make use of the RunLauncher configured on the Dagster instance, if one is configured. Additionally, run launchers can now support termination of previously launched runs. If you have written your own run launcher, you may want to update it to support termination. Note also that as of 0.7.9, the semantics of RunLauncher.launch_run have changed; this method now takes the run_id of an existing run and should no longer attempt to create the run in the instance.

  • Flexible reexecution Pipeline re-execution from Dagit is now fully flexible. You may re-execute arbitrary subsets of a pipeline's execution steps, and the re-execution now appears in the interface as a child run of the original execution.

  • Support for historical runs Snapshots of pipelines and other Dagster objects are now persisted along with pipeline runs, so that historial runs can be loaded for review with the correct execution plans even when pipeline code has changed. This prepares the system to be able to diff pipeline runs and other objects against each other.

  • Step launchers and expanded support for PySpark on EMR and Databricks We've introduced a new StepLauncher abstraction that uses the resource system to allow individual execution steps to be run in separate processes (and thus on separate execution substrates). This has made extensive improvements to our PySpark support possible, including the option to execute individual PySpark steps on EMR using the EmrPySparkStepLauncher and on Databricks using the DatabricksPySparkStepLauncher The emr_pyspark example demonstrates how to use a step launcher.

  • Clearer names What was previously known as the environment dictionary is now called the run_config, and the previous environment_dict argument to APIs such as execute_pipeline is now deprecated. We renamed this argument to focus attention on the configuration of the run being launched or executed, rather than on an ambiguous "environment". We've also renamed the config argument to all use definitions to be config_schema, which should reduce ambiguity between the configuration schema and the value being passed in some particular case. We've also consolidated and improved documentation of the valid types for a config schema.

  • Lakehouse We're pleased to introduce Lakehouse, an experimental, alternative programming model for data applications, built on top of Dagster core. Lakehouse allows developers to define data applications in terms of data assets, such as database tables or ML models, rather than in terms of the computations that produce those assets. The simple_lakehouse example gives a taste of what it's like to program in Lakehouse. We'd love feedback on whether this model is helpful!

  • Airflow ingest We've expanded the tooling available to teams with existing Airflow installations that are interested in incrementally adopting Dagster. Previously, we provided only injection tools that allowed developers to write Dagster pipelines and then compile them into Airflow DAGs for execution. We've now added ingestion tools that allow teams to move to Dagster for execution without having to rewrite all of their legacy pipelines in Dagster. In this approach, Airflow DAGs are kept in their own container/environment, compiled into Dagster pipelines, and run via the Dagster orchestrator. See the airflow_ingest example for details!

Breaking Changes

  • dagster

    • The @scheduler and @repository_partitions decorators have been removed. Instances of ScheduleDefinition and PartitionSetDefinition belonging to a repository should be specified using the @repository decorator instead.

    • Support for the Dagster solid selection DSL, previously introduced in Dagit, is now uniform throughout the Python codebase, with the previous solid_subset arguments (--solid-subset in the CLI) being replaced by solid_selection (--solid-selection). In addition to the names of individual solids, this argument now supports selection queries like *solid_name++ (i.e., solid_name, all of its ancestors, its immediate descendants, and their immediate descendants).

    • The built-in Dagster type Path has been removed.

    • PartitionSetDefinition names, including those defined by a PartitionScheduleDefinition, must now be unique within a single repository.

    • Asset keys are now sanitized for non-alphanumeric characters. All characters besides alphanumerics and _ are treated as path delimiters. Asset keys can also be specified using AssetKey, which accepts a list of strings as an explicit path. If you are running 0.7.10 or later and using assets, you may need to migrate your historical event log data for asset keys from previous runs to be attributed correctly. This event_log data migration can be invoked as follows:

      from dagster.core.storage.event_log.migration import migrate_event_log_data
      from dagster import DagsterInstance
      
      migrate_event_log_data(instance=DagsterInstance.get())
      
    • The interface of the Scheduler base class has changed substantially. If you've written a custom scheduler, please get in touch!

    • The partitioned schedule decorators now generate PartitionSetDefinition names using the schedule name, suffixed with _partitions.

    • The repository property on ScheduleExecutionContext is no longer available. If you were using this property to pass to Scheduler instance methods, this interface has changed significantly. Please see the Scheduler class documentation for details.

    • The CLI option --celery-base-priority is no longer available for the command: dagster pipeline backfill. Use the tags option to specify the celery priority, (e.g. dagster pipeline backfill my_pipeline --tags '{ "dagster-celery/run_priority": 3 }'

    • The execute_partition_set API has been removed.

    • The deprecated is_optional parameter to Field and OutputDefinition has been removed. Use is_required instead.

    • The deprecated runtime_type property on InputDefinition and OutputDefinition has been removed. Use dagster_type instead.

    • The deprecated has_runtime_type, runtime_type_named, and all_runtime_types methods on PipelineDefinition have been removed. Use has_dagster_type, dagster_type_named, and all_dagster_types instead.

    • The deprecated all_runtime_types method on SolidDefinition and CompositeSolidDefinition has been removed. Use all_dagster_types instead.

    • The deprecated metadata argument to SolidDefinition and @solid has been removed. Use tags instead.

    • The graphviz-based DAG visualization in Dagster core has been removed. Please use Dagit!

  • dagit

    • dagit-cli has been removed, and dagit is now the only console entrypoint.
  • dagster-aws

    • The AWS CLI has been removed.
    • dagster_aws.EmrRunJobFlowSolidDefinition has been removed.
  • dagster-bash

    • This package has been renamed to dagster-shell. Thebash_command_solid and bash_script_solid solid factory functions have been renamed to create_shell_command_solid and create_shell_script_solid.
  • dagster-celery

    • The CLI option --celery-base-priority is no longer available for the command: dagster pipeline backfill. Use the tags option to specify the celery priority, (e.g. dagster pipeline backfill my_pipeline --tags '{ "dagster-celery/run_priority": 3 }'
  • dagster-dask

    • The config schema for the dagster_dask.dask_executor has changed. The previous config should now be nested under the key local.
  • dagster-gcp

    • The BigQueryClient has been removed. Use bigquery_resource instead.
  • dagster-dbt

    • The dagster-dbt package has been removed. This was inadequate as a reference integration, and will be replaced in 0.8.x.
  • dagster-spark

    • dagster_spark.SparkSolidDefinition has been removed - use create_spark_solid instead.
    • The SparkRDD Dagster type, which only worked with an in-memory engine, has been removed.
  • dagster-twilio

    • The TwilioClient has been removed. Use twilio_resource instead.

New

  • dagster

    • You may now set asset_key on any Materialization to use the new asset system. You will also need to configure an asset-aware storage, such as Postgres. The longitudinal_pipeline example demonstrates this system.
    • The partitioned schedule decorators now support an optional end_time.
    • Opt-in telemetry now reports the Python version being used.
  • dagit

    • Dagit's GraphQL playground is now available at /graphiql as well as at /graphql.
  • dagster-aws

    • The dagster_aws.S3ComputeLogManager may now be configured to override the S3 endpoint and associated SSL settings.
    • Config string and integer values in the S3 tooling may now be set using either environment variables or literals.
  • dagster-azure

    • We've added the dagster-azure package, with support for Azure Data Lake Storage Gen2; you can use the adls2_system_storage or, for direct access, the adls2_resource resource. (Thanks @sd2k!)
  • dagster-dask

    • Dask clusters are now supported by dagster_dask.dask_executor. For full support, you will need to install extras with pip install dagster-dask[yarn, pbs, kube]. (Thanks @DavidKatz-il!)
  • dagster-databricks

    • We've added the dagster-databricks package, with support for running PySpark steps on Databricks clusters through the databricks_pyspark_step_launcher. (Thanks @sd2k!)
  • dagster-gcp

    • Config string and integer values in the BigQuery, Dataproc, and GCS tooling may now be set using either environment variables or literals.
  • dagster-k8s

    • Added the CeleryK8sRunLauncher to submit execution plan steps to Celery task queues for execution as k8s Jobs.
    • Added the ability to specify resource limits on a per-pipeline and per-step basis for k8s Jobs.
    • Many improvements and bug fixes to the dagster-k8s Helm chart.
  • dagster-pandas

    • Config string and integer values in the dagster-pandas input and output schemas may now be set using either environment variables or literals.
  • dagster-papertrail

    • Config string and integer values in the papertrail_logger may now be set using either environment variables or literals.
  • dagster-pyspark

    • PySpark solids can now run on EMR, using the emr_pyspark_step_launcher, or on Databricks using the new dagster-databricks package. The emr_pyspark example demonstrates how to use a step launcher.
  • dagster-snowflake

    • Config string and integer values in the snowflake_resource may now be set using either environment variables or literals.
  • dagster-spark

    • dagster_spark.create_spark_solid now accepts a required_resource_keys argument, which enables setting up a step launcher for Spark solids, like the emr_pyspark_step_launcher.

Bugfix

  • dagster pipeline execute now sets a non-zero exit code when pipeline execution fails.

0.7.16#

Bugfix

  • Enabled NoOpComputeLogManager to be configured as the compute_logs implementation in dagster.yaml
  • Suppressed noisy error messages in logs from skipped steps