You are viewing an unreleased or outdated version of the documentation

Changelog#

1.6.9 (core) / 0.22.8 (libraries)#

New#

  • [ui] When viewing logs for a run, the date for a single log row is now shown in the tooltip on the timestamp. This helps when viewing a run that takes place over more than one date.
  • Added suggestions to the error message when selecting asset keys that do not exist as an upstream asset or in an AssetSelection.
  • Improved error messages when trying to materialize a subset of a multi-asset which cannot be subset.
  • [dagster-snowflake] dagster-snowflake now requires snowflake-connector-python>=3.4.0
  • [embedded-elt] @sling_assets accepts an optional name parameter for the underlying op
  • [dagster-openai] dagster-openai library is now available.
  • [dagster-dbt] Added a new setting on DagsterDbtTranslatorSettings called enable_duplicate_source_asset_keys that allows users to set duplicate asset keys for their dbt sources. Thanks @hello-world-bfree!
  • Log messages in the Dagster daemon for unloadable sensors and schedules have been removed.
  • [ui] Search now uses a cache that persists across pageloads which should greatly improve search performance for very large orgs.
  • [ui] groups/code locations in the asset graph’s sidebar are now sorted alphabetically.

Bugfixes#

  • Fixed issue where the input/output schemas of configurable IOManagers could be ignored when providing explicit input / output run config.
  • Fixed an issue where enum values could not properly have a default value set in a ConfigurableResource.
  • Fixed an issue where graph-backed assets would sometimes lose user-provided descriptions due to a bug in internal copying.
  • [auto-materialize] Fixed an issue introduced in 1.6.7 where updates to ExternalAssets would be ignored when using AutoMaterializePolicies which depended on parent updates.
  • [asset checks] Fixed a bug with asset checks in step launchers.
  • [embedded-elt] Fix a bug when creating a SlingConnectionResource where a blank keyword argument would be emitted as an environment variable
  • [dagster-dbt] Fixed a bug where emitting events from dbt source freshness would cause an error.
  • [ui] Fixed a bug where using the “Terminate all runs” button with filters selected would not apply the filters to the action.
  • [ui] Fixed an issue where typing a search query into the search box before the search data was fetched would yield “No results” even after the data was fetched.

Community Contributions#

  • [docs] fixed typo in embedded-elt.mdx (thanks @cameronmartin)!
  • [dagster-databricks] log the url for the run of a databricks job (thanks @smats0n)!
  • Fix missing partition property (thanks christeefy)!
  • Add op_tags to @observable_source_asset decorator (thanks @maxfirman)!
  • [docs] typo in MultiPartitionMapping docs (thanks @dschafer)
  • Allow github actions to checkout branch from forked repo for docs changes (ci fix) (thanks hainenber)!

Experimental#

  • [asset checks] UI performance of asset checks related pages has been improved.
  • [dagster-dbt] The class DbtArtifacts has been added for managing the behavior of rebuilding the manifest during development but expecting a pre-built one in production.

Documentation#

  • Added example of writing compute logs to AWS S3 when customizing agent configuration.
  • "Hello, Dagster" is now "Dagster Quickstart" with the option to use a Github Codespace to explore Dagster.
  • Improved guides and reference to better running multiple isolated agents with separate queues on ECS.

Dagster Cloud#

  • Microsoft Teams is now supported for alerts. Documentation
  • A send sample alert button now exists on both the alert policies page and in the alert policies editor to make it easier to debug and configure alerts without having to wait for an event to kick them off.

1.6.8 (core) / 0.22.8 (libraries)#

Bugfixes#

  • [dagster-embedded-elt] Fixed a bug in the SlingConnectionResource that raised an error when connecting to a database.

Experimental#

  • [asset checks] graph_multi_assets with check_specs now support subsetting.

1.6.7 (core) / 0.22.7 (libraries)#

New#

  • Added a new run_retries.retry_on_op_or_asset_failures setting that can be set to false to make run retries only occur when there is an unexpected failure that crashes the run, allowing run-level retries to co-exist more naturally with op or asset retries. See the docs for more information.
  • dagster dev now sets the environment variable DAGSTER_IS_DEV_CLI allowing subprocesses to know that they were launched in a development context.
  • [ui] The Asset Checks page has been updated to show more information on the page itself rather than in a dialog.

Bugfixes#

  • [ui] Fixed an issue where the UI disallowed creating a dynamic partition if its name contained the “|” pipe character.
  • AssetSpec previously dropped the metadata and code_version fields, resulting in them not being attached to the corresponding asset. This has been fixed.

Experimental#

  • The new @multi_observable_source_asset decorator enables defining a set of assets that can be observed together with the same function.
  • [dagster-embedded-elt] New Asset Decorator @sling_assets and Resource SlingConnectionResource have been added for the [dagster-embedded-elt.sling](http://dagster-embedded-elt.sling) package. Deprecated build_sling_asset, SlingSourceConnection and SlingTargetConnection.
  • Added support for op-concurrency aware run dequeuing for the QueuedRunCoordinator.

Documentation#

  • Fixed reference documentation for isolated agents in ECS.
  • Corrected an example in the Airbyte Cloud documentation.
  • Added API links to OSS Helm deployment guide.
  • Fixed in-line pragmas showing up in the documentation.

Dagster Cloud#

  • Alerts now support Microsoft Teams.
  • [ECS] Fixed an issue where code locations could be left undeleted.
  • [ECS] ECS agents now support setting multiple replicas per code server.
  • [Insights] You can now toggle the visibility of a row in the chart by clicking on the dot for the row in the table.
  • [Users] Added a new column “Licensed role” that shows the user's most permissive role.

1.6.6 (core) / 0.22.6 (libraries)#

New#

  • Dagster officially supports Python 3.12.
  • dagster-polars has been added as an integration. Thanks @danielgafni!
  • [dagster-dbt] @dbt_assets now supports loading projects with semantic models.
  • [dagster-dbt] @dbt_assets now supports loading projects with model versions.
  • [dagster-dbt] get_asset_key_for_model now supports retrieving asset keys for seeds and snapshots. Thanks @aksestok!
  • [dagster-duckdb] The Dagster DuckDB integration supports DuckDB version 0.10.0.
  • [UPath I/O manager] If a non-partitioned asset is updated to have partitions, the file containing the non-partitioned asset data will be deleted when the partitioned asset is materialized, rather than raising an error.

Bugfixes#

  • Fixed an issue where creating a backfill of assets with dynamic partitions and a backfill policy would sometimes fail with an exception.
  • Fixed an issue with the type annotations on the @asset decorator causing a false positive in Pyright strict mode. Thanks @tylershunt!
  • [ui] On the asset graph, nodes are slightly wider allowing more text to be displayed, and group names are no longer truncated.
  • [ui] Fixed an issue where the groups in the asset graph would not update after an asset was switched between groups.
  • [dagster-k8s] Fixed an issue where setting the security_context field on the k8s_job_executor didn't correctly set the security context on the launched step pods. Thanks @krgn!

Experimental#

  • Observable source assets can now yield ObserveResults with no data_version.
  • You can now include FreshnessPolicys on observable source assets. These assets will be considered “Overdue” when the latest value for the “dagster/data_time” metadata value is older than what’s allowed by the freshness policy.
  • [ui] In Dagster Cloud, a new feature flag allows you to enable an overhauled asset overview page with a high-level stakeholder view of the asset’s health, properties, and column schema.

Documentation#

  • Updated docs to reflect newly-added support for Python 3.12.

Dagster Cloud#

  • [kubernetes] Fixed an issue where the Kubernetes agent would sometimes leave dangling kubernetes services if the agent was interrupted during the middle of being terminated.

1.6.5 (core) / 0.22.5 (libraries)#

New#

  • Within a backfill or within auto-materialize, when submitting runs for partitions of the same assets, runs are now submitted in lexicographical order of partition key, instead of in an unpredictable order.
  • [dagster-k8s] Include k8s pod debug info in run worker failure messages.
  • [dagster-dbt] Events emitted by DbtCliResource now include metadata from the dbt adapter response. This includes fields like rows_affected, query_id from the Snowflake adapter, or bytes_processed from the BigQuery adapter.

Bugfixes#

  • A previous change prevented asset backfills from grouping multiple assets into the same run when using BackfillPolicies under certain conditions. While the backfills would still execute in the proper order, this could lead to more individual runs than necessary. This has been fixed.
  • [dagster-k8s] Fixed an issue introduced in the 1.6.4 release where upgrading the Helm chart without upgrading the Dagster version used by user code caused failures in jobs using the k8s_job_executor.
  • [instigator-tick-logs] Fixed an issue where invoking context.log.exception in a sensor or schedule did not properly capture exception information.
  • [asset-checks] Fixed an issue where additional dependencies for dbt tests modeled as Dagster asset checks were not properly being deduplicated.
  • [dagster-dbt] Fixed an issue where dbt model, seed, or snapshot names with periods were not supported.

Experimental#

  • @observable_source_asset-decorated functions can now return an ObserveResult. This allows including metadata on the observation, in addition to a data version. This is currently only supported for non-partitioned assets.
  • [auto-materialize] A new AutoMaterializeRule.skip_on_not_all_parents_updated_since_cron class allows you to construct AutoMaterializePolicys which wait for all parents to be updated after the latest tick of a given cron schedule.
  • [Global op/asset concurrency] Ops and assets now take run priority into account when claiming global op/asset concurrency slots.

Documentation#

  • Fixed an error in our asset checks docs. Thanks @vaharoni!
  • Fixed an error in our Dagster Pipes Kubernetes docs. Thanks @cameronmartin!
  • Fixed an issue on the Hello Dagster! guide that prevented it from loading.
  • Add specific capabilities of the Airflow integration to the Airflow integration page.
  • Re-arranged sections in the I/O manager concept page to make info about using I/O versus resources more prominent.

0.11.8#

New#

  • The @solid decorator can now wrap a function without a context argument, if no context information is required. For example, you can now do:
@solid
def basic_solid():
    return 5

@solid
def solid_with_inputs(x, y):
    return x + y

however, if your solid requires config or resources, then you will receive an error at definition time.

  • It is now simpler to provide structured metadata on events. Events that take a metadata_entries argument may now instead accept a metadata argument, which should allow for a more convenient API. The metadata argument takes a dictionary with string labels as keys and EventMetadata values. Some base types (str, int, float, and JSON-serializable list/dicts) are also accepted as values and will be automatically coerced to the appropriate EventMetadata value. For example:
@solid
def old_metadata_entries_solid(df):
   yield AssetMaterialization(
       "my_asset",
       metadata_entries=[
           EventMetadataEntry.text("users_table", "table name"),
           EventMetadataEntry.int(len(df), "row count"),
           EventMetadataEntry.url("http://mysite/users_table", "data url")
       ]
   )

@solid
def new_metadata_solid(df):
    yield AssetMaterialization(
       "my_asset",
       metadata={
           "table name": "users_table",
           "row count": len(df),
           "data url": EventMetadata.url("http://mysite/users_table")
       }
   )

  • The dagster-daemon process now has a --heartbeat-tolerance argument that allows you to configure how long the process can run before shutting itself down due to a hanging thread. This parameter can be used to troubleshoot failures with the daemon process.
  • When creating a schedule from a partition set using PartitionSetDefinition.create_schedule_definition, the partition_selector function that determines which partition to use for a given schedule tick can now return a list of partitions or a single partition, allowing you to create schedules that create multiple runs for each schedule tick.

Bugfixes#

  • Runs submitted via backfills can now correctly resolve the source run id when loading inputs from previous runs instead of encountering an unexpected KeyError.
  • Using nested Dict and Set types for solid inputs/outputs now works as expected. Previously a structure like Dict[str, Dict[str, Dict[str, SomeClass]]] could result in confusing errors.
  • Dagstermill now correctly loads the config for aliased solids instead of loading from the incorrect place which would result in empty solid_config.
  • Error messages when incomplete run config is supplied are now more accurate and precise.
  • An issue that would cause map and collect steps downstream of other map and collect steps to mysteriously not execute when using multiprocess executors has been resolved.

Documentation#

0.11.7#

New#

  • For pipelines with tags defined in code, display these tags in the Dagit playground.
  • On the Dagit asset list page, use a polling query to regularly refresh the asset list.
  • When viewing the Dagit asset list, persist the user’s preference between the flattened list view and the directory structure view.
  • Added solid_exception on HookContext which returns the actual exception object thrown in a failed solid. See the example “Accessing failure information in a failure hook“ for more details.
  • Added solid_output_values on HookContext which returns the computed output values.
  • Added make_values_resource helper for defining a resource that passes in user-defined values. This is useful when you want multiple solids to share values. See the example for more details.
  • StartupProbes can now be set to disabled in Helm charts. This is useful if you’re running on a version earlier than Kubernetes 1.16.

Bugfixes#

  • Fixed an issue where partial re-execution was not referencing the right source run and failed to load the correct persisted outputs.
  • When running Dagit with --path-prefix, our color-coded favicons denoting the success or failure of a run were not loading properly. This has been fixed.
  • Hooks and tags defined on solid invocations now work correctly when executing a pipeline with a solid subselection
  • Fixed an issue where heartbeats from the dagster-daemon process would not appear on the Status page in dagit until the process had been running for 30 seconds
  • When filtering runs, Dagit now suggests all “status:” values and other auto-completions in a scrolling list
  • Fixed asset catalog where nested directory structure links flipped back to the flat view structure

Community Contributions#

  • [Helm] The Dagit service port is now configurable (thanks @trevenrawr!)
  • [Docs] Cleanup & updating visual aids (thanks @keypointt!)

Experimental#

  • [Dagster-GraphQL] Added an official Python Client for Dagster’s GraphQL API (GH issue #2674). Docs can be found here

Documentation#

  • Fixed a confusingly-worded header on the Solids/Pipelines Testing page

0.11.6#

Breaking Changes#

  • DagsterInstance.get() no longer falls back to an ephemeral instance if DAGSTER_HOME is not set. We don’t expect this to break normal workflows. This change allows our tooling to be more consistent around it’s expectations. If you were relying on getting an ephemeral instance you can use DagsterInstance.ephemeral() directly.
  • Undocumented attributes on HookContext have been removed. step_key and mode_def have been documented as attributes.

New#

  • Added a permanent, linkable panel in the Run view in Dagit to display the raw compute logs.
  • Added more descriptive / actionable error messages throughout the config system.
  • When viewing a partitioned asset in Dagit, display only the most recent materialization for a partition, with a link to view previous materializations in a dialog.
  • When viewing a run in Dagit, individual log line timestamps now have permalinks. When loading a timestamp permalink, the log table will highlight and scroll directly to that line.
  • The default config_schema for all configurable objects - solids, resources, IO managers, composite solids, executors, loggers - is now Any. This means that you can now use configuration without explicitly providing a config_schema. Refer to the docs for more details: https://docs.dagster.io/concepts/configuration/config-schema.
  • When launching an out of process run, resources are no longer initialized in the orchestrating process. This should give a performance boost for those using out of process execution with heavy resources (ie, spark context).
  • input_defs and output_defs on @solid will now flexibly combine data that can be inferred from the function signature that is not declared explicitly via InputDefinition / OutputDefinition. This allows for more concise defining of solids with reduced repetition of information.
  • [Helm] Postgres storage configuration now supports connection string parameter keywords.
  • The Status page in Dagit will now display errors that were surfaced in the dagster-daemon process within the last 5 minutes. Previously, it would only display errors from the last 30 seconds.
  • Hanging sensors and schedule functions will now raise a timeout exception after 60 seconds, instead of crashing the dagster-daemon process.
  • The DockerRunLauncher now accepts a container_kwargs config parameter, allowing you to specify any argument to the run container that can be passed into the Docker containers.run method. See https://docker-py.readthedocs.io/en/stable/containers.html#docker.models.containers.ContainerCollection.run for the full list of available options.
  • Added clearer error messages for when a Partition cannot be found in a Partition Set.
  • The celery_k8s_job_executor now accepts a job_wait_timeout allowing you to override the default of 24 hours.

Bugfixes#

  • Fixed the raw compute logs in Dagit, which were not live updating as the selected step was executing.
  • Fixed broken links in the Backfill table in Dagit when Dagit is started with a --prefix-path argument.
  • Showed failed status of backfills in the Backfill table in Dagit, along with an error stack trace. Previously, the backfill jobs were stuck in a Requested state.
  • Previously, if you passed a non-required Field to the output_config_schema or input_config_schema arguments of @io_manager, the config would still be required. Now, the config is not required.
  • Fixed nested subdirectory views in the Assets catalog, where the view switcher would flip back from the directory view to the flat view when navigating into subdirectories.
  • Fixed an issue where the dagster-daemon process would crash if it experienced a transient connection error while connecting to the Dagster database.
  • Fixed an issue where the dagster-airflow scaffold command would raise an exception if a preset was specified.
  • Fixed an issue where Dagit was not including the error stack trace in the Status page when a repository failed to load.

0.11.5#

New#

  • Resources in a ModeDefinition that are not required by a pipeline no longer require runtime configuration. This should make it easier to share modes or resources among multiple pipelines.
  • Dagstermill solids now support retries when a RetryRequested is yielded from a notebook using dagstermill.yield_event.
  • In Dagit, the asset catalog now supports both a flattened view of all assets as well as a hierarchical directory view.
  • In Dagit, the asset catalog now supports bulk wiping of assets.

Bugfixes#

  • In the Dagit left nav, schedules and sensors accurately reflect the filtered repositories.
  • When executing a pipeline with a subset of solids, the config for solids not included in the subset is correctly made optional in more cases.
  • URLs were sometimes not prefixed correctly when running Dagit using the --path-prefix option, leading to failed GraphQL requests and broken pages. This bug was introduced in 0.11.4, and is now fixed.
  • The update_timestamp column in the runs table is now updated with a UTC timezone, making it consistent with the create_timestamp column.
  • In Dagit, the main content pane now renders correctly on ultra-wide displays.
  • The partition run matrix on the pipeline partition tab now shows step results for composite solids and dynamically mapped solids. Previously, the step status was not shown at all for these solids.
  • Removed dependency constraint of dagster-pandas on pandas. You can now include any version of pandas. (https://github.com/dagster-io/dagster/issues/3350)
  • Removed dependency on requests in dagster. Now only dagit depends on requests.
  • Removed dependency on pyrsistent in dagster.

Documentation#

  • Updated the “Deploying to Airflow” documentation to reflect the current state of the system.

0.11.4#

Community Contributions#

  • Fix typo in --config help message (thanks @pawelad !)

Breaking Changes#

  • Previously, when retrieving the outputs from a run of execute_pipeline, the system would use the io manager that handled each output to perform the retrieval. Now, when using execute_pipeline with the default in-process executor, the system directly captures the outputs of solids for use with the result object returned by execute_pipeline. This may lead to slightly different behavior when retrieving outputs if switching between executors and using custom IO managers.

New#

  • The K8sRunLauncher and CeleryK8sRunLauncher now add a dagster/image tag to pipeline runs to document the image used. The DockerRunLauncher has also been modified to use this tag (previously it used docker/image).
  • In Dagit, the left navigation is now collapsible on smaller viewports. You can use the . key shortcut to toggle visibility.
  • @solid can now decorate async def functions.

Bugfixes#

  • In Dagit, a GraphQL error on partition sets related to missing fragment PartitionGraphFragment has been fixed.
  • The compute log manager now handles base directories containing spaces in the path.
  • Fixed a bug where re-execution was not working if the initial execution failed, and execution was delegated to other machines/process (e.g. using the multiprocess executor)
  • The same solid can now collect over multiple dynamic outputs