You are viewing an unreleased or outdated version of the documentation

Changelog#

1.6.9 (core) / 0.22.8 (libraries)#

New#

  • [ui] When viewing logs for a run, the date for a single log row is now shown in the tooltip on the timestamp. This helps when viewing a run that takes place over more than one date.
  • Added suggestions to the error message when selecting asset keys that do not exist as an upstream asset or in an AssetSelection.
  • Improved error messages when trying to materialize a subset of a multi-asset which cannot be subset.
  • [dagster-snowflake] dagster-snowflake now requires snowflake-connector-python>=3.4.0
  • [embedded-elt] @sling_assets accepts an optional name parameter for the underlying op
  • [dagster-openai] dagster-openai library is now available.
  • [dagster-dbt] Added a new setting on DagsterDbtTranslatorSettings called enable_duplicate_source_asset_keys that allows users to set duplicate asset keys for their dbt sources. Thanks @hello-world-bfree!
  • Log messages in the Dagster daemon for unloadable sensors and schedules have been removed.
  • [ui] Search now uses a cache that persists across pageloads which should greatly improve search performance for very large orgs.
  • [ui] groups/code locations in the asset graph’s sidebar are now sorted alphabetically.

Bugfixes#

  • Fixed issue where the input/output schemas of configurable IOManagers could be ignored when providing explicit input / output run config.
  • Fixed an issue where enum values could not properly have a default value set in a ConfigurableResource.
  • Fixed an issue where graph-backed assets would sometimes lose user-provided descriptions due to a bug in internal copying.
  • [auto-materialize] Fixed an issue introduced in 1.6.7 where updates to ExternalAssets would be ignored when using AutoMaterializePolicies which depended on parent updates.
  • [asset checks] Fixed a bug with asset checks in step launchers.
  • [embedded-elt] Fix a bug when creating a SlingConnectionResource where a blank keyword argument would be emitted as an environment variable
  • [dagster-dbt] Fixed a bug where emitting events from dbt source freshness would cause an error.
  • [ui] Fixed a bug where using the “Terminate all runs” button with filters selected would not apply the filters to the action.
  • [ui] Fixed an issue where typing a search query into the search box before the search data was fetched would yield “No results” even after the data was fetched.

Community Contributions#

  • [docs] fixed typo in embedded-elt.mdx (thanks @cameronmartin)!
  • [dagster-databricks] log the url for the run of a databricks job (thanks @smats0n)!
  • Fix missing partition property (thanks christeefy)!
  • Add op_tags to @observable_source_asset decorator (thanks @maxfirman)!
  • [docs] typo in MultiPartitionMapping docs (thanks @dschafer)
  • Allow github actions to checkout branch from forked repo for docs changes (ci fix) (thanks hainenber)!

Experimental#

  • [asset checks] UI performance of asset checks related pages has been improved.
  • [dagster-dbt] The class DbtArtifacts has been added for managing the behavior of rebuilding the manifest during development but expecting a pre-built one in production.

Documentation#

  • Added example of writing compute logs to AWS S3 when customizing agent configuration.
  • "Hello, Dagster" is now "Dagster Quickstart" with the option to use a Github Codespace to explore Dagster.
  • Improved guides and reference to better running multiple isolated agents with separate queues on ECS.

Dagster Cloud#

  • Microsoft Teams is now supported for alerts. Documentation
  • A send sample alert button now exists on both the alert policies page and in the alert policies editor to make it easier to debug and configure alerts without having to wait for an event to kick them off.

1.6.8 (core) / 0.22.8 (libraries)#

Bugfixes#

  • [dagster-embedded-elt] Fixed a bug in the SlingConnectionResource that raised an error when connecting to a database.

Experimental#

  • [asset checks] graph_multi_assets with check_specs now support subsetting.

1.6.7 (core) / 0.22.7 (libraries)#

New#

  • Added a new run_retries.retry_on_op_or_asset_failures setting that can be set to false to make run retries only occur when there is an unexpected failure that crashes the run, allowing run-level retries to co-exist more naturally with op or asset retries. See the docs for more information.
  • dagster dev now sets the environment variable DAGSTER_IS_DEV_CLI allowing subprocesses to know that they were launched in a development context.
  • [ui] The Asset Checks page has been updated to show more information on the page itself rather than in a dialog.

Bugfixes#

  • [ui] Fixed an issue where the UI disallowed creating a dynamic partition if its name contained the “|” pipe character.
  • AssetSpec previously dropped the metadata and code_version fields, resulting in them not being attached to the corresponding asset. This has been fixed.

Experimental#

  • The new @multi_observable_source_asset decorator enables defining a set of assets that can be observed together with the same function.
  • [dagster-embedded-elt] New Asset Decorator @sling_assets and Resource SlingConnectionResource have been added for the [dagster-embedded-elt.sling](http://dagster-embedded-elt.sling) package. Deprecated build_sling_asset, SlingSourceConnection and SlingTargetConnection.
  • Added support for op-concurrency aware run dequeuing for the QueuedRunCoordinator.

Documentation#

  • Fixed reference documentation for isolated agents in ECS.
  • Corrected an example in the Airbyte Cloud documentation.
  • Added API links to OSS Helm deployment guide.
  • Fixed in-line pragmas showing up in the documentation.

Dagster Cloud#

  • Alerts now support Microsoft Teams.
  • [ECS] Fixed an issue where code locations could be left undeleted.
  • [ECS] ECS agents now support setting multiple replicas per code server.
  • [Insights] You can now toggle the visibility of a row in the chart by clicking on the dot for the row in the table.
  • [Users] Added a new column “Licensed role” that shows the user's most permissive role.

1.6.6 (core) / 0.22.6 (libraries)#

New#

  • Dagster officially supports Python 3.12.
  • dagster-polars has been added as an integration. Thanks @danielgafni!
  • [dagster-dbt] @dbt_assets now supports loading projects with semantic models.
  • [dagster-dbt] @dbt_assets now supports loading projects with model versions.
  • [dagster-dbt] get_asset_key_for_model now supports retrieving asset keys for seeds and snapshots. Thanks @aksestok!
  • [dagster-duckdb] The Dagster DuckDB integration supports DuckDB version 0.10.0.
  • [UPath I/O manager] If a non-partitioned asset is updated to have partitions, the file containing the non-partitioned asset data will be deleted when the partitioned asset is materialized, rather than raising an error.

Bugfixes#

  • Fixed an issue where creating a backfill of assets with dynamic partitions and a backfill policy would sometimes fail with an exception.
  • Fixed an issue with the type annotations on the @asset decorator causing a false positive in Pyright strict mode. Thanks @tylershunt!
  • [ui] On the asset graph, nodes are slightly wider allowing more text to be displayed, and group names are no longer truncated.
  • [ui] Fixed an issue where the groups in the asset graph would not update after an asset was switched between groups.
  • [dagster-k8s] Fixed an issue where setting the security_context field on the k8s_job_executor didn't correctly set the security context on the launched step pods. Thanks @krgn!

Experimental#

  • Observable source assets can now yield ObserveResults with no data_version.
  • You can now include FreshnessPolicys on observable source assets. These assets will be considered “Overdue” when the latest value for the “dagster/data_time” metadata value is older than what’s allowed by the freshness policy.
  • [ui] In Dagster Cloud, a new feature flag allows you to enable an overhauled asset overview page with a high-level stakeholder view of the asset’s health, properties, and column schema.

Documentation#

  • Updated docs to reflect newly-added support for Python 3.12.

Dagster Cloud#

  • [kubernetes] Fixed an issue where the Kubernetes agent would sometimes leave dangling kubernetes services if the agent was interrupted during the middle of being terminated.

1.6.5 (core) / 0.22.5 (libraries)#

New#

  • Within a backfill or within auto-materialize, when submitting runs for partitions of the same assets, runs are now submitted in lexicographical order of partition key, instead of in an unpredictable order.
  • [dagster-k8s] Include k8s pod debug info in run worker failure messages.
  • [dagster-dbt] Events emitted by DbtCliResource now include metadata from the dbt adapter response. This includes fields like rows_affected, query_id from the Snowflake adapter, or bytes_processed from the BigQuery adapter.

Bugfixes#

  • A previous change prevented asset backfills from grouping multiple assets into the same run when using BackfillPolicies under certain conditions. While the backfills would still execute in the proper order, this could lead to more individual runs than necessary. This has been fixed.
  • [dagster-k8s] Fixed an issue introduced in the 1.6.4 release where upgrading the Helm chart without upgrading the Dagster version used by user code caused failures in jobs using the k8s_job_executor.
  • [instigator-tick-logs] Fixed an issue where invoking context.log.exception in a sensor or schedule did not properly capture exception information.
  • [asset-checks] Fixed an issue where additional dependencies for dbt tests modeled as Dagster asset checks were not properly being deduplicated.
  • [dagster-dbt] Fixed an issue where dbt model, seed, or snapshot names with periods were not supported.

Experimental#

  • @observable_source_asset-decorated functions can now return an ObserveResult. This allows including metadata on the observation, in addition to a data version. This is currently only supported for non-partitioned assets.
  • [auto-materialize] A new AutoMaterializeRule.skip_on_not_all_parents_updated_since_cron class allows you to construct AutoMaterializePolicys which wait for all parents to be updated after the latest tick of a given cron schedule.
  • [Global op/asset concurrency] Ops and assets now take run priority into account when claiming global op/asset concurrency slots.

Documentation#

  • Fixed an error in our asset checks docs. Thanks @vaharoni!
  • Fixed an error in our Dagster Pipes Kubernetes docs. Thanks @cameronmartin!
  • Fixed an issue on the Hello Dagster! guide that prevented it from loading.
  • Add specific capabilities of the Airflow integration to the Airflow integration page.
  • Re-arranged sections in the I/O manager concept page to make info about using I/O versus resources more prominent.

0.7.0 "Waiting to Exhale"#

Breaking Changes

There are a substantial number of breaking changes in the 0.7.0 release. Please see 070_MIGRATION.md for instructions regarding migrating old code.

Scheduler

  • The scheduler configuration has been moved from the @schedules decorator to DagsterInstance. Existing schedules that have been running are no longer compatible with current storage. To migrate, remove the scheduler argument on all @schedules decorators:

    instead of:

    @schedules(scheduler=SystemCronScheduler)
    def define_schedules():
      ...
    

    Remove the scheduler argument:

    @schedules
    def define_schedules():
      ...
    

    Next, configure the scheduler on your instance by adding the following to $DAGSTER_HOME/dagster.yaml:

    scheduler:
      module: dagster_cron.cron_scheduler
      class: SystemCronScheduler
    

    Finally, if you had any existing schedules running, delete the existing $DAGSTER_HOME/schedules directory and run dagster schedule wipe && dagster schedule up to re-instatiate schedules in a valid state.

  • The should_execute and environment_dict_fn argument to ScheduleDefinition now have a required first argument context, representing the ScheduleExecutionContext

Config System Changes

  • In the config system, Dict has been renamed to Shape; List to Array; Optional to Noneable; and PermissiveDict to Permissive. The motivation here is to clearly delineate config use cases versus cases where you are using types as the inputs and outputs of solids as well as python typing types (for mypy and friends). We believe this will be clearer to users in addition to simplifying our own implementation and internal abstractions.

    Our recommended fix is not to use Shape and Array, but instead to use our new condensed config specification API. This allow one to use bare dictionaries instead of Shape, lists with one member instead of Array, bare types instead of Field with a single argument, and python primitive types (int, bool etc) instead of the dagster equivalents. These result in dramatically less verbose config specs in most cases.

    So instead of

    from dagster import Shape, Field, Int, Array, String
    # ... code
    config=Shape({ # Dict prior to change
          'some_int' : Field(Int),
          'some_list: Field(Array[String]) # List prior to change
      })
    

    one can instead write:

    config={'some_int': int, 'some_list': [str]}
    

    No imports and much simpler, cleaner syntax.

  • config_field is no longer a valid argument on solid, SolidDefinition, ExecutorDefintion, executor, LoggerDefinition, logger, ResourceDefinition, resource, system_storage, and SystemStorageDefinition. Use config instead.

  • For composite solids, the config_fn no longer takes a ConfigMappingContext, and the context has been deleted. To upgrade, remove the first argument to config_fn.

    So instead of

    @composite_solid(config={}, config_fn=lambda context, config: {})
    

    one must instead write:

    @composite_solid(config={}, config_fn=lambda config: {})
    
  • Field takes a is_required rather than a is_optional argument. This is to avoid confusion with python's typing and dagster's definition of Optional, which indicates None-ability, rather than existence. is_optional is deprecated and will be removed in a future version.

Required Resources

  • All solids, types, and config functions that use a resource must explicitly list that resource using the argument required_resource_keys. This is to enable efficient resource management during pipeline execution, especially in a multiprocessing or remote execution environment.

  • The @system_storage decorator now requires argument required_resource_keys, which was previously optional.

Dagster Type System Changes

  • dagster.Set and dagster.Tuple can no longer be used within the config system.
  • Dagster types are now instances of DagsterType, rather than a class than inherits from RuntimeType. Instead of dynamically generating a class to create a custom runtime type, just create an instance of a DagsterType. The type checking function is now an argument to the DagsterType, rather than an abstract method that has to be implemented in a subclass.
  • RuntimeType has been renamed to DagsterType is now an encouraged API for type creation.
  • Core type check function of DagsterType can now return a naked bool in addition to a TypeCheck object.
  • type_check_fn on DagsterType (formerly type_check and RuntimeType, respectively) now takes a first argument context of type TypeCheckContext in addition to the second argument of value.
  • define_python_dagster_type has been eliminated in favor of PythonObjectDagsterType .
  • dagster_type has been renamed to usable_as_dagster_type.
  • as_dagster_type has been removed and similar capabilities added as make_python_type_usable_as_dagster_type.
  • PythonObjectDagsterType and usable_as_dagster_type no longer take a type_check argument. If a custom type_check is needed, use DagsterType.
  • As a consequence of these changes, if you were previously using dagster_pyspark or dagster_pandas and expecting Pyspark or Pandas types to work as Dagster types, e.g., in type annotations to functions decorated with @solid to indicate that they are input or output types for a solid, you will need to call make_python_type_usable_as_dagster_type from your code in order to map the Python types to the Dagster types, or just use the Dagster types (dagster_pandas.DataFrame instead of pandas.DataFrame) directly.

Other

  • We no longer publish base Docker images. Please see the updated deployment docs for an example Dockerfile off of which you can work.
  • step_metadata_fn has been removed from SolidDefinition & @solid.
  • SolidDefinition & @solid now takes tags and enforces that values are strings or are safely encoded as JSON. metadata is deprecated and will be removed in a future version.
  • resource_mapper_fn has been removed from SolidInvocation.

New

  • Dagit now includes a much richer execution view, with a Gantt-style visualization of step execution and a live timeline.

  • Early support for Python 3.8 is now available, and Dagster/Dagit along with many of our libraries are now tested against 3.8. Note that several of our upstream dependencies have yet to publish wheels for 3.8 on all platforms, so running on Python 3.8 likely still involves building some dependencies from source.

  • dagster/priority tags can now be used to prioritize the order of execution for the built-in in-process and multiprocess engines.

  • dagster-postgres storages can now be configured with separate arguments and environment variables, such as:

    run_storage:
      module: dagster_postgres.run_storage
      class: PostgresRunStorage
      config:
        postgres_db:
          username: test
          password:
            env: ENV_VAR_FOR_PG_PASSWORD
          hostname: localhost
          db_name: test
    
  • Support for RunLaunchers on DagsterInstance allows for execution to be "launched" outside of the Dagit/Dagster process. As one example, this is used by dagster-k8s to submit pipeline execution as a Kubernetes Job.

  • Added support for adding tags to runs initiated from the Playground view in Dagit.

  • Added @monthly_schedule decorator.

  • Added Enum.from_python_enum helper to wrap Python enums for config. (Thanks @kdungs!)

  • [dagster-bash] The Dagster bash solid factory now passes along kwargs to the underlying solid construction, and now has a single Nothing input by default to make it easier to create a sequencing dependency. Also, logs are now buffered by default to make execution less noisy.

  • [dagster-aws] We've improved our EMR support substantially in this release. The dagster_aws.emr library now provides an EmrJobRunner with various utilities for creating EMR clusters, submitting jobs, and waiting for jobs/logs. We also now provide a emr_pyspark_resource, which together with the new @pyspark_solid decorator makes moving pyspark execution from your laptop to EMR as simple as changing modes. [dagster-pandas] Added create_dagster_pandas_dataframe_type, PandasColumn, and Constraint API's in order for users to create custom types which perform column validation, dataframe validation, summary statistics emission, and dataframe serialization/deserialization.

  • [dagster-gcp] GCS is now supported for system storage, as well as being supported with the Dask executor. (Thanks @habibutsu!) Bigquery solids have also been updated to support the new API.

Bugfix

  • Ensured that all implementations of RunStorage clean up pipeline run tags when a run is deleted. Requires a storage migration, using dagster instance migrate.
  • The multiprocess and Celery engines now handle solid subsets correctly.
  • The multiprocess and Celery engines will now correctly emit skip events for steps downstream of failures and other skips.
  • The @solid and @lambda_solid decorators now correctly wrap their decorated functions, in the sense of functools.wraps.
  • Performance improvements in Dagit when working with runs with large configurations.
  • The Helm chart in dagster_k8s has been hardened against various failure modes and is now compatible with Helm 2.
  • SQLite run and event log storages are more robust to concurrent use.
  • Improvements to error messages and to handling of user code errors in input hydration and output materialization logic.
  • Fixed an issue where the Airflow scheduler could hang when attempting to load dagster-airflow pipelines.
  • We now handle our SQLAlchemy connections in a more canonical way (thanks @zzztimbo!).
  • Fixed an issue using S3 system storage with certain custom serialization strategies.
  • Fixed an issue leaking orphan processes from compute logging.
  • Fixed an issue leaking semaphores from Dagit.
  • Setting the raise_error flag in execute_pipeline now actually raises user exceptions instead of a wrapper type.

Documentation

  • Our docs have been reorganized and expanded (thanks @habibutsu, @vatervonacht, @zzztimbo). We'd love feedback and contributions!

Thank you Thank you to all of the community contributors to this release!! In alphabetical order: @habibutsu, @kdungs, @vatervonacht, @zzztimbo.

0.6.9#

Bugfix

  • Improved SQLite concurrency issues, uncovered while using concurrent nodes in Airflow
  • Fixed sqlalchemy warnings (thanks @zzztimbo!)
  • Fixed Airflow integration issue where a Dagster child process triggered a signal handler of a parent Airflow process via a process fork
  • Fixed GCS and AWS intermediate store implementations to be compatible with read/write mode serialization strategies
  • Improve test stability

Documentation

  • Improved descriptions for setting up the cron scheduler (thanks @zzztimbo!)

0.6.8#

New

  • Added the dagster-github library, a community contribution from @Ramshackle-Jamathon and @k-mahoney!

dagster-celery

  • Simplified and improved config handling.
  • An engine event is now emitted when the engine fails to connect to a broker.

Bugfix

  • Fixes a file descriptor leak when running many concurrent dagster-graphql queries (e.g., for backfill).
  • The @pyspark_solid decorator now handles inputs correctly.
  • The handling of solid compute functions that accept kwargs but which are decorated with explicit input definitions has been rationalized.
  • Fixed race conditions in concurrent execution using SQLite event log storage with concurrent execution, uncovered by upstream improvements in the Python inotify library we use.

Documentation

  • Improved error messages when using system storages that don't fulfill executor requirements.

0.6.7#

New

  • We are now more permissive when specifying configuration schema in order make constructing configuration schema more concise.
  • When specifying the value of scalar inputs in config, one can now specify that value directly as the key of the input, rather than having to embed it within a value key.

Breaking

  • The implementation of SQL-based event log storages has been consolidated, which has entailed a schema change. If you have event logs stored in a Postgres- or SQLite-backed event log storage, and you would like to maintain access to these logs, you should run dagster instance migrate. To check what event log storages you are using, run dagster instance info.
  • Type matches on both sides of an InputMapping or OutputMapping are now enforced.

New

  • Dagster is now tested on Python 3.8
  • Added the dagster-celery library, which implements a Celery-based engine for parallel pipeline execution.
  • Added the dagster-k8s library, which includes a Helm chart for a simple Dagit installation on a Kubernetes cluster.

Dagit

  • The Explore UI now allows you to render a subset of a large DAG via a new solid query bar that accepts terms like solid_name+* and +solid_name+. When viewing very large DAGs, nothing is displayed by default and * produces the original behavior.
  • Performance improvements in the Explore UI and config editor for large pipelines.
  • The Explore UI now includes a zoom slider that makes it easier to navigate large DAGs.
  • Dagit pages now render more gracefully in the presence of inconsistent run storage and event logs.
  • Improved handling of GraphQL errors and backend programming errors.
  • Minor display improvements.

dagster-aws

  • A default prefix is now configurable on APIs that use S3.
  • S3 APIs now parametrize region_name and endpoint_url.

dagster-gcp

  • A default prefix is now configurable on APIs that use GCS.

dagster-postgres

  • Performance improvements for Postgres-backed storages.

dagster-pyspark

  • Pyspark sessions may now be configured to be held open after pipeline execution completes, to enable extended test cases.

dagster-spark

  • spark_outputs must now be specified when initializing a SparkSolidDefinition, rather than in config.
  • Added new create_spark_solid helper and new spark_resource.
  • Improved EMR implementation.

Bugfix

  • Fixed an issue retrieving output values using SolidExecutionResult (e.g., in test) for dagster-pyspark solids.
  • Fixes an issue when expanding composite solids in Dagit.
  • Better errors when solid names collide.
  • Config mapping in composite solids now works as expected when the composite solid has no top level config.
  • Compute log filenames are now guaranteed not to exceed the POSIX limit of 255 chars.
  • Fixes an issue when copying and pasting solid names from Dagit.
  • Termination now works as expected in the multiprocessing executor.
  • The multiprocessing executor now executes parallel steps in the expected order.
  • The multiprocessing executor now correctly handles solid subsets.
  • Fixed a bad error condition in dagster_ssh.sftp_solid.
  • Fixed a bad error message giving incorrect log level suggestions.

Documentation

  • Minor fixes and improvements.

Thank you Thank you to all of the community contributors to this release!! In alphabetical order: @cclauss, @deem0n, @irabinovitch, @pseudoPixels, @Ramshackle-Jamathon, @rparrapy, @yamrzou.

0.6.6#

Breaking

  • The selector argument to PipelineDefinition has been removed. This API made it possible to construct a PipelineDefinition in an invalid state. Use PipelineDefinition.build_sub_pipeline instead.

New

  • Added the dagster_prometheus library, which exposes a basic Prometheus resource.
  • Dagster Airflow DAGs may now use GCS instead of S3 for storage.
  • Expanded interface for schedule management in Dagit.

Dagit

  • Performance improvements when loading, displaying, and editing config for large pipelines.
  • Smooth scrolling zoom in the explore tab replaces the previous two-step zoom.
  • No longer depends on internet fonts to run, allowing fully offline dev.
  • Typeahead behavior in search has improved.
  • Invocations of composite solids remain visible in the sidebar when the solid is expanded.
  • The config schema panel now appears when the config editor is first opened.
  • Interface now includes hints for autocompletion in the config editor.
  • Improved display of solid inputs and output in the explore tab.
  • Provides visual feedback while filter results are loading.
  • Better handling of pipelines that aren't present in the currently loaded repo.

Bugfix

  • Dagster Airflow DAGs previously could crash while handling Python errors in DAG logic.
  • Step failures when running Dagster Airflow DAGs were previously not being surfaced as task failures in Airflow.
  • Dagit could previously get into an invalid state when switching pipelines in the context of a solid subselection.
  • frozenlist and frozendict now pass Dagster's parameter type checks for list and dict.
  • The GraphQL playground in Dagit is now working again.

Nits

  • Dagit now prints its pid when it loads.
  • Third-party dependencies have been relaxed to reduce the risk of version conflicts.
  • Improvements to docs and example code.