[ui] When viewing logs for a run, the date for a single log row is now shown in the tooltip on the timestamp. This helps when viewing a run that takes place over more than one date.
Added suggestions to the error message when selecting asset keys that do not exist as an upstream asset or in an AssetSelection.
Improved error messages when trying to materialize a subset of a multi-asset which cannot be subset.
[dagster-snowflake] dagster-snowflake now requires snowflake-connector-python>=3.4.0
[embedded-elt] @sling_assets accepts an optional name parameter for the underlying op
[dagster-openai] dagster-openai library is now available.
[dagster-dbt] Added a new setting on DagsterDbtTranslatorSettings called enable_duplicate_source_asset_keys that allows users to set duplicate asset keys for their dbt sources. Thanks @hello-world-bfree!
Log messages in the Dagster daemon for unloadable sensors and schedules have been removed.
[ui] Search now uses a cache that persists across pageloads which should greatly improve search performance for very large orgs.
[ui] groups/code locations in the asset graph’s sidebar are now sorted alphabetically.
Fixed issue where the input/output schemas of configurable IOManagers could be ignored when providing explicit input / output run config.
Fixed an issue where enum values could not properly have a default value set in a ConfigurableResource.
Fixed an issue where graph-backed assets would sometimes lose user-provided descriptions due to a bug in internal copying.
[auto-materialize] Fixed an issue introduced in 1.6.7 where updates to ExternalAssets would be ignored when using AutoMaterializePolicies which depended on parent updates.
[asset checks] Fixed a bug with asset checks in step launchers.
[embedded-elt] Fix a bug when creating a SlingConnectionResource where a blank keyword argument would be emitted as an environment variable
[dagster-dbt] Fixed a bug where emitting events from dbt source freshness would cause an error.
[ui] Fixed a bug where using the “Terminate all runs” button with filters selected would not apply the filters to the action.
[ui] Fixed an issue where typing a search query into the search box before the search data was fetched would yield “No results” even after the data was fetched.
[asset checks] UI performance of asset checks related pages has been improved.
[dagster-dbt] The class DbtArtifacts has been added for managing the behavior of rebuilding the manifest during development but expecting a pre-built one in production.
Microsoft Teams is now supported for alerts. Documentation
A send sample alert button now exists on both the alert policies page and in the alert policies editor to make it easier to debug and configure alerts without having to wait for an event to kick them off.
Added a new run_retries.retry_on_op_or_asset_failures setting that can be set to false to make run retries only occur when there is an unexpected failure that crashes the run, allowing run-level retries to co-exist more naturally with op or asset retries. See the docs for more information.
dagster dev now sets the environment variable DAGSTER_IS_DEV_CLI allowing subprocesses to know that they were launched in a development context.
[ui] The Asset Checks page has been updated to show more information on the page itself rather than in a dialog.
[ui] Fixed an issue where the UI disallowed creating a dynamic partition if its name contained the “|” pipe character.
AssetSpec previously dropped the metadata and code_version fields, resulting in them not being attached to the corresponding asset. This has been fixed.
The new @multi_observable_source_asset decorator enables defining a set of assets that can be observed together with the same function.
[dagster-embedded-elt] New Asset Decorator @sling_assets and Resource SlingConnectionResource have been added for the [dagster-embedded-elt.sling](http://dagster-embedded-elt.sling) package. Deprecated build_sling_asset, SlingSourceConnection and SlingTargetConnection.
Added support for op-concurrency aware run dequeuing for the QueuedRunCoordinator.
dagster-polars has been added as an integration. Thanks @danielgafni!
[dagster-dbt] @dbt_assets now supports loading projects with semantic models.
[dagster-dbt] @dbt_assets now supports loading projects with model versions.
[dagster-dbt] get_asset_key_for_model now supports retrieving asset keys for seeds and snapshots. Thanks @aksestok!
[dagster-duckdb] The Dagster DuckDB integration supports DuckDB version 0.10.0.
[UPath I/O manager] If a non-partitioned asset is updated to have partitions, the file containing the non-partitioned asset data will be deleted when the partitioned asset is materialized, rather than raising an error.
Fixed an issue where creating a backfill of assets with dynamic partitions and a backfill policy would sometimes fail with an exception.
Fixed an issue with the type annotations on the @asset decorator causing a false positive in Pyright strict mode. Thanks @tylershunt!
[ui] On the asset graph, nodes are slightly wider allowing more text to be displayed, and group names are no longer truncated.
[ui] Fixed an issue where the groups in the asset graph would not update after an asset was switched between groups.
[dagster-k8s] Fixed an issue where setting the security_context field on the k8s_job_executor didn't correctly set the security context on the launched step pods. Thanks @krgn!
Observable source assets can now yield ObserveResults with no data_version.
You can now include FreshnessPolicys on observable source assets. These assets will be considered “Overdue” when the latest value for the “dagster/data_time” metadata value is older than what’s allowed by the freshness policy.
[ui] In Dagster Cloud, a new feature flag allows you to enable an overhauled asset overview page with a high-level stakeholder view of the asset’s health, properties, and column schema.
[kubernetes] Fixed an issue where the Kubernetes agent would sometimes leave dangling kubernetes services if the agent was interrupted during the middle of being terminated.
Within a backfill or within auto-materialize, when submitting runs for partitions of the same assets, runs are now submitted in lexicographical order of partition key, instead of in an unpredictable order.
[dagster-k8s] Include k8s pod debug info in run worker failure messages.
[dagster-dbt] Events emitted by DbtCliResource now include metadata from the dbt adapter response. This includes fields like rows_affected, query_id from the Snowflake adapter, or bytes_processed from the BigQuery adapter.
A previous change prevented asset backfills from grouping multiple assets into the same run when using BackfillPolicies under certain conditions. While the backfills would still execute in the proper order, this could lead to more individual runs than necessary. This has been fixed.
[dagster-k8s] Fixed an issue introduced in the 1.6.4 release where upgrading the Helm chart without upgrading the Dagster version used by user code caused failures in jobs using the k8s_job_executor.
[instigator-tick-logs] Fixed an issue where invoking context.log.exception in a sensor or schedule did not properly capture exception information.
[asset-checks] Fixed an issue where additional dependencies for dbt tests modeled as Dagster asset checks were not properly being deduplicated.
[dagster-dbt] Fixed an issue where dbt model, seed, or snapshot names with periods were not supported.
@observable_source_asset-decorated functions can now return an ObserveResult. This allows including metadata on the observation, in addition to a data version. This is currently only supported for non-partitioned assets.
[auto-materialize] A new AutoMaterializeRule.skip_on_not_all_parents_updated_since_cron class allows you to construct AutoMaterializePolicys which wait for all parents to be updated after the latest tick of a given cron schedule.
[Global op/asset concurrency] Ops and assets now take run priority into account when claiming global op/asset concurrency slots.
There are a substantial number of breaking changes in the 0.7.0 release.
Please see 070_MIGRATION.md for instructions regarding migrating old code.
Scheduler
The scheduler configuration has been moved from the @schedules decorator to DagsterInstance.
Existing schedules that have been running are no longer compatible with current storage. To
migrate, remove the scheduler argument on all @schedules decorators:
Finally, if you had any existing schedules running, delete the existing $DAGSTER_HOME/schedules
directory and run dagster schedule wipe && dagster schedule up to re-instatiate schedules in a
valid state.
The should_execute and environment_dict_fn argument to ScheduleDefinition now have a
required first argument context, representing the ScheduleExecutionContext
Config System Changes
In the config system, Dict has been renamed to Shape; List to Array; Optional to
Noneable; and PermissiveDict to Permissive. The motivation here is to clearly delineate
config use cases versus cases where you are using types as the inputs and outputs of solids as
well as python typing types (for mypy and friends). We believe this will be clearer to users in
addition to simplifying our own implementation and internal abstractions.
Our recommended fix is not to use Shape and Array, but instead to use our new condensed
config specification API. This allow one to use bare dictionaries instead of Shape, lists with
one member instead of Array, bare types instead of Field with a single argument, and python
primitive types (int, bool etc) instead of the dagster equivalents. These result in
dramatically less verbose config specs in most cases.
So instead of
from dagster import Shape, Field, Int, Array, String
# ... code
config=Shape({ # Dict prior to change
'some_int' : Field(Int),
'some_list: Field(Array[String]) # List prior to change
})
one can instead write:
config={'some_int': int, 'some_list': [str]}
No imports and much simpler, cleaner syntax.
config_field is no longer a valid argument on solid, SolidDefinition, ExecutorDefintion,
executor, LoggerDefinition, logger, ResourceDefinition, resource, system_storage, and
SystemStorageDefinition. Use config instead.
For composite solids, the config_fn no longer takes a ConfigMappingContext, and the context
has been deleted. To upgrade, remove the first argument to config_fn.
Field takes a is_required rather than a is_optional argument. This is to avoid confusion
with python's typing and dagster's definition of Optional, which indicates None-ability,
rather than existence. is_optional is deprecated and will be removed in a future version.
Required Resources
All solids, types, and config functions that use a resource must explicitly list that
resource using the argument required_resource_keys. This is to enable efficient
resource management during pipeline execution, especially in a multiprocessing or
remote execution environment.
The @system_storage decorator now requires argument required_resource_keys, which was
previously optional.
Dagster Type System Changes
dagster.Set and dagster.Tuple can no longer be used within the config system.
Dagster types are now instances of DagsterType, rather than a class than inherits from
RuntimeType. Instead of dynamically generating a class to create a custom runtime type, just
create an instance of a DagsterType. The type checking function is now an argument to the
DagsterType, rather than an abstract method that has to be implemented in
a subclass.
RuntimeType has been renamed to DagsterType is now an encouraged API for type creation.
Core type check function of DagsterType can now return a naked bool in addition
to a TypeCheck object.
type_check_fn on DagsterType (formerly type_check and RuntimeType, respectively) now
takes a first argument context of type TypeCheckContext in addition to the second argument of
value.
define_python_dagster_type has been eliminated in favor of PythonObjectDagsterType .
dagster_type has been renamed to usable_as_dagster_type.
as_dagster_type has been removed and similar capabilities added as
make_python_type_usable_as_dagster_type.
PythonObjectDagsterType and usable_as_dagster_type no longer take a type_check argument. If
a custom type_check is needed, use DagsterType.
As a consequence of these changes, if you were previously using dagster_pyspark or
dagster_pandas and expecting Pyspark or Pandas types to work as Dagster types, e.g., in type
annotations to functions decorated with @solid to indicate that they are input or output types
for a solid, you will need to call make_python_type_usable_as_dagster_type from your code in
order to map the Python types to the Dagster types, or just use the Dagster types
(dagster_pandas.DataFrame instead of pandas.DataFrame) directly.
Other
We no longer publish base Docker images. Please see the updated deployment docs for an example
Dockerfile off of which you can work.
step_metadata_fn has been removed from SolidDefinition & @solid.
SolidDefinition & @solid now takes tags and enforces that values are strings or
are safely encoded as JSON. metadata is deprecated and will be removed in a future version.
resource_mapper_fn has been removed from SolidInvocation.
New
Dagit now includes a much richer execution view, with a Gantt-style visualization of step
execution and a live timeline.
Early support for Python 3.8 is now available, and Dagster/Dagit along with many of our libraries
are now tested against 3.8. Note that several of our upstream dependencies have yet to publish
wheels for 3.8 on all platforms, so running on Python 3.8 likely still involves building some
dependencies from source.
dagster/priority tags can now be used to prioritize the order of execution for the built-in
in-process and multiprocess engines.
dagster-postgres storages can now be configured with separate arguments and environment
variables, such as:
run_storage:
module: dagster_postgres.run_storage
class: PostgresRunStorage
config:
postgres_db:
username: test
password:
env: ENV_VAR_FOR_PG_PASSWORD
hostname: localhost
db_name: test
Support for RunLaunchers on DagsterInstance allows for execution to be "launched" outside of
the Dagit/Dagster process. As one example, this is used by dagster-k8s to submit pipeline
execution as a Kubernetes Job.
Added support for adding tags to runs initiated from the Playground view in Dagit.
Added @monthly_schedule decorator.
Added Enum.from_python_enum helper to wrap Python enums for config. (Thanks @kdungs!)
[dagster-bash] The Dagster bash solid factory now passes along kwargs to the underlying
solid construction, and now has a single Nothing input by default to make it easier to create a
sequencing dependency. Also, logs are now buffered by default to make execution less noisy.
[dagster-aws] We've improved our EMR support substantially in this release. The
dagster_aws.emr library now provides an EmrJobRunner with various utilities for creating EMR
clusters, submitting jobs, and waiting for jobs/logs. We also now provide a
emr_pyspark_resource, which together with the new @pyspark_solid decorator makes moving
pyspark execution from your laptop to EMR as simple as changing modes.
[dagster-pandas] Added create_dagster_pandas_dataframe_type, PandasColumn, and
Constraint API's in order for users to create custom types which perform column validation,
dataframe validation, summary statistics emission, and dataframe serialization/deserialization.
[dagster-gcp] GCS is now supported for system storage, as well as being supported with the
Dask executor. (Thanks @habibutsu!) Bigquery solids have also been updated to support the new API.
Bugfix
Ensured that all implementations of RunStorage clean up pipeline run tags when a run
is deleted. Requires a storage migration, using dagster instance migrate.
The multiprocess and Celery engines now handle solid subsets correctly.
The multiprocess and Celery engines will now correctly emit skip events for steps downstream of
failures and other skips.
The @solid and @lambda_solid decorators now correctly wrap their decorated functions, in the
sense of functools.wraps.
Performance improvements in Dagit when working with runs with large configurations.
The Helm chart in dagster_k8s has been hardened against various failure modes and is now
compatible with Helm 2.
SQLite run and event log storages are more robust to concurrent use.
Improvements to error messages and to handling of user code errors in input hydration and output
materialization logic.
Fixed an issue where the Airflow scheduler could hang when attempting to load dagster-airflow
pipelines.
We now handle our SQLAlchemy connections in a more canonical way (thanks @zzztimbo!).
Fixed an issue using S3 system storage with certain custom serialization strategies.
Fixed an issue leaking orphan processes from compute logging.
Fixed an issue leaking semaphores from Dagit.
Setting the raise_error flag in execute_pipeline now actually raises user exceptions instead
of a wrapper type.
Documentation
Our docs have been reorganized and expanded (thanks @habibutsu, @vatervonacht, @zzztimbo). We'd
love feedback and contributions!
Thank you
Thank you to all of the community contributors to this release!! In alphabetical order: @habibutsu,
@kdungs, @vatervonacht, @zzztimbo.
Added the dagster-github library, a community contribution from @Ramshackle-Jamathon and
@k-mahoney!
dagster-celery
Simplified and improved config handling.
An engine event is now emitted when the engine fails to connect to a broker.
Bugfix
Fixes a file descriptor leak when running many concurrent dagster-graphql queries (e.g., for
backfill).
The @pyspark_solid decorator now handles inputs correctly.
The handling of solid compute functions that accept kwargs but which are decorated with explicit
input definitions has been rationalized.
Fixed race conditions in concurrent execution using SQLite event log storage with concurrent
execution, uncovered by upstream improvements in the Python inotify library we use.
Documentation
Improved error messages when using system storages that don't fulfill executor requirements.
We are now more permissive when specifying configuration schema in order make constructing
configuration schema more concise.
When specifying the value of scalar inputs in config, one can now specify that value directly as
the key of the input, rather than having to embed it within a value key.
Breaking
The implementation of SQL-based event log storages has been consolidated,
which has entailed a schema change. If you have event logs stored in a
Postgres- or SQLite-backed event log storage, and you would like to maintain
access to these logs, you should run dagster instance migrate. To check
what event log storages you are using, run dagster instance info.
Type matches on both sides of an InputMapping or OutputMapping are now enforced.
New
Dagster is now tested on Python 3.8
Added the dagster-celery library, which implements a Celery-based engine for parallel pipeline
execution.
Added the dagster-k8s library, which includes a Helm chart for a simple Dagit installation on a
Kubernetes cluster.
Dagit
The Explore UI now allows you to render a subset of a large DAG via a new solid
query bar that accepts terms like solid_name+* and +solid_name+. When viewing
very large DAGs, nothing is displayed by default and * produces the original behavior.
Performance improvements in the Explore UI and config editor for large pipelines.
The Explore UI now includes a zoom slider that makes it easier to navigate large DAGs.
Dagit pages now render more gracefully in the presence of inconsistent run storage and event logs.
Improved handling of GraphQL errors and backend programming errors.
Minor display improvements.
dagster-aws
A default prefix is now configurable on APIs that use S3.
S3 APIs now parametrize region_name and endpoint_url.
dagster-gcp
A default prefix is now configurable on APIs that use GCS.
dagster-postgres
Performance improvements for Postgres-backed storages.
dagster-pyspark
Pyspark sessions may now be configured to be held open after pipeline execution completes, to
enable extended test cases.
dagster-spark
spark_outputs must now be specified when initializing a SparkSolidDefinition, rather than in
config.
Added new create_spark_solid helper and new spark_resource.
Improved EMR implementation.
Bugfix
Fixed an issue retrieving output values using SolidExecutionResult (e.g., in test) for
dagster-pyspark solids.
Fixes an issue when expanding composite solids in Dagit.
Better errors when solid names collide.
Config mapping in composite solids now works as expected when the composite solid has no top
level config.
Compute log filenames are now guaranteed not to exceed the POSIX limit of 255 chars.
Fixes an issue when copying and pasting solid names from Dagit.
Termination now works as expected in the multiprocessing executor.
The multiprocessing executor now executes parallel steps in the expected order.
The multiprocessing executor now correctly handles solid subsets.
Fixed a bad error condition in dagster_ssh.sftp_solid.
Fixed a bad error message giving incorrect log level suggestions.
Documentation
Minor fixes and improvements.
Thank you
Thank you to all of the community contributors to this release!! In alphabetical order: @cclauss,
@deem0n, @irabinovitch, @pseudoPixels, @Ramshackle-Jamathon, @rparrapy, @yamrzou.
The selector argument to PipelineDefinition has been removed. This API made it possible to
construct a PipelineDefinition in an invalid state. Use PipelineDefinition.build_sub_pipeline
instead.
New
Added the dagster_prometheus library, which exposes a basic Prometheus resource.
Dagster Airflow DAGs may now use GCS instead of S3 for storage.
Expanded interface for schedule management in Dagit.
Dagit
Performance improvements when loading, displaying, and editing config for large pipelines.
Smooth scrolling zoom in the explore tab replaces the previous two-step zoom.
No longer depends on internet fonts to run, allowing fully offline dev.
Typeahead behavior in search has improved.
Invocations of composite solids remain visible in the sidebar when the solid is expanded.
The config schema panel now appears when the config editor is first opened.
Interface now includes hints for autocompletion in the config editor.
Improved display of solid inputs and output in the explore tab.
Provides visual feedback while filter results are loading.
Better handling of pipelines that aren't present in the currently loaded repo.
Bugfix
Dagster Airflow DAGs previously could crash while handling Python errors in DAG logic.
Step failures when running Dagster Airflow DAGs were previously not being surfaced as task
failures in Airflow.
Dagit could previously get into an invalid state when switching pipelines in the context of a
solid subselection.
frozenlist and frozendict now pass Dagster's parameter type checks for list and dict.
The GraphQL playground in Dagit is now working again.
Nits
Dagit now prints its pid when it loads.
Third-party dependencies have been relaxed to reduce the risk of version conflicts.