- [ui] When viewing logs for a run, the date for a single log row is now shown in the tooltip on the timestamp. This helps when viewing a run that takes place over more than one date.
- Added suggestions to the error message when selecting asset keys that do not exist as an upstream asset or in an
AssetSelection.
- Improved error messages when trying to materialize a subset of a multi-asset which cannot be subset.
- [dagster-snowflake]
dagster-snowflake
now requires snowflake-connector-python>=3.4.0
- [embedded-elt]
@sling_assets
accepts an optional name parameter for the underlying op - [dagster-openai]
dagster-openai
library is now available. - [dagster-dbt] Added a new setting on
DagsterDbtTranslatorSettings
called enable_duplicate_source_asset_keys
that allows users to set duplicate asset keys for their dbt sources. Thanks @hello-world-bfree! - Log messages in the Dagster daemon for unloadable sensors and schedules have been removed.
- [ui] Search now uses a cache that persists across pageloads which should greatly improve search performance for very large orgs.
- [ui] groups/code locations in the asset graph’s sidebar are now sorted alphabetically.
- Fixed issue where the input/output schemas of configurable IOManagers could be ignored when providing explicit input / output run config.
- Fixed an issue where enum values could not properly have a default value set in a
ConfigurableResource
. - Fixed an issue where graph-backed assets would sometimes lose user-provided descriptions due to a bug in internal copying.
- [auto-materialize] Fixed an issue introduced in 1.6.7 where updates to ExternalAssets would be ignored when using AutoMaterializePolicies which depended on parent updates.
- [asset checks] Fixed a bug with asset checks in step launchers.
- [embedded-elt] Fix a bug when creating a
SlingConnectionResource
where a blank keyword argument would be emitted as an environment variable - [dagster-dbt] Fixed a bug where emitting events from
dbt source freshness
would cause an error. - [ui] Fixed a bug where using the “Terminate all runs” button with filters selected would not apply the filters to the action.
- [ui] Fixed an issue where typing a search query into the search box before the search data was fetched would yield “No results” even after the data was fetched.
- [docs] fixed typo in embedded-elt.mdx (thanks @cameronmartin)!
- [dagster-databricks] log the url for the run of a databricks job (thanks @smats0n)!
- Fix missing partition property (thanks christeefy)!
- Add op_tags to @observable_source_asset decorator (thanks @maxfirman)!
- [docs] typo in MultiPartitionMapping docs (thanks @dschafer)
- Allow github actions to checkout branch from forked repo for docs changes (ci fix) (thanks hainenber)!
- [asset checks] UI performance of asset checks related pages has been improved.
- [dagster-dbt] The class
DbtArtifacts
has been added for managing the behavior of rebuilding the manifest during development but expecting a pre-built one in production.
- Added example of writing compute logs to AWS S3 when customizing agent configuration.
- "Hello, Dagster" is now "Dagster Quickstart" with the option to use a Github Codespace to explore Dagster.
- Improved guides and reference to better running multiple isolated agents with separate queues on ECS.
- Microsoft Teams is now supported for alerts. Documentation
- A
send sample alert
button now exists on both the alert policies page and in the alert policies editor to make it easier to debug and configure alerts without having to wait for an event to kick them off.
- [dagster-embedded-elt] Fixed a bug in the
SlingConnectionResource
that raised an error when connecting to a database.
- [asset checks]
graph_multi_assets
with check_specs
now support subsetting.
- Added a new
run_retries.retry_on_op_or_asset_failures
setting that can be set to false to make run retries only occur when there is an unexpected failure that crashes the run, allowing run-level retries to co-exist more naturally with op or asset retries. See the docs for more information. dagster dev
now sets the environment variable DAGSTER_IS_DEV_CLI
allowing subprocesses to know that they were launched in a development context.- [ui] The Asset Checks page has been updated to show more information on the page itself rather than in a dialog.
- [ui] Fixed an issue where the UI disallowed creating a dynamic partition if its name contained the “|” pipe character.
- AssetSpec previously dropped the metadata and code_version fields, resulting in them not being attached to the corresponding asset. This has been fixed.
- The new
@multi_observable_source_asset
decorator enables defining a set of assets that can be observed together with the same function. - [dagster-embedded-elt] New Asset Decorator
@sling_assets
and Resource SlingConnectionResource
have been added for the [dagster-embedded-elt.sling](http://dagster-embedded-elt.sling)
package. Deprecated build_sling_asset
, SlingSourceConnection
and SlingTargetConnection
. - Added support for op-concurrency aware run dequeuing for the
QueuedRunCoordinator
.
- Fixed reference documentation for isolated agents in ECS.
- Corrected an example in the Airbyte Cloud documentation.
- Added API links to OSS Helm deployment guide.
- Fixed in-line pragmas showing up in the documentation.
- Alerts now support Microsoft Teams.
- [ECS] Fixed an issue where code locations could be left undeleted.
- [ECS] ECS agents now support setting multiple replicas per code server.
- [Insights] You can now toggle the visibility of a row in the chart by clicking on the dot for the row in the table.
- [Users] Added a new column “Licensed role” that shows the user's most permissive role.
- Dagster officially supports Python 3.12.
dagster-polars
has been added as an integration. Thanks @danielgafni!- [dagster-dbt]
@dbt_assets
now supports loading projects with semantic models. - [dagster-dbt]
@dbt_assets
now supports loading projects with model versions. - [dagster-dbt]
get_asset_key_for_model
now supports retrieving asset keys for seeds and snapshots. Thanks @aksestok! - [dagster-duckdb] The Dagster DuckDB integration supports DuckDB version 0.10.0.
- [UPath I/O manager] If a non-partitioned asset is updated to have partitions, the file containing the non-partitioned asset data will be deleted when the partitioned asset is materialized, rather than raising an error.
- Fixed an issue where creating a backfill of assets with dynamic partitions and a backfill policy would sometimes fail with an exception.
- Fixed an issue with the type annotations on the
@asset
decorator causing a false positive in Pyright strict mode. Thanks @tylershunt! - [ui] On the asset graph, nodes are slightly wider allowing more text to be displayed, and group names are no longer truncated.
- [ui] Fixed an issue where the groups in the asset graph would not update after an asset was switched between groups.
- [dagster-k8s] Fixed an issue where setting the
security_context
field on the k8s_job_executor
didn't correctly set the security context on the launched step pods. Thanks @krgn!
- Observable source assets can now yield
ObserveResult
s with no data_version
. - You can now include
FreshnessPolicy
s on observable source assets. These assets will be considered “Overdue” when the latest value for the “dagster/data_time” metadata value is older than what’s allowed by the freshness policy. - [ui] In Dagster Cloud, a new feature flag allows you to enable an overhauled asset overview page with a high-level stakeholder view of the asset’s health, properties, and column schema.
- Updated docs to reflect newly-added support for Python 3.12.
- [kubernetes] Fixed an issue where the Kubernetes agent would sometimes leave dangling kubernetes services if the agent was interrupted during the middle of being terminated.
- Within a backfill or within auto-materialize, when submitting runs for partitions of the same assets, runs are now submitted in lexicographical order of partition key, instead of in an unpredictable order.
- [dagster-k8s] Include k8s pod debug info in run worker failure messages.
- [dagster-dbt] Events emitted by
DbtCliResource
now include metadata from the dbt adapter response. This includes fields like rows_affected
, query_id
from the Snowflake adapter, or bytes_processed
from the BigQuery adapter.
- A previous change prevented asset backfills from grouping multiple assets into the same run when using BackfillPolicies under certain conditions. While the backfills would still execute in the proper order, this could lead to more individual runs than necessary. This has been fixed.
- [dagster-k8s] Fixed an issue introduced in the 1.6.4 release where upgrading the Helm chart without upgrading the Dagster version used by user code caused failures in jobs using the
k8s_job_executor
. - [instigator-tick-logs] Fixed an issue where invoking
context.log.exception
in a sensor or schedule did not properly capture exception information. - [asset-checks] Fixed an issue where additional dependencies for dbt tests modeled as Dagster asset checks were not properly being deduplicated.
- [dagster-dbt] Fixed an issue where dbt model, seed, or snapshot names with periods were not supported.
@observable_source_asset
-decorated functions can now return an ObserveResult
. This allows including metadata on the observation, in addition to a data version. This is currently only supported for non-partitioned assets.- [auto-materialize] A new
AutoMaterializeRule.skip_on_not_all_parents_updated_since_cron
class allows you to construct AutoMaterializePolicys
which wait for all parents to be updated after the latest tick of a given cron schedule. - [Global op/asset concurrency] Ops and assets now take run priority into account when claiming global op/asset concurrency slots.
- Fixed an error in our asset checks docs. Thanks @vaharoni!
- Fixed an error in our Dagster Pipes Kubernetes docs. Thanks @cameronmartin!
- Fixed an issue on the Hello Dagster! guide that prevented it from loading.
- Add specific capabilities of the Airflow integration to the Airflow integration page.
- Re-arranged sections in the I/O manager concept page to make info about using I/O versus resources more prominent.
- Added an example that demonstrates what a complete repository that takes advantage of many Dagster features might look like. Includes usage of IO Managers, modes / resources, unit tests, several cloud service integrations, and more! Check it out at
examples/hacker_news
! retry_number
is now available on SolidExecutionContext
, allowing you to determine within a solid function how many times the solid has been previously retried.- Errors that are surfaced during solid execution now have clearer stack traces.
- When using Postgres or MySQL storage, the database mutations that initialize Dagster tables on startup now happen in atomic transactions, rather than individual SQL queries.
- For versions >=0.11.13, when specifying the
--version
flag when installing the Helm chart, the tags for Dagster-provided images in the Helm chart will now default to the current Chart version. For --version
<0.11.13, the image tags will still need to be updated properly to use old chart version. - Removed the
PIPELINE_INIT_FAILURE
event type. A failure that occurs during pipeline initialization will now produce a PIPELINE_FAILURE
as with all other pipeline failures.
- When viewing run logs in Dagit, in the stdout/stderr log view, switching the filtered step did not work. This has been fixed. Additionally, the filtered step is now present as a URL query parameter.
- The
get_run_status
method on the Python GraphQL client now returns a PipelineRunStatus
enum instead of the raw string value in order to align with the mypy type annotation. Thanks to Dylan Bienstock for surfacing this bug! - When a docstring on a solid doesn’t match the reST, Google, or Numpydoc formats, Dagster no longer raises an error.
- Fixed a bug where memoized runs would sometimes fail to execute when specifying a non-default IO manager key.
- Added the
k8s_job_executor
, which executes solids in separate kubernetes jobs. With the addition of this executor, you can now choose at runtime between single pod and multi-pod isolation for solids in your run. Previously this was only configurable for the entire deployment - you could either use the K8sRunLauncher with the default executors (in_process and multiprocess) for low isolation, or you could use the CeleryK8sRunLauncher with the celery_k8s_job_executor for pod-level isolation. Now, your instance can be configured with the K8sRunLauncher and you can choose between the default executors or the k8s_job_executor. - The
DagsterGraphQLClient
now allows you to specify whether to use HTTP or HTTPS when connecting to the GraphQL server. In addition, error messages during query execution or connecting to dagit are now clearer. Thanks to @emily-hawkins for raising this issue! - Added experimental hook invocation functionality. Invoking a hook will call the underlying decorated function. For example:
from dagster import build_hook_context
my_hook(build_hook_context(resources={"foo_resource": "foo"}))
- Resources can now be directly invoked as functions. Invoking a resource will call the underlying decorated initialization function.
from dagster import build_init_resource_context
@resource(config_schema=str)
def my_basic_resource(init_context):
return init_context.resource_config
context = build_init_resource_context(config="foo")
assert my_basic_resource(context) == "foo"
- Improved the error message when a pipeline definition is incorrectly invoked as a function.
ScheduleDefinition
and SensorDefinition
now carry over properties from functions decorated by @sensor
and @schedule
. Ie: docstrings.- Fixed a bug with configured on resources where the version set on a
ResourceDefinition
was not being passed to the ResourceDefinition
created by the call to configured
. - Previously, if an error was raised in an
IOManager
handle_output
implementation that was a generator, it would not be wrapped DagsterExecutionHandleOutputError
. Now, it is wrapped. - Dagit will now gracefully degrade if websockets are not available. Previously launching runs and viewing the event logs would block on a websocket conection.
- Added an example of run attribution via a custom run coordinator, which reads a user’s email from HTTP headers on the Dagster GraphQL server and attaches the email as a run tag. Custom run coordinator are also now specifiable in the Helm chart, under
queuedRunCoordinator
. See the docs for more information on setup. RetryPolicy
now supports backoff and jitter settings, to allow for modulating the delay
as a function of attempt number and randomness.
- [Helm] Added
dagit.enableReadOnly
. When enabled, a separate Dagit instance is deployed in —read-only
mode. You can use this feature to serve Dagit to users who you do not want to able to kick off new runs or make other changes to application state. - [dagstermill] Dagstermill is now compatible with current versions of papermill (2.x). Previously we required papermill to be pinned to 1.x.
- Added a new metadata type that links to the asset catalog, which can be invoked using
EventMetadata.asset
. - Added a new log event type
LOGS_CAPTURED
, which explicitly links to the captured stdout/stderr logs for a given step, as determined by the configured ComputeLogManager
on the Dagster instance. Previously, these links were available on the STEP_START
event. - The
network
key on DockerRunLauncher
config can now be sourced from an environment variable. - The Workspace section of the Status page in Dagit now shows more metadata about your workspace, including the python file, python package, and Docker image of each of your repository locations.
- In Dagit, settings for how executions are viewed now persist across sessions.
- The
get_execution_data
method of SensorDefinition
and ScheduleDefinition
has been renamed to evaluate_tick
. We expect few to no users of the previous name, and are renaming to prepare for improved testing support for schedules and sensors.
- README has been updated to remove typos (thanks @gogi2811).
- Configured API doc examples have been fixed (thanks @jrouly).
- Documentation on testing sensors using experimental
build_sensor_context
API. See Testing sensors.
- Some mypy errors encountered when using the built-in Dagster types (e.g.,
dagster.Int
) as type annotations on functions decorated with @solid
have been resolved. - Fixed an issue where the
K8sRunLauncher
sometimes hanged while launching a run due to holding a stale Kubernetes client. - Fixed an issue with direct solid invocation where default config values would not be applied.
- Fixed a bug where resource dependencies to io managers were not being initialized during memoization.
- Dagit can once again override pipeline tags that were set on the definition, and UI clarity around the override behavior has been improved.
- Markdown event metadata rendering in dagit has been repaired.
- Sensors can now set a string cursor using
context.update_cursor(str_value)
that is persisted across evaluations to save unnecessary computation. This persisted string value is made available on the context as context.cursor
. Previously, we encouraged cursor-like behavior by exposing last_run_key
on the sensor context, to keep track of the last time the sensor successfully requested a run. This, however, was not useful for avoiding unnecessary computation when the sensor evaluation did not result in a run request. - Dagit may now be run in
--read-only
mode, which will disable mutations in the user interface and on the server. You can use this feature to run instances of Dagit that are visible to users who you do not want to able to kick off new runs or make other changes to application state. - In
dagster-pandas
, the event_metadata_fn
parameter to the function create_dagster_pandas_dataframe_type
may now return a dictionary of EventMetadata
values, keyed by their string labels. This should now be consistent with the parameters accepted by Dagster events, including the TypeCheck
event.
MyDataFrame = create_dagster_pandas_dataframe_type(
"MyDataFrame",
event_metadata_fn=lambda df: [
EventMetadataEntry.int(len(df), "number of rows"),
EventMetadataEntry.int(len(df.columns), "number of columns"),
]
)
MyDataFrame = create_dagster_pandas_dataframe_type(
"MyDataFrame",
event_metadata_fn=lambda df: {
"number of rows": len(df),
"number of columns": len(dataframe.columns),
},
)
- dagster-pandas’
PandasColumn.datetime_column()
now has a new tz
parameter, allowing you to constrain the column to a specific timezone (thanks @mrdavidlaing
!) - The
DagsterGraphQLClient
now takes in an optional transport
argument, which may be useful in cases where you need to authenticate your GQL requests:
authed_client = DagsterGraphQLClient(
"my_dagit_url.com",
transport=RequestsHTTPTransport(..., auth=<some auth>),
)
- Added an
ecr_public_resource
to get login credentials for the AWS ECR Public Gallery. This is useful if any of your pipelines need to push images. - Failed backfills may now be resumed in Dagit, by putting them back into a “requested” state. These backfill jobs should then be picked up by the backfill daemon, which will then attempt to create and submit runs for any of the outstanding requested partitions . This should help backfill jobs recover from any deployment or framework issues that occurred during the backfill prior to all the runs being launched. This will not, however, attempt to re-execute any of the individual pipeline runs that were successfully launched but resulted in a pipeline failure.
- In the run log viewer in Dagit, links to asset materializations now include the timestamp for that materialization. This will bring you directly to the state of that asset at that specific time.
- The Databricks step launcher now includes a
max_completion_wait_time_seconds
configuration option, which controls how long it will wait for a Databricks job to complete before exiting.
- Solids can now be invoked outside of composition. If your solid has a context argument, the
build_solid_context
function can be used to provide a context to the invocation.
from dagster import build_solid_context
@solid
def basic_solid():
return "foo"
assert basic_solid() == 5
@solid
def add_one(x):
return x + 1
assert add_one(5) == 6
@solid(required_resource_keys={"foo_resource"})
def solid_reqs_resources(context):
return context.resources.foo_resource + "bar"
context = build_solid_context(resources={"foo_resource": "foo"})
assert solid_reqs_resources(context) == "foobar"
build_schedule_context
allows you to build a ScheduleExecutionContext
using a DagsterInstance
. This can be used to test schedules.
from dagster import build_schedule_context
with DagsterInstance.get() as instance:
context = build_schedule_context(instance)
my_schedule.get_execution_data(context)
build_sensor_context
allows you to build a SensorExecutionContext
using a DagsterInstance
. This can be used to test sensors.
from dagster import build_sensor_context
with DagsterInstance.get() as instance:
context = build_sensor_context(instance)
my_sensor.get_execution_data(context)
build_input_context
and build_output_context
allow you to construct InputContext
and OutputContext
respectively. This can be used to test IO managers.
from dagster import build_input_context, build_output_context
io_manager = MyIoManager()
io_manager.load_input(build_input_context())
io_manager.handle_output(build_output_context(), val)
- Resources can be provided to either of these functions. If you are using context manager resources, then
build_input_context
/build_output_context
must be used as a context manager.
with build_input_context(resources={"cm_resource": my_cm_resource}) as context:
io_manager.load_input(context)
validate_run_config
can be used to validate a run config blob against a pipeline definition & mode. If the run config is invalid for the pipeline and mode, this function will throw an error, and if correct, this function will return a dictionary representing the validated run config that Dagster uses during execution.
validate_run_config(
{"solids": {"a": {"config": {"foo": "bar"}}}},
pipeline_contains_a
)
validate_run_config(
pipeline_no_required_config
)
- The ability to set a
RetryPolicy
has been added. This allows you to declare automatic retry behavior when exceptions occur during solid execution. You can set retry_policy
on a solid invocation, @solid
definition, or @pipeline
definition.
@solid(retry_policy=RetryPolicy(max_retries=3, delay=5))
def fickle_solid():
@pipeline(
solid_retry_policy=RetryPolicy()
)
def my_pipeline():
some_solid()
fickle_solid()
fickle_solid.with_retry_policy(RetryPolicy(max_retries=2))
- Previously, asset materializations were not working in dagster-dbt for dbt >= 0.19.0. This has been fixed.
- Previously, using the
dagster/priority
tag directly on pipeline definitions would cause an error. This has been fixed. - In dagster-pandas, the
create_dagster_pandas_dataframe_type()
function would, in some scenarios, not use the specified materializer
argument when provided. This has been fixed (thanks @drewsonne
!) dagster-graphql --remote
now sends the query and variables as post body data, avoiding uri length limit issues.- In the Dagit pipeline definition view, we no longer render config nubs for solids that do not need them.
- In the run log viewer in Dagit, truncated row contents (including errors with long stack traces) now have a larger and clearer button to expand the full content in a dialog.
- [dagster-mysql] Fixed a bug where database connections accumulated by
sqlalchemy.Engine
objects would be invalidated after 8 hours of idle time due to MySQL’s default configuration, resulting in an sqlalchemy.exc.OperationalError
when attempting to view pages in Dagit in long-running deployments.
- In 0.11.9, context was made an optional argument on the function decorated by @solid. The solids throughout tutorials and snippets that do not need a context argument have been altered to omit that argument, and better reflect this change.
- In a previous docs revision, a tutorial section on accessing resources within solids was removed. This has been re-added to the site.
- In Dagit, assets can now be viewed with an
asOf
URL parameter, which shows a snapshot of the asset at the provided timestamp, including parent materializations as of that time. - [Dagit] Queries and Mutations now use HTTP instead of a websocket-based connection.
- A regression in 0.11.8 where composites would fail to render in the right side bar in Dagit has been fixed.
- A dependency conflict in
make dev_install
has been fixed. - [dagster-python-client]
reload_repository_location
and submit_pipeline_execution
have been fixed - the underlying GraphQL queries had a missing inline fragment case.
- AWS S3 resources now support named profiles (thanks @deveshi!)
- The Dagit ingress path is now configurable in our Helm charts (thanks @orf!)
- Dagstermill’s use of temporary files is now supported across operating systems (thanks @slamer59!)
- Deploying with Helm documentation has been updated to reflect the correct name for “dagster-user-deployments” (thanks @hebo-yang!)
- Deploying with Helm documentation has been updated to suggest naming your release “dagster” (thanks @orf!)
- Solids documentation has been updated to remove a typo (thanks @dwallace0723!)
- Schedules documentation has been updated to remove a typo (thanks @gdoron!)