[experimental] AutomationCondition.eager() will now only launch runs for missing partitions which become missing after the condition has been added to the asset. This avoids situations in which the eager policy kicks off a large amount of work when added to an asset with many missing historical static/dynamic partitions.
[experimental] Added a new AutomationCondition.asset_matches() condition, which can apply a condition against an arbitrary asset in the graph.
[experimental] Added the ability to specify multiple kinds for an asset with the kinds parameter.
[dagster-github] Added create_pull_request method on GithubClient that enables creating a pull request.
[dagster-github] Added create_ref method on GithubClient that enables creating a new branch.
[dagster-embedded-elt] dlt assets now generate column metadata for child tables.
[dagster-embedded-elt] dlt assets can now fetch row count metadata with dlt.run(...).fetch_row_count() for both partitioned and non-partitioned assets. Thanks @kristianandre!
[dagster-airbyte] relation identifier metadata is now attached to Airbyte assets.
[dagster-embedded-elt] relation identifier metadata is now attached to sling assets.
[dagster-embedded-elt] relation identifier metadata is now attached to dlt assets.
JobDefinition, @job, and define_asset_job now take a run_tags parameter. If run_tags are defined, they will be attached to all runs of the job, and tags will not be. If run_tags is not set, then tags are attached to all runs of the job (status quo behavior). This change enables the separation of definition-level and run-level tags on jobs.
Then env var DAGSTER_COMPUTE_LOG_TAIL_WAIT_AFTER_FINISH can now be used to pause before capturing logs (thanks @HynekBlaha!)
The kinds parameter is now available on AssetSpec.
OutputContext now exposes the AssetSpec of the asset that is being stored as an output (thanks, @marijncv!)
[experimental] Backfills are incorporated into the Runs page to improve observability and provide a more simplified UI. See the GitHub discussion for more details.
[ui] The updated navigation is now enabled for all users. You can revert to the legacy navigation via a feature flag. See GitHub discussion for more.
[ui] Improved performance for loading partition statuses of an asset job.
[dagster-docker] Run containers launched by the DockerRunLauncher now include dagster/job_name and dagster/run_id labels.
[dagster-aws] The ECS launcher now automatically retries transient ECS RunTask failures (like capacity placement failures).
Changed the log volume for global concurrency blocked runs in the run coordinator to be less spammy.
[ui] Asset checks are now visible in the run page header when launched from a schedule.
[ui] Fixed asset group outlines not rendering properly in Safari.
[ui] Reporting a materialization event now removes the asset from the asset health "Execution failures" list and returns the asset to a green / success state.
[ui] When setting an AutomationCondition on an asset, the label of this condition will now be shown in the sidebar on the Asset Details page.
[ui] Previously, filtering runs by Created date would include runs that had been updated after the lower bound of the requested time range. This has been updated so that only runs created after the lower bound will be included.
[ui] When using the new experimental navigation flag, added a fix for the automations page for code locations that have schedules but no sensors.
[ui] Fixed tag wrapping on asset column schema table.
[ui] Restored object counts on the code location list view.
[ui] Padding when displaying warnings on unsupported run coordinators has been corrected (thanks @hainenber!)
[dagster-k8s] Fixed an issue where run termination sometimes did not terminate all step processes when using the k8s_job_executor, if the termination was initiated while it was in the middle of launching a step pod.
AssetSpec now has a with_io_manager_key method that returns an AssetSpec with the appropriate metadata entry to dictate the key for the IO manager used to load it. The deprecation warning for SourceAsset now references this method.
Added a max_runtime_seconds configuration option to run monitoring, allowing you to specify that any run in your Dagster deployment should terminate if it exceeds a certain runtime. Prevoiusly, jobs had to be individually tagged with a dagster/max_runtime tag in order to take advantage of this feature. Jobs and runs can still be tagged in order to override this value for an individual run.
It is now possible to set both tags and a custom execution_fn on a ScheduleDefinition. Schedule tags are intended to annotate the definition and can be used to search and filter in the UI. They will not be attached to run requests emitted from the schedule if a custom execution_fn is provided. If no custom execution_fn is provided, then for back-compatibility the tags will also be automatically attached to run requests emitted from the schedule.
SensorDefinition and all of its variants/decorators now accept a tags parameter. The tags annotate the definition and can be used to search and filter in the UI.
Added the dagster definitions validate command to Dagster CLI. This command validates if Dagster definitions are loadable.
[dagster-databricks] Databricks Pipes now allow running tasks in existing clusters.
Fixed an issue where calling build_op_context in a unit test would sometimes raise a TypeError: signal handler must be signal.SIG_IGN, signal.SIG_DFL, or a callable object Exception on process shutdown.
[dagster-webserver] Fix an issue where the incorrect sensor/schedule state would appear when using DefaultScheduleStatus.STOPPED / DefaultSensorStatus.STOPPED after performing a reset.
Fixed an issue where users with Launcher permissions for a particular code location were not able to cancel backfills targeting only assets in that code location.
Fixed an issue preventing long-running alerts from being sent when there was a quick subsequent run.
Added --partition-range option to dagster asset materialize CLI. This option only works for assets with single-run Backfill Policies.
Added a new .without() method to AutomationCondition.eager(), AutomationCondition.on_cron(), and AutomationCondition.on_missing() which allows sub-conditions to be removed, e.g. AutomationCondition.eager().without(AutomationCondition.in_latest_time_window()).
Added AutomationCondition.on_missing(), which materializes an asset partition as soon as all of its parent partitions are filled in.
pyproject.toml can now load multiple Python modules as individual Code Locations. Thanks, @bdart!
[ui] If a code location has errors, a button will be shown to view the error on any page in the UI.
[dagster-adls2] The ADLS2PickleIOManager now accepts lease_duration configuration. Thanks, @0xfabioo!
[dagster-embedded-elt] Added an option to fetch row count metadata after running a Sling sync by calling sling.replicate(...).fetch_row_count().
[dagster-fivetran] The dagster-fivetran integration will now automatically pull and attach column schema metadata after each sync.
Fixed an issue which could cause errors when using AutomationCondition.any_downstream_condition() with downstream AutoMaterializePolicy objects.
Fixed an issue where process_config_and_initialize did not properly handle processing nested resource config.
[ui] Fixed an issue that would cause some AutomationCondition evaluations to be labeled DepConditionWrapperCondition instead of the key that they were evaluated against.
[dagster-webserver] Fixed an issue with code locations appearing in fluctuating incorrect state in deployments with multiple webserver processes.
[dagster-embedded-elt] Fixed an issue where Sling column lineage did not correctly resolve int the Dagster UI.
[dagster-k8s] The wait_for_pod check now waits until all pods are available, rather than erroneously returning after the first pod becomes available. Thanks @easontm!
The AssetSpec constructor now raises an error if an invalid group name is provided, instead of an error being raised when constructing the Definitions object.
dagster/relation_identifier metadata is now automatically attached to assets which are stored using a DbIOManager.
[ui] Streamlined the code location list view.
[ui] The “group by” selection on the Timeline Overview page is now part of the query parameters, meaning it will be retained when linked to directly or when navigating between pages.
[dagster-dbt] When instantiating DbtCliResource, the project_dir argument will now override the DBT_PROJECT_DIR environment variable if it exists in the local environment (thanks, @marijncv!).
[dagster-embedded-elt] dlt assets now generate rows_loaded metadata (thanks, @kristianandre!).
Fixed a bug where setting asset_selection=[] on RunRequest objects yielded from sensors using asset_selection would select all assets instead of none.
Fixed bug where the tick status filter for batch-fetched graphql sensors was not being respected.
[examples] Fixed missing assets in assets_dbt_python example.
[dagster-airbyte] Updated the op names generated for Airbyte assets to include the full connection ID, avoiding name collisions.
[dagster-dbt] Fixed issue causing dagster-dbt to be unable to load dbt projects where the adapter did not have a database field set (thanks, @dargmuesli!)
[dagster-dbt] Removed a warning about not being able to load the dbt.adapters.duckdb module when loading dbt assets without that package installed.
You may now wipe specific asset partitions directly from the execution context in user code by calling DagsterInstance.wipe_asset_partitions.
Dagster+ users with a "Viewer" role can now create private catalog views.
Fixed an issue where the default IOManager used by Dagster+ Serverless did not respect setting allow_missing_partitions as metadata on a downstream asset.
Fixed an issue where runs in Dagster+ Serverless that materialized partitioned assets would sometimes fail with an object has no attribute '_base_path' error.
[dagster-graphql] Fixed an issue where the statuses filter argument to the sensorsOrError GraphQL field was sometimes ignored when querying GraphQL for multiple sensors at the same time.
Updated multi-asset sensor definition to be less likely to timeout queries against the asset history storage.
Consolidated the CapturedLogManager and ComputeLogManager APIs into a single base class.
[ui] Added an option under user settings to clear client side indexeddb caches as an escape hatch for caching related bugs.
[dagster-aws, dagster-pipes] Added a new PipesECSClient to allow Dagster to interface with ECS tasks.
[dagster-dbt] Increased the default timeout when terminating a run that is running a dbt subprocess to wait 25 seconds for the subprocess to cleanly terminate. Previously, it would only wait 2 seconds.
[dagster-sdf] Increased the default timeout when terminating a run that is running an sdf subprocess to wait 25 seconds for the subprocess to cleanly terminate. Previously, it would only wait 2 seconds.
[dagster-sdf] Added support for caching and asset selection (Thanks, akbog!)
[dagster-dlt] Added support for AutomationCondition using DagsterDltTranslator.get_automation_condition() (Thanks, aksestok!)
[ui] Fixed a bug where in-progress runs from a backfill could not be terminated from the backfill UI.
[ui] Fixed a bug that caused an "Asset must be part of at least one job" error when clicking on an external asset in the asset graph UI
Fixed an issue where viewing run logs with the latest 5.0 release of the watchdog package raised an exception.
[ui] Fixed issue causing the “filter to group” action in the lineage graph to have no effect.
[ui] Fixed case sensitivity when searching for partitions in the launchpad.
[ui] Fixed a bug which would redirect to the events tab for an asset if you loaded the partitions tab directly.
[ui] Fixed issue causing runs to get skipped when paging through the runs list (Thanks, @HynekBlaha!)
[ui] Fixed a bug where the asset catalog list view for a particular group would show all assets.
[dagster-dbt] fix bug where empty newlines in raw dbt logs were not being handled correctly.
[dagster-k8s, dagster-celery-k8s] Correctly set dagster/image label when image is provided from user_defined_k8s_config. (Thanks, @HynekBlaha!)
[dagster-duckdb] Fixed an issue for DuckDB versions older than 1.0.0 where an unsupported configuration option, custom_user_agent, was provided by default
[dagster-k8s] Fixed an issue where Kubernetes Pipes failed to create a pod if the op name contained capital or non-alphanumeric containers.
[dagster-embedded-elt] Fixed an issue where dbt assets downstream of Sling were skipped
[dagser-aws]: Direct AWS API arguments in PipesGlueClient.run have been deprecated and will be removed in 1.9.0. The new params argument should be used instead.
The default io_manager on Serverless now supports the allow_missing_partitions configuration option.
Fixed a bug that caused an error when loading the launchpad for a partition, when using in Dagster+ with an agent with version below 1.8.2
1.8.3 (core) / 0.24.3 (libraries) (YANKED - This version of Dagster resulted in errors when trying to launch runs that target individual asset partitions)#
When different assets within a code location have different PartitionsDefinitions, there will no longer be an implicit asset job __ASSET_JOB_... for each PartitionsDefinition; there will just be one with all the assets. This reduces the time it takes to load code locations with assets with many different PartitionsDefinitions.
[ui] Fixed a collection of broken links pointing to renamed Declarative Automation pages.
[dagster-dbt] Fixed issue preventing usage of MultiPartitionMapping with @dbt_assets (Thanks, @arookieds!)
[dagster-azure] Fixed issue that would cause an error when configuring an AzureBlobComputeLogManager without a secret_key (Thanks, @ion-elgreco and @HynekBlaha!)
The default sqlite and dagster-postgres implementations have been altered to extract the
event step_key field as a column, to enable faster per-step queries. You will need to run
dagster instance migrate to update the schema. You may optionally migrate your historical event
log data to extract the step_key using the migrate_event_log_data function. This will ensure
that your historical event log data will be captured in future step-key based views. This
event_log data migration can be invoked as follows:
from dagster.core.storage.event_log.migration import migrate_event_log_data
from dagster import DagsterInstance
migrate_event_log_data(instance=DagsterInstance.get())
We have made pipeline metadata serializable and persist that along with run information.
While there are no user-facing features to leverage this yet, it does require an instance
migration. Run dagster instance migrate. If you have already run the migration for the
event_log changes above, you do not need to run it again. Any unforeseen errors related to the
new snapshot_id in the runs table or the new snapshots table are related to this migration.
dagster-pandas ColumnTypeConstraint has been removed in favor of ColumnDTypeFnConstraint and
ColumnDTypeInSetConstraint.
New
You can now specify that dagstermill output notebooks be yielded as an output from dagstermill
solids, in addition to being materialized.
You may now set the extension on files created using the FileManager machinery.
dagster-pandas typed PandasColumn constructors now support pandas 1.0 dtypes.
The Dagit Playground has been restructured to make the relationship between Preset, Partition
Sets, Modes, and subsets more clear. All of these buttons have be reconciled and moved to the
left side of the Playground.
Config sections that are required but not filled out in the Dagit playground are now detected
and labeled in orange.
dagster-celery config now support using env: to load from environment variables.
Bugfix
Fixed a bug where selecting a preset in dagit would not populate tags specified on the pipeline
definition.
Fixed a bug where metadata attached to a raised Failure was not displayed in the error modal in
dagit.
Fixed an issue where reimporting dagstermill and calling dagstermill.get_context() outside of
the parameters cell of a dagstermill notebook could lead to unexpected behavior.
Fixed an issue with connection pooling in dagster-postgres, improving responsiveness when using
the Postgres-backed storages.
Experimental
Added a longitudinal view of runs for on the Schedule tab for scheduled, partitioned pipelines.
Includes views of run status, execution time, and materializations across partitions. The UI is
in flux and is currently optimized for daily schedules, but feedback is welcome.
default_value in Field no longer accepts native instances of python enums. Instead
the underlying string representation in the config system must be used.
default_value in Field no longer accepts callables.
The dagster_aws imports have been reorganized; you should now import resources from
dagster_aws.<AWS service name>. dagster_aws provides s3, emr, redshift, and cloudwatch
modules.
The dagster_aws S3 resource no longer attempts to model the underlying boto3 API, and you can
now just use any boto3 S3 API directly on a S3 resource, e.g.
context.resources.s3.list_objects_v2. (#2292)
New
New Playground view in dagit showing an interactive config map
Improved storage and UI for showing schedule attempts
Added the ability to set default values in InputDefinition
Added CLI command dagster pipeline launch to launch runs using a configured RunLauncher
Added ability to specify pipeline run tags using the CLI
Added a pdb utility to SolidExecutionContext to help with debugging, available within a solid
as context.pdb
Added PresetDefinition.with_additional_config to allow for config overrides
Added resource name to log messages generated during resource initialization
Added grouping tags for runs that have been retried / reexecuted.
Bugfix
Fixed a bug where date range partitions with a specified end date was clipping the last day
Fixed an issue where some schedule attempts that failed to start would be marked running forever.
Fixed the @weekly partitioned schedule decorator
Fixed timezone inconsistencies between the runs view and the schedules view
Integers are now accepted as valid values for Float config fields
Fixed an issue when executing dagstermill solids with config that contained quote characters.
dagstermill
The Jupyter kernel to use may now be specified when creating dagster notebooks with the --kernel
flag.
dagster-dbt
dbt_solid now has a Nothing input to allow for sequencing
dagster-k8s
Added get_celery_engine_config to select celery engine, leveraging Celery infrastructure
Documentation
Improvements to the airline and bay bikes demos
Improvements to our dask deployment docs (Thanks jswaney!!)
Added the IntSource type, which lets integers be set from environment variables in config.
You may now set tags on pipeline definitions. These will resolve in the following cases:
Loading in the playground view in Dagit will pre-populate the tag container.
Loading partition sets from the preset/config picker will pre-populate the tag container with
the union of pipeline tags and partition tags, with partition tags taking precedence.
Executing from the CLI will generate runs with the pipeline tags.
Executing programmatically using the execute_pipeline api will create a run with the union
of pipeline tags and RunConfig tags, with RunConfig tags taking precedence.
Scheduled runs (both launched and executed) will have the union of pipeline tags and the
schedule tags function, with the schedule tags taking precedence.
Output materialization configs may now yield multiple Materializations, and the tutorial has
been updated to reflect this.
We now export the SolidExecutionContext in the public API so that users can correctly type hint
solid compute functions.
Dagit
Pipeline run tags are now preserved when resuming/retrying from Dagit.
Scheduled run stats are now grouped by partition.
A "preparing" section has been added to the execution viewer. This shows steps that are in
progress of starting execution.
Markers emitted by the underlying execution engines are now visualized in the Dagit execution
timeline.
Bugfix
Resume/retry now works as expected in the presence of solids that yield optional outputs.
Fixed an issue where dagster-celery workers were failing to start in the presence of config
values that were None.
Fixed an issue with attempting to set threads_per_worker on Dask distributed clusters.
dagster-postgres
All postgres config may now be set using environment variables in config.
dagster-aws
The s3_resource now exposes a list_objects_v2 method corresponding to the underlying boto3
API. (Thanks, @basilvetas!)
Added the redshift_resource to access Redshift databases.
dagster-k8s
The K8sRunLauncher config now includes the load_kubeconfig and kubeconfig_file options.
Documentation
Fixes and improvements.
Dependencies
dagster-airflow no longer pins its werkzeug dependency.
Community
We've added opt-in telemetry to Dagster so we can collect usage statistics in order to inform
development priorities. Telemetry data will motivate projects such as adding features in
frequently-used parts of the CLI and adding more examples in the docs in areas where users
encounter more errors.
We will not see or store solid definitions (including generated context) or pipeline definitions
(including modes and resources). We will not see or store any data that is processed within solids
and pipelines.
If you'd like to opt in to telemetry, please add the following to $DAGSTER_HOME/dagster.yaml:
telemetry:
enabled: true
Thanks to @basilvetas and @hspak for their contributions!
It is now possible to use Postgres to back schedule storage by configuring
dagster_postgres.PostgresScheduleStorage on the instance.
Added the execute_pipeline_with_mode API to allow executing a pipeline in test with a specific
mode without having to specify RunConfig.
Experimental support for retries in the Celery executor.
It is now possible to set run-level priorities for backfills run using the Celery executor by
passing --celery-base-priority to dagster pipeline backfill.
Added the @weekly schedule decorator.
Deprecations
The dagster-ge library has been removed from this release due to drift from the underlying
Great Expectations implementation.
dagster-pandas
PandasColumn now includes an is_optional flag, replacing the previous
ColumnExistsConstraint.
You can now pass the ignore_missing_values flag to PandasColumn in order to apply column
constraints only to the non-missing rows in a column.
dagster-k8s
The Helm chart now includes provision for an Ingress and for multiple Celery queues.
It is now possible to configure a Dagit instance to disable executing pipeline runs in a local
subprocess.
Resource initialization, teardown, and associated failure states now emit structured events
visible in Dagit. Structured events for pipeline errors and multiprocess execution have been
consolidated and rationalized.
Support Redis queue provider in dagster-k8s Helm chart.
Support external postgresql in dagster-k8s Helm chart.
Bugfix
Fixed an issue with inaccurate timings on some resource initializations.
Fixed an issue that could cause the multiprocess engine to spin forever.
Fixed an issue with default value resolution when a config value was set using SourceString.
Fixed an issue when loading logs from a pipeline belonging to a different repository in Dagit.
Fixed an issue with where the CLI command dagster schedule up would fail in certain scenarios
with the SystemCronScheduler.
Pandas
Column constraints can now be configured to permit NaN values.
Dagstermill
Removed a spurious dependency on sklearn.
Docs
Improvements and fixes to docs.
Restored dagster.readthedocs.io.
Experimental
An initial implementation of solid retries, throwing a RetryRequested exception, was added.
This API is experimental and likely to change.
Other
Renamed property runtime_type to dagster_type in definitions. The following are deprecated
and will be removed in a future version.
InputDefinition.runtime_type is deprecated. Use InputDefinition.dagster_type instead.
OutputDefinition.runtime_type is deprecated. Use OutputDefinition.dagster_type instead.
CompositeSolidDefinition.all_runtime_types is deprecated. Use
CompositeSolidDefinition.all_dagster_types instead.
SolidDefinition.all_runtime_types is deprecated. Use SolidDefinition.all_dagster_types
instead.
PipelineDefinition.has_runtime_type is deprecated. Use PipelineDefinition.has_dagster_type
instead.
PipelineDefinition.runtime_type_named is deprecated. Use
PipelineDefinition.dagster_type_named instead.
PipelineDefinition.all_runtime_types is deprecated. Use
PipelineDefinition.all_dagster_types instead.