Understanding dynamic table initialization and refresh¶
A dynamic table’s content is defined by a query and automatically updates — called a refresh — when the underlying data changes. This process analyzes the query to keep the table current.
The following sections explain dynamic table refresh in more detail:
Section |
Description |
---|---|
Introduces initialization, or in other words, the initial data population when you create a dynamic table. You can specify when the initial refresh occurs. |
|
An overview of dynamic table refresh. Dynamic tables refresh on a schedule unless manually refreshed. |
|
Dynamic tables support two refresh modes: incremental and full. You can either set the refresh mode to AUTO or set it explicitly. Learn best practices for choosing refresh mode and how to view the refresh mode used. |
|
How data is refreshed when a dynamic table depends on other dynamic tables |
Learn how dynamic tables refresh in relation to their dependencies. |
Understanding the effects of changes to columns in base tables |
Learn about the impact of changes to base tables. |
A list of best practices for dynamic table refresh. |
Understanding dynamic table initialization¶
When you create a dynamic table, its initial refresh takes place either synchronously at creation or at a scheduled time. The initial data population, or initialization, depends on when this initial refresh occurs.
Dynamic tables refresh based on the specified target lag, which sets the maximum allowed delay
between updates to the base tables and the dynamic table’s content. If you set INITIALIZE = ON_CREATE
(default), the table is initialized
immediately. If you set INITIALIZE = ON_SCHEDULE
, initialization happens within the specified target lag timeframe.
For example, consider a dynamic table, DT1
, with a target lag of 30 minutes. The initial data population for DT1
can occur as follows:
If
DT1
is set to refresh synchronously at creation (ON_CREATE
), it initializes at creation.If
DT1
is set to refresh at a scheduled time (ON_SCHEDULE
), it initializes within 30 minutes.
In scenarios with downstream dependencies, refresh behavior depends on the dependencies. For example, if dynamic table DT1
has a
downstream target lag and DT2
, which depends on DT1
, has a 30-minute target lag, DT1
refreshes only when DT2
refreshes.
For DT1
:
If set to refresh synchronously at creation, it initializes immediately. If initialization fails, the creation process stops, providing immediate feedback on any errors.
If set to refresh at a scheduled time, initialization depends on when
DT2
refreshes.
Initialization can take some time, depending on how much data is scanned. To track progress, see Troubleshoot dynamic table creation.
Understanding manual and scheduled refresh options¶
Dynamic tables are refreshed on a schedule that’s determined by the target lag. Every time a dynamic table is read, the data freshness is within the time period defined by the target lag.
You can manually refresh your dynamic tables to get the latest data using the ALTER DYNAMIC TABLE … REFRESH command or Snowsight. For more information, see Manually refresh dynamic tables.
Dynamic table refresh timeouts are controlled by the STATEMENT_TIMEOUT_IN_SECONDS parameter, which sets the maximum allowed duration at the account or warehouse level before a refresh is automatically canceled.
How target lag affects scheduled refreshes¶
Target lag controls the frequency of scheduled refreshes. To manually manage refreshes, set your dynamic table’s target lag to DOWNSTREAM and ensure that all downstream dynamic tables are also set to DOWNSTREAM.
Setting the entire Directed Acyclic Graph (DAG)’s target lag to DOWNSTREAM essentially disables scheduled refreshes because the final dynamic table controls the refresh schedule. If no dynamic table has a time-based target lag, the pipeline is suspended for scheduled refreshes. In this case, manually refreshing the most downstream table automatically refreshes any upstream dependencies.
Setting the target lag to DOWNSTREAM doesn’t specify exact times. Instead, Snowflake picks a refresh cadence to attempt to keep the lag under the target value. For example, a dynamic table with a target lag of 4 hours might refresh every 3.5 hours.
As a workaround, you can use a Task with a CRON schedule. For example:
CREATE TASK my_task
SCHEDULE = 'USING CRON <expr> <time_zone>'
AS
ALTER DYNAMIC TABLE my_dynamic_table REFRESH;
Dynamic table refresh modes¶
Dynamic tables support two refresh modes: incremental and full. You can either set the refresh mode to AUTO or set it explicitly:
AUTO refresh mode: When you use the
AUTO
parameter, Snowflake automatically selects the most cost- and time-effective refresh mode based on query complexity, supported constructs, operators, functions, and expected performance. If incremental refresh isn’t supported or is likely to perform poorly, Snowflake automatically selects full refresh instead.Incremental refresh mode: This mode analyzes the dynamic table’s query and calculates changes since the last refresh. It then merges these changes into the table.
Full refresh mode: This mode executes the dynamic table’s query and completely replaces the previously materialized results.
After you create a dynamic table, you can monitor the table to determine whether incremental or full refreshes are used to update that table.
Best practices for choosing dynamic table refresh modes¶
The best mode for your dynamic tables’ performance depends on data change volume and query complexity. Additionally, testing different refresh modes with a dedicated warehouse helps isolate costs and improve performance tuning based on actual workloads.
AUTO refresh mode: The system attempts to apply incremental refresh by default. When incremental refresh isn’t supported or might not perform well, the dynamic table automatically selects full refresh instead.
We strongly recommend using
AUTO
for most use cases because it allows Snowflake to optimize refresh behavior without manual tuning. For consistent behavior, explicitly set the refresh mode on all production tables.AUTO
behavior might change between Snowflake releases, which can cause unexpected changes in performance if used in production pipelines.
Incremental refresh: Updates the dynamic table with only the changes since the last refresh, making it ideal for large datasets with frequent small updates.
Best for queries compatible with incremental refresh (for example, deterministic functions, simple joins, and basic expressions in
SELECT
,WHERE
, andGROUP BY
). If unsupported features are present, and the refresh mode is set to incremental, Snowflake will fail to create the dynamic table.A key practice for optimizing performance with incremental refresh is to limit change volume to about 5% of the source data and to cluster your data by the grouping keys to reduce processing overhead.
Consider that certain combinations of operations, like aggregations atop many joins, might not run efficiently.
Full refresh: Reprocesses the entire dataset and updates the dynamic table with the complete query result. Use for complex queries or when significant data changes require a complete update.
Useful when incremental refresh isn’t supported due to complex queries, non-deterministic functions, or major changes in the data.
To determine the best mode for your use case, experiment with automatic recommendations and the concrete refresh modes (full and incremental).
Important
Dynamic tables in incremental refresh mode can’t be downstream from dynamic tables with full refresh mode. This is because incremental refresh mode is incompatible with the complete row changes that occur during each refresh of an upstream full refresh table.
To verify the refresh mode for your dynamic tables, see View dynamic table refresh mode.
View dynamic table refresh mode¶
You set the refresh mode when you create a dynamic table. After creation, you can check if your dynamic table uses incremental or full refreshes using one of the following methods, with a role that has the necessary privileges:
Execute the SHOW DYNAMIC TABLES command:
SHOW DYNAMIC TABLES;
In the output:
The
text
column shows the user-specified refresh mode.The
refresh_mode
column shows the actual refresh mode.The
refresh_mode_reason
shows why the actual refresh mode was chosen.
In the navigation menu, select Monitoring » Dynamic Tables, and then select your dynamic table.
You can view the refresh mode for the dynamic table in the object header at the top of the page. For full refreshes, the refresh mode reason is visible when hovering over the mode.
How data is refreshed when a dynamic table depends on other dynamic tables¶
When a dynamic table’s lag is set as a time measure, the automated refresh process schedules refreshes to best meet the target lag times.
In order to keep data consistent in cases when one dynamic table depends on another, the process refreshes all dynamic tables in an account at compatible times. The timing of less frequent refreshes coincides with the timing of more frequent refreshes. If refreshes take too long, the scheduler may skip refreshes to try to stay up to date. However, snapshot isolation is preserved.
For example, suppose that dynamic table DT1
has a target lag of two minutes and queries dynamic table DT2
, which has a target lag of
one minute. The process might determine that DT1
should be refreshed every 96 seconds, and DT2
every 48 seconds. As a result, the
process might apply the following schedule:
Specific Point in Time |
Dynamic Tables Refreshed |
---|---|
2022-12-01 00:00:00 |
DT1, DT2 |
2022-12-01 00:00:48 |
DT2 |
2022-12-01 00:01:36 |
DT1, DT2 |
2022-12-01 00:02:24 |
DT2 |
This means that at any given time, when you query a set of dynamic tables that depend on each other, you are querying the same “snapshot” of the data across these tables.
Note that the target lag of a dynamic table cannot be shorter than the target lag of the dynamic tables it depends on. For example, suppose that:
DT1
queries dynamic tablesDT2
andDT3
.DT2
has a target lag of five minutes.DT3
has a target lag of one minute.
This means that the target lag time for DT1
must not be shorter than five minutes (that is, not shorter than the longer of the lag times
for DT2
and DT3
).
If you set the lag for DT1
to five minutes, the process sets up a refresh schedule with these goals:
Refresh
DT3
often enough to keep its lag below one minute.Refresh
DT1
andDT2
together and often enough to keep their lags below five minutes.Ensure that the refresh for
DT1
andDT2
coincides with a refresh ofDT3
to ensure snapshot isolation.
Important
Dynamic tables in incremental refresh mode can’t be downstream from dynamic tables with full refresh mode. This is because incremental refresh mode is incompatible with the complete row changes that occur during each refresh of an upstream full refresh table.
Understanding the effects of changes to columns in base tables¶
When the underlying objects associated with a dynamic table change, the following behaviors apply:
Change |
Impact |
---|---|
|
None. If a new column is added to the base table or an unused column is deleted, no action occurs and refreshes continue as before. |
|
Reinitialization: The first refresh after recreation is initialization. |
|
Future refreshes of the dynamic table will fail. The dynamic table must be recreated to respond to the change. |
Best practices for dynamic table refresh¶
Use dedicated warehouses for refreshes¶
Dynamic tables require a virtual warehouse to perform refreshes. To get a clear understanding of costs related to your dynamic table pipelines, you should test your dynamic tables using dedicated warehouses such that the virtual warehouse consumption attributed to dynamic tables can be isolated.
For more information, see Understanding cost for dynamic tables.
Use downstream lag¶
Downstream lag indicates that the dynamic table should refresh when other dependent dynamic tables require refreshing. You should use downstream lag as a best practice because of its ease of use and cost effectiveness. Without downstream lag, managing a chain of complex dynamic tables would require individually assigning each table its own target lag and managing the associated constraints, instead of only monitoring the data freshness of the final table.
For more information, see Understanding dynamic table target lag.