Skip to main content

Redshift configurations

Incremental materialization strategies

In dbt-redshift, the following incremental materialization strategies are supported:

  • append (default when unique_key is not defined)
  • merge
  • delete+insert (default when unique_key is defined)

All of these strategies are inherited from dbt-postgres.

Performance optimizations

Using sortkey and distkey

Tables in Amazon Redshift have two powerful optimizations to improve query performance: distkeys and sortkeys. Supplying these values as model-level configurations apply the corresponding settings in the generated CREATE TABLE DDLData Definition Language (DDL) is a group of SQL statements that you can execute to manage database objects, including tables, views, and more.. Note that these settings will have no effect on models set to view or ephemeral models.

  • dist can have a setting of all, even, auto, or the name of a key.
  • sort accepts a list of sort keys, for example: ['reporting_day', 'category']. dbt will build the sort key in the same order the fields are supplied.
  • sort_type can have a setting of interleaved or compound. if no setting is specified, sort_type defaults to compound.

When working with sort keys, it's highly recommended you follow Redshift's best practices on sort key effectiveness and cardinality.

Sort and dist keys should be added to the {{ config(...) }} block in model .sql files, eg:

my_model.sql
-- Example with one sort key
{{ config(materialized='table', sort='reporting_day', dist='unique_id') }}

select ...


-- Example with multiple sort keys
{{ config(materialized='table', sort=['category', 'region', 'reporting_day'], dist='received_at') }}

select ...


-- Example with interleaved sort keys
{{ config(materialized='table',
sort_type='interleaved'
sort=['category', 'region', 'reporting_day'],
dist='unique_id')
}}

select ...

For more information on distkeys and sortkeys, view Amazon's docs:

Late binding views

Redshift supports viewsA view (as opposed to a table) is a defined passthrough SQL query that can be run against a database (or data warehouse). unbound from their dependencies, or late binding views. This DDL option "unbinds" a view from the data it selects from. In practice, this means that if upstream views or tables are dropped with a cascade qualifier, the late-binding view does not get dropped as well.

Using late-binding views in a production deployment of dbt can vastly improve the availability of data in the warehouse, especially for models that are materialized as late-binding views and are queried by end-users, since they won’t be dropped when upstream models are updated. Additionally, late binding views can be used with external tables via Redshift Spectrum.

To materialize a dbt model as a late binding view, use the bind: false configuration option:

my_view.sql
{{ config(materialized='view', bind=False) }}

select *
from source.data

To make all views late-binding, configure your dbt_project.yml file like this:

dbt_project.yml
models:
+bind: false # Materialize all views as late-binding
project_name:
....

Materialized views

The Redshift adapter supports materialized views with the following configuration parameters:

ParameterTypeRequiredDefaultChange Monitoring Support
on_configuration_change<string>noapplyn/a
dist<string>noevendrop/create
sort[<string>]nononedrop/create
sort_type<string>noauto if no sort
compound if sort
drop/create
auto_refresh<boolean>nofalsealter
backup<string>notruen/a
dbt_project.yml
models:
<resource-path>:
+materialized: materialized_view
+on_configuration_change: apply | continue | fail
+dist: all | auto | even | <field-name>
+sort: <field-name> | [<field-name>]
+sort_type: auto | compound | interleaved
+auto_refresh: true | false
+backup: true | false

Many of these parameters correspond to their table counterparts and have been linked above. The parameters unique to materialized views are the auto-refresh and backup functionality, which are covered below.

Learn more about these parameters in Redshift's docs.

Auto-refresh

ParameterTypeRequiredDefaultChange Monitoring Support
auto_refresh<boolean>nofalsealter

Redshift supports automatic refresh configuration for materialized views. By default, a materialized view does not automatically refresh. dbt monitors this parameter for changes and applies them using an ALTER statement.

Learn more information about the parameters in the Redshift docs.

Backup

ParameterTypeRequiredDefaultChange Monitoring Support
backup<boolean>notruen/a

Redshift supports backup configuration of clusters at the object level. This parameter identifies if the materialized view should be backed up as part of the cluster snapshot. By default, a materialized view will be backed up during a cluster snapshot. dbt cannot monitor this parameter as it is not queryable within Redshift. If the value changes, the materialized view will need to go through a --full-refresh to set it.

Learn more about these parameters in Redshift's docs.

Limitations

As with most data platforms, there are limitations associated with materialized views. Some worth noting include:

  • Materialized views cannot reference views, temporary tables, user-defined functions, or late-binding tables.
  • Auto-refresh cannot be used if the materialized view references mutable functions, external schemas, or another materialized view.

Find more information about materialized view limitations in Redshift's docs.

Unit test limitations

Redshift doesn't support unit tests when the SQL in the common table expression (CTE) contains functions such as LISTAGG, MEDIAN, PERCENTILE_CONT, and so on. These functions must be executed against a user-created table. dbt combines given rows to be part of the CTE, which Redshift does not support.

In order to support this pattern in the future, dbt would need to "materialize" the input fixtures as tables, rather than interpolating them as CTEs. If you are interested in this functionality, we'd encourage you to participate in this issue in GitHub: dbt-labs/dbt-core#8499

0