This page provides an overview of CockroachDB multi-region capabilities.
Overview
CockroachDB multi-region capabilities make it easier to run global applications. To use these capabilities effectively, you should understand the following concepts:
- Cluster region is a geographic region that you specify at node start time.
- Database region is a geographic region in which a database operates. You must choose a database region from the list of available cluster regions.
- Survival goal dictates how many simultaneous failure(s) a database can survive.
- Table locality determines how CockroachDB optimizes access to a table's data.
At a high level, the simplest process for running a multi-region cluster is:
- Set region information for each node in the cluster at startup using node startup locality options.
- Add one or more regions to a database, making it a "multi-region" database. One of these regions must be the primary region.
- (Optional) Change table localities (global, regional by table, regional by row). This step is optional because by default the tables in a database will be homed in the database's primary region (as set during Step 1).
- (Optional) Change the database's survival goals (zone or region). This step is optional because by default multi-region databases will be configured to survive zone failures.
These steps describe the simplest case, where you accept all of the default settings. The latter two steps are optional, but table locality and survival goals have a significant impact on performance. Therefore Cockroach Labs recommends that you give these aspects some consideration when you choose a multi-region configuration.
For new clusters using the multi-region SQL abstractions, Cockroach Labs recommends lowering the --max-offset
setting to 250ms
. This setting is especially helpful for lowering the write latency of global tables. Nodes can run with different values for --max-offset
, but only for the purpose of updating the setting across the cluster using a rolling upgrade.
Cluster regions
You define a cluster region at the node level using the region
key and the zone using the zone
key in the node startup locality options.
For example, the following command adds us-east-1
to the list of cluster regions and us-east-1b
to the list of zones:
cockroach start --locality=region=us-east-1,zone=us-east-1b # ... other required flags go here
To show all of a cluster's regions, execute the following SQL statement:
SHOW REGIONS FROM CLUSTER;
Database regions
A database region is a high-level abstraction for a geographic region. Each region is broken into multiple zones. These terms are meant to correspond directly to the region and zone terminology used by cloud providers.
The regions added during node startup become database regions when you add them to a database. To add the first region, use the ALTER DATABASE ... PRIMARY REGION
statement.
While the database has only one region assigned to it, it is considered a "multi-region database." This means that all data in that database is stored within its assigned regions, and CockroachDB optimizes access to the database's data from the primary region. If the default survival goals and table localities meet your needs, there is nothing else you need to do once you have set a database's primary region.
To add another database region, use the ALTER DATABASE ... ADD REGION
statement.
To show all of a database's regions, execute the SHOW REGIONS FROM DATABASE
statement.
If the default survival goals and table localities meet your needs, there is nothing else you need to do once you have set a database's primary region.
Super regions
This is an experimental feature. The interface and output are subject to change.
Super regions allow you to define a set of database regions such that the following schema objects will have all of their replicas stored only in regions that are members of the super region:
- Regional tables whose home region is a member of the super region.
- Any row of a regional by row table whose home region is a member of the super region.
The primary use case for super regions is data domiciling. As mentioned above, data from regional and regional by row tables will be stored only in regions that are members of the super region. Further, if the super region contains 3 or more regions and if you use REGION
survival goals, the data domiciled in the super region will remain available if you lose a region.
To use super regions, keep the following considerations in mind:
- Your cluster must be a multi-region cluster.
- Super regions must be enabled.
- Super regions can only contain database regions that have already been added with
ALTER DATABASE ... ADD REGION
. - Each database region can only belong to one super region. In other words, given two super regions A and B, the set of database regions in A must be disjoint from the set of database regions in B.
- You cannot drop a region that is part of a super region until you either alter the super region to remove it, or drop the super region altogether.
For more information about how to enable and use super regions, see:
Note that super regions take a different approach to data domiciling than ALTER DATABASE ... PLACEMENT RESTRICTED
. Specifically, super regions make it so that all replicas (both voting and non-voting) are placed within the super region, whereas PLACEMENT RESTRICTED
makes it so that there are no non-voting replicas.
For more information about data domiciling using PLACEMENT RESTRICTED
, see Data Domiciling with CockroachDB.
If you want to do data domiciling for databases with region survival goals using the higher-level multi-region abstractions, you must use super regions. Using ALTER DATABASE ... PLACEMENT RESTRICTED
will not work for databases that are set up with region survival goals.
Super regions rely on the underlying replication zone system, which was historically built for performance, not for domiciling. The replication system's top priority is to prevent the loss of data and it may override the zone configurations if necessary to ensure data durability. For more information, see Configure Replication Zones.
If you are using super regions in your cluster, there are additional constraints when using secondary regions:
- If the primary region is in a super region, the secondary region must be a region within the primary's super region.
- If the primary region is not in a super region, the secondary region must not be within a super region.
Secondary regions
Secondary regions allow you to define a database region that will be used for failover in the event your primary region goes down. In other words, the secondary region will act as the primary region if the original primary region fails.
Secondary regions work as follows: when a secondary region is added to a database, a lease preference is added to the tables and indexes in that database to ensure that two voting replicas are moved into the secondary region.
This behavior is an improvement over versions of CockroachDB prior to v22.2. In those versions, when the primary region failed, the leaseholders would be transferred to another database region at random, which could have negative effects on performance.
For more information about how to use secondary regions, see:
If you are using super regions in your cluster, there are additional constraints when using secondary regions:
- If the primary region is in a super region, the secondary region must be a region within the primary's super region.
- If the primary region is not in a super region, the secondary region must not be within a super region.
Survival goals
A survival goal dictates how many simultaneous failure(s) a database can survive. All tables within the same database operate with the same survival goal. Each database can have its own survival goal setting.
The following survival goals are available:
- Zone failure
- Region failure
The zone failure survival goal is the default. You can configure a database to survive region failures at the cost of slower write performance (due to network hops) using the following statement:
ALTER DATABASE <db> SURVIVE REGION FAILURE;
For more information about the survival goals supported by CockroachDB, see the following sections:
Survive zone failures
With the zone level survival goal, the database will remain fully available for reads and writes, even if a zone becomes unavailable. However, the database may not remain fully available if multiple zones in the same region fail. This is the default setting for multi-region databases.
You can configure a database to survive zone failures using the ALTER DATABASE ... SURVIVE ZONE FAILURE
statement.
If your application has performance or availability needs that are different than what the default settings provide, you can explore the other customization options described on this page.
Survive region failures
The region level survival goal has the property that the database will remain fully available for reads and writes, even if an entire region becomes unavailable. This added survival comes at a cost: write latency will be increased by at least as much as the round-trip time to the nearest region. Read performance will be unaffected. In other words, you are adding network hops and making writes slower in exchange for robustness.
You can upgrade a database to survive region failures using the ALTER DATABASE ... SURVIVE REGION FAILURE
statement.
Setting this goal on a database in a cluster with 3 cluster regions will automatically increase the replication factor of the ranges underlying the database from 3 (the default) to 5. This ensures that there will be 5 replicas of each range spread across the 3 regions (2+2+1=5). This is how CockroachDB is able to provide region level resiliency while maintaining good read performance in the leaseholder's region. For writes, CockroachDB will need to coordinate across 2 of the 3 regions, so you will pay additional write latency in exchange for the increased resiliency.
To survive region failures, you must add at least 3 database regions.
Table localities
Table locality determines how CockroachDB optimizes access to the table's data. Every table in a multi-region database has a "table locality setting" that configures one or more home regions at the table or row level. A table or row's home region is where the leaseholder of its ranges is placed, along with a number of voting replicas determined by the survival goal of the database.
By default, all tables in a multi-region database are regional tables—that is, CockroachDB optimizes access to the table's data from a single home region (by default, the database's primary region).
For information about the table localities CockroachDB supports, see the sections:
- Regional tables provide low-latency reads and writes for an entire table from a single region.
- Regional by row tables provide low-latency reads and writes for one or more rows of a table from a single region. Different rows in the table can be optimized for access from different regions.
- Global tables are optimized for low-latency reads from all regions.
Table locality settings are used for optimizing latency under different read and write patterns. If you are optimizing for read and write access to all of your tables from a single region (the primary region), there is nothing else you need to do once you set your database's primary region.
Regional tables
In a regional table, access to the table will be fast in the table's home region and slower in other regions. In other words, CockroachDB optimizes access to data in a regional table from a single region. By default, a regional table's home region is the database's primary region, but that can be changed to use any region in the database. Regional tables work well when your application requires low-latency reads and writes for an entire table from a single region.
For instructions showing how to set a table's locality to REGIONAL BY TABLE
and configure its home region, see ALTER TABLE ... SET LOCALITY
.
By default, all tables in a multi-region database are regional tables that use the database's primary region. Unless you know your application needs different performance characteristics than regional tables provide, there is no need to change this setting.
Regional by row tables
In a regional by row table, individual rows are optimized for access from different home regions. Every row in a regional by row table has a column of type crdb_internal_region
that represents the row's home region. The REGIONAL BY ROW
setting automatically divides a table and all of its indexes into partitions, with each partition optimized for access from a different region. Like regional tables, regional by row tables are optimized for access from a single region. However, that region is specified at the row level instead of applying to the whole table.
Use regional by row tables when your application requires low-latency reads and writes at a row level where individual rows are primarily accessed from a single region. For example, a users table in a global application may need to keep some users' data in specific regions for better performance.
For an example of a table that can benefit from the regional by row setting in a multi-region deployment, see the users
table from the MovR application.
For instructions showing how to set a table's locality to REGIONAL BY ROW
and configure its home regions, see ALTER TABLE ... SET LOCALITY
.
Global tables
A global table is optimized for low-latency reads from every region in the database. This means that any region can effectively act as the home region of the table. The tradeoff is that writes will incur higher latencies from any given region, since writes have to be replicated across every region to make the global low-latency reads possible. Use global tables when your application has a "read-mostly" table of reference data that is rarely updated, and needs to be available to all regions.
For an example of a table that can benefit from the global table locality setting in a multi-region deployment, see the promo_codes
table from the MovR application.
For instructions showing how to set a table's locality to GLOBAL
, see ALTER TABLE ... SET LOCALITY
.
For more information about global tables, including troubleshooting information, see Global Tables.
Additional features
The features listed in this section make working with multi-region clusters easier.
Indexes on REGIONAL BY ROW
tables
In multi-region deployments, most users should use REGIONAL BY ROW
tables instead of explicit index partitioning. When you add an index to a REGIONAL BY ROW
table, it is automatically partitioned on the crdb_region
column. Explicit index partitioning is not required.
While CockroachDB process an ADD REGION
or DROP REGION
statement on a particular database, creating or modifying an index will throw an error. Similarly, all ADD REGION
and DROP REGION
statements will be blocked while an index is being modified on a REGIONAL BY ROW
table within the same database.
This behavior also applies to GIN indexes.
For an example that uses unique indexes but applies to all indexes on REGIONAL BY ROW
tables, see Add a unique index to a REGIONAL BY ROW
table.
Regional by row tables can take advantage of hash-sharded indexes provided the crdb_region
column is not part of the columns in the hash-sharded index.
Zone config extensions
Zone Config Extensions are a customization tool for advanced users to persistently modify the configuration generated by the standard multi-region SQL abstractions on a per-region basis.
For more information, see Zone Config Extensions.
Schema changes in multi-region clusters
To reduce latency while making online schema changes, we recommend specifying a lease_preference
zone configuration on the system
database to a single region and running all subsequent schema changes from a node within that region. For example, if the majority of online schema changes come from machines that are geographically close to us-east1
, run the following:
ALTER DATABASE system CONFIGURE ZONE USING constraints = '{"+region=us-east1": 1}', lease_preferences = '[[+region=us-east1]]';
Run all subsequent schema changes from a node in the specified region.
If you do not intend to run more schema changes from that region, you can safely remove the lease preference from the zone configuration for the system database.
Next steps
See also
- When to Use
ZONE
vs.REGION
Survival Goals - When to Use
REGIONAL
vs.GLOBAL
Tables - Global Tables
- Topology Patterns
- Disaster Recovery
- Develop and Deploy a Global Application
- Low Latency Reads and Writes in a Multi-Region Cluster
- Migrate to Multi-Region SQL
SET SECONDARY REGION
ALTER DATABASE ... DROP SECONDARY REGION
- Zone Config Extensions