As you scale your usage of multi-region clusters, you may need to keep certain subsets of data in specific localities. Keeping specific data on servers in specific geographic locations is also known as data domiciling.
CockroachDB has basic support for data domiciling in multi-region clusters using the ALTER DATABASE ... PLACEMENT RESTRICTED
statement.
Using CockroachDB as part of your approach to data domiciling has several limitations. For more information, see Limitations.
Overview
This page has instructions for data domiciling in multi-region clusters using the ALTER DATABASE ... PLACEMENT RESTRICTED
statement. At a high level, this process involves:
- Controlling the placement of specific row or table data using regional tables with the
REGIONAL BY ROW
andREGIONAL BY TABLE
clauses. - Further restricting where the data in those regional tables is stored using the
ALTER DATABASE ... PLACEMENT RESTRICTED
statement, which constrains the voting and non-voting replicas for a partition or table to be stored in only the home regions associated with those rows or tables.
Before you begin
This page assumes you are already familiar with:
- CockroachDB's multi-region SQL abstractions. If you are not using them, the instructions on this page will not apply.
- The fact that CockroachDB stores your data in a distributed key-value store, which is split into chunks called ranges.
Example
In the following example, you will go through the process of configuring the MovR data set using multi-region SQL statements. Then, as part of implementing a data domiciling strategy, you will apply restricted replica settings using the ALTER DATABASE ... PLACEMENT RESTRICTED
statement. Finally, you will verify that the resulting replica placements are as expected using replication reports.
For the purposes of this example, the data domiciling requirement is to configure a multi-region deployment of the MovR database such that data for EU-based users, vehicles, etc. is being stored on CockroachDB nodes running in EU localities.
If you want to do data domiciling for databases with region survival goals using the higher-level multi-region abstractions, you must use super regions. Using ALTER DATABASE ... PLACEMENT RESTRICTED
will not work for databases that are set up with region survival goals.
Step 1. Start a simulated multi-region cluster
Use the following cockroach demo
command to start the cluster. This particular combination of flags results in a demo cluster of 9 nodes, with 3 nodes in each region. It sets the appropriate node localities and also simulates the network latency that would occur between nodes in these localities. For more information about each flag, see the cockroach demo
documentation, especially for --global
.
$ cockroach demo --global --nodes 9 --no-example-database --insecure
When the cluster starts, you'll see a message like the one shown below, followed by a SQL prompt. Note the URLs for:
- Viewing the DB Console:
http://127.0.0.1:8080
. - Connecting to the database from a SQL shell or a programming language:
postgres://root@127.0.0.1:26257?sslmode=disable
.
#
# Welcome to the CockroachDB demo database!
#
# You are connected to a temporary, in-memory CockroachDB cluster of 9 nodes.
#
# This demo session will attempt to enable enterprise features
# by acquiring a temporary license from Cockroach Labs in the background.
# To disable this behavior, set the environment variable
# COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING=true.
#
# Reminder: your changes to data stored in the demo session will not be saved!
#
# Connection parameters:
# (webui) http://127.0.0.1:8080/demologin?password=demo76950&username=demo
# (sql) postgres://demo:demo76950@127.0.0.1:26257?sslmode=require
# (sql/unix) postgres://demo:demo76950@?host=%2Fvar%2Ffolders%2Fc8%2Fb_q93vjj0ybfz0fz0z8vy9zc0000gp%2FT%2Fdemo070856957&port=26257
#
# To display connection parameters for other nodes, use \demo ls.
#
# The user "demo" with password "demo76950" has been created. Use it to access the Web UI!
#
# Server version: CockroachDB CCL v21.1.2 (x86_64-apple-darwin19, built 2021/06/07 18:13:04, go1.15.11) (same version as client)
# Cluster ID: bfd9fc91-69bd-4417-a2f7-66e556bf2cfd
# Organization: Cockroach Demo
#
# Enter \? for a brief introduction.
#
You now have a cluster running across 9 nodes, with 3 nodes each in the following regions:
us-east1
us-west1
europe-west1
You can verify this using the SHOW REGIONS
statement:
SHOW REGIONS;
region | zones | database_names | primary_region_of
---------------+---------+----------------+--------------------
europe-west1 | {b,c,d} | {} | {}
us-east1 | {b,c,d} | {} | {}
us-west1 | {a,b,c} | {} | {}
(3 rows)
Step 2. Apply multi-region SQL abstractions
Execute the following statements to set the database regions. This information is necessary so that CockroachDB can later move data around to optimize access to particular data from particular regions.
ALTER DATABASE movr PRIMARY REGION "europe-west1";
ALTER DATABASE movr ADD REGION "us-east1";
ALTER DATABASE movr ADD REGION "us-west1";
Because the data in promo_codes
is not updated frequently (a.k.a., "read-mostly"), and needs to be available from any region, the right table locality is GLOBAL
.
ALTER TABLE promo_codes SET locality GLOBAL;
Next, alter the user_promo_codes
table to have a foreign key into the global promo_codes
table. This will enable fast reads of the promo_codes.code
column from any region in the cluster.
ALTER TABLE user_promo_codes
ADD CONSTRAINT user_promo_codes_code_fk
FOREIGN KEY (code)
REFERENCES promo_codes (code)
ON UPDATE CASCADE;
All of the tables except promo_codes
contain rows which are partitioned by region, and updated very frequently. For these tables, the right table locality for optimizing access to their data is REGIONAL BY ROW
.
Apply this table locality to the remaining tables. These statements use a CASE
statement to put data for a given city in the right region and can take around 1 minute to complete for each table.
rides
ALTER TABLE rides ADD COLUMN region crdb_internal_region AS ( CASE WHEN city = 'amsterdam' THEN 'europe-west1' WHEN city = 'paris' THEN 'europe-west1' WHEN city = 'rome' THEN 'europe-west1' WHEN city = 'new york' THEN 'us-east1' WHEN city = 'boston' THEN 'us-east1' WHEN city = 'washington dc' THEN 'us-east1' WHEN city = 'san francisco' THEN 'us-west1' WHEN city = 'seattle' THEN 'us-west1' WHEN city = 'los angeles' THEN 'us-west1' END ) STORED; ALTER TABLE rides ALTER COLUMN REGION SET NOT NULL; ALTER TABLE rides SET LOCALITY REGIONAL BY ROW AS "region";
user_promo_codes
ALTER TABLE user_promo_codes ADD COLUMN region crdb_internal_region AS ( CASE WHEN city = 'amsterdam' THEN 'europe-west1' WHEN city = 'paris' THEN 'europe-west1' WHEN city = 'rome' THEN 'europe-west1' WHEN city = 'new york' THEN 'us-east1' WHEN city = 'boston' THEN 'us-east1' WHEN city = 'washington dc' THEN 'us-east1' WHEN city = 'san francisco' THEN 'us-west1' WHEN city = 'seattle' THEN 'us-west1' WHEN city = 'los angeles' THEN 'us-west1' END ) STORED; ALTER TABLE user_promo_codes ALTER COLUMN REGION SET NOT NULL; ALTER TABLE user_promo_codes SET LOCALITY REGIONAL BY ROW AS "region";
users
ALTER TABLE users ADD COLUMN region crdb_internal_region AS ( CASE WHEN city = 'amsterdam' THEN 'europe-west1' WHEN city = 'paris' THEN 'europe-west1' WHEN city = 'rome' THEN 'europe-west1' WHEN city = 'new york' THEN 'us-east1' WHEN city = 'boston' THEN 'us-east1' WHEN city = 'washington dc' THEN 'us-east1' WHEN city = 'san francisco' THEN 'us-west1' WHEN city = 'seattle' THEN 'us-west1' WHEN city = 'los angeles' THEN 'us-west1' END ) STORED; ALTER TABLE users ALTER COLUMN REGION SET NOT NULL; ALTER TABLE users SET LOCALITY REGIONAL BY ROW AS "region";
vehicle_location_histories
ALTER TABLE vehicle_location_histories ADD COLUMN region crdb_internal_region AS ( CASE WHEN city = 'amsterdam' THEN 'europe-west1' WHEN city = 'paris' THEN 'europe-west1' WHEN city = 'rome' THEN 'europe-west1' WHEN city = 'new york' THEN 'us-east1' WHEN city = 'boston' THEN 'us-east1' WHEN city = 'washington dc' THEN 'us-east1' WHEN city = 'san francisco' THEN 'us-west1' WHEN city = 'seattle' THEN 'us-west1' WHEN city = 'los angeles' THEN 'us-west1' END ) STORED; ALTER TABLE vehicle_location_histories ALTER COLUMN REGION SET NOT NULL; ALTER TABLE vehicle_location_histories SET LOCALITY REGIONAL BY ROW AS "region";
vehicles
ALTER TABLE vehicles ADD COLUMN region crdb_internal_region AS ( CASE WHEN city = 'amsterdam' THEN 'europe-west1' WHEN city = 'paris' THEN 'europe-west1' WHEN city = 'rome' THEN 'europe-west1' WHEN city = 'new york' THEN 'us-east1' WHEN city = 'boston' THEN 'us-east1' WHEN city = 'washington dc' THEN 'us-east1' WHEN city = 'san francisco' THEN 'us-west1' WHEN city = 'seattle' THEN 'us-west1' WHEN city = 'los angeles' THEN 'us-west1' END ) STORED; ALTER TABLE vehicles ALTER COLUMN REGION SET NOT NULL; ALTER TABLE vehicles SET LOCALITY REGIONAL BY ROW AS "region";
Step 3. View noncompliant replicas
Next, run a replication report to see which ranges are still not in compliance with your desired domiciling: that data on EU-based entities (users, etc.) does not leave EU-based nodes.
On a small demo cluster like this one, the data movement from the previous step should have finished almost instantly; on larger clusters, the rebalancing process may take longer. For more information about the performance considerations of rebalancing data in multi-region clusters, see Performance considerations.
With the default settings, you should expect some replicas in the cluster to be violating this constraint. This is because non-voting replicas are enabled by default in multi-region clusters to enable stale reads of data in regional tables from outside those tables' home regions. For many use cases, this is preferred, but it keeps you from meeting the domiciling requirements for this example.
SELECT * FROM system.replication_constraint_stats WHERE violating_ranges > 0;
zone_id | subzone_id | type | config | report_id | violation_start | violating_ranges
----------+------------+------------+------------------------+-----------+-------------------------------+-------------------
52 | 0 | constraint | +region=europe-west1:1 | 1 | 2022-01-19 16:33:49.485535+00 | 10
52 | 0 | constraint | +region=us-east1:1 | 1 | 2022-01-19 16:34:49.930886+00 | 58
52 | 0 | constraint | +region=us-west1:1 | 1 | 2022-01-19 16:34:49.930886+00 | 61
Based on this output, you can see that plenty of replicas are out of compliance (see the violating_ranges
column) for the reason described above: the presence of non-voting replicas in other regions to enable fast stale reads from those regions.
The Replication Reports do not consider non-voting replicas located outside of a table's home region to be in compliance with the constraints on that table.
Next, run the query suggested in the Replication Reports documentation that should show which database and table names contain the violating_ranges
.
WITH
partition_violations
AS (
SELECT
*
FROM
system.replication_constraint_stats
WHERE
violating_ranges > 0
),
report
AS (
SELECT
crdb_internal.zones.zone_id,
crdb_internal.zones.subzone_id,
target,
database_name,
table_name,
index_name,
partition_violations.type,
partition_violations.config,
partition_violations.violation_start,
partition_violations.violating_ranges
FROM
crdb_internal.zones, partition_violations
WHERE
crdb_internal.zones.zone_id
= partition_violations.zone_id
)
SELECT * FROM report;
zone_id | subzone_id | target | database_name | table_name | index_name | type | config | violation_start | violating_ranges
----------+------------+---------------+---------------+------------+------------+------------+------------------------+-------------------------------+-------------------
52 | 0 | DATABASE movr | movr | NULL | NULL | constraint | +region=europe-west1:1 | 2022-01-19 16:33:49.485535+00 | 16
52 | 0 | DATABASE movr | movr | NULL | NULL | constraint | +region=us-west1:1 | 2022-01-19 16:34:49.930886+00 | 78
52 | 0 | DATABASE movr | movr | NULL | NULL | constraint | +region=us-east1:1 | 2022-01-19 16:34:49.930886+00 | 78
This output shows that the movr
database has ranges out of compliance, which you saw previously. Unfortunately, this output does not contain the table or index names due to a current limitation of the replication reports: non-voting replicas are not associated with any tables or indexes by the reports.
Step 4. Apply stricter replica placement settings
To ensure that data on EU-based users, vehicles, etc. from REGIONAL BY ROW
tables is stored only on EU-based nodes in the cluster, you must disable the use of non-voting replicas on all of the regional tables in this database. You can do this using the ALTER DATABASE ... PLACEMENT RESTRICTED
statement.
To use this statement, you must set the enable_multiregion_placement_policy
session setting or the sql.defaults.multiregion_placement_policy.enabled
cluster setting:
SET enable_multiregion_placement_policy=on;
SET
Next, use the ALTER DATABASE ... PLACEMENT RESTRICTED
statement to disable non-voting replicas for regional tables:
ALTER DATABASE movr PLACEMENT RESTRICTED;
ALTER DATABASE PLACEMENT
The restricted replica placement settings should start to apply immediately.
ALTER DATABASE ... PLACEMENT RESTRICTED
does not affect the replica placement for global tables, which are designed to provide fast, up-to-date reads from all database regions.
Use ALTER ROLE ALL SET {sessionvar} = {val}
instead of the sql.defaults.*
cluster settings. This allows you to set a default value for all users for any session variable that applies during login, making the sql.defaults.*
cluster settings redundant.
Step 5. Verify updated replica placement
Now that you have restricted the placement of non-voting replicas for all regional tables, you can run another replication report to see the effects:
SELECT * FROM system.replication_constraint_stats WHERE violating_ranges > 0;
zone_id | subzone_id | type | config | report_id | violation_start | violating_ranges
----------+------------+------------+--------------------+-----------+-------------------------------+-------------------
57 | 0 | constraint | +region=us-east1:1 | 1 | 2022-01-19 19:09:00.235247+00 | 1
57 | 0 | constraint | +region=us-west1:1 | 1 | 2022-01-19 19:09:00.235247+00 | 1
The output above shows that there are now far fewer replicas that do not meet the data domiciling goal. As described above, ALTER DATABASE ... PLACEMENT RESTRICTED
does not affect the replica placement for GLOBAL
tables, so it's likely that these few replicas are part of a global table.
To verify that the constraint violating replicas are indeed part of a GLOBAL
table, run the replication report query from Step 3 again as shown below. This will display the database and table names of these replicas.
WITH
partition_violations
AS (
SELECT
*
FROM
system.replication_constraint_stats
WHERE
violating_ranges > 0
),
report
AS (
SELECT
crdb_internal.zones.zone_id,
crdb_internal.zones.subzone_id,
target,
database_name,
table_name,
index_name,
partition_violations.type,
partition_violations.config,
partition_violations.violation_start,
partition_violations.violating_ranges
FROM
crdb_internal.zones, partition_violations
WHERE
crdb_internal.zones.zone_id
= partition_violations.zone_id
)
SELECT * FROM report;
zone_id | subzone_id | target | database_name | table_name | index_name | type | config | violation_start | violating_ranges
----------+------------+-------------------------------+---------------+-------------+------------+------------+--------------------+-------------------------------+-------------------
57 | 0 | TABLE movr.public.promo_codes | movr | promo_codes | NULL | constraint | +region=us-east1:1 | 2022-01-19 19:09:00.235247+00 | 1
57 | 0 | TABLE movr.public.promo_codes | movr | promo_codes | NULL | constraint | +region=us-west1:1 | 2022-01-19 19:09:00.235247+00 | 1
As expected, these replicas are part of the promo_codes
table, which was configured to use the GLOBAL
table locality in Step 2.
Now that you have verified that the system is configured to meet the domiciling requirement, it's a good idea to run these replication reports on a regular basis (via automation of some kind) to ensure that the requirement continues to be met.
The steps above are necessary but not sufficient to accomplish a data domiciling solution using CockroachDB. Be sure to review the limitations of CockroachDB for data domiciling and design your total solution with those limitations in mind.
Limitations
Using CockroachDB as part of your approach to data domiciling has several limitations:
- When columns are indexed, a subset of data from the indexed columns may appear in meta ranges or other system tables. CockroachDB synchronizes these system ranges and system tables across nodes. This synchronization does not respect any multi-region settings applied via either the multi-region SQL statements, or the low-level zone configs mechanism.
- Zone configs can be used for data placement but these features were historically built for performance, not for domiciling. The replication system's top priority is to prevent the loss of data and it may override the zone configurations if necessary to ensure data durability. For more information, see Configure Replication Zones.
- If your log files are kept in the region where they were generated, there is some cross-region leakage (like the system tables described previously), but the majority of user data that makes it into the logs is going to be homed in that region. If that's not strong enough, you can use the log redaction functionality to strip all raw data from the logs. You can also limit your log retention entirely.
- If you start a node with a
--locality
flag that says the node is in region A, but the node is actually running in some region B, data domiciling based on the inferred node placement will not work. A CockroachDB node only knows its locality based on the text supplied to the--locality
flag; it can not ensure that it is actually running in that physical location.
See also
- How to Choose a Multi-region Configuration
- Migrate to Multi-Region SQL
- Multi-Region Overview
- Low Latency Reads and Writes in a Multi-Region Cluster
- Multi-Region Capabilities Overview
- Reads and Writes in CockroachDB
- When to Use
REGIONAL
vs.GLOBAL
Tables - When to Use
ZONE
vs.REGION
Survival Goals ADD REGION
- Secondary regions
- Zone Config Extensions
ALTER DATABASE ... SET SECONDARY REGION
ALTER DATABASE ... DROP SECONDARY REGION