BACKUP

On this page Carat arrow pointing down

CockroachDB's BACKUP statement allows you to create full or incremental backups of your cluster's schema and data that are consistent as of a given timestamp.

You can backup a full cluster, which includes:

You can also backup:

  • An individual database, which includes all of its tables and views.
  • An individual table, which includes its indexes and views.

    BACKUP only backs up entire tables; it does not support backing up subsets of a table.

Because CockroachDB is designed with high fault tolerance, these backups are designed primarily for disaster recovery (i.e., if your cluster loses a majority of its nodes) through RESTORE. Isolated issues (such as small-scale node outages) do not require any intervention. You can check that backups in external storage are valid by using a backup validation command.

To view the contents of an backup created with the BACKUP statement, use SHOW BACKUP.

Tip:

You can create schedules for periodic backups in CockroachDB. We recommend using scheduled backups to automate daily backups of your cluster.

Warning:

The BACKUP ... TO and RESTORE ... FROM syntax is deprecated as of v22.1 and will be removed in a future release.

We recommend using the BACKUP ... INTO {collectionURI} syntax, which creates or adds to a backup collection in your storage location. For restoring backups, we recommend using RESTORE FROM {backup} IN {collectionURI} with {backup} being LATEST or a specific subdirectory.

For guidance on the syntax for backups and restores, see the BACKUP and RESTORE examples.

Considerations

Storage considerations

  • HTTP storage is not supported for BACKUP and RESTORE.
  • Modifying backup files in the storage location could invalidate a backup, and therefore, prevent a restore. In v22.1 and later, we recommend enabling object locking in your cloud storage bucket.
  • While Cockroach Labs actively tests Amazon S3, Google Cloud Storage, and Azure Storage, we do not test S3-compatible services (e.g., MinIO, Red Hat Ceph).

Required privileges

Note:

Starting in v22.2, CockroachDB introduces a new system-level privilege model that provides finer control over a user's privilege to work with the database, including taking and managing backups.

There is continued support for the legacy privilege model for backups in v22.2, however it will be removed in a future release of CockroachDB. We recommend implementing the new privilege model that follows in this section for all new and existing backups.

New in v22.2: You can grant the BACKUP privilege to a user or role depending on the type of backup:

Backup Privilege
Cluster Grant a user the BACKUP system-level privilege. For example, GRANT SYSTEM BACKUP TO user;.
Database Grant a user the BACKUP privilege on the target database. For example, GRANT BACKUP ON DATABASE test_db TO user;.
Table Grant a user the BACKUP privilege at the table level. This gives the user the privilege to back up the schema and all user-defined types that are associated with the table. For example, GRANT BACKUP ON TABLE test_db.table TO user;.

The listed privileges do not cascade to objects lower in the schema tree. For example, if you are granted database-level BACKUP privileges, this does not give you the privilege to back up a table. If you need the BACKUP privilege on a database to apply to all newly created tables in that database, use DEFAULT PRIVILEGES. You can add BACKUP to the user or role's default privileges with ALTER DEFAULT PRIVILEGES.

Note:

You can grant the BACKUP privilege to a user or role without the SELECT privilege on a table. As a result, these users will be able to take backups, but they will not be able to run a SELECT query on that data directly. However, these users could still read this data indirectly, by restoring it from any backups they produce.

Members of the admin role can run all three types of backups (cluster, database, and table) without the need to grant a specific BACKUP privilege. However, we recommend using the BACKUP privilege model to create users or roles and grant them BACKUP privileges as necessary for stronger access control.

See GRANT for detail on granting privileges to a role or user.

Required privileges using the legacy privilege model

The following details the legacy privilege model that CockroachDB supports in v22.2 and earlier. Support for this privilege model will be removed in a future release of CockroachDB:

  • Full cluster backups can only be run by members of the admin role. By default, the root user belongs to the admin role.
  • For all other backups, the user must have read access on all objects being backed up. Database backups require CONNECT privileges, and table backups require SELECT privileges. Backups of user-defined schemas, or backups containing user-defined types, require USAGE privileges.

See the Required privileges section for the updated privilege model.

Destination privileges

New in v22.2: You can grant a user the EXTERNALIOIMPLICITACCESS system-level privilege.

Either the EXTERNALIOIMPLICITACCESS system-level privilege or the admin role is required for the following scenarios:

No special privilege is required for:

  • Interacting with an Amazon S3 and Google Cloud Storage resource using SPECIFIED credentials. Azure Storage is always SPECIFIED by default.
  • Using Userfile storage.

We recommend using cloud storage for CockroachDB operations. You also need to ensure that the permissions at your storage destination are configured for the bulk operation. See Storage Permissions for a list of the necessary permissions that each bulk operation requires.

Warning:

While Cockroach Labs actively tests Amazon S3, Google Cloud Storage, and Azure Storage, we do not test S3-compatible services (e.g., MinIO, Red Hat Ceph).

Synopsis

BACKUP TABLE table_pattern , DATABASE database_name , INTO subdirectory LATEST IN collectionURI ( localityURI , ) AS OF SYSTEM TIME timestamp WITH backup_options , OPTIONS ( backup_options , )

Parameters

CockroachDB stores full backups in a backup collection. Each full backup in a collection may also have incremental backups. For more detail on this, see Backup collections.

Parameter Description
targets Back up the listed targets.
subdirectory The name of the specific backup (e.g., 2021/03/23-213101.37) in the collection to which you want to add an incremental backup. To view available backup subdirectories, use SHOW BACKUPS IN collectionURI. If the backup subdirectory is not provided, incremental backups will be stored in the default /incrementals directory at the root of the collection URI. See the Create incremental backups example.

Warning: If you use an arbitrary STRING as the subdirectory, a new full backup will be created, but it will never be shown in SHOW BACKUPS IN. We do not recommend using arbitrary strings as subdirectory names.
LATEST Append an incremental backup to the latest completed full backup's subdirectory.
collectionURI The URI where you want to store the backup. (Or, the default locality for a locality-aware backup.)

For information about this URL structure, see Backup File URLs.
localityURI The URI containing the COCKROACH_LOCALITY parameter for a non-default locality that is part of a single locality-aware backup.
timestamp Back up data as it existed as of timestamp. The timestamp must be more recent than your data's garbage collection TTL (which is controlled by the gc.ttlseconds replication zone variable).
backup_options Control the backup behavior with a comma-separated list of these options.

Targets

Target Description
N/A Backup the cluster. For an example of a full cluster backup, see Backup a cluster.
DATABASE {database_name} [, ...] The name of the database(s) you want to backup (i.e., create backups of all tables and views in the database). For an example of backing up a database, see Backup a database.
TABLE {table_name} [, ...] The name of the table(s) or view(s) you want to backup. For an example of backing up a table or view, see Backup a table or view.

Options

Option Value Description
revision_history BOOL / None Create a backup with full revision history, which records every change made to the cluster within the garbage collection period leading up to and including the given timestamp.

You can specify a backup with revision history without any value e.g., WITH revision_history. Or, you can explicitly define WITH revision_history = 'true' / 'false'. revision_history defaults to true when used with BACKUP or CREATE SCHEDULE FOR BACKUP. A value is required when using ALTER BACKUP SCHEDULE.
encryption_passphrase STRING The passphrase used to encrypt the files (BACKUP manifest and data files) that the BACKUP statement generates. This same passphrase is needed to decrypt the file when it is used to restore and to list the contents of the backup when using SHOW BACKUP. There is no practical limit on the length of the passphrase.
detached BOOL / None When a backup runs in detached mode, it will execute asynchronously. The job ID will be returned after the backup job creation completes. Note that with detached specified, further job information and the job completion status will not be returned. For more on the differences between the returned job data, see the example. To check on the job status, use the SHOW JOBS statement. Backups running on a schedule have the detached option applied implicitly.

To run a backup within a transaction, use the detached option.
kms STRING The key management service (KMS) URI (or a comma-separated list of URIs) used to encrypt the files (BACKUP manifest and data files) that the BACKUP statement generates. This same KMS URI is needed to decrypt the file when it is used to restore and to list the contents of the backup when using SHOW BACKUP.

Currently, AWS KMS and Google Cloud KMS are supported.
incremental_location STRING Create an incremental backup in a different location than the default incremental backup location.

WITH incremental_location = 'explicit_incrementals_URI'

See Incremental backups with explicitly specified destinations for usage.

Backup file URLs

CockroachDB uses the URL provided to construct a secure API call to the service you specify. The URL structure depends on the type of file storage you are using. For more information, see the following:

New in v22.2: You can create an external connection to represent an external storage or sink URI. This allows you to specify the external connection's name in statements rather than the provider-specific URI. For detail on using external connections, see the CREATE EXTERNAL CONNECTION page.

Tip:

Backups support cloud object locking and Amazon S3 storage classes. For more detail, see Additional cloud storage feature support.

Functional details

Object dependencies

Dependent objects must be backed up at the same time as the objects they depend on.

Object Depends On
Table with foreign key constraints The table it REFERENCES; however, this dependency can be removed during the restore.
Table with a sequence The sequence it uses; however, this dependency can be removed during the restore.
Views The tables used in the view's SELECT statement.
Note:

To exclude a table's row data from a backup, use the exclude_data_from_backup parameter with CREATE TABLE or ALTER TABLE.

For more detail, see the Exclude a table's data from backups example.

Users and privileges

The system.users table stores your users and their passwords. To restore your users and privilege grants, do a cluster backup and restore the cluster to a fresh cluster with no user data. You can also backup the system.users table, and then use this procedure.

Performance

The BACKUP process minimizes its impact to the cluster's performance by distributing work to all nodes. Each node backs up only a specific subset of the data it stores (those for which it serves writes), with no two nodes backing up the same data.

BACKUP, like any read, cannot export a range if the range contains an unresolved intent. While you typically will want bulk, background jobs like BACKUP to have as little impact on your foreground traffic as possible, it's more important for backups to actually complete (which maintains your recovery point objective (RPO)). Unlike a normal read transaction that will block until any uncommitted writes it encounters are resolved, BACKUP will block only for a configurable duration before invoking priority to ensure it can complete on-time.

We recommend always starting backups with a specific timestamp at least 10 seconds in the past. For example:

> BACKUP...AS OF SYSTEM TIME '-10s';

This improves performance by decreasing the likelihood that the BACKUP will be retried because it contends with other statements/transactions. However, because AS OF SYSTEM TIME returns historical data, your reads might be stale. Taking backups with AS OF SYSTEM TIME '-10s' is a good best practice to reduce the number of still-running transactions you may encounter, since the backup will take priority and will force still-running transactions to restart after the backup is finished.

BACKUP will initially ask individual ranges to backup but to skip if they encounter an intent. Any range that is skipped is placed at the end of the queue. When BACKUP has completed its initial pass and is revisiting ranges, it will ask any range that did not resolve within the given time limit (default 1 minute) to attempt to resolve any intents that it encounters and to not skip. Additionally, the backup's transaction priority is then set to high, which causes other transactions to abort until the intents are resolved and the backup is finished.

A backup job will pause instead of entering a failed state if it continues to encounter transient errors once it has retried a maximum number of times. Once the backup has paused, you can either resume or cancel it.

See the Backup and Restore Monitoring page for detail on monitoring backup and restore jobs and a list of the available metrics.

Backup performance configuration

Cluster settings provide a means to tune a CockroachDB cluster. The following cluster settings are helpful for configuring backup files and performance:

bulkio.backup.file_size

Set a target for the amount of backup data written to each backup file. This is the maximum target size the backup will reach, but it is possible files of a smaller size are created during the backup job.

Note that if you lower bulkio.backup.file_size below the default, it will cause the backup job to create many small SST files, which could impact a restore job’s performance because it will need to keep track of so many small files.

Default: 128 MiB

cloudstorage.azure.concurrent_upload_buffers

Improve the speed of backups to Azure Storage by increasing cloudstorage.azure.concurrent_upload_buffers to 3. This setting configures the number of concurrent buffers that are used during file uploads to Azure Storage. Note that the higher this setting the more data that is held in memory, which can increase the risk of OOMs if there is not sufficient memory on each node.

Default: 1

For a complete list, including all cluster settings related to backups, see the Cluster Settings page.

Viewing and controlling backups jobs

After CockroachDB successfully initiates a backup, it registers the backup as a job, and you can do the following:

Action SQL Statement
View the backup status SHOW JOBS
Pause the backup PAUSE JOB
Resume the backup RESUME JOB
Cancel the backup CANCEL JOB

You can also visit the Jobs page of the DB Console to view job details. The BACKUP statement will return when the backup is finished or if it encounters an error.

Note:

The presence of the BACKUP MANIFEST file in the backup subdirectory is an indicator that the backup job completed successfully.

Examples

Per our guidance in the Performance section, we recommend starting backups from a time at least 10 seconds in the past using AS OF SYSTEM TIME.

The following examples make use of:

Also, note the following features for connecting and authenticating to cloud storage:

  • New in v22.2: External connections, which allow you to represent an external storage or sink URI. You can then specify the external connection's name in statements rather than the provider-specific URI. For detail on using external connections, see the CREATE EXTERNAL CONNECTION page.
  • New in v22.2: Assume role authentication, which allows you to limit the control specific users have over your storage buckets. See Assume role authentication for more information.
Note:

The BACKUP ... TO syntax is deprecated as of v22.1 and will be removed in a future release.

We recommend using the BACKUP ... INTO {collectionURI} syntax as per the following examples.

Backup a cluster

To take a full backup of a cluster:

icon/buttons/copy
> BACKUP INTO \
's3://{BUCKET NAME}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}' \
AS OF SYSTEM TIME '-10s';

Backup a database

To take a full backup of a single database:

icon/buttons/copy
> BACKUP DATABASE bank \
INTO 's3://{BUCKET NAME}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}' \
AS OF SYSTEM TIME '-10s';

To take a full backup of multiple databases:

icon/buttons/copy
> BACKUP DATABASE bank, employees \
INTO 's3://{BUCKET NAME}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}' \
AS OF SYSTEM TIME '-10s';

Backup a table or view

To take a full backup of a single table or view:

icon/buttons/copy
> BACKUP bank.customers \
INTO 's3://{BUCKET NAME}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}' \
AS OF SYSTEM TIME '-10s';

To take a full backup of multiple tables:

icon/buttons/copy
> BACKUP bank.customers, bank.accounts \
INTO 's3://{BUCKET NAME}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}' \
AS OF SYSTEM TIME '-10s';

Backup all tables in a schema

To back up all tables in a specified schema, use a wildcard with the schema name:

icon/buttons/copy
> BACKUP test_schema.*
INTO 's3://{BUCKET NAME}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}' \
AS OF SYSTEM TIME '-10s';

Alternatively, use a fully qualified name: database.schema.*.

With this syntax, schemas will be resolved before databases. test_object.* will resolve to a schema of test_object within the set current database before matching to a database of test_object.

If a database and schema have the same name, such as bank.bank, running BACKUP bank.* will result in the schema resolving first. All the tables within that schema will be backed up. However, if this were to be run from a different database that does not have a bank schema, all tables in the bank database will be backed up.

See Name Resolution for more details on how naming hierarchy and name resolution work in CockroachDB.

Create incremental backups

When a BACKUP statement specifies an existing subdirectory in the collection, explicitly or via the LATEST keyword, an incremental backup will be added to the default /incrementals directory at the root of the collection storage location.

To take an incremental backup using the LATEST keyword:

icon/buttons/copy
> BACKUP INTO LATEST IN \
    's3://{BUCKET NAME}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}' \
    AS OF SYSTEM TIME '-10s';

To store the backup in an existing subdirectory in the collection:

icon/buttons/copy
BACKUP INTO {'subdirectory'} IN 's3://{BUCKET NAME}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}' \
AS OF SYSTEM TIME '-10s';
Note:

If you intend to take a full backup, we recommend running BACKUP INTO {collectionURI} without specifying a subdirectory.

To explicitly control where you store your incremental backups, use the incremental_location option. For more detail, see this example demonstrating the incremental_location option.

Run a backup asynchronously

Use the DETACHED option to execute the backup job asynchronously:

icon/buttons/copy
> BACKUP INTO \
's3://{BUCKET NAME}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}' \
AS OF SYSTEM TIME '-10s'
WITH DETACHED;

The job ID is returned after the backup job creation completes:

        job_id
----------------------
  592786066399264769
(1 row)

Without the DETACHED option, BACKUP will block the SQL connection until the job completes. Once finished, the job status and more detailed job data is returned:

job_id             |  status   | fraction_completed | rows | index_entries | bytes
---------------------+-----------+--------------------+------+---------------+--------
652471804772712449 | succeeded |                  1 |   50 |             0 |  4911
(1 row)

Back up with an S3 storage class

To associate your backup objects with a specific storage class in your Amazon S3 bucket, use the S3_STORAGE_CLASS parameter with the class. For example, the following S3 connection URI specifies the INTELLIGENT_TIERING storage class:

icon/buttons/copy
BACKUP DATABASE movr INTO 's3://{BUCKET NAME}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}&S3_STORAGE_CLASS=INTELLIGENT_TIERING' AS OF SYSTEM TIME '-10s';

Use the parameter to set one of these storage classes listed in Amazon's documentation. For more general usage information, see Amazon's Using Amazon S3 storage classes documentation.

Warning:

Incremental backups are not compatible with the S3 Glacier Flexible Retrieval or Glacier Deep Archive storage classes. Incremental backups require ad-hoc reading of previous backups. The Glacier Flexible Retrieval or Glacier Deep Archive storage classes do not allow immediate access to S3 objects without first restoring the objects. See Amazon's documentation on Restoring an archived object for more detail.

Advanced examples

For examples of advanced BACKUP and RESTORE use cases, see:

See also


Yes No
On this page

Yes No