CockroachDB provides backup validation tools to check that backups you have in storage are restorable. Although a successful restore completely validates a backup, the validation tools offer a faster alternative and return an error message if a backup is not valid.
You can validate a cluster, database, or table backup with one of the following SHOW BACKUP
or RESTORE
statements. The three options result in increasing levels of backup validation:
SHOW BACKUP ... WITH check_files
: Check that all files belonging to a backup are in the expected location in storage. See Validate backup files for an example.RESTORE ... WITH schema_only
: Restore the schema from the backup to verify that it is valid without restoring any user table data. See Validate a backup is restorable for an example.RESTORE ... WITH schema_only, verify_backup_table_data
: Run aschema_only
restore and have the restore read all user table data from external storage, verify checksums, and discard the user table data before writing it to disk. To useverify_backup_table_data
, you must includeschema_only
in the statement. See Validate backup table data is restorable for an example.
The options that give the most validation coverage will increase the runtime of the check. That is, verify_backup_table_data
will take a longer time to validate a backup compared to check_files
or schema_only
alone. Despite that, each of these validation options provide a quicker way to validate a backup over running a "regular" restore.
Supported products
The feature described on this page is available in CockroachDB Dedicated, CockroachDB Serverless, and CockroachDB Self-Hosted clusters when you are running customer-owned backups. For a full list of features, see Backup and restore product support.
Validate backup files
Using SHOW BACKUP
with the check_files
option, you can check that all SST and metadata files that belong to a backup are present in the storage location.
Take a backup that we'll use for each of the examples on this page:
BACKUP DATABASE movr INTO "s3://bucket?AWS_ACCESS_KEY_ID={Access Key ID}&AWS_SECRET_ACCESS_KEY={Secret Access Key}" AS OF SYSTEM TIME "-10s";
To find a specific backup to validate in the storage location, show the stored backups in the storage location:
SHOW BACKUPS IN "s3://bucket?AWS_ACCESS_KEY_ID={Access Key ID}&AWS_SECRET_ACCESS_KEY={Secret Access Key}";
path ------------------------ 2022/09/19-134123.64 2022/09/19-134559.68 (2 rows)
Use
SHOW BACKUP ... check_files
with a backup for validation:SHOW BACKUP "2022/09/19-134123.64" IN "s3://bucket?AWS_ACCESS_KEY_ID={Access Key ID}&AWS_SECRET_ACCESS_KEY={Secret Access Key}" WITH check_files;
This will return the following output after validating that the backup files are correct and present:
database_name | parent_schema_name | object_name | object_type | backup_type | start_time | end_time | size_bytes | rows | is_full_cluster | file_bytes ----------------+--------------------+----------------------------+-------------+-------------+------------+----------------------------+------------+-------+-----------------+------------- NULL | NULL | movr | database | full | NULL | 2022-09-19 13:41:23.645189 | NULL | NULL | f | NULL movr | NULL | public | schema | full | NULL | 2022-09-19 13:41:23.645189 | NULL | NULL | f | NULL movr | public | users | table | full | NULL | 2022-09-19 13:41:23.645189 | 31155 | 340 | f | 16598 movr | public | vehicles | table | full | NULL | 2022-09-19 13:41:23.645189 | 22282 | 113 | f | 12459 movr | public | rides | table | full | NULL | 2022-09-19 13:41:23.645189 | 261950 | 902 | f | 135831 movr | public | vehicle_location_histories | table | full | NULL | 2022-09-19 13:41:23.645189 | 742557 | 10850 | f | 318583 movr | public | promo_codes | table | full | NULL | 2022-09-19 13:41:23.645189 | 228320 | 1034 | f | 118376 movr | public | user_promo_codes | table | full | NULL | 2022-09-19 13:41:23.645189 | 9320 | 111 | f | 4832
The output will return
file_bytes
along with the columns you receive fromSHOW BACKUP
withoutcheck_files
. Thefile_bytes
column indicates the estimated bytes in external storage for a particular table object. For more detail on the output columns, see theSHOW BACKUP
Response table.If
SHOW BACKUP ... check_files
cannot read from a file, it will return an error message similar to the following:ERROR: The following files are missing from the backup: s3:/bucket-name/2022/09/19-134123.64/data/797981063156727810.sst
SHOW BACKUP ... check_files
will return up to ten file paths for incorrect or missing files.
Validate a backup is restorable
To validate that a backup is restorable, you can run RESTORE
with the schema_only
option, which will complete a restore without restoring any user table data. This process is significantly faster than running a regular restore for the purposes of validation.
A schema_only
restore produces close to complete validation coverage on backups. However, this restore type does not read or write from any of the SST files, which store the backed-up user table data. You can use SHOW BACKUP ... WITH check_files
in addition to a schema_only
restore to check that these SST files are present for a restore operation. Or, you can use schema_only
in combination with verify_backup_table_data
. See Validate backup table data is restorable.
Run RESTORE
with the schema_only
option, specifying either LATEST
or the specific backup you would like to restore:
RESTORE DATABASE movr FROM "2022/09/19-134123.64" IN "s3://bucket?AWS_ACCESS_KEY_ID={Access Key ID}&AWS_SECRET_ACCESS_KEY={Secret Access Key}" WITH schema_only;
job_id | status | fraction_completed | rows | index_entries | bytes
---------------------+-----------+--------------------+------+---------------+--------
797982663104856065 | succeeded | 1 | 0 | 0 | 0
(1 row)
You can also use the new_db_name
option to restore a database to a different name. For example, new_db_name = test_movr
.
To verify that the table schemas are in place, check the tables:
SHOW TABLES FROM movr;
schema_name | table_name | type | owner | estimated_row_count | locality
--------------+----------------------------+-------+-------+---------------------+-----------
public | promo_codes | table | root | 0 | NULL
public | rides | table | root | 0 | NULL
public | user_promo_codes | table | root | 0 | NULL
public | users | table | root | 0 | NULL
public | vehicle_location_histories | table | root | 0 | NULL
public | vehicles | table | root | 0 | NULL
(6 rows)
You'll find the tables in place with no user table data.
SELECT * FROM rides;
id | city | vehicle_city | rider_id | vehicle_id | start_address | end_address | start_time | end_time | revenue
-----+------+--------------+----------+------------+---------------+-------------+------------+----------+----------
(0 rows)
Once you have successfully validated the restore, you can revert the cluster to its pre-restore state by dropping the table or database:
DROP DATABASE movr CASCADE;
Cluster-level backup validation
It is important to note that full cluster restores with schema_only
will write the system tables to disk. This provides important coverage for validation at the cluster level. Writing the system tables should not have a notable impact on the runtime of this process.
Once you have successfully validated a cluster-level restore, the restored system data cannot be reverted. However, you can drop the databases and tables as per the previous command.
Validate backup table data is restorable
A restore with the verify_backup_table_data
option will perform a schema_only
restore and the following:
- Read all user data from the storage location.
- Checksum the user data to ensure it is not corrupt.
- Discard the user data before it is written to disk.
In comparison to schema_only
restores, verify_backup_table_data
also validates the backup by reading and checksumming the user data.
It is necessary to include schema_only
when you run a restore with verify_backup_table_data
:
RESTORE DATABASE movr FROM LATEST IN "s3://bucket?AWS_ACCESS_KEY_ID={Access Key ID}&AWS_SECRET_ACCESS_KEY={Secret Access Key}" WITH schema_only, verify_backup_table_data;
Similarly, to just schema_only
restores, you'll find the table schemas restored. If a file is not present or unreadable in the backup, you'll receive an error.