The instructions on this page require updates. We currently recommend using AWS Database Migration Service (DMS) to migrate data from PostgreSQL to CockroachDB. You can also migrate from CSV.
This page has instructions for migrating data from PostgreSQL to CockroachDB using IMPORT
's support for reading pg_dump
files.
The examples pull real data from Amazon S3. They use the employees data set that is also used in the MySQL docs. The data was imported to PostgreSQL using pgloader, and then modified for use here as explained below.
For best practices for optimizing import performance in CockroachDB, see Import Performance Best Practices.
Pre-migration considerations
Primary keys
PostgreSQL and CockroachDB have different best practices surrounding primary keys on tables. While it's common to see sequences and auto-incrementing primary keys in PostgreSQL, these features can cause hotspots within your cluster when reading or writing large amounts of data. Cockroach Labs recommends that you use multi-column primary keys or the UUID
datatype for primary key columns.
If you are working with a table that must be indexed on sequential keys, consider using hash-sharded indexes. We recommend doing thorough performance testing with and without hash-sharded indexes to see which works best for your application.
For further information, see Unique ID best practices and 3 Basic Rules for Choosing Indexes.
Step 1. Dump the PostgreSQL database
There are several ways to dump data from PostgreSQL to be imported into CockroachDB:
The import will fail if the dump file contains functions or type definitions. In addition to calling pg_dump
as shown below, you may need to edit the dump file to remove functions and data types.
Also, note that CockroachDB's IMPORT
does not support automatically importing data from PostgreSQL's non-public schemas. As a workaround, you can edit the dump file to change the table and schema names in the CREATE TABLE
statements.
Dump the entire database
Most users will want to import their entire PostgreSQL database all at once, as shown below in Import a full database dump.
To dump the entire database, run the pg_dump
command shown below.
$ pg_dump employees > /tmp/employees-full.sql
For this data set, the PostgreSQL dump file required the following edits, which have already been performed on the files used in the examples below:
The type of the
employees.gender
column in theCREATE TABLE
statement had to be changed fromemployees.employees_gender
toSTRING
since PostgreSQL represented the employee's gender using aCREATE TYPE
statement that is not supported by CockroachDB.A
CREATE TYPE employee ...
statement needed to be removed.
If you only want to import one table from a database dump, see Import a table from a full database dump below.
Dump one table at a time
To dump the employees
table from a PostgreSQL database also named employees
, run the pg_dump
command shown below. You can import this table using the instructions in Import a table from a table dump below.
$ pg_dump -t employees employees > /tmp/employees.sql
For this data set, the PostgreSQL dump file required the following edits, which have already been performed on the files used in the examples below.
- The type of the
employees.gender
column in theCREATE TABLE
statement had to be changed fromemployees.employees_gender
toSTRING
since PostgreSQL represented the employee's gender using aCREATE TYPE
statement that is not supported by CockroachDB.
Step 2. Host the files where the cluster can access them
Each node in the CockroachDB cluster needs to have access to the files being imported. There are several ways for the cluster to access the data; for more information on the types of storage IMPORT
can pull from, see the following:
We strongly recommend using cloud storage such as Amazon S3 or Google Cloud to host the data files you want to import.
Step 3. Import the PostgreSQL dump file
You can choose from several variants of the IMPORT
statement, depending on whether you want to import a full database or a single table:
- Import a full database dump
- Import a table from a full database dump
- Import a table from a table dump
Note that all of the IMPORT
statements in this section pull real data from Amazon S3 and will kick off background import jobs that you can monitor with SHOW JOBS
.
Import a full database dump
This example assumes you dumped the entire database.
The IMPORT
statement below reads the data and DDL statements (including existing foreign key relationships) from the full database dump file.
> IMPORT PGDUMP 'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/pg_dump/employees-full.sql.gz' WITH ignore_unsupported_statements;
job_id | status | fraction_completed | rows | index_entries | system_records | bytes
--------------------+-----------+--------------------+--------+---------------+----------------+----------
381845110403104769 | succeeded | 1 | 300024 | 0 | 0 | 11534293
(1 row)
Import a table from a full database dump
This example assumes you dumped the entire database.
IMPORT
can import one table's data from a full database dump. It reads the data and applies any CREATE TABLE
statements from the file.
> CREATE DATABASE IF NOT EXISTS employees;
> USE employees;
> IMPORT TABLE employees FROM PGDUMP 'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/pg_dump/employees-full.sql.gz' WITH ignore_unsupported_statements;
job_id | status | fraction_completed | rows | index_entries | system_records | bytes
--------------------+-----------+--------------------+--------+---------------+----------------+----------
383839294913871873 | succeeded | 1 | 300024 | 0 | 0 | 11534293
(1 row)
Import a table from a table dump
The simplest way to import a table dump is to run IMPORT
. It reads the table data and any CREATE TABLE
statements from the file:
> CREATE DATABASE IF NOT EXISTS employees;
> USE employees;
> IMPORT PGDUMP 'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/pg_dump/employees.sql.gz' WITH ignore_unsupported_statements;
job_id | status | fraction_completed | rows | index_entries | system_records | bytes
--------------------+-----------+--------------------+--------+---------------+----------------+----------
383855569817436161 | succeeded | 1 | 300024 | 0 | 0 | 11534293
(1 row)
Configuration Options
The following options are available to IMPORT PGDUMP
:
Max row size
The max_row_size
option is used to override limits on line size. Default: 0.5MB
. This setting may need to be tweaked if your PostgreSQL dump file has extremely long lines, for example as part of a COPY
statement.
Example usage:
> IMPORT TABLE employees FROM PGDUMP ('s3://your-external-storage/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456') WITH max_row_size = '5MB';
Row limit
The row_limit
option determines the number of rows to import. If you are importing one table, setting row_limit = 'n'
will import the first n rows of the table. If you are importing an entire database, this option will import the first n rows from each table in the dump file. It is useful for finding errors quickly before executing a more time- and resource-consuming import.
Example usage:
> IMPORT TABLE employees FROM PGDUMP 's3://your-external-storage/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456' WITH row_limit = '10';
Ignore unsupported statements
The ignore_unsupported_statements
option specifies whether the import will ignore unsupported statements in the PGDUMP
file. Default: false
.
If ignore_unsupported_statements
is omitted, the import will fail if it encounters a statement that is unsupported by CockroachDB. Use ignore_unsupported_statements
with log_ignored_statements
to log unsupported statements.
Example usage:
> IMPORT TABLE employees FROM PGDUMP's3://your-external-storage/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456' WITH ignore_unsupported_statements;
Log unsupported statements
The log_ignored_statements
option is used with the ignore_unsupported_statements
option to log unsupported statements in the PGDUMP
file to specified a destination file.
Example usage:
> IMPORT TABLE employees FROM PGDUMP 's3://your-external-storage/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456' WITH ignore_unsupported_statements, log_ignored_statements='userfile://defaultdb.public.userfiles_root/unsupported-statements.log';
Skip foreign keys
By default, IMPORT PGDUMP
supports foreign keys. Default: false. Add the skip_foreign_keys
option to speed up data import by ignoring foreign key constraints in the dump file's DDL. It will also enable you to import individual tables that would otherwise fail due to dependencies on other tables.
The most common dependency issues are caused by unsatisfied foreign key relationships. You can avoid these issues by adding the skip_foreign_keys
option to your IMPORT
statement as needed. For more information, see the list of import options.
For example, if you get the error message pq: there is no unique constraint matching given keys for referenced table tablename
, use IMPORT ... WITH skip_foreign_keys
.
Example usage:
> IMPORT TABLE employees FROM PGDUMP ('s3://your-external-storage/employees.sql?AWS_ACCESS_KEY_ID=123&AWS_SECRET_ACCESS_KEY=456') WITH skip_foreign_keys;
Foreign key constraints can be added by using ALTER TABLE ... ADD CONSTRAINT
commands after importing the data.