When you create an Enterprise changefeed, you can include the initial_scan = 'only'
option to specify that the changefeed should only complete a table scan. The changefeed emits messages for the table scan and then the job completes with a succeeded
status. As a result, you can create a changefeed with initial_scan = 'only'
to export data out of your database.
New in v23.1: You can also schedule a changefeed to use a changefeed initial scan for exporting data on a regular cadence.
The benefits of using changefeeds for this use case instead of export, include:
- Changefeeds are jobs, which can be paused, resumed, cancelled, scheduled, and altered.
- There is observability into a changefeed job using
SHOW CHANGEFEED JOBS
and the Changefeeds Dashboard in the DB Console. - Changefeed jobs have built-in checkpointing and retries.
- Changefeed sinks provide additional endpoints for your data.
- You can use the
format=csv
option withinitial_scan= 'only'
to emit messages in CSV format.
Changefeeds emit the same CSV format as EXPORT
. In v22.1, changefeeds emitted CSV data that wrapped some values in single quotes, which were not wrapped when exporting data with the EXPORT
statement.
Examples
Export data with a changefeed
To create a changefeed that will only complete an initial scan of a table(s), run the following:
CREATE CHANGEFEED FOR TABLE movr.users INTO '{scheme}://{host}:{port}?{query_parameters}' WITH initial_scan = 'only', format=csv;
Or, use CDC queries to filter the data that your changefeed emits:
CREATE CHANGEFEED INTO '{scheme}://{host}:{port}?{query_parameters}'
WITH initial_scan = 'only', format=csv AS SELECT name, city FROM movr.users;
The job will return a job ID once it has started. You can use SHOW CHANGEFEED JOBS
to check on the status:
SHOW CHANGEFEED JOB {job ID};
When the scan has completed you will find the output shows succeeded
in the status
field.
Create a scheduled changefeed to export filtered data
This example creates a nightly export of some filtered table data with a scheduled changefeed that will run just after midnight every night. The changefeed uses CDC queries to query the table and filter the data it will send to the sink:
CREATE SCHEDULE sf_skateboard FOR CHANGEFEED INTO 'external://cloud-sink' WITH format=csv
AS SELECT current_location AS sf_address, id, type, status FROM vehicles
WHERE city = 'san francisco' AND type = 'skateboard'
RECURRING '1 0 * * *' WITH SCHEDULE OPTIONS on_execution_failure=retry, on_previous_running=start;
The schedule options control the schedule's behavior:
- If it runs into a failure,
on_execution_failure=retry
will ensure that the schedule retries the changefeed immediately. - If the previous scheduled changefeed is still running,
on_previous_running=start
will start a new changefeed at the defined cadence.