Secondary Indexes

On this page Carat arrow pointing down
Warning:
Cockroach Labs will stop providing Assistance Support for v21.2 on May 16, 2023. Prior to that date, upgrade to a more recent version to continue receiving support. For more details, see the Release Support Policy.

Indexes are logical objects in a cluster that help CockroachDB queries find data more efficiently. When you create an index, CockroachDB creates a copy of the columns selected for the index, and then sorts the rows of data by indexed column values, without sorting the values in the table itself.

CockroachDB automatically creates an index on the table's primary key columns. This index is called the primary index. The primary index helps CockroachDB more efficiently scan rows, as sorted by the table's primary key columns, but it does not help find values as identified by any other columns.

Secondary indexes (i.e., all indexes that are not the primary index) improve the performance of queries that identify rows with columns that are not in a table's primary key. CockroachDB automatically creates secondary indexes for columns with a UNIQUE constraint.

This page provides best-practice guidance on creating secondary indexes, with a simple example based on Cockroach Labs' fictional vehicle-sharing company, MovR.

Before you begin

Before reading this page, do the following:

Create a secondary index

To add a secondary index to a table do one of the following:

  • Add an INDEX clause to the end of a CREATE TABLE statement.

    INDEX clauses generally take the following form:

    INDEX {index_name} ({column_names});
    
    Parameter Description
    {index_name} The name of the index.
    {column_names} The name of the column to index, or a comma-separated list of names of the columns to index.
  • Use a CREATE INDEX statement.

    CREATE INDEX statements generally take the following form:

    CREATE INDEX {index_name} ON {table_name} ({column_names});
    
    Parameter Description
    {index_name} The name of the index.
    {table_name} The name of the table.
    {column_names} The name of the column to index, or a comma-separated list of names of the columns to index.

For an example, see Example.

Note:
  • If you do not specify a name for an index, CockroachDB will generate a name.
  • The notation for referring to an index is {table_name}@{index_name}.

Best practices

Here are some best practices for creating and using indexes:

  • Index all columns that you plan to use for sorting or filtering data.

    An index that stores all the columns needed by a query is also known as a covering index for that query. When a query has a covering index, CockroachDB can use that index directly instead of doing a join with the primary key, which is likely to be slower.

    Columns listed in a filtering WHERE clause with the equality operators (= or IN) should come first in the index, before those referenced with inequality operators (<, >).

    Columns with a higher cardinality (higher number of distinct values) should be placed in the index before columns with a lower cardinality. If the cardinality of the columns you wish to add to the index are similar, test multiple column arrangements in a non-production environment to determine the most performant arrangement.

  • Avoid indexing on sequential values. Writes to indexes with sequential keys can result in range hot spots that negatively affect performance. Instead, use randomly generated unique IDs or multi-column keys.

    If you are working with a table that must be indexed on sequential keys, use hash-sharded indexes. For details about the mechanics and performance improvements of hash-sharded indexes in CockroachDB, see our Hash Sharded Indexes Unlock Linear Scaling for Sequential Workloads blog post.

  • Avoid creating secondary indexes that you do not need. Secondary indexes can slow down write performance and take up node memory.

    Queries can benefit from an index even if they only filter a prefix of its columns. For example, if you create an index of columns (A, B, C), queries filtering (A) or (A, B) can still use the index. However, queries that do not filter (A) will not benefit from the index. This feature also lets you avoid using single-column indexes. Instead, use the column as the first column in a multiple-column index, which is useful to more queries.

    ALTER PRIMARY KEY creates a secondary index from the old primary key. If you need to change a primary key, and you do not plan to filter queries on the old primary key column(s), do not use ALTER PRIMARY KEY. Instead, use DROP CONSTRAINT ... PRIMARY KEY/ADD CONSTRAINT ... PRIMARY KEY, which does not create a secondary index.

  • Limit creation and deletion of secondary indexes to off-peak hours. Performance impacts are likely if done during peak business hours.

  • Drop unused indexes whenever possible.

    To understand usage statistics for an index, query the crdb_internal.index_usage_statistics table.

    To find which indexes are being used in a database, query the crdb_internal.index_usage_statistics table, which will show the total reads and time the primary and secondary indexes were last read.

    icon/buttons/copy
    SELECT * FROM crdb_internal.index_usage_statistics;
    

    To get more detailed information about the table and index names, run a join query against crdb_internal.index_usage_statistics and crdb_internal.table_indexes.

    icon/buttons/copy
    SELECT ti.descriptor_name as table_name, ti.index_name, total_reads, last_read
    FROM crdb_internal.index_usage_statistics AS us
    JOIN crdb_internal.table_indexes ti
    ON us.index_id = ti.index_id
    AND us.table_id = ti.descriptor_id
    ORDER BY total_reads ASC;
    
              table_name         |                  index_name                   | total_reads |           last_read
    -----------------------------+-----------------------------------------------+-------------+--------------------------------
    vehicle_location_histories | primary                                       |           1 | 2021-09-28 22:59:03.324398+00
    rides                      | rides_auto_index_fk_city_ref_users            |           1 | 2021-09-28 22:59:01.500962+00
    rides                      | rides_auto_index_fk_vehicle_city_ref_vehicles |           1 | 2021-09-28 22:59:02.470526+00
    user_promo_codes           | primary                                       |         456 | 2021-09-29 00:01:17.063418+00
    promo_codes                | primary                                       |         910 | 2021-09-29 00:01:17.062319+00
    vehicles                   | primary                                       |        3591 | 2021-09-29 00:01:18.261658+00
    users                      | primary                                       |        5401 | 2021-09-29 00:01:18.260198+00
    rides                      | primary                                       |       45658 | 2021-09-29 00:01:18.258208+00
    vehicles                   | vehicles_auto_index_fk_city_ref_users         |       87119 | 2021-09-29 00:01:19.071476+00
    (9 rows)
    

  • Use a STORING clause to store columns of data that you want returned by common queries, but that you do not plan to use in query filters. Note that the synonym COVERING is also supported.

    The STORING clause specifies columns that are not part of the index key but should be stored in the index, without being sorted. If a column is specified in a query, and the column is neither indexed nor stored in an index, CockroachDB will perform a full scan of the table, which can result in poor performance. For an example, see below.

  • Review the specialized indexes that CockroachDB supports, and decide if you need to create a specialized index instead of a standard index.

  • Do not create indexes as the root user. Instead, create indexes as a different user, with fewer privileges, following authorization best practices. This will likely be the same user that created the table to which the index belongs.

  • As a general best practice, we discourage the use of client libraries to execute database schema changes. Instead, use a database schema migration tool, or the CockroachDB SQL client.

  • Review the limitations of online schema changes in CockroachDB. Note that CockroachDB has limited support for schema changes within the same explicit transaction.

    We recommend doing schema changes outside explicit transactions, where possible. When a database schema management tool manages transactions on your behalf, we recommend only including one schema change operation per transaction.

Warning:

We strongly recommend adding size limits to all indexed columns, which includes columns in primary keys.

Values exceeding 1 MiB can lead to storage layer write amplification and cause significant performance degradation or even crashes due to OOMs (out of memory errors).

To add a size limit using CREATE TABLE:

icon/buttons/copy
CREATE TABLE name (first STRING(100), last STRING(100));

To add a size limit using ALTER TABLE ... ALTER COLUMN:

icon/buttons/copy
SET enable_experimental_alter_column_type_general = true;
icon/buttons/copy
ALTER TABLE name ALTER first TYPE STRING(99);

Example

Suppose you want the MovR application to display all of the bikes available to the users of the MovR platform.

Recall that the vehicles table that you created in Create a Table stores rows of data for each vehicle registered with MovR. Your application will need to read any data about vehicles into the application's persistence layer from this table. To display available bikes, the reads will need to filter on the available and type columns.

Open max_init.sql, and, under the CREATE TABLE statement for the vehicles table, add a CREATE INDEX statement for an index on the type and available columns of the vehicles table:

icon/buttons/copy
CREATE INDEX type_available_idx ON movr.vehicles (type, available));

This statement creates a secondary index named type_available_idx, on the vehicles table.

The MovR app might also need to display the vehicle's location and ID, but the app will not be filtering or sorting on those values. If any of the columns referenced in or returned by a query are not in a primary or secondary index key, CockroachDB will need to perform a full scan of the table to find the value. Full table scans can be costly, and should be avoided whenever possible.

To help avoid unnecessary full table scans, add a STORING clause to the index:

icon/buttons/copy
CREATE INDEX type_available_idx ON movr.vehicles (type, available) STORING (last_location);

The index will now store the values in last_location, which will improve the performance of reads from the vehicles table that return type, available, id, and last_location values and do not filter or sort on the last_location column.

The max_init.sql file should now look similar to the following:

icon/buttons/copy
CREATE TABLE movr.max_schema.users (
    first_name STRING,
    last_name STRING,
    email STRING UNIQUE,
    CONSTRAINT "primary" PRIMARY KEY (first_name, last_name)
  );

CREATE TYPE movr.max_schema.vtype AS ENUM ('bike', 'scooter', 'skateboard');

CREATE TABLE movr.max_schema.vehicles (
      id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
      type movr.max_schema.vtype,
      creation_time TIMESTAMPTZ DEFAULT now(),
      available BOOL,
      last_location STRING
  );

CREATE INDEX type_available_idx ON movr.max_schema.vehicles (type, available) STORING (last_location);

CREATE TABLE movr.max_schema.rides (
      id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
      vehicle_id UUID REFERENCES movr.max_schema.vehicles(id),
      start_address STRING,
      end_address STRING,
      start_time TIMESTAMPTZ DEFAULT now(),
      end_time TIMESTAMPTZ
  );

If you executed this file when following the Create a Table example, then all of these objects already exist. To clear the database and re-initialize the schemas, first execute the statements in the dbinit.sql file as the root user:

icon/buttons/copy
$ cockroach sql \
--certs-dir={certs-directory} \
--user=root \
-f dbinit.sql

Then, execute the statements in the max_init.sql and abbey_init.sql files:

icon/buttons/copy
$ cockroach sql \
--certs-dir={certs-directory} \
--user=max \
--database=movr \
-f max_init.sql
icon/buttons/copy
$ cockroach sql \
--certs-dir={certs-directory} \
--user=abbey \
--database=movr \
-f abbey_init.sql

After the statements have been executed, you can see the new index in the CockroachDB SQL shell.

Open the SQL shell to your cluster:

icon/buttons/copy
$ cockroach sql \
--certs-dir={certs-directory} \
--user=max \
--database=movr

To view the indexes in the vehicles table, issue a SHOW INDEXES statement:

icon/buttons/copy
> SHOW INDEXES FROM movr.max_schema.vehicles;
  table_name | index_name | non_unique | seq_in_index |  column_name  | direction | storing | implicit
-------------+------------+------------+--------------+---------------+-----------+---------+-----------
  vehicles   | primary            |   false    |            1 | id            | ASC       |  false  |  false
  vehicles   | primary            |   false    |            2 | type          | N/A       |  true   |  false
  vehicles   | primary            |   false    |            3 | creation_time | N/A       |  true   |  false
  vehicles   | primary            |   false    |            4 | available     | N/A       |  true   |  false
  vehicles   | primary            |   false    |            5 | last_location | N/A       |  true   |  false
  vehicles   | type_available_idx |    true    |            1 | type          | ASC       |  false  |  false
  vehicles   | type_available_idx |    true    |            2 | available     | ASC       |  false  |  false
  vehicles   | type_available_idx |    true    |            3 | last_location | N/A       |  true   |  false
  vehicles   | type_available_idx |    true    |            4 | id            | ASC       |  false  |   true
(9 rows)

The output from this SHOW statement displays the names and columns of the two indexes on the table (i.e., primary and type_available_idx).

Note that the last_location column's storing value is true in the type_available_idx index, and is therefore not sorted. Also note that the primary key column id is implicit in the index, meaning the id column is implicitly indexed in type_available_idx.

To see an index definition, use a SHOW CREATE statement on the table that contains the index:

icon/buttons/copy
> SHOW CREATE TABLE movr.max_schema.vehicles;
         table_name        |                                create_statement
---------------------------+----------------------------------------------------------------------------------
  movr.max_schema.vehicles | CREATE TABLE max_schema.vehicles (
                           |     id UUID NOT NULL DEFAULT gen_random_uuid(),
                           |     type max_schema.vtype NULL,
                           |     creation_time TIMESTAMPTZ NULL DEFAULT now():::TIMESTAMPTZ,
                           |     available BOOL NULL,
                           |     last_location STRING NULL,
                           |     CONSTRAINT "primary" PRIMARY KEY (id ASC),
                           |     INDEX type_available_idx (available ASC, type ASC) STORING (last_location),
                           |     FAMILY "primary" (id, type, creation_time, available, last_location)
                           | )
(1 row)

After creating a database, a user-defined schema, some tables, and secondary indexes, the database schema should be ready for your application to write and read data.

It's likely that you will need to update your database schema at some point. For an overview on how to update a database schema, see Change and Remove Objects in a Database Schema. We also recommend reading about how online schema changes work in CockroachDB.

What's next?

You might also be interested in the following pages:


Yes No
On this page

Yes No