Client side retry handling example

On this page Carat arrow pointing down

When a transaction is unable to complete due to contention with another concurrent or recent transaction attempting to write to the same data, CockroachDB will automatically attempt to retry the failed transaction without involving the client (i.e., silently). If the automatic retry is not possible or fails, a transaction retry erroris emitted to the client.

This page presents an example of an application's transaction retry logic, as well as a manner by which that logic can be tested and verified against your application's needs.

Client-side retry handling example

The Python-like pseudocode below shows how to implement an application-level retry loop; it does not require your driver or ORM to implement advanced retry handling logic, so it can be used from any programming language or environment. In particular, your retry loop must:

  • Raise an error if the max_retries limit is reached
  • Retry on 40001 error codes
  • COMMIT at the end of the try block
  • Implement exponential backoff logic as shown below for best performance
while true:
    n++
    if n == max_retries:
        throw Error("did not succeed within N retries")
    try:
        # add logic here to run all your statements
        conn.exec('COMMIT')
        break
    catch error:
        if error.code != "40001":
            throw error
        else:
            # This is a retry error, so we roll back the current transaction
            # and sleep for a bit before retrying. The sleep time increases
            # for each failed transaction.  Adapted from
            # https://colintemple.com/2017/03/java-exponential-backoff/
            conn.exec('ROLLBACK');
            sleep_ms = int(((2**n) * 100) + rand( 100 - 1 ) + 1)
            sleep(sleep_ms) # Assumes your sleep() takes milliseconds

Testing transaction retry logic

To test your application's transaction retry logic, use the inject_retry_errors_enabled session variable. When inject_retry_errors_enabled is set to true, any statement (with the exception of SET statements) executed in the session inside of an explicit transaction will return a transaction retry error with the message restart transaction: TransactionRetryWithProtoRefreshError: injected by `inject_retry_errors_enabled` session variable.

If the client retries the transaction using the special cockroach_restart SAVEPOINT name, after the 3rd retry, the transaction will proceed as normal. Otherwise, the errors will continue until the client issues a SET inject_retry_errors_enabled=false statement.

For example, suppose you've written a wrapper function with some retry logic named run_transaction that you want to use to execute statements across a psycopg2 connection. In this example, run_transaction takes a SQL-executing function op(conn) and attempts to run the function, retrying on serialization failures (exposed in pscyopg2 as the SerializationFailure exception class) with exponential backoff, until reaching a maximum number of tries:

def run_transaction(conn, op, max_retries=3):    
    """
    Execute the operation *op(conn)* retrying serialization failures.

    If the database returns an error asking to retry the transaction, retry it
    *max_retries* times before giving up (and propagate it).
    """
    # leaving this block the transaction will commit or rollback
    # (if leaving with an exception)
    with conn:
        for retry in range(1, max_retries + 1):
            try:
                op(conn)
                # If we reach this point, we were able to commit, so we break
                # from the retry loop.
                return

            except SerializationFailure as e:
                # This is a retry error, so we roll back the current
                # transaction and sleep for a bit before retrying. The
                # sleep time increases for each failed transaction.
                logging.debug("got error: %s", e)
                conn.rollback()
                sleep_ms = (2 ** retry) * 0.1 * (random.random() + 0.5)
                logging.debug("Sleeping %s seconds", sleep_ms)
                time.sleep(sleep_ms)

            except psycopg2.Error as e:
                logging.debug("got error: %s", e)
                raise e

        raise ValueError(f"Transaction did not succeed after {max_retries} retries")

You can add a quick test to this function using the inject_retry_errors_enabled session variable.

def run_transaction(conn, op=None, max_retries=3):
    """
    Execute the operation *op(conn)* retrying serialization failure.
    If no op is specified, the function runs a test, using the
    inject_retry_errors_enabled session variable to inject errors.

    If the database returns an error asking to retry the transaction, retry it
    *max_retries* times before giving up (and propagate it).
    """
    # leaving this block the transaction will commit or rollback
    # (if leaving with an exception)
    with conn:
        for retry in range(1, max_retries + 1):
            try:
                if not op:
                    with conn.cursor() as cur:
                        if retry == 1:
                            cur.execute("SET inject_retry_errors_enabled = 'true'")
                        if retry == max_retries:
                            cur.execute("SET inject_retry_errors_enabled = 'false'")
                        cur.execute("SELECT now()")
                        logging.debug("status message: %s", cur.statusmessage)
                else:
                    op(conn)
                # If we reach this point, we were able to commit, so we break
                # from the retry loop.
                return

            except SerializationFailure as e:
                # This is a retry error, so we roll back the current
                # transaction and sleep for a bit before retrying. The
                # sleep time increases for each failed transaction.
                logging.debug("got error: %s", e)
                conn.rollback()
                sleep_ms = (2 ** retry) * 0.1 * (random.random() + 0.5)
                logging.debug("Sleeping %s seconds", sleep_ms)
                time.sleep(sleep_ms)

            except psycopg2.Error as e:
                logging.debug("got error: %s", e)
                raise e

        raise ValueError(f"Transaction did not succeed after {max_retries} retries")

Calling run_transaction without an op input sets inject_retry_errors_enabled as true until the final retry attempt, before which the inject_retry_errors_enabled is set back to false. For all attempts except the last one, CockroachDB will inject a retryable serialization error for the client to handle. If the client cannot handle the error properly, the retry logic isn't working properly.

See also


Yes No
On this page

Yes No