node-berkeleydb/docs/gsg_txn/CXX/architectrecovery.html


								<?xml version="1.0" encoding="UTF-8" standalone="no"?>

								<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

								<html xmlns="http://www.w3.org/1999/xhtml">

								  <head>

								    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

								    <title>Designing Your Application for Recovery</title>

								    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />

								    <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />

								    <link rel="start" href="index.html" title="Getting Started with Berkeley DB Transaction Processing" />

								    <link rel="up" href="filemanagement.html" title="Chapter 5. Managing DB Files" />

								    <link rel="prev" href="recovery.html" title="Recovery Procedures" />

								    <link rel="next" href="hotfailover.html" title="Using Hot Failovers" />

								  </head>

								  <body>

								    <div xmlns="" class="navheader">

								      <div class="libver">

								        <p>Library Version 12.1.6.2</p>

								      </div>

								      <table width="100%" summary="Navigation header">

								        <tr>

								          <th colspan="3" align="center">Designing Your Application for Recovery</th>

								        </tr>

								        <tr>

								          <td width="20%" align="left"><a accesskey="p" href="recovery.html">Prev</a> </td>

								          <th width="60%" align="center">Chapter 5. Managing DB Files</th>

								          <td width="20%" align="right"> <a accesskey="n" href="hotfailover.html">Next</a></td>

								        </tr>

								      </table>

								      <hr />

								    </div>

								    <div class="sect1" lang="en" xml:lang="en">

								      <div class="titlepage">

								        <div>

								          <div>

								            <h2 class="title" style="clear: both"><a id="architectrecovery"></a>Designing Your Application for Recovery</h2>

								          </div>

								        </div>

								      </div>

								      <div class="toc">

								        <dl>

								          <dt>

								            <span class="sect2">

								              <a href="architectrecovery.html#multithreadrecovery">Recovery for Multi-Threaded Applications</a>

								            </span>

								          </dt>

								          <dt>

								            <span class="sect2">

								              <a href="architectrecovery.html#multiprocessrecovery">Recovery in Multi-Process Applications</a>

								            </span>

								          </dt>

								        </dl>

								      </div>

								      <p>

								            When building your DB application, you should consider how you will run recovery. If you are building a

								            single threaded, single process application, it is fairly simple to run recovery when your application first

								            opens its environment. In this case, you need only decide if you want to run recovery every time you open

								            your application (recommended) or only some of the time, presumably triggered by a start up option

								            controlled by your application's user.

								        </p>

								      <p>

								            However, for multi-threaded and multi-process applications, you need to carefully consider how you will

								            design your application's startup code so as to run recovery only when it makes sense to do so.

								        </p>

								      <div class="sect2" lang="en" xml:lang="en">

								        <div class="titlepage">

								          <div>

								            <div>

								              <h3 class="title"><a id="multithreadrecovery"></a>Recovery for Multi-Threaded Applications</h3>

								            </div>

								          </div>

								        </div>

								        <p>

								                If your application uses only one environment handle, then handling recovery for a multi-threaded

								                application is no more difficult than for a single threaded application. You simply open the environment

								                in the application's main thread, and then pass that handle to each of the threads that will be

								                performing DB operations. We illustrate this with our final example in this book (see

								                <a class="xref" href="txnexample_c.html" title="Transaction Example">Transaction Example</a>


								                for more information).

								            </p>

								        <p>

								                Alternatively, you can have each worker thread open its own environment handle. However, in this case,

								                designing for recovery is a bit more complicated.

								            </p>

								        <p>

								                Generally, when a thread performing database operations fails

								                or hangs, it is frequently best to simply

								                restart the application and run recovery upon application

								                startup as normal. However, not all applications can afford

								                to restart because a single thread has misbehaved.

								             </p>

								        <p>

								                If you are attempting to continue operations in the face of a misbehaving thread,

								                then at a minimum recovery must be run if a thread performing database operations fails or hangs.

								            </p>

								        <p>

								                Remember that recovery clears the environment of all

								                outstanding locks, including any that might be outstanding

								                from an aborted thread. If these locks are not cleared,

								                other threads performing database operations can back up

								                behind the locks obtained but never cleared by the failed

								                thread. The result will be an application that hangs

								                indefinitely.

								            </p>

								        <p>

								                To run recovery under these circumstances:

								            </p>

								        <div class="orderedlist">

								          <ol type="1">

								            <li>

								              <p>

								                        Suspend or shutdown all other threads performing

								                        database operations.

								                    </p>

								            </li>

								            <li>

								              <p>

								                        Discarding any open environment handles. Note that

								                        attempting to gracefully close these handles may be

								                        asking for trouble; the close can fail if the

								                        environment is already in need of recovery. For

								                        this reason, it is best and easiest to simply discard the handle.

								                    </p>

								            </li>

								            <li>

								              <p>

								                        Open new handles, running recovery as you open

								                        them.

								                        See <a class="xref" href="recovery.html#normalrecovery" title="Normal Recovery">Normal Recovery</a> for more information.

								                    </p>

								            </li>

								            <li>

								              <p>

								                        Restart all your database threads.

								                    </p>

								            </li>

								          </ol>

								        </div>

								        <p>

								                A traditional way to handle this activity is to spawn a watcher thread that is responsible for making

								                sure all is well with your threads, and performing the above actions if not.

								            </p>

								        <p>

								                However, in the case where each worker thread opens and maintains its own environment handle, recovery

								                is complicated for two reasons:

								            </p>

								        <div class="orderedlist">

								          <ol type="1">

								            <li>

								              <p>

								                        For some applications and workloads, it might be

								                        worthwhile to give your database threads the

								                        ability to gracefully finalize any on-going

								                        transactions. If this is the case, your

								                        code must be capable of signaling each thread

								                        to halt DB activities and close its

								                        environment. If you simply run recovery against the

								                        environment, your database threads will

								                        detect this and fail in the midst of performing their

								                        database operations.

								                    </p>

								            </li>

								            <li>

								              <p>

								                        Your code must be capable of ensuring only one

								                        thread runs recovery before allowing all other

								                        threads to open their respective environment

								                        handles. Recovery should be single threaded because when

								                        recovery is run against an environment, it is

								                        deleted and then recreated. This will cause all

								                        other processes and threads to "fail" when they

								                        attempt operations against the newly recovered

								                        environment. If all threads run recovery

								                        when they start up, then it is likely that some

								                        threads will fail because the environment that they

								                        are using has been recovered. This will cause the thread to have to re-execute its own recovery

								                        path. At best, this is inefficient and at worst it could cause your application to fall into an

								                        endless recovery pattern.

								                    </p>

								            </li>

								          </ol>

								        </div>

								      </div>

								      <div class="sect2" lang="en" xml:lang="en">

								        <div class="titlepage">

								          <div>

								            <div>

								              <h3 class="title"><a id="multiprocessrecovery"></a>Recovery in Multi-Process Applications</h3>

								            </div>

								          </div>

								        </div>

								        <p>

								                Frequently, DB applications use multiple processes to interact with the databases. For example, you may

								                have a long-running process, such as some kind of server, and then a series of administrative tools that

								                you use to inspect and administer the underlying databases. Or, in some web-based architectures, different

								                services are run as independent processes that are managed by the server.

								            </p>

								        <p>

								                In any case, recovery for a multi-process environment is complicated for two reasons:

								            </p>

								        <div class="orderedlist">

								          <ol type="1">

								            <li>

								              <p>

								                        In the event that recovery must be run, you might

								                        want to notify processes interacting with the environment

								                        that recovery is about to occur and give them a

								                        chance to gracefully terminate. Whether it is

								                        worthwhile for you to do this is entirely dependent

								                        upon the nature of your application. Some

								                        long-running applications with multiple processes

								                        performing meaningful work might want to do this.

								                        Other applications with processes performing database

								                        operations that are likely to be harmed by error conditions in other

								                        processes will likely find it to be not worth the

								                        effort. For this latter group, the chances of

								                        performing a graceful shutdown may be low anyway.

								                    </p>

								            </li>

								            <li>

								              <p>

								                        Unlike single process scenarios, it can quickly become wasteful for every process interacting

								                        with the databases to run recovery when it starts up. This is partly because recovery

								                        <span class="emphasis"><em>does</em></span> take some amount of time to run, but mostly you want to

								                        avoid a situation where your server must

								                        reopen all its environment handles just because you fire up a command line database

								                        administrative utility that always runs recovery.

								                    </p>

								            </li>

								          </ol>

								        </div>

								        <p>

								                DB offers you two methods by which you can manage recovery for multi-process DB applications.

								                Each has different strengths and weaknesses, and they are described in the next sections.

								            </p>

								        <div class="sect3" lang="en" xml:lang="en">

								          <div class="titlepage">

								            <div>

								              <div>

								                <h4 class="title"><a id="mp_recover_effects"></a>Effects of Multi-Process Recovery</h4>

								              </div>

								            </div>

								          </div>

								          <p>

								                    Before continuing, it is worth noting that the following sections describe recovery processes than

								                    can result in one process running recovery while other processes are currently actively performing

								                    database operations.

								                </p>

								          <p>

								                    When this happens, the current database operation will

								                    abnormally fail, indicating a DB_RUNRECOVERY condition.

								                    This means that your application should immediately abandon any database operations that it may have

								                    on-going, discard any environment handles it has opened, and obtain and open new handles.

								                </p>

								          <p>

								                    The net effect of this is that any writes performed by unresolved transactions will be lost.

								                    For persistent applications (servers, for example), the services it provides will also be

								                    unavailable for the amount of time that it takes to complete a recovery and for all participating

								                    processes to reopen their environment handles.

								                </p>

								        </div>

								        <div class="sect3" lang="en" xml:lang="en">

								          <div class="titlepage">

								            <div>

								              <div>

								                <h4 class="title"><a id="db_register"></a>Process Registration</h4>

								              </div>

								            </div>

								          </div>

								          <p>

								                    One way to handle multi-process recovery is for every process to "register" its environment. In

								                    doing so, the process gains the ability to see if any other applications are using the

								                    environment and, if so, whether they have suffered an abnormal termination. If an abnormal

								                    termination is detected, the process runs recovery; otherwise, it does not.

								                </p>

								          <p>

								                    Note that using process registration also ensures that

								                    recovery is serialized across applications. That is,

								                    only one process at a time has a chance to run

								                    recovery. Generally this means that the first process

								                    to start up will run recovery, and all other processes

								                    will silently not run recovery because it is not

								                    needed.

								                </p>

								          <p>

								                    To cause your application to register its environment, you specify

								                        <span>

								                            the <code class="literal">DB_REGISTER</code> flag when you open your environment.

								                            You may also specify <code class="literal">DB_RECOVER</code>. However, it is an error to specify

								                            <code class="literal">DB_RECOVER_FATAL</code> when using

								                            the <code class="literal">DB_REGISTER</code> flag.

								                        </span>


								                        If during the open, DB determines that recovery must be run, it will automatically run the correct

								                        type of recovery for you, so long as you specify normal recovery

								                        on your environment open. If you do not specify normal recovery, and you register your environment,

								                        then no recovery is run if the registration process identifies a need for it. In this case,

								                        the environment open simply fails by

								                            <span>returning <code class="literal">DB_RUNRECOVERY</code>.</span>


								                </p>

								          <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">

								            <h3 class="title">Note</h3>

								            <p>

								                        If you do not specify normal recovery when you open your first registered environment

								                        in the application, then that application will fail the environment open by

								                            <span>returning <code class="literal">DB_RUNRECOVERY</code>.</span>


								                        This is because the first process to register must create an internal

								                        registration file, and recovery is forced when that file is created. To

								                        avoid an abnormal termination of the environment open, specify recovery on

								                        the environment open for at least the first process starting in your

								                        application.

								                    </p>

								          </div>

								          <p>

								                    In addition, if you specify <code class="literal">DB_ENV_FAILCHK</code>

								                    when you register your environment, then a fail check is performed on

								                    environment open (fail checks are described in the next section). If, during the

								                    fail check process, an abnormal termination is detected for any of the processes

								                    involved in the application, DB releases any read locks held by the dead

								                    process and performs transaction aborts as necessary. This is done in an attempt

								                    to clean up the environment.

								                </p>

								          <p>

								                    In this situation, if a general cleanup of the

								                    environment is not possible and normal recovery is not specified on environment

								                    open, then the open will abort,

								                    <span>returning <code class="literal">DB_RUNRECOVERY</code>.</span>


								                    However, if this situation occurs and recovery was specified, then the appropriate type of recovery

								                    (normal or fatal) is run so as to bring the environment back to a healthy state.

								                </p>

								          <p>

								                    Be aware that there are some limitations/requirements if you want your various processes to

								                    coordinate recovery using registration:

								                </p>

								          <div class="orderedlist">

								            <ol type="1">

								              <li>

								                <p>

								                            There can be only one environment handle per

								                            environment per process. In the case of multi-threaded

								                            processes, the environment handle must be shared across threads.

								                        </p>

								              </li>

								              <li>

								                <p>

								                            All processes sharing the environment must use registration. If registration is

								                            not uniformly used across all participating processes, then you can see inconsistent results

								                            in terms of your application's ability to recognize that recovery must be run.

								                        </p>

								              </li>

								            </ol>

								          </div>

								        </div>

								        <div class="sect3" lang="en" xml:lang="en">

								          <div class="titlepage">

								            <div>

								              <div>

								                <h4 class="title"><a id="failchk"></a>Failure Checking</h4>

								              </div>

								            </div>

								          </div>

								          <p>

								                    For very large and robust multi-process applications, the most common way to ensure all the

								                    processes are working as intended is to make use of a watchdog process. To assist a watchdog

								                    process, DB offers a failure checking mechanism.

								                </p>

								          <p>

								                    When a thread of control fails with open environment handles, the result is that there may be

								                    resources left locked or corrupted. Other threads of control may encountered these unavailable resources

								                    quickly or not at all, depending on data access patterns.

								                </p>

								          <p>

								                    In any case, the DB failure checking mechanism allows a watchdog to detect if an environment is

								                    unusable as a result of a thread of control failure. It should be called periodically

								                    (for example, once a minute) from the watchdog process. If the environment is deemed unusable, then

								                    the watchdog process is notified that recovery should be run. It is then up to the watchdog to

								                    actually run recovery. It is also the watchdog's responsibility to decide what to do about currently

								                    running processes before running recovery. The watchdog could, for example, attempt to

								                    gracefully shutdown or kill all relevant processes before running recovery.

								                </p>

								          <p>

								                    Note that failure checking need not be run from a separate process, although conceptually that is

								                    how the mechanism is meant to be used. This same mechanism could be used in a multi-threaded

								                    application that wants to have a watchdog thread.

								                </p>

								          <p>

								                    To use failure checking you must:

								                </p>

								          <div class="orderedlist">

								            <ol type="1">

								              <li>

								                <p>

								                            <span>

								                                Provide an <code class="function">is_alive()</code> call back using the


								                                <code class="methodname">Dbenv::set_isalive()</code>

								                                method.

								                            </span>


								                                DB uses this method to determine whether a specified process and thread

								                                is alive when the failure checking is performed.

								                        </p>

								              </li>

								              <li>

								                <p>

								                            Possibly provide a


								                                <span>

								                                <code class="literal">thread_id</code> callback

								                                </span>


								                            that uniquely identifies a process

								                            and thread of control. This

								                                <span>callback</span>


								                            is only necessary if the standard process and thread

								                            identification functions for your platform are not sufficient to for use by failure

								                            checking. This is rarely necessary and is usually because the thread and/or process ids

								                            used by your system cannot fit into an unsigned integer.

								                        </p>

								                <p>

								                            You provide this callback using the


								                                <code class="methodname">DbEnv::set_thread_id()</code>

								                            method. See the API reference for this method for more information on when setting a thread

								                            id callback might be necessary.

								                        </p>

								              </li>

								              <li>

								                <p>

								                            Call the


								                                <code class="methodname">DbEnv::failchk()</code>


								                            method periodically. You can do this either periodically (once per minute, for example), or

								                            whenever a thread of control exits for your application.

								                        </p>

								                <p>

								                            If this method determines that a thread of control exited holding read locks, those locks

								                            are automatically released. If the thread of control exited with an unresolved transaction,

								                            that transaction is aborted. If any other problems exist beyond these such that the

								                            environment must be recovered, the method will

								                                <span>return <code class="literal">DB_RUNRECOVERY</code>.</span>


								                        </p>

								              </li>

								            </ol>

								          </div>

								        </div>

								      </div>

								    </div>

								    <div class="navfooter">

								      <hr />

								      <table width="100%" summary="Navigation footer">

								        <tr>

								          <td width="40%" align="left"><a accesskey="p" href="recovery.html">Prev</a> </td>

								          <td width="20%" align="center">

								            <a accesskey="u" href="filemanagement.html">Up</a>

								          </td>

								          <td width="40%" align="right"> <a accesskey="n" href="hotfailover.html">Next</a></td>

								        </tr>

								        <tr>

								          <td width="40%" align="left" valign="top">Recovery Procedures </td>

								          <td width="20%" align="center">

								            <a accesskey="h" href="index.html">Home</a>

								          </td>

								          <td width="40%" align="right" valign="top"> Using Hot Failovers</td>

								        </tr>

								      </table>

								    </div>

								  </body>

								</html>