Continuity in the IT world means
that if we lose a teammate (hopefully only temporarily) that work and life in
our environment can still go on.

The most important thing to do is
to automate everything that you can. Alerts, backups, index investigation, data
integrity checks and other maintenance processes must be automated. Make sure
that failure notices go to a team, rather than an individual, but make sure an
individual is responsible for responding to the messages. On top of that, make
sure that whatever is sending the messages is up (SQL Server Agent, for
example). The second thing is to document the jobs, location of the backups,
the tolerances for performance and capacity issues. In other words, create a
run book. If you document what your normal tolerances are and share what your
past stress causing experiences are in your runbook you can help the rest of
the team handle situations when you have a missing team member.

We create runbooks that share these
tolerances, but the runbooks have to be kept up to date. Many times, teams
treat runbooks as historical record, but they should be live documents, updated
for every change in the environment including what new jobs are added.

Another key point is to ensure that
alerts don’t only go to one team member they should go to a local or group
email or file or send an email to every team member. When you have secret
alerts then you have secret processes that no one knows to handle, or the
information can be lost forever if the team member leaves your business.

You should create rules for job
creation, job documentation and alert responses. Every so often we see
businesses where you have a DBA that wants to be the HERO. They are always
first on the trail for a response. It is great if they are that engaged but the
single threaded information can be dangerous for your team if that person is on
vacation, or out sick. Also, it can be security issue because information could
continue to be sent to this person after they have left the business
permanently or to a dead email address).

We suggest that you have a meeting
soon, today or tomorrow. Review the runbooks, review the jobs that are running.
Explore and interview each other about the knowledge each has about your system
and record it all. Record the meeting in case it doesn’t make it in the
runbook. Version the runbook considers it a live document and create rules for
updating the document.

Next have someone test and validate
the information in the runbook. Have them provide feedback and create a hard
deadline to get the information from each person’s head into the document.
(Note this is a great task for the newest team member for lots of reasons)

Then the living final document
should be shared in a share file that is monitored and has controlled access.  Quarterly or at least twice a year depending
on how volatile your environment is someone should be assigned to again review
and validate the information in the document.