Business continuity management is the wider, business risk domain into which IT services fit. This includes non-IT related outages and disruptions, such as those caused by the 2003 SARS outbreak or local incidents such as road closures. The first phase in a business continuity incident may well be termed a ‘disaster’ and a disaster recovery phase will be concerned with establishing a safe environment and getting the infrastructure operating again. The IT services management role and the business continuity management role may well be adopted by the same individual if the separation of IT services from business process cannot be carried out easily and reliably.
In many business processes, all risks except those relating to IT services are low, so a focus on IT is warranted. It is important, however, that the whole business process is monitored to verify that this is the case.
Many IT functions have not adopted an ‘IT services’ model and retain their traditional focus on infrastructure, applications and systems. They will know that ‘a system is down’ but not know the full set of services that it provides. This rates all services equally and does not offer different priorities or service levels. If a telecommunications failure hits a system, which of its services are affected? The mapping of services to systems (and facilities) is seldom complete.
When a comprehensive review of IT services is carried out, service continuity vulnerabilities will be identified. The appropriate actions to deal with these risks will include duplication of resources, increasing the fault-tolerance or resilience of the infrastructure, establishment of secondary facilities, hot sites and the like. It will also be important to work out priorities, so that when resources are scarce, rather than absent, the most critical services are delivered. If, for example, telecommunications bandwidth is reduced, do all applications get a reduction or is the data traffic for accounting more important than the voice traffic for customer service? On the spot decisions are not good enough.
Much of the work involved in IT services risk is in planning. In principle, all threats to essential services will need to be identified, assessed and treated. However, these could be so numerous and have impacts across so many services that an alternative approach may be more effective. A single threat, or a collection of threats, may be dealt with by using a scenario, and then a small collection of alternative scenarios may be invented that together represent all of the important threats.
The planning exercise does two major things – it establishes which facilities need to be enhanced and devises action plans for crises. Enhancements are subject to trade-off and risk appetite hurdles, which should be explicit. Enhancing facilities can be accomplished by replication, hardening or creating alternatives, and the action plans need to be tested and rehearsed. All staff involved in IT service delivery will need to know their roles under particular scenarios. This is achieved only by testing on a regular and systematic basis.
— extract from Beating IT Risks, Chapter 1: Thriving on risk —