Our framework consists of the following measures.
1 Operational resiliency starts with an integrated program (business, operations, fraud/cyber and technology).
Preparation and adoption of an integrated contingency playbook is one of the most effective yet overlooked resiliency measures that can provide immediate gains. An integrated response team prioritizes key services, along with a set of scenarios, failure points and corresponding resiliency strategies, such as a response matrix for potential issues, decision trees to pursue different recovery options and a validated escalation tree with primary and back-up personnel.
2 Banks must modernize their digital telemetry capabilities.
This helps instrument critical solution stacks with highly effective alerting and monitoring capabilities that can quickly pinpoint the outage root-cause and help accelerate the resolution.
3 Advanced process automation and real-time operational reporting are essential.
An automated operational reporting dashboard proves to be highly effective in adding to the resiliency posture. Banks can use this to quickly narrow their focus on a targeted set of impacted customers or transactions (also see example below).
4 A robust event management process can streamline client engagement.
For example, banks with wires operations can set up a war room to manage outages in wires services. We have helped clients set up such war rooms with pre-defined process-overwrite- approvals, manual risk measurement models, wires-entry alternatives, etc. We have also helped prepare pre-defined communication templates and guidelines that a bank can leverage to act quickly initiate customer updates for different scenarios based on issue root cause, time-to-recovery and cut-off times.
5 DR / BCP readiness can be strengthened by performing additional due diligence.
Banks should define decision frameworks to help teams assess and address potential component failover options during an outage. This can include inventory of DR set-up activities for application components, as well as details that include types of configuration, point of contacts, risks and mitigations.
6 A reassessment of vendor/partner capabilities can help fortify resiliency.
Supplier contingency and resiliency evaluations can help banks assess and address concerns in areas such as change and incident management, technology and platform integration, SLA definitions and adherence, as well as communications.
Finally, we recommend that banks seriously apply objective measures such as NPS, a reflection of customer satisfaction, as well as a risk index, a reflection of frequency and length of outages, as objective measures for resiliency. Some of our clients have experiences a significant boost in resiliency just by instituting metrics/KPIs that the entire organization (business, operations and technology) tracks on a monthly basis to measure progress against their different initiatives and objectives.
Proof is in the pudding
We recently helped one of our global banking clients boost its resiliency by leveraging the aforementioned framework. This client, a leading full-service bank in North America, had embarked on a sweeping digital initiative with an aspiration to become the best bank in its peer group. One of the key priority areas included significant operational resiliency improvement to address a growing number of outages across several of its critical service areas.
We started by establishing an end-to-end service team, drawing key individuals from business, operations and technology areas across leadership and operational levels. Next, we engaged this team on a common definition of the associated process-flow contextualized for the bank. With the team and reference process-flow established, we started the deployment of each of the six dimensions of the framework:
- Collaborated with our client teams in the development and deployment of an integrated playbook. This playbook spanned the bank’s internal functions and vendor partners, included scenario-based response framework, validation and establishment of escalation teams, and focused response options. We accelerated this playbook development by aggregating and refreshing existing set of artifacts/job-aids distributed across different functions and teams.
- Enabled rapid assessment of the end-to-end technology stack and components with a standardized telemetry tool implementation. Additionally, we helped develop the appropriate monitoring and alerting measures, along with an objective dashboard for ongoing monitoring.
- Helped the team assess all manual interventions needed to respond to an outage and develop surgical automation capabilities. This included development of utilities as well as leveraging robotics process automation. We also developed a set of real-time reporting drawing from key operational databases.
- Assisted in the establishment of a robust war room equipped with a set of escalation decision trees needed to respond to different types of outages. Additionally, we worked on a robust customer engagement process, wherein the bank can leverage pre-defined and proactive communication protocols to keep customers informed throughout an outage and associated recovery. This step helped immediately boost customer satisfaction.
- Enabled the bank to review and strengthen its disaster recovery, business continuity planning and supplier governance processes. We developed a decision framework that allowed business, operations and technology stakeholders to jointly review different options for critical components during an outage.
Leveraging this framework, we helped our client meet and exceed its operational resiliency objectives. The progress was immediate and evident in the objective measures that we deployed — steady NPS growth and a decline in the risk index over a nine-month period of time (see below).