Skip to main content Skip to footer

Rapid digitization, coupled with COVID-related challenges, has exposed critical operational resiliency gaps across the banking sector, which have led to an increase in service outages at some leading financial institutions (FIs). The pressure is mounting on FIs to improve operational resiliency or risk long-term business impact, according to what our clients are telling us and what we hear from leading operational risk groups such as ORX, the largest operational risk association in the financial services industry.

In addition to exposing banks to regulatory challenges and potential financial and reputational harm, these outages also reinforce concern among bank customers — from retail consumers, to mom-and pop outfits to large commercial enterprises — that FIs are not doing enough to stabilize their operations and services during these uncertain times.

While banks recognize this need, many have encountered the following set of issues that prohibit them from improving their operational resiliency. Overall, banks:

  • Often focus on only a few traditional areas — such as business continuity planning (BCP) and disaster recovery (DR), forsaking hypercritical activities such as managing an event response in a coordinated fashion, and identifying and addressing the responsible root causes.
  • Typically lack coherent and strong business-centric measures for operational resiliency, such as a resiliency risk index, or customer satisfaction impact index.
  • Are inconsistent, across the organization, in how they approach and enforce operational resiliency, relying on separate teams that operate with different playbooks that are not well integrated and synchronized.
  • Are usually constrained in terms of real-time access to operational data such as high-value transactions that are stuck, or clients whose transactions are being impacted during an outage to help facilitate strong decision-making.
  • Are not well organized when it comes to managing customer communications around outages.

To be operationally resilient, banks need contingency measures that span business, operations and technology functions. These measures include well-orchestrated outage response playbooks, support models and enhanced event management capability.

As a result of our work with leading global banks, we have developed a robust framework and tool kit that banks can adopt to boost their operational resiliency and start to reinvigorate their Net Promoter Score (NPS).


Our framework consists of the following measures.

1    Operational resiliency starts with an integrated program (business, operations, fraud/cyber and technology).

Preparation and adoption of an integrated contingency playbook is one of the most effective yet overlooked resiliency measures that can provide immediate gains. An integrated response team prioritizes key services, along with a set of scenarios, failure points and corresponding resiliency strategies, such as a response matrix for potential issues, decision trees to pursue different recovery options and a validated escalation tree with primary and back-up personnel.

2    Banks must modernize their digital telemetry capabilities.

This helps instrument critical solution stacks with highly effective alerting and monitoring capabilities that can quickly pinpoint the outage root-cause and help accelerate the resolution.

3    Advanced process automation and real-time operational reporting are essential.

An automated operational reporting dashboard proves to be highly effective in adding to the resiliency posture. Banks can use this to quickly narrow their focus on a targeted set of impacted customers or transactions (also see example below).

4    A robust event management process can streamline client engagement.

For example, banks with wires operations can set up a war room to manage outages in wires services. We have helped clients set up such war rooms with pre-defined process-overwrite- approvals, manual risk measurement models, wires-entry alternatives, etc. We have also helped prepare pre-defined communication templates and guidelines that a bank can leverage to act quickly initiate customer updates for different scenarios based on issue root cause, time-to-recovery and cut-off times.

5    DR / BCP readiness can be strengthened by performing additional due diligence.

Banks should define decision frameworks to help teams assess and address potential component failover options during an outage. This can include inventory of DR set-up activities for application components, as well as details that include types of configuration, point of contacts, risks and mitigations.

6    A reassessment of vendor/partner capabilities can help fortify resiliency.

Supplier contingency and resiliency evaluations can help banks assess and address concerns in areas such as change and incident management, technology and platform integration, SLA definitions and adherence, as well as communications.

Finally, we recommend that banks seriously apply objective measures such as NPS, a reflection of customer satisfaction, as well as a risk index, a reflection of frequency and length of outages, as objective measures for resiliency. Some of our clients have experiences a significant boost in resiliency just by instituting metrics/KPIs that the entire organization (business, operations and technology) tracks on a monthly basis to measure progress against their different initiatives and objectives.

Proof is in the pudding

We recently helped one of our global banking clients boost its resiliency by leveraging the aforementioned framework. This client, a leading full-service bank in North America, had embarked on a sweeping digital initiative with an aspiration to become the best bank in its peer group. One of the key priority areas included significant operational resiliency improvement to address a growing number of outages across several of its critical service areas.

We started by establishing an end-to-end service team, drawing key individuals from business, operations and technology areas across leadership and operational levels. Next, we engaged this team on a common definition of the associated process-flow contextualized for the bank. With the team and reference process-flow established, we started the deployment of each of the six dimensions of the framework:

  • Collaborated with our client teams in the development and deployment of an integrated playbook. This playbook spanned the bank’s internal functions and vendor partners, included scenario-based response framework, validation and establishment of escalation teams, and focused response options. We accelerated this playbook development by aggregating and refreshing existing set of artifacts/job-aids distributed across different functions and teams.
  • Enabled rapid assessment of the end-to-end technology stack and components with a standardized telemetry tool implementation. Additionally, we helped develop the appropriate monitoring and alerting measures, along with an objective dashboard for ongoing monitoring.
  • Helped the team assess all manual interventions needed to respond to an outage and develop surgical automation capabilities. This included development of utilities as well as leveraging robotics process automation. We also developed a set of real-time reporting drawing from key operational databases.
  • Assisted in the establishment of a robust war room equipped with a set of escalation decision trees needed to respond to different types of outages. Additionally, we worked on a robust customer engagement process, wherein the bank can leverage pre-defined and proactive communication protocols to keep customers informed throughout an outage and associated recovery. This step helped immediately boost customer satisfaction.
  • Enabled the bank to review and strengthen its disaster recovery, business continuity planning and supplier governance processes. We developed a decision framework that allowed business, operations and technology stakeholders to jointly review different options for critical components during an outage.

Leveraging this framework, we helped our client meet and exceed its operational resiliency objectives. The progress was immediate and evident in the objective measures that we deployed — steady NPS growth and a decline in the risk index over a nine-month period of time (see below).


This article was written by Amit Anand, Vice President within Cognizant Consulting’s Banking and Financial Services Practice; and Abhishek Roy, Director within Cognizant Consulting’s Banking and Financial Services Practice.

For more insights, please visit the Banking & Financial Services section of our website or contact us.