With the explosive growth of business ecosystems powered by open application programming interfaces (APIs), organizations must plug platform-unknown vulnerabilities before they can be exploited. This is made even more challenging with the rapid growth of new APIs. The ProgrammableWeb directory, for example, shows 20,000 APIs available in their searchable API directory. Here is how to utilize machine learning to address securing APIs.
As businesses bend and reshape to the forces of all things digital, a wholesale reinvention is occurring in how we build, deploy and operate technology. Microservices, Agile development, DevOps and cloud everything-as-a-service are critical elements of the digital evolution. However, perhaps underappreciated in application discussions is the use of APIs as the building blocks of new software architectures.
However, if APIs are the arch stone of digital innovation, securing them is of paramount importance. In recent years, API security has seen well-publicized incidents such as the IRS “Get Transcript” API attack and platform-wide attacks such as Heartbleed and Shellshock. API vulnerabilities allow an attacker to bypass key controls such as Privileged Access Management (PAM) because the attack operates at the point of program execution. Despite an obsessive focus on prevention technologies at the network, host and application layers, API security implementations still lag significantly in the detection and protection stages of the security lifecycle.
Enter API Behavioral Security
As a rule, traditional security detection solutions from endpoint protection to intrusion prevention systems (IPS) to web application firewalls (WAF) look for recognized patterns of attack. Sandboxing and next-generation approaches that apply machine learning (ML) are increasing the dynamic nature of detection, but most application and API-related monitoring today rely on deterministic forms of detection. This deterministic model, while not infallible, can be effective when context and correlation are applied from user behaviors, asset profiles, and general threat intelligence using indicators of compromise (IOCs) and IP reputation.
API security has a more difficult threat detection challenge because an API resides on one or more layers and is removed from the source of the data request. An API query often has limited information about a specific user identity, historical activity, source IP address or location, host vulnerabilities, etc. As such, cutting-edge API behavioral security (ABS) solutions employ ML to constantly watch for nondeterministic forms of attack (i.e., no signatures), and only alert or block activity when the unexpected happens. Simply authenticating a user and applying entity access control is clearly inadequate. Fundamentally, if your organization’s API security relies only on permission-based protection, bypassing that API perimeter could expose the entire data store. This risk is magnified further because most APIs are remotely accessible.
So how can ABS detect well-crafted malicious requests eliciting valid responses without patterns or signatures?
The key to behavior detection of deterministic environments via machine learning is through training of observed behavior. ML-infused ABS technology needs to be trained to recognize legitimate traffic. Once trained on application traffic specific to your environment, the ABS technology can then easily identify ill-formed traffic.
However, a key challenge is to subject the ABS during the training phase to a sufficient variety of traffic. For this to work effectively, ABS solutions are integrated with APIs very early in their development lifecycle during the dev, build and test phases. In this way, even outlier use cases can be used as training samples, as the ABS solution will train its machine-learned understanding by studying the inbound and outbound traffic, queries and response.
This continues, of course, when the APIs are in production, but the training during the pre-production phase infuses the non-deterministic algorithms with an essential understanding of the API’s expected behavior.
Two of the most frequently used ABS configurations are in-band and out-of-band. With the former, the ABS solution acts as a proxy, inspecting all traffic before being passed on to the API endpoint; there could be some latency in this configuration. With the latter, an ABS solution actively interrogates API management solutions for copies of in- and out-bound traffic; this has led to many ongoing integrations between leading API gateways and ABS solutions.
Simply put, API gateways need to be augmented with ML-powered solutions like ABS to protect APIs that are important conduits to mission-critical data. The efficacy of an ABS solution lies in its ability to never stop learning based on ML algorithms and, hence, continue to grow in potency. ABS-powered solutions give API security a much-needed next-generation uplift to protect against unknown attacks.
This article was written by Tom Le, CTO, and Sudhakar Kamalanathan, Principal Architect of the Cognizant Security Practice.