How to Effectively Implement & Monitor Cloud Infrastructure

How to Effectively Implement & Monitor Cloud Infrastructure

As organisations are migrating more and more computing to the cloud, they are at risk of becoming more susceptible to malicious attacks. When it comes to the cloud, there’s a difference between what a cloud provider sees and what an attacker sees.

A cloud provider’s perspective

  • Cloud is ever present, ever accessible
  • Provides a wide range of computing services
  • Enables rapid development and deployment
  • Cloud consumption is rapidly increased

An attacker’s perspective

  • Can be continuously and relentlessly attacked
  • A wide surface area to attack
  • Easy to make mistakes and configuration errors
  • Makes a super attractive target

Most attacks are not new - things like malware, password brute forcing, credential theft, DDoS, and SQLi are all common in legacy and on-premise systems. Aside from these, there are also new types of attacks emerging in cloud environments such as password spraying, crypto miners, harvesting secrets/subscription keys, and file-less attacks.

For instance, password spraying works by taking one password and throwing them into multiple accounts, while password brute forcing takes one account and throws many passwords against it. There have been many reports of attacks along the supply chain and on misconfigurations in cloud infrastructure.

When we think about attacks on the cloud, we can group them as such:

Tenant level (Any organisation that puts their infrastructure in the cloud)

  • User elevated to tenant admin
  • Multi factor authentication changed

Subscription level

  • External accounts added to subscription
  • Stale accounts with access to subscription
  • Attack detection service not configured properly


  • Known hacker/malicious tool/process found
  • Account password hash accessed
  • Anti-malware service disabled
  • Brute force login attack detected
  • Communication with a malicious IP
  • TOR IP detected
  • File-less attack technique detected
  • Outgoing DDoS attacks


  • Malicious key vault access — keys enumerated
  • Anonymous storage accessed
  • Activity from unfamiliar location
  • SQL injection detected
  • Hadoop YARN exploit
  • Open management ports on kubenetes nodes
  • Authentication disabled for App/Web services


  • A potentially malicious URL click was detected
  • Unusual volume of external file sharing
  • Password spray login attack

We should think about all these areas that need to be secured. Besides securing cloud infrastructure, it is also important to apply a good monitoring mechanism to respond to any kind of incident. But the problem is — are SOCs (Security Operations Centre) really prepared?

There are many challenges surrounding the implementation of a cloud monitoring system that prevent SOCs from keeping up to date.

  • Most cloud platforms are tenants or are based on subscription models, therefore creating new boundaries
  • Many cloud services = many attack types, and these attacks are becoming more sophisticated
  • Since cloud environments are still relatively new, gaining familiarity with this new technology involves a steep learning curve
  • If you have an on-premise SOC and you want to create a hybrid environment, it makes detection and investigation complex
  • Cloud infrastructure and services are a lot more dynamic in nature. Organisations will keep on running new services while cloud service providers rapidly will concurrently release new features. Furthermore, DevOps and SRE teams make frequent changes to their production systems. It will take a huge amount of effort to keep SOCs up to date with these new services.

If our servers are on-premise, we have control over the network. If an incident happens, we can perform actions like blocking IP or taking down the machine. However, we may not have the same flexibility on the cloud. Monitoring will require establishing partnerships with SOC analysts, cloud resource owners, subscription owners, and cloud service providers. SOC analysts may even need intervention from cloud resource owners in order to obtain resources to conduct investigations or for implementing remediation steps.

In order to implement an effective cloud monitoring system, we have to identify the odds and events that are generated in the aforementioned attacks. We can divide event types into four categories:

  • Control plane logs — ex: Create, update, and delete operations for cloud resources
  • Data plane logs — ex: Events logged as a part of cloud resource usage or Windows events in a VM, SQL audit logs etc.
  • Identity logs — When you design cloud infrastructure, you need to identify the identity architecture. It should be possible to map identity with any action, such as AuthN, AuthZ events, AAD logs etc.
  • Baked alerts — ex: Ready to consume security alerts, ASC, CASB etc.

It’s very beneficial to have a common raw events repository and an alert/log template that can help in log analytics. Additionally, it’s also better to include these data as a common template:

Event ID, Event name, Subscription ID, Resource name, Resource ID, Event time, Data centre, Meta data, Prod or dev, Owner ID, User ID, Success or failure.

This can help build your custom monitoring scenarios and help your SOC to run investigations.

Some alerts and logs can be false positives which may generate lots of load for the SOC. To prevent overloading, we can configure some limits so that it can be redirected to the resource owner. If the resource owner feels like a certain alert or log needs an investigation, they can then redirect them to the SOC.

SIEM (Security information and event management) system’s design and architecture is evolving too. If your on-premise infrastructure already has SIEM setup, it’s better to start bringing cloud events to an on-premise SIEM. Most cloud providers have connectors to popular SIEM’s that makes integration seamless. Over time, you can also consider moving to a cloud-based SIEM so that you can move on-premise events to the cloud SIEM. The last approach is to combine cloud and on-premise things into one big data platform. It provides more flexibility and a great user experience.

There are various mechanisms to fetch events:

  • REST API calls
  • Connectors by SIEM vendors
  • Conversion to standard Syslog format

Skilling up your analysts and engineers is the key to success. Start by providing trainings about cloud concepts like IAAS, PASS, and SAAS. You can begin with IASS as it is more close to on-premise before moving on to PASS, which is more complex than IASS. Try to avoid specific things, accept flexibility, find people who understand data, and keep learning.

To be successful in implementing a proper monitoring system, we have to configure it right. We can apply tools like Azure CIS benchmark to achieve this. Prioritisation is super critical. We have limited resources but have hundreds of use cases. We can use threat modelling to prioritise monitoring scenarios and cut noises.

And last but not least, we cannot forget the importance of constantly scaling up team skills, designing the right SIEM architecture, and establishing a mechanism to keep up with new features in the cloud.

Prasad Wanigasinghe
Prasad Wanigasinghe

Related articles

A Beginner’s Guide to Micro Frontends with Webpack Module Federation
12 mins
Developer toolbox
A Beginner’s Guide to Micro Frontends with Webpack Module Federation
Design Systems: Building a Cross-Functional UI Library with Stencil.js
7 mins
Developer toolbox
Design Systems: Building a Cross-Functional UI Library with Stencil.js
Aggregating Data through DynamoDB Streams
4 mins
Developer toolbox
Aggregating Data through DynamoDB Streams

Button / CloseCreated with Sketch.