CSA CCM LOG-13
Failures and Anomalies Reporting

Proper logging and monitoring is critical for detecting security issues in cloud environments. Organizations need to define processes for identifying and reporting on anomalies and failures in their monitoring systems. Immediate notification should be provided to accountable parties when issues are detected.

Where did this come from?

This control comes from the CSA Cloud Controls Matrix v4.0.10 - 2023-09-26. You can download the full matrix here. The matrix provides a comprehensive set of security controls tailored for cloud computing. For more background, check out the AWS Cloud Adoption Framework - Logging and Monitoring.

Who should care?

  • Cloud security engineers responsible for designing logging and monitoring controls
  • Incident responders who need visibility into anomalies and failures
  • Compliance officers ensuring the organization has proper detection processes in place
  • Developers instrumenting applications to feed security event data into monitoring systems

What is the risk?

Failing to detect and respond to logging and monitoring anomalies can allow attackers to operate undetected in the environment. They may be able to escalate privileges, access sensitive data, and maintain long-term persistence.

The impact depends on what systems are impacted and for how long issues persist before detection. A monitoring outage for a single low value asset may not be severe. However, an outage spanning multiple critical systems poses high risk.

What's the care factor?

Organizations should treat the ability to detect logging and monitoring failures as a high priority. It directly impacts the reliability of security alerting. Proper incident response depends on having confidence in the completeness and accuracy of event data.

However, lower severity assets may not require the same level of alerting on monitoring issues. Risk-based decisions can help prioritize implementation.

When is it relevant?

This control applies to any systems that generate security event logs or are in scope for security monitoring. It is most relevant for:

  • Critical assets with high availability and integrity requirements
  • Systems processing highly sensitive data
  • Assets exposed to untrusted networks
  • Security infrastructure like firewalls, IDS/IPS, and access control systems

It may be less relevant for isolated test/dev systems or resources that don't process important data.

What are the trade-offs?

Implementing comprehensive failure and anomaly reporting requires effort to:

  • Define detection processes and integrate them with monitoring pipelines
  • Tune and customize alerts to reduce false positives
  • Train staff to interpret and respond to notifications
  • Maintain the reporting workflows over time

Organizations have to balance the level of anomaly detection with the noise generated for staff. Too many alerts can cause fatigue and complacency.

How to make it happen?

  1. Identify critical security systems that require anomaly reporting (firewalls, IDS/IPS, FIM, AV, access control, etc)
  2. For each system, define a list of failure and anomaly conditions to monitor. Examples:
    • Unexpected service outages
    • Inability to ship logs to a SIEM
    • Storage thresholds exceeded
    • Backup failures
    • Unauthorized configuration changes
  3. Configure the log sources to generate events when these conditions occur. Feed the events into a centralized SIEM or monitoring platform.
  4. Create targeted alerts that notify the appropriate teams (SecOps, IT, etc) when anomalies are detected. Use high severity notifications for critical systems.
  5. Document the alert response procedures. Train staff on the required investigation and mitigation steps for each alert type.
  6. Implement automated responses where possible. For example, page the on-call staff if a critical failure occurs after hours.
  7. Regularly test the anomaly reporting by triggering alerts and verifying the notification and response workflows.
  8. Tune the alerts over time to ensure a high signal to noise ratio. Suppress noisy/redundant alarms.

What are some gotchas?

  • Ensure logging systems have sufficient storage capacity to retain events for the required timeframe. Monitoring for "disk full" errors is key.
  • Account for network and permission dependencies required for log sources to send data to analysis platforms. E.g. firewalls need to allow SIEM log collectors to pull data over the required ports.
  • Some alerts may require specific permissions to generate, such as AWS CloudTrail logging for S3 buckets. You may need to attach additional IAM policies.
  • Be thoughtful about who can suppress or modify alerts. Implement strong change control and reviews to avoid blind spots.

What are the alternatives?

  • Outsource elements of logging and monitoring to an MSSP. Let them manage the underlying detection and reporting infrastructure.
  • Use a 3rd party Cloud Access Security Broker (CASB) to handle log aggregation and anomaly detection across multiple cloud platforms and SaaS apps.
  • Engage a Managed Detection and Response (MDR) provider to act as an extension of the internal SecOps team. Let their analysts monitor for advanced threats.

Explore further

?

Blog

Learn cloud security with our research blog