avatarAakif Shaikh, CISSP, CEH, CHFI, CISA, GWAPT

Summary

The web content discusses the implementation of a custom dashboard for Cloud Custodian, an open-source serverless tool for cloud management, to visualize policy health, compliance status, and other metrics by integrating with Sumo Logic.

Abstract

The article outlines an approach to compensate for the lack of a built-in dashboard in Cloud Custodian, a tool used for over four years by organizations to manage cloud resources. It details the high-level architecture of Cloud Custodian in conjunction with Sumo Logic, which enables the ingestion of Custodian logs, querying for non-compliant items, and the creation of visual dashboards. The process involves deploying policies in YAML, setting up Lambda functions, CloudWatch Log Groups, and Event Rule scheduling. The output, in GZ format, is sent to an S3 bucket and then ingested into Sumo Logic via a hosted collector. The article emphasizes the cost-effectiveness of running Cloud Custodian and the flexibility of the solution, which allows for the monitoring of various metrics such as the number of AWS accounts, policy health checks, and the status of resources related to security and cost savings. The author also provides examples of Sumo Logic queries and screenshots of illustrative dashboards, highlighting the ability to track historical data and perform inventory management across multiple AWS accounts.

Opinions

  • The author suggests that the absence of a native dashboard in Cloud Custodian is a significant shortcoming, necessitating integration with third-party tools like Sumo Logic for better visualization and monitoring.
  • There is an appreciation for Cloud Custodian's serverless nature and its cost-efficiency, with an example given that running 200 policies can cost less than $100 per month.
  • The author expresses the importance of a naming convention for policies to facilitate the identification and categorization of policies (e.g., cost-saving, security-related).
  • The article conveys a positive view of the customization capabilities provided by the integration of Cloud Custodian with Sumo Logic, allowing organizations to tailor dashboards to their specific needs and environments.
  • The author emphasizes the utility of historical data comparison in understanding trends and the effectiveness of policies over time.
  • There is a recognition of the complexity involved in managing cloud resources across multiple accounts, which the proposed solution aims to simplify through centralized dashboards and inventory tracking.

Dashboard for Cloud Custodian

An alternate method to get the visuals and build your own dashboards

After using the Cloud Custodian for over four years, we can all agree to have missed one prominent feature that is “Dashboard”. Cloud Custodian does not have the front end / GUI where you can easily navigate the findings, provide a single plane of glass view of all the accounts you have from all of the public cloud providers, show checks on policy health, display various charts, and guidance to tell the story to the management. Due to this shortcoming, the user has to integrate with native tools or third-party tools. We know how powerful is the Cloud Custodian with all the execution modes, filters, and action items. The fact that it is serverless, running Cloud Custodian is very cheap. Every organization's environment is different and so is the configuration, simply stating the monthly cost to run 200 policies (approx.) is less than $100 (depends on how frequently you are running).

Cloud Custodian is an open-source python based serverless tool

In this story, I will go through the high-level architecture of the Cloud Custodian and Sumo Logic setup which enables us to ingest the Custodian Logs and write various queries to look for non-compliant items, check for policy health, and draw pretty dashboards.

Example- Identify AWS Redshift Cluster Publicly Accessible
policies:
- name: redshift-cluster-publicly-accessible
  resource: aws.redshift
  comments: |
    Find Redshift clusters that are publicly accessible.This is a 
    notify only policy. The policy run once every 24 hours.
  filters:
     - "tag:redshift-publicly-accessible-exempt": absent
     - PubliclyAccessible: true
  mode:
    type: periodic
    schedule: "rate(24 hours)"
    execution-options:
      output_dir: s3://s3bucket/cclogs/{account_id}/
    runtime: python3.8
  action:
    - type: delete

Different Components

The basic component of Cloud Custodian depending on your implementation includes — Lambda Function, CloudWatch Log Groups, and Cloud Watch Event Rules. Firstly, you write a policy in YAML as shown above, as an example to identify the publicly accessible Redshift clusters. When you deploy the policy to the AWS account, the real magic happens. It creates the lambda function which includes the policy. It will then create the CloudWatch Log Groups. This is where you can check the log streams. Every time the policy runs it creates a new log stream. This log stream contains the timestamp and debugging messages. You can also see if the resources matched the filters and identified them as non-compliant items. Lastly, it creates the Cloud Watch Event Rule. This is where you can check how often the policy will run. It includes the event rule name, status, event schedule, and target. I have a separate story where I have discussed how to solve the quota problem for cloud watch event rules while deploying the cloud custodian policies.

Architecture

A high-level architecture includes the Lambda function where the Custodian and the policy reside. The Cloud Watch event rule will trigger the policy to execute. Custodian will look for the items matched to the filter and produces the output in GZ format. This output is sent to the s3 bucket as defined in the policy. The IAM role that is used by Custodian must have access to that s3 bucket in order to drop those files. You must have deployed the hosted collector within that AWS account to ingest the Custodian output logs from s3 (3 GZ files) into the Sumo Logic (SIEM solution).

Cloud Custodian output is ingested from s3 into Sumo Logic

SumoLogic

Sumo Logic is a cloud-based SIEM solution(Security Information and Event Management). A hosted collector must be configured for Source S3. This means a hosted collector will take the data from the s3 bucket and ingest it into SumoLogic. Refer to the SumoLogic support page for instructions on how to create the collector and source.

Sumo Logic — S3 Source for Hosted Collector

We have a separate story that explains the components required and corresponding configurations. Refer to the story- Ingesting Cloud Custodian Logs into SumoLoigc (Part 1) and Ingesting Cloud Custodian Logs into SumoLoigc (Part 2). A separate story to identify the Cloud Custodian Policy Health Checks.

Dashboard

We have created the below dashboard to give a high-level counts on various things- 1) Total number of AWS accounts 2) Count of low and high tier accounts 3) Count on active and suspended accounts 4) Total number of CIS Benchmark Policies 5) Total number of Cost Saving Policies (Separation into Action Vs Notify) 6) Total number of Security Related Policies, etc.

Sumo Logic Dashboard — Illustration Purposes Only

In order to get these counts, it is very important that you have the policy to count the resources. In this scenario, we are using the policy that is counting on lambda functions. We have also adopted a simplified naming convention which allows us to identify- (i) if the policy is CSP (cost-saving policy) or Sec (security-related) or misc (miscellaneous (ii) if the policy is to notify only (indicated as -n-) or action (indicated as -na-) (iii) acts on existing or newly created resources. The below policy structure shows the

Policy Structure

In the below query, you have to enter your _sourceCategory, _sourceName. The policy name that counts the lambda function is “sec-n-lambda-function-count”. We have to use regex to separate the FunctionName that matches with “cis-” because the CIS benchmark policies start with “cis’”

_sourceCategory="aws/cc/sourcecategory" AND _sourceName=*CustodianLogs/*/policyname/*/*/*/*/resources.json.gz
| parse field=_sourceName "*/*/*/*/*/*/*/*" as clogs, account_id, policies_name, year, month, date, _min, crunlog nodrop
| parse regex "\"FunctionName\":\s\"(?<FunctionName>.+?)\"" multi nodrop
| where FunctionName matches "*cis-*"
| count(FunctionName) group by FunctionName
| fields -_count
| count

The below screenshot from the Sumo Logic dashboard shows — 1) Total number of policies related to missing tags for existing resources (covers all existing) and 2) Total number of policies related to missing tags for newly created resources (in the past 30 days). It is important to note that you have to write individual policies for each resource to count the resources. 3) It also gives the count of policies related to encryption. For example- the number of policies related to encryption that has guard rails, the number of policies that are just notified only, and the number of policies that covers CIS benchmarks (related to encryption).

Sumo Logic Dashboard — Illustration Purposes Only

Encryption Related Policies — Dashboard

This dashboard contains all the resources that have policies related to encryption. For example, the policies are looking where encryption is not enabled and then notifying it, in other cases where it has a guard rails and taking actions. This provides you with a quick way to identify all non-compliant items.

Sumo Logic Dashboard — Illustration Purposes Only

A sample Sumo Logic query to draw the dashboard like above. Replace the below query with source category, source, source name, collector, and policy name.

Sumo Logic Query
_sourceCategory="source-category" and
_source="resources_file_sourcename" and _collector="collectorname"
AND _sourceName=*cclogs/*/sec-n-redshift-cluster-not-encrypted/*/*/*/*/resources.json.gz
| parse field=_sourceName "*/*/*/*/*/*/*/*" as clogs, account_id, policies_name, year, month, date, _min, crunlog nodrop
| parse regex "\"ClusterIdentifier\":\s\"(?<ClusterIdentifier>.+?)\"" multi nodrop
| count (ClusterIdentifier) group by ClusterIdentifier, account_id
| fields -_count

Publicly Accessible Resources — Dashboard

The below screenshot shows the dashboard for resources that are exposed to the world. You have to write each individual query in Sumo Logic and then add it to the dashboard.

Sumo Logic Dashboard — Illustration Purposes Only

Comparing Historical Data

Example#1 — In the below example, we are comparing historical data to understand how many AMIs existed and were created in the last 4 weeks across all of your AWS accounts (hundreds).

*** Sumo Logic Query ***
_sourceCategory="YourSourceCategory" and
_source="cloudcustodianresourcefilename" and _collector="YourCollector" AND _sourceName=*CustodianLogs/*/policyname/*/*/*/*/resources.json.gz
| parse field=_sourceName "*/*/*/*/*/*/*/*" as clogs, account_id, policies_name, year, month, date, _min, crunlog nodrop
| parse regex "\"ImageId\":\s\"(?<ImageId>.+?)\"" multi nodrop
| count (ImageId) group by ImageId
| count| compare timeshift 1w 4

The below screenshot shows the count of AMIs every week for the last 4 weeks. The data shown is just for illustration purposes only. We have manually edited it to show the differences (historical values).

Historical comparison of data (last 30 days)

Example #2 — In the below example, we are comparing historical data to understand how many old EBS volume snapshots were deleted in the last 4 weeks across all of your AWS accounts (hundreds).

*** Sumo Logic Query ***
_sourceCategory="YourSourceCategory" and
_source="cloudcustodianresourcefilename" and _collector="YourCollector" AND _sourceName=*CustodianLogs/*/policyname/*/*/*/*/resources.json.gz
| parse field=_sourceName "*/*/*/*/*/*/*/*" as clogs, account_id, policies_name, year, month, date, _min, crunlog nodrop
| parse regex "\"SnapshotId\":\s\"(?<SnapshotId>.+?)\"" multi nodrop
| count (SnapshotId) group by SnapshotId
| count| compare timeshift 1w 4

The below screenshot shows the count of old EBS volume snapshots that were deleted every week for the last 4 weeks. The data shown is just for illustration purposes only. We have manually edited it to show the differences (historical values).

Historical comparison of data (last 30 days)

AWS Resources Inventory — Dashboard

We have a separate story where we have discussed the problem and the solution — How to tag at resource and account level in AWS? The below screenshot from sumo provides you with the count of all AWS resources. You can draw a dashboard for each account or for all AWS accounts (100s of accounts together). You just need to adjust your query in Sumo Logic.

Screenshot from Sumo Logic — Inventory Dashboard

Other Stories

Cloud Custodian Policy Health Checks

Ingesting Cloud Custodian Logs into Sumo Logic

Cloud Custodian [GZ] Output Files

Upgrade your Cloud Custodian to the latest version

https://ismsguy.medium.com/membership

Cloud Custodian
Cloud Governance
Policy As Code
AWS
Governance As Code
Recommended from ReadMedium