Set up and use sensitive data classifications

Cloud Security (InsightCloudSec) offers an integrated sensitive data discovery capability that can provide you with a unified approach to managing sensitive data discovery risks across your environments. This capability seamlessly combines Insights with the existing risk scoring and prioritization model introduced with Layered Context but found throughout Cloud Security (InsightCloudSec). Currently, this capability supports sensitive data classification using resource tags but can leverage findings from third-party tools like Amazon Macie.

Feature support

Cloud Security (InsightCloudSec) currently supports sensitive data classifications for the following resource types from Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure commercial cloud accounts:

Cloud Security (InsightCloudSec) resource type	AWS equivalent	Azure equivalent	GCP equivalent
Cloud Dataset	N/A	N/A	BigQuery
Database	N/A	SQL Database/Dedicated SQL Pool	Cloud SQL Database
Storage Container	S3 Bucket	Blob Storage Container	Cloud Storage
Storage Account	N/A	Storage Account	N/A

Prerequisites

Before Cloud Security (InsightCloudSec) can begin reporting on sensitive data classification, ensure you have the following:

Cloud Security (InsightCloudSec) Domain Admin permissions or the Sensitive Data Classification entitlement
At least one supported cloud account connected to Cloud Security (InsightCloudSec)

Set up sensitive data classification

Cloud Security (InsightCloudSec) consumes a combination of finding metadata from CSPs and resource tagging to surface data classifications and instances of sensitive data in your environments. After Cloud Security (InsightCloudSec) begins processing the tags and classifications, the following Query Filters and Insight will report on the classification status for supported resources:

Resource with Sensitive Data Classifications (Query Filter) - Checks if a resource has a sensitive data classification.
Resource without Sensitive Data Classifications (Query Filter) - Checks if a resource has sensitive data but no criteria (tagging).
Resource With Missing Data Classification (Insight) - Uses the Resource without Sensitive Data Classifications Query Filter to determine if a given resource has any data classification present. If no data classification is found, a finding is added.

If you use a data security service from a CSP, explore Cloud-based classification. Otherwise, review Manual classification for details on the tagging format.

Manual classification

While Cloud Security (InsightCloudSec) automatically parses sensitive data findings from CSPs, the resource tagging for manual classification must follow a specific key-value pair format. For a full list of values, review Data Classification (Settings). Rapid7 recommends leveraging continuous integration (CI) tools, Infrastructure-as-Code (IaC), and source control to ensure your deployment templates are automatically tagged appropriately and tracked over time. This is especially true if you do not use a CSP’s data security service (like Amazon Macie) or wish to override the classification for a resource. If you have the appropriate Cloud Security (InsightCloudSec) and CSP permissions, you can add tags directly from the Resource Properties panel or you can add them using a CSP’s console or API and Cloud Security (InsightCloudSec) will harvest them. You can also use the Bot Factory to quickly scope resources and tag in the prescribed format.

Supported key-value pairs

The following table outlines the key-value pairs that are recognized as valid classifications. Review Data Classification (Settings) for more information on the full list of values. For assistance with auditing and tracking usage of these tags, we recommend you use the Tag Explorer.

⚠️

Manual sensitivity overrides

Manually setting the sensitivity will override Cloud-based sensitivity settings.

Key	Description	Values
`data_sensitivity`	Overall sensitivity for the resource. If `data_sensitivity` is not `false`, it needs to be paired with at least one category of found sensitive data (for example: `pii` or `phi`)	`false`, `high`, `medium`, `low`
`pii`	Data is classified as personally identifiable information (PII). If `pii` is not `any`, it has a default sensitivity so it does not need to be paired with `data_sensitivity`. If `pii` is `any`, it needs to be paired with a `data_sensitivity` severity (for example: `high`, `medium`, `low`)	`any` or anything PII-related, such as `ip_address`, `marital_status`, `national_identification_number`
`phi`	Data is classified as protected health information (PHI). If `phi` is not `any`, it has a default sensitivity so it does not need to be paired with `data_sensitivity`. If `phi` is `any`, it needs to be paired with a `data_sensitivity` severity (for example: `high`, `medium`, `low`)	`any` or anything PHI-related, such as `blood_type`, `fda_code`, `health_insurance_number`
`credential`	Data is classified as a credential. For example, an access token or password. If `credential` is not `any`, it has a default sensitivity so it does not need to be paired with `data_sensitivity`. If `credential` is `any`, it needs to be paired with a `data_sensitivity` severity (for example: `high`, `medium`, `low`)	`any` or anything credential-related, such as `json_web_token`, `password`, `openssh_private_key`
`financial`	Data is classified as financial. For example, a credit card or bank account number. If `financial` is not `any`, it has a default sensitivity so it does not need to be paired with `data_sensitivity`. If `financial` is `any`, it needs to be paired with a `data_sensitivity` severity (for example: `high`, `medium`, `low`)	`any` or anything financial-related, such as `bank_account_number`, `credit_card_expiration`, `iban_number`

ℹ️

Want to audit and track your sensitive data tags?

Example classifications

Valid:

data_sensitivity: false
- Resource will be marked as not having sensitive data.
pii: blood_type
- Resource will be marked as having sensitive blood type data with a severity determined by Cloud Security (InsightCloudSec).
data_sensitivity: high, pii: any
- Resource will be marked as having sensitive PII data with a high severity.
data_sensitivity: medium, pii: any, phi: blood_type-fda_code
- Resource will be marked as having sensitive blood type and FDA code data with a severity determined by Cloud Security (InsightCloudSec). Multiple sub-types are properly delimited with a hyphen.

Invalid:

data_sensitivity: high
- Invalid because Cloud Security (InsightCloudSec) cannot infer the category or type of sensitive data. This classification will result in an Insight finding.
pii: any
- Invalid because Cloud Security (InsightCloudSec) cannot infer a severity from any. This classification will result in an Insight finding.
data_sensitivity: high, credentials: creit_card
- Invalid because Cloud Security (InsightCloudSec) cannot recognize the type (credit_card is misspelled). This classification will result in an Insight finding.
pii: ssn,marital_status
- Invalid because multiple sub-types of the same category should be hyphen delimited (for example: pii: ssn-marital_status).

Cloud-based classification

If you currently use a CSP’s data security service, Cloud Security (InsightCloudSec) can visualize and assess the risk of your classifications with no manual intervention. Explore the following sections for details on how Cloud Security (InsightCloudSec) reports on each CSP’s service and classifications.

AWS

AWS data sensitivity classifications require the Amazon Macie service. For more information on how Amazon Macie finds and determines data sensitivity, explore Amazon’s Macie documentation: https://docs.aws.amazon.com/macie/latest/user/data-classification.html . Cloud Security (InsightCloudSec) uses the following Query Filter and Insight to help you audit which accounts do not have Macie turned on:

Cloud Account Without Macie Enabled (AWS) (Query Filter) - Checks if the Macie service is turned on for a given cloud account in the selected regions.
Cloud Account without Macie Enabled (AWS) (Insight) - Uses the Cloud Account Without Macie Enabled (AWS) Query Filter to determine if Macie is turned on for a given cloud account (in any region). If Macie is not turned on for any region inside the cloud account, a finding is added.

If the Insight is adding findings, follow the Recommended Remediation Steps found in the Insight Details, which is accessed by opening the Insight from the Insights Library.

After ensuring Macie is turned on for all relevant accounts, Cloud Security (InsightCloudSec) will report on findings related to sensitive data and surface the classification category, type, sensitivity, and number of sensitive objects per resource. Currently, Cloud Security (InsightCloudSec) only supports automated sensitive data discovery findings from Macie. Explore the Amazon Macie documentation for more information on automated sensitive data discovery: https://docs.aws.amazon.com/macie/latest/user/discovery-asdd.html

ℹ️

AWS Macie findings also supported as a resource

Cloud Security (InsightCloudSec) supports harvesting AWS Macie data findings as a resource type in the inventory.

Azure

Azure data sensitivity classifications require the Microsoft Defender for Cloud service (Defender CSPM or Defender for Storage plans) with Sensitive Data Discovery turned on. For more information on how Microsoft Defender for Cloud finds and determines data sensitivity, explore Azure Defender for Cloud documentation: https://learn.microsoft.com/en-us/azure/defender-for-cloud/concept-data-security-posture#sensitive-data-discovery . Cloud Security (InsightCloudSec) uses the following Query Filter and Insights to help you audit which accounts do not have Sensitive Data Discovery turned on:

Cloud Account Sensitive Data Discovery Status (Query Filter) - Checks if the Defender CSPM and Defender for Storage plans have Sensitive Data Discovery turned on.
Cloud Account without Defender CSPM Sensitive Data Discovery Enabled (Insight) - Uses the Cloud Account Sensitive Data Discovery Status Query Filter to determine if the cloud account or tenant has Defender CSPM Sensitive Data Discovery turned on. If Defender CSPM Sensitive Data Discovery is not turned on inside the cloud account or organization, a finding is added.
Cloud Account without Defender for Storage Sensitive Data Discovery Enabled (Insight) - Uses the Cloud Account Sensitive Data Discovery Status Query Filter to determine if the cloud account or tenant has Defender for Storage Sensitive Data Discovery turned on. If Defender for Storage Sensitive Data Discovery is not turned on inside the cloud account or organization, a finding is added.

If the Insights are adding findings, follow the Recommended Remediation Steps found in the Insight Details, which is accessed by opening the Insight from the Insights Library.

After ensuring Sensitive Data Discovery is turned on for all relevant accounts, Cloud Security (InsightCloudSec) will report on findings related to sensitive data and surface the classification category, type, and sensitivity per resource.

GCP

GCP data sensitivity classifications require the Sensitive Data Protection (formerly known as Data Loss Prevention) service. For more information on how GCP Sensitive Data Protection finds and determines data sensitivity, explore GCP’s documentation: https://cloud.google.com/sensitive-data-protection/docs . Cloud Security (InsightCloudSec) uses the following Query Filter and Insight to help you audit which accounts do not have Sensitive Data Protection turned on:

Cloud Account Without Sensitive Data Protection Enabled (Query Filter) - Checks if the Sensitive Data Protection service is turned on for a given cloud account or organization.
Cloud Account Without Sensitive Data Protection in Use (Insight) - Uses the Cloud Account Without Sensitive Data Protection Enabled Query Filter to determine if the Sensitive Data Protection service is turned on for a given cloud account or organization. If Sensitive Data Protection is not turned on inside the cloud account or organization, a finding is added.

If the Insight is adding findings, follow the Recommended Remediation Steps found in the Insight Details, which is accessed by opening the Insight from the Insights Library.

After ensuring Sensitive Data Protection is turned on for all relevant accounts, Cloud Security (InsightCloudSec) will report on findings related to sensitive data and surface the classification category, type, and sensitivity per resource.

Interact with sensitive data classifications

Now that your sensitive data classifications are represented in Cloud Security (InsightCloudSec), you can begin to interact with the available reporting throughout Cloud Security (InsightCloudSec):

Infrastructure as Code (IaC)

The IaC feature in Cloud Security (InsightCloudSec) can be used to validate data classifications and prevent deployments for Terraform and CloudFormation templates with missing data classifications. To do this, you’ll need to:

Create a Custom Insight Pack featuring the Resource With Missing Data Classification Insight.
Update your deployment templates for the prescribed tagging format.
Create a IaC scan configuration using the Custom Insight Pack.
Integrate mimics (IaC scanning tool) with your CI/CD pipeline.

Sample IaC workflow for data classification

To create a Custom Insight Pack:

Log in to Cloud Security (InsightCloudSec).
Go to Security > Insights > Custom Packs.
Click + Create Custom Pack.
Enter a Pack Name and Description.
Click OK.
Optionally, set up a subscription for the pack.
Go to Security > Insights > Library.
Search for the Resource With Missing Data Classification Insight.
Select the checkbox next to it and click Add to Custom Pack.
Search for the new Custom Pack you created and select it.
Click OK.

To create an IaC Scan Configuration:

Go to Security > Infrastructure as Code > Configurations.
Click + New Configuration.
Enter a Name.
Optionally, enter a Description.
On the Insight Settings tab, search for and select the Custom Pack you just created.
Optionally, on the Notifications tab, update the Slack and Email notification fields.
- This requires the Slack and Email integrations respectively.
Click Apply.

With a Custom Insight Pack and Scan Configuration configured, you can then begin automatically scanning templates for compliance. For more details on mimics and using IaC in Cloud Security (InsightCloudSec), explore the documentation. The following image shows an example of mimics scanning templates:

Layered Context and Risk

Layered Context is one of the primary reporting mechanisms for data sensitivity classifications and provides the quickest way to view the extent of the sensitivity for a given resource. Data sensitivity also can affect the risk score for a resource. If you open Layered Context, you’ll see the Sensitive Data column displaying one of a few possible statuses:

Sensitive - You or the related cloud service have determined the data for this resource is sensitive.
Not Sensitive - You have determined the data for this resource is not sensitive.
Not Classified - Cloud Security (InsightCloudSec) has not identified any data classification for this resource from any of the CSP services or classification through tagging.
N/A - The resource it is not supported by data classification.

If a resource has been marked as sensitive, you can hover your cursor to display a pop-up summary of the data found on the resource. Click Sensitive to open the Resource Properties panel directly to the Sensitive Data tab.

To filter Layered Context based on sensitivity:

Log in to Cloud Security (InsightCloudSec).
Go to Security > Layered Context.
Click Add Filter.
Click Clear All Filters.
Click Add Filter.
Configure a filter: Data Classification is Sensitive.
Click Apply.

Resource Properties

Supported resources have access to a Sensitive Data tab in the Resource Properties panel. There are two sub-tabs on the Sensitive Data tab:

Data Overview - An overview of the classification type, count, sensitivity, and source of data found on the resource. The classifications are processed and presented by Cloud Security (InsightCloudSec), but the data can come from a CSP or manual classification (the source will be Rapid7 in this case).
Data Findings - A list of the data findings from CSP services associated with the resource. This tab currently only supports displaying Amazon Macie findings.

The Resource Properties panel can be accessed throughout Cloud Security (InsightCloudSec), including from the following locations:

Inventory > Resources
Security > Layered Context
Security > Attack Paths

If a resource is deemed sensitive, Cloud Security (InsightCloudSec) exposes the sensitivity summary information and automatically generates an overall sensitivity (called Resource data sensitivity) for the resource that matches the highest severity sensitivity found. From this tab, you can also view the Insight findings related to data sensitivity for the selected resource. For additional properties details and actions, explore Resources.

ℹ️

Want to override the sensitivity?

You can override the sensitivity rating using manual classification.

Tag Explorer

As recommended in the Manual Classification section, you may want to leverage the Tag Explorer to audit data classification tag usage throughout your environments. Review the Tag Explorer documentation for details on creating a tag configuration. The following are some example tag configurations:

Audit data sensitivity tag usage in general:
- Tag keys: data_sensitivity
- Options: Missing all tags
- Resource types: Storage Container, Dataset, Database, Storage Account
Audit financial databases:
- Tag keys: data_sensitivity, financial
- Options: Contains all tags
- Resource types: Database
Audit storage buckets for PII:
- Tag keys: data_sensitivity, pii
- Options: Contains all tags
- Resource types: Storage Container

Attack Paths

Resources with sensitive data that are on Attack Paths are especially problematic and should be mitigated as soon as possible. The following data sensitivity-related Attack Paths are available:

Attack Path Name	Supported CSPs	Description
Publicly Exposed Compute Instance with access to Cloud Trail Data	AWS	An attacker who gains access to the instance can access and manipulate/steal sensitive information, gain access to other cloud resources or disrupt business operations. It can also be used to further pivot within the customer’s cloud footprint due to exposing data about additional cloud accounts and resources.
Publicly Exposed Compute Instance with access to a Bucket Containing PII Data	AWS	When a compute instance has access to PII data stored in an S3 bucket, it can read and potentially manipulate this data thereby posing significant security risks.

For more information on using Attack Paths, explore the documentation.

Cloud Summary (Risk Overview)

For a high-level overview of your environment’s risk, including publicly-available sensitive data, you’ll want to continually review the Cloud Summary > Risk Overview. For details on using the Cloud Summary, explore the documentation.

Data Classifications (Settings)

For detailed information on which external sensitive data classifications Cloud Security (InsightCloudSec) processes, go to Settings > Data Classification. This page assists with exploring the types of data and clouds that Cloud Security (InsightCloudSec) supports.