Set up and use sensitive data classifications

InsightCloudSec offers an integrated sensitive data discovery capability that can provide you with a unified approach to managing sensitive data discovery risks across your environments. This capability seamlessly combines Insights with the existing risk scoring and prioritization model introduced with Layered Context but found throughout InsightCloudSec. Currently, this capability supports sensitive data classification using resource tags but can leverage findings from third-party tools like Amazon Macie.

Feature support

InsightCloudSec currently supports sensitive data classifications for the following resource types from Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure commercial cloud accounts:

InsightCloudSec resource typeAWS equivalentAzure equivalentGCP equivalent
Cloud DatasetN/AN/ABigQuery
DatabaseN/ASQL Database/Dedicated SQL PoolCloud SQL Database
Storage ContainerS3 BucketBlob Storage ContainerCloud Storage
Storage AccountN/AStorage AccountN/A

Prerequisites

Before InsightCloudSec can begin reporting on sensitive data classification, ensure you have the following:

  • InsightCloudSec Domain Admin permissions or the Sensitive Data Classification entitlement
  • At least one supported cloud account connected to InsightCloudSec

Set up sensitive data classification

InsightCloudSec consumes a combination of finding metadata from CSPs and resource tagging to surface data classifications and instances of sensitive data in your environments. After InsightCloudSec begins processing the tags and classifications, the following Query Filters and Insight will report on the classification status for supported resources:

  • Resource with Sensitive Data Classifications (Query Filter) - Checks if a resource has a sensitive data classification.
  • Resource without Sensitive Data Classifications (Query Filter) - Checks if a resource has sensitive data but no criteria (tagging).
  • Resource With Missing Data Classification (Insight) - Uses the Resource without Sensitive Data Classifications Query Filter to determine if a given resource has any data classification present. If no data classification is found, a finding is added.

If you use a data security service from a CSP, explore Cloud-based classification. Otherwise, review Manual classification for details on the tagging format.

Manual classification

While InsightCloudSec automatically parses sensitive data findings from CSPs, the resource tagging for manual classification must follow a specific key-value pair format. For a full list of values, review Data Classification (Settings). Rapid7 recommends leveraging continuous integration (CI) tools, Infrastructure-as-Code (IaC), and source control to ensure your deployment templates are automatically tagged appropriately and tracked over time. This is especially true if you do not use a CSP's data security service (like Amazon Macie) or wish to override the classification for a resource. If you have the appropriate InsightCloudSec and CSP permissions, you can add tags directly from the Resource Properties panel or you can add them using a CSP's console or API and InsightCloudSec will harvest them. You can also use the BotFactory to quickly scope resources and tag in the prescribed format.

Supported key-value pairs

The following table outlines the key-value pairs that are recognized as valid classifications. Review Data Classification (Settings) for more information on the full list of values. For assistance with auditing and tracking usage of these tags, we recommend you use the Tag Explorer.

Manual sensitivity overrides

Manually setting the sensitivity will override Cloud-based sensitivity settings.

KeyDescriptionValues
data_sensitivityOverall sensitivity for the resource. If data_sensitivity is not false, it needs to be paired with at least one category of found sensitive data (for example: pii or phi)false, high, medium, low
piiData is classified as personally identifiable information (PII). If pii is not any, it has a default sensitivity so it does not need to be paired with data_sensitivity. If pii is any, it needs to be paired with a data_sensitivity severity (for example: high, medium, low)any or anything PII-related, such as ip_address, marital_status, national_identification_number
phiData is classified as protected health information (PHI). If phi is not any, it has a default sensitivity so it does not need to be paired with data_sensitivity. If phi is any, it needs to be paired with a data_sensitivity severity (for example: high, medium, low)any or anything PHI-related, such as blood_type, fda_code, health_insurance_number
credentialData is classified as a credential. For example, an access token or password. If credential is not any, it has a default sensitivity so it does not need to be paired with data_sensitivity. If credential is any, it needs to be paired with a data_sensitivity severity (for example: high, medium, low)any or anything credential-related, such as json_web_token, password, openssh_private_key
financialData is classified as financial. For example, a credit card or bank account number. If financial is not any, it has a default sensitivity so it does not need to be paired with data_sensitivity. If financial is any, it needs to be paired with a data_sensitivity severity (for example: high, medium, low)any or anything financial-related, such as bank_account_number, credit_card_expiration, iban_number

Want to audit and track your sensitive data tags?

Example classifications

Valid:

  • data_sensitivity: false
    • Resource will be marked as not having sensitive data.
  • pii: blood_type
    • Resource will be marked as having sensitive blood type data with a severity determined by InsightCloudSec.
  • data_sensitivity: high, pii: any
    • Resource will be marked as having sensitive PII data with a high severity.
  • data_sensitivity: medium, pii: any, phi: blood_type-fda_code
    • Resource will be marked as having sensitive blood type and FDA code data with a severity determined by InsightCloudSec. Multiple sub-types are properly delimited with a hyphen.

Invalid:

  • data_sensitivity: high
    • Invalid because InsightCloudSec cannot infer the category or type of sensitive data. This classification will result in an Insight finding.
  • pii: any
    • Invalid because InsightCloudSec cannot infer a severity from any. This classification will result in an Insight finding.
  • data_sensitivity: high, credentials: creit_card
    • Invalid because InsightCloudSec cannot recognize the type (credit_card is misspelled). This classification will result in an Insight finding.
  • pii: ssn,marital_status
    • Invalid because multiple sub-types of the same category should be hyphen delimited (for example: pii: ssn-marital_status).

Cloud-based classification

If you currently use a CSP's data security service, InsightCloudSec can visualize and assess the risk of your classifications with no manual intervention. Explore the following sections for details on how InsightCloudSec reports on each CSP's service and classifications.

AWS

AWS data sensitivity classifications require the Amazon Macie service. For more information on how Amazon Macie finds and determines data sensitivity, explore Amazon's Macie documentation: https://docs.aws.amazon.com/macie/latest/user/data-classification.html. InsightCloudSec uses the following Query Filter and Insight to help you audit which accounts do not have Macie turned on:

  • Cloud Account Without Macie Enabled (AWS) (Query Filter) - Checks if the Macie service is turned on for a given cloud account in the selected regions.
  • Cloud Account without Macie Enabled (AWS) (Insight) - Uses the Cloud Account Without Macie Enabled (AWS) Query Filter to determine if Macie is turned on for a given cloud account (in any region). If Macie is not turned on for any region inside the cloud account, a finding is added.

If the Insight is adding findings, follow the Recommended Remediation Steps found in the Insight Details, which is accessed by opening the Insight from the Insights Library.

After ensuring Macie is turned on for all relevant accounts, InsightCloudSec will report on findings related to sensitive data and surface the classification category, type, sensitivity, and number of sensitive objects per resource. Currently, InsightCloudSec only supports automated sensitive data discovery findings from Macie. Explore the Amazon Macie documentation for more information on automated sensitive data discovery: https://docs.aws.amazon.com/macie/latest/user/discovery-asdd.html

AWS Macie findings also supported as a resource

InsightCloudSec supports harvesting AWS Macie data findings as a resource type in the inventory.

Azure

Azure data sensitivity classifications require the Microsoft Defender for Cloud service (Defender CSPM or Defender for Storage plans) with Sensitive Data Discovery turned on. For more information on how Microsoft Defender for Cloud finds and determines data sensitivity, explore Azure Defender for Cloud documentation: https://learn.microsoft.com/en-us/azure/defender-for-cloud/concept-data-security-posture#sensitive-data-discovery. InsightCloudSec uses the following Query Filter and Insights to help you audit which accounts do not have Sensitive Data Discovery turned on:

  • Cloud Account Sensitive Data Discovery Status (Query Filter) - Checks if the Defender CSPM and Defender for Storage plans have Sensitive Data Discovery turned on.
  • Cloud Account without Defender CSPM Sensitive Data Discovery Enabled (Insight) - Uses the Cloud Account Sensitive Data Discovery Status Query Filter to determine if the cloud account or tenant has Defender CSPM Sensitive Data Discovery turned on. If Defender CSPM Sensitive Data Discovery is not turned on inside the cloud account or organization, a finding is added.
  • Cloud Account without Defender for Storage Sensitive Data Discovery Enabled (Insight) - Uses the Cloud Account Sensitive Data Discovery Status Query Filter to determine if the cloud account or tenant has Defender for Storage Sensitive Data Discovery turned on. If Defender for Storage Sensitive Data Discovery is not turned on inside the cloud account or organization, a finding is added.

If the Insights are adding findings, follow the Recommended Remediation Steps found in the Insight Details, which is accessed by opening the Insight from the Insights Library.

After ensuring Sensitive Data Discovery is turned on for all relevant accounts, InsightCloudSec will report on findings related to sensitive data and surface the classification category, type, and sensitivity per resource.

GCP

GCP data sensitivity classifications require the Sensitive Data Protection (formerly known as Data Loss Prevention) service. For more information on how GCP Sensitive Data Protection finds and determines data sensitivity, explore GCP's documentation: https://cloud.google.com/sensitive-data-protection/docs. InsightCloudSec uses the following Query Filter and Insight to help you audit which accounts do not have Sensitive Data Protection turned on:

  • Cloud Account Without Sensitive Data Protection Enabled (Query Filter) - Checks if the Sensitive Data Protection service is turned on for a given cloud account or organization.
  • Cloud Account Without Sensitive Data Protection in Use (Insight) - Uses the Cloud Account Without Sensitive Data Protection Enabled Query Filter to determine if the Sensitive Data Protection service is turned on for a given cloud account or organization. If Sensitive Data Protection is not turned on inside the cloud account or organization, a finding is added.

If the Insight is adding findings, follow the Recommended Remediation Steps found in the Insight Details, which is accessed by opening the Insight from the Insights Library.

After ensuring Sensitive Data Protection is turned on for all relevant accounts, InsightCloudSec will report on findings related to sensitive data and surface the classification category, type, and sensitivity per resource.

Interact with sensitive data classifications

Now that your sensitive data classifications are represented in InsightCloudSec, you can begin to interact with the available reporting throughout InsightCloudSec:

Infrastructure as Code (IaC)

The IaC feature in InsightCloudSec can be used to validate data classifications and prevent deployments for Terraform and CloudFormation templates with missing data classifications. To do this, you'll need to:

  1. Create a Custom Insight Pack featuring the Resource With Missing Data Classification Insight.
  2. Update your deployment templates for the prescribed tagging format.
  3. Create a IaC scan configuration using the Custom Insight Pack.
  4. Integrate mimics (IaC scanning tool) with your CI/CD pipeline.

Sample IaC workflow for data classification

To create a Custom Insight Pack:

  1. Log in to InsightCloudSec.
  2. Go to Security > Insights > Custom Packs.
  3. Click + Create Custom Pack.
  4. Enter a Pack Name and Description.
  5. Click OK.
  6. Optionally, set up a subscription for the pack.
  7. Go to Security > Insights > Library.
  8. Search for the Resource With Missing Data Classification Insight.
  9. Select the checkbox next to it and click Add to Custom Pack.
  10. Search for the new Custom Pack you created and select it.
  11. Click OK.

To create an IaC Scan Configuration:

  1. Go to Security > Infrastructure as Code > Configurations.
  2. Click + New Configuration.
  3. Enter a Name.
  4. Optionally, enter a Description.
  5. On the Insight Settings tab, search for and select the Custom Pack you just created.
  6. Optionally, on the Notifications tab, update the Slack and Email notification fields.
    • This requires the Slack and Email integrations respectively.
  7. Click Apply.

With a Custom Insight Pack and Scan Configuration configured, you can then begin automatically scanning templates for compliance. For more details on mimics and using IaC in InsightCloudSec, explore the documentation. The following image shows an example of mimics scanning templates:

Mimics scan example

Layered Context and Risk

Layered Context is one of the primary reporting mechanisms for data sensitivity classifications and provides the quickest way to view the extent of the sensitivity for a given resource. Data sensitivity also can affect the risk score for a resource. If you open Layered Context, you'll see the Sensitive Data column displaying one of a few possible statuses:

  • Sensitive - You or the related cloud service have determined the data for this resource is sensitive.
  • Not Sensitive - You have determined the data for this resource is not sensitive.
  • Not Classified - InsightCloudSec has not identified any data classification for this resource from any of the CSP services or classification through tagging.
  • N/A - The resource it is not supported by data classification.

If a resource has been marked as sensitive, you can hover your cursor to display a pop-up summary of the data found on the resource. Click Sensitive to open the Resource Properties panel directly to the Sensitive Data tab.

To filter Layered Context based on sensitivity:

  1. Log in to InsightCloudSec.
  2. Go to Security > Layered Context.
  3. Click Add Filter.
  4. Click Clear All Filters.
  5. Click Add Filter.
  6. Configure a filter: Data Classification is Sensitive.
  7. Click Apply.
Resource Properties

Supported resources have access to a Sensitive Data tab in the Resource Properties panel. There are two sub-tabs on the Sensitive Data tab:

  • Data Overview - An overview of the classification type, count, sensitivity, and source of data found on the resource. The classifications are processed and presented by InsightCloudSec, but the data can come from a CSP or manual classification (the source will be Rapid7 in this case).
  • Data Findings - A list of the data findings from CSP services associated with the resource. This tab currently only supports displaying Amazon Macie findings.

The Resource Properties panel can be accessed throughout InsightCloudSec, including from the following locations:

  • Inventory > Resources
  • Security > Layered Context
  • Security > Attack Paths

If a resource is deemed sensitive, InsightCloudSec exposes the sensitivity summary information and automatically generates an overall sensitivity (called Resource data sensitivity) for the resource that matches the highest severity sensitivity found. From this tab, you can also view the Insight findings related to data sensitivity for the selected resource. For additional properties details and actions, explore Resources.

Want to override the sensitivity?

You can override the sensitivity rating using manual classification.

Tag Explorer

As recommended in the Manual Classification section, you may want to leverage the Tag Explorer to audit data classification tag usage throughout your environments. Review the Tag Explorer documentation for details on creating a tag configuration. The following are some example tag configurations:

  • Audit data sensitivity tag usage in general:
    • Tag keys: data_sensitivity
    • Options: Missing all tags
    • Resource types: Storage Container, Dataset, Database, Storage Account
  • Audit financial databases:
    • Tag keys: data_sensitivity, financial
    • Options: Contains all tags
    • Resource types: Database
  • Audit storage buckets for PII:
    • Tag keys: data_sensitivity, pii
    • Options: Contains all tags
    • Resource types: Storage Container
Attack Paths

Resources with sensitive data that are on Attack Paths are especially problematic and should be mitigated as soon as possible. The following data sensitivity-related Attack Paths are available:

Attack Path NameSupported CSPsDescription
Publicly Exposed Compute Instance with access to Cloud Trail DataAWSAn attacker who gains access to the instance can access and manipulate/steal sensitive information, gain access to other cloud resources or disrupt business operations. It can also be used to further pivot within the customer's cloud footprint due to exposing data about additional cloud accounts and resources.
Publicly Exposed Compute Instance with access to a Bucket Containing PII DataAWSWhen a compute instance has access to PII data stored in an S3 bucket, it can read and potentially manipulate this data thereby posing significant security risks.

For more information on using Attack Paths, explore the documentation.

Cloud Summary (Risk Overview)

For a high-level overview of your environment's risk, including publicly-available sensitive data, you'll want to continually review the Cloud Summary > Risk Overview. For details on using the Cloud Summary, explore the documentation.

Data Classifications (Settings)

For detailed information on which external sensitive data classifications InsightCloudSec processes, go to Settings > Data Classification. This page assists with exploring the types of data and clouds that InsightCloudSec supports.