Data Groups

Data Groups simplify resource filtering, Insight analysis, and Bot configuration. This feature allows administrators to build out reusable data definitions—collections of strings—that can be used and reused when creating and updating Insights and Bots. The Data Groups can be associated with any number of the hundreds of filters in the product.

When you edit a Data Group, all Insights and Bots that use that group will automatically use the updated group next time they run; you needn’t repeat your edits across multiple Insights and Bots.

Example use case

Example Use Case

If you want to specify a list of trusted accounts, and disallow certain kinds of activity from all other accounts. You might set up Bots configured with these Insights:

  • Resource With Cross Account Access to Unknown Account
  • Network Peers Connected to Unknown Accounts
  • Service Role Trusting Unknown Account
  • Cloud Role Trusting Unknown Account

You’ll configure all of these Bots with the same set of account numbers. Manually entering and updating the list of allowed accounts in all of these Bots is a tedious and error-prone process; Data Groups can make correctly managing these reused inputs fast and easy, even for sets of tens of thousands of strings.

A Data Group can contain up to 4MB of strings, allowing you to manage tens of thousands of entries, defining the behavior of many Insights and Bots, in a single list. The admin creating or editing a Data Group is responsible for ensuring the integrity of the collection, e.g., a list of accounts must contain only valid account numbers; Data Groups will not validate the entered lists.

Accessing Data Groups

Go to Cloud > Data Groups to get started.

Manage data groups

  1. Go to Cloud > Data Groups.
  2. To add a new group, click New Group and complete the details.
  3. To delete a group, select the group and click the Delete icon.
  4. To edit data groups, click the data group to open the data group edit window.
    • To rename the group, click the Edit icon.
    • To add a new item, include the value (e.g. the input string - an account number), and a description, then click Add.
    • To delete an item, click the Delete icon.

Using Data Groups

The process described below demonstrates using data groups in filtering Resources. This process is the same working with Insights or Bots. For entering and editing large lists, refer also to the API documentation for Data Groups.

Create a New Data Group

  1. Go to Inventory > Resources and select Query Filters.
  2. Use the search bar to find the name of the filter of interest, e.g., Resource Trusting Unknown Account.
  3. Click Apply and enter any tags, names, or other strings to configure the Query Filter.
  4. Select Create to create a new Data Group containing these inputs.
  5. Use the Create modal that opens to name the new Data Group. Data Groups must have unique names.

Import a CSV

Alternatively, Data Group can be created by importing a CSV file:

  1. Go to Cloud > Data Group and click New Group.
  2. Select Choose a file to import your CSV.
  3. The CSV should be formatted using two columns for value-description pairs (not key-value pairs).
    • Each value has a max character limit of 255.
    • Each value must be unique within a data group; duplicate values are ignored.
    • Empty values are automatically excluded.
    • The description is not required and does not have a character limit.
  4. After selecting the CSV and naming the new data group, choose Create to complete importing. Data Groups are required to have unique names.

Using an Existing Data Group

In the Resource Query Filters pane, you can select an existing Data Group.

Using Data Groups Programmatically

Many use cases for Data Groups — such as maintaining lists of hundreds or thousands of allowed accounts — are best done programmatically. To support these use cases, we’ve written the module provided below, which uses our REST API to create, populate, and update Data Group contents. These functions can be imported directly and used as a module in your management scripts, or can be copied-and-pasted from as snippets.

""" This collection of functions can help you programmatically manipulate Data Collections in InsightCloudSec using our REST API. These functions are intended to be copied and pasted into your code, or can be imported and used directly in your tools. """ from collections import Mapping import json import os import requests from requests.api import request # Assumes the presence of these environment variables: # INSIGHTCLOUDSEC_API_USER_USERNAME -- user we want to authenticate as # INSIGHTCLOUDSEC_API_USER_PASSWORD -- user's password # INSIGHTCLOUDSEC_BASE_URL -- base url for the InsightCloudSec instance to run against # INSIGHTCLOUDSEC_API_KEY -- API Key to interact with the InsightCloudSec REST API USERNAME, PASSWORD, BASE_URL, API_KEY = (os.environ['INSIGHTCLOUDSEC_API_USER_USERNAME'], os.environ['INSIGHTCLOUDSEC_API_USER_PASSWORD'], os.environ['INSIGHTCLOUDSEC_BASE_URL'], os.environ['INSIGHTCLOUDSEC_API_KEY']) BASE_HEADERS = {'Content-Type': 'application/json;charset=UTF-8', 'Accept': 'application/json', 'Api-Key': API_KEY} def create_data_group(group_name, group_data=None) """ Create a new data group with name `group_name`, optionally populated with the values in the dictionary `group_data`. `group_data` should be one of 2 things: - a dictionary mapping Data Group values to descriptions. Using `None` as a description will result in the description being set to the empty string. - an iterable of strings, which will be inserted as values with no descriptions. For example, the following is a valid input: ``` { 'value one': 'description for value one', 'value two': None, # description will be set to the empty string 'value three': 'description for value three' } ``` Or you can simply pass a list like `['first value', 'second value']`, which is equivalent to passing `{'first value': None, 'second value': None}` """ data = {'group_name': group_name, 'group_data': normalize_group(group_data)} return requests.post( url=requests.compat.urljoin(BASE_URL, '/v2/datacollections/'), headers=BASE_HEADERS, data=json.dumps(data) ) def update_data_group(group_id, group_data): """ Update the existing data group with integer ID `group_id` using the data in `group_data`. `group_data` should be a dictionary mapping values to descriptions. Any new key: description value pairs will be inserted into the data group; any whose key already exists in the Data Group will be used to update the existing description value. Descriptions may be `None`. A description equaling `None` in a new entry will result in an empty description. For existing entries, a description equaling `None` will result in no changes to an existing description; setting a description to the empty string must be done explicitly. For example: ``` { # set the description for an existing value 'value one', or create a # new value with that description 'value one': 'description for value one', # leave the description for an existing value 'value two' unchanged, or # create a value 'value two' with no description 'value two': None, # empty the description for an existing value 'value three', or create # a new value with an empty description 'value three': '' } ``` Note that this operation does not remove any entries from the data group. """ url = requests.compat.urljoin( BASE_URL, requests.compat.urljoin('/v2/datacollections/', str(group_id)) ) result = requests.post( url=url, headers=BASE_HEADERS, data=json.dumps({'collection_data': normalize_group(group_data)}) ) return result def delete_data_group_values(group_id, values_to_delete): """ Delete all entries with values in the iterable `values_to_delete`. Note that this is a 2-phase operation: this first checks that the values exist and gets their IDs within the group, then sends the request to delete them. This means that calling this method concurrently with other data group manipulation could have unexpected results. """ url = requests.compat.urljoin( BASE_URL, requests.compat.urljoin('/v2/datacollections/', str(group_id)) ) # phase 1: grab existing entries group_result = requests.get( url=url, headers=BASE_HEADERS ) existing_values_to_ids = { datum['value']: int(datum['id']) for datum in group_result.json()['collection']['data'] } # pre-deletion check: we should only try to delete entries that exist if not set(values_to_delete) < set(existing_values_to_ids): raise ValueError( 'Some values to be deleted not in existing data ' 'collection: {}'.format(set(values_to_delete) - set(existing_values_to_ids)) ) # phase 2: delete specified entries return requests.delete( url=url, headers=BASE_HEADERS, data=json.dumps({ 'data_ids': [existing_values_to_ids[value] for value in values_to_delete] }) ) def normalize_group(group_data): if isinstance(group_data, Mapping): return group_data return {datum: None for datum in group_data}