Data Collections
Data Collections simplify resource filtering, Insight analysis, and Bot configuration. This feature allows administrators to build out reusable data definitions--collections of strings--that can be used and reused when creating and updating Insights and Bots. The Data Collections can be associated with any number of the hundreds of filters in the product.
When you edit a Data Collection, all Insights and Bots that use that collection will automatically use the updated collection next time they run; you needn't repeat your edits across multiple Insights and Bots.
Example use case
Example Use Case
If you want to specify a list of trusted accounts, and disallow certain kinds of activity from all other accounts. You might set up Bots configured with these Insights:
Resource With Cross Account Access to Unknown Account
Network Peers Connected to Unknown Accounts
Service Role Trusting Unknown Account
Cloud Role Trusting Unknown Account
You'll configure all of these Bots with the same set of account numbers. Manually entering and updating the list of allowed accounts in all of these Bots is a tedious and error-prone process; Data Collections can make correctly managing these reused inputs fast and easy, even for sets of tens of thousands of strings.
A Data Collection can contain up to 4MB of strings, allowing you to manage tens of thousands of entries, defining the behavior of many Insights and Bots, in a single list. The admin creating or editing a Data Collection is responsible for ensuring the integrity of the collection, e.g., a list of accounts must contain only valid account numbers; Data Collections will not validate the entered lists.
Accessing Data Collections
Go to Cloud > Data Collections to get started.
Manage data collections
- Go to Cloud > Data Collections.
- To add a new collection, click New Collection and complete the details.
- To delete a collection, select the collection and click the Delete icon.
- To edit data collections, click the data collection to open the data collection edit window.
- To rename the collection, click the Edit icon.
- To add a new item, include the value (e.g. the input string - an account number), and a description, then click Add.
- To delete an item, click the Delete icon.
Using Data Collections
The process described below demonstrates using data collections in filtering Resources. This process is the same working with Insights or Bots. For entering and editing large lists, refer also to the API documentation for Data Collections.
Create a New Data Collection
- Go to Inventory > Resources and select Query Filters.
- Use the search bar to find the name of the filter of interest, e.g.,
Resource Trusting Unknown Account
. - Click Apply and enter any tags, names, or other strings to configure the Query Filter.
- Select Create to create a new Data Collection containing these inputs.
- Use the Create modal that opens to name the new Data Collection. Data Collections must have unique names.
Import a CSV
Alternatively, Data Collections can be created by importing a CSV file:
- Go to Cloud > Data Collection and click New Collections.
- Select Choose a file to import your CSV.
- The CSV should be formatted using two columns for value-description pairs (not key-value pairs).
- Each value has a max character limit of 255.
- Each value must be unique within a data collection; duplicate values are ignored.
- Empty values are automatically excluded.
- The description is not required and does not have a character limit.
- After selecting the CSV and naming the new data collection, choose "Create" to complete importing. Data Collections are required to have unique names.
Using an Existing Data Collection
In the Resource Query Filters pane, you can select an existing Data Collection.
Using Data Collections Programmatically
Many use cases for Data Collections -- such as maintaining lists of hundreds or thousands of allowed accounts -- are best done programatically. To support these use cases, we've written the module provided below, which uses our REST API to create, populate, and update Data Collection contents. These functions can be imported directly and used as a module in your management scripts, or can be copied-and-pasted from as snippets.
python
1"""2This collection of functions can help you programatically manipulate Data3Collections in InsightCloudSec using our REST API.45These functions are intended to be copied and pasted into your code, or can be6imported and used directly in your tools.7"""89from collections import Mapping10import json11import os12import requests13from requests.api import request141516# Assumes the presence of these environment variables:17# INSIGHTCLOUDSEC_API_USER_USERNAME -- user we want to authenticate as18# INSIGHTCLOUDSEC_API_USER_PASSWORD -- user's password19# INSIGHTCLOUDSEC_BASE_URL -- base url for the InsightCloudSec instance to run against20# INSIGHTCLOUDSEC_API_KEY -- API Key to interact with the InsightCloudSec REST API21USERNAME, PASSWORD, BASE_URL, API_KEY = (os.environ['INSIGHTCLOUDSEC_API_USER_USERNAME'],22os.environ['INSIGHTCLOUDSEC_API_USER_PASSWORD'],23os.environ['INSIGHTCLOUDSEC_BASE_URL'],24os.environ['INSIGHTCLOUDSEC_API_KEY'])25BASE_HEADERS = {'Content-Type': 'application/json;charset=UTF-8',26'Accept': 'application/json',27'Api-Key': API_KEY}282930def create_data_collection(collection_name, collection_data=None):31"""32Create a new data collection with name `collection_name`, optionally33populated with the values in the dictionary `collection_data`.3435`collection_data` should be one of 2 things:36- a dictionary mapping Data Collection values to descriptions. Using `None`37as a description will result in the description being set to the empty38string.39- an iterable of strings, which will be inserted as values with no40descriptions.4142For example, the following is a valid input:4344```45{46'value one': 'description for value one',47'value two': None, # description will be set to the empty string48'value three': 'description for value three'49}50```51Or you can simply pass a list like `['first value', 'second value']`, which52is equivalent to passing `{'first value': None, 'second value': None}`53"""5455data = {'collection_name': collection_name,56'collection_data': normalize_collection(collection_data)}5758return requests.post(59url=requests.compat.urljoin(BASE_URL, '/v2/datacollections/'),60headers=BASE_HEADERS,61data=json.dumps(data)62)6364def update_data_collection(collection_id, collection_data):65"""66Update the existing data collection with integer ID `collection_id` using67the data in `collection_data`.6869`collection_data` should be a dictionary mapping values to descriptions.70Any new key: description value pairs will be inserted into the data71collection; any whose key already exists in the Data Collection will be used72to update the existing description value.7374Descriptions may be `None`. A description equaling `None` in a new entry75will result in an empty description. For existing entries, a description76equaling `None` will result in no changes to an existing description;77setting a description to the empty string must be done explicitly. For78example:7980```81{82# set the description for an existing value 'value one', or create a83# new value with that description84'value one': 'description for value one',85# leave the description for an existing value 'value two' unchanged, or86# create a value 'value two' with no description87'value two': None,88# empty the description for an existing value 'value three', or create89# a new value with an empty description90'value three': ''91}92```93Note that this operation does not remove any entries from the data94collection.95"""96url = requests.compat.urljoin(97BASE_URL,98requests.compat.urljoin('/v2/datacollections/', str(collection_id))99)100101result = requests.post(102url=url,103headers=BASE_HEADERS,104data=json.dumps({'collection_data': normalize_collection(collection_data)})105)106return result107108109def delete_data_collection_values(collection_id, values_to_delete):110"""111Delete all entries with values in the iterable `values_to_delete`.112113Note that this is a 2-phase operation: this first checks that the values114exist and gets their IDs within the collection, then sends the request to115delete them. This means that calling this method concurrently with other116data collection manipulation could have unexpected results.117"""118url = requests.compat.urljoin(119BASE_URL,120requests.compat.urljoin('/v2/datacollections/', str(collection_id))121)122123# phase 1: grab existing entries124collection_result = requests.get(125url=url, headers=BASE_HEADERS126)127existing_values_to_ids = {128datum['value']: int(datum['id'])129for datum in collection_result.json()['collection']['data']130}131132# pre-deletion check: we should only try to delete entries that exist133if not set(values_to_delete) < set(existing_values_to_ids):134raise ValueError(135'Some values to be deleted not in existing data '136'collection: {}'.format(set(values_to_delete) - set(existing_values_to_ids))137)138# phase 2: delete specified entries139return requests.delete(140url=url,141headers=BASE_HEADERS,142data=json.dumps({143'data_ids': [existing_values_to_ids[value] for value in values_to_delete]144})145)146147def normalize_collection(collection_data):148if isinstance(collection_data, Mapping):149return collection_data150return {datum: None for datum in collection_data}