Log Data Collection and Storage

After you configure your foundational event sources, the Collector automatically begins to normalize and analyze data as it is ingested. Your data populates various pages, dashboards, widgets, and key performance indicators (KPIs) after events are processed. InsightIDR does not ingest or analyze historical data from event sources.

When completing an environment audit and preparing to deploy InsightIDR in your environment, keep the following in mind:

Log Data Storage Retention

InsightIDR stores your logs for 13 months so they are available for log search, visualization, and investigations. By default, you have 3 months of “hot” storage and 10 months of “cold” storage. Hot storage data is immediately available in log search. Cold storage data is less frequently accessed, so search results may take longer to display. The wait time when searching cold storage data depends on how old the data is and the volume of logs you are requesting. See our Cold Storage Logs documentation for instructions on how to import logs from cold storage.

Rapid7 does not store your data past the 13-month retention period. If you do not you upgrade your data retention plan, your log data will no longer be accessible. If you need a longer retention time, contact your Customer Success Manager to tailor a plan for your business and compliance needs. You can also enable S3 Archiving in InsightIDR to store log data in your own private AWS S3 bucket. InsightIDR does not have any other limits on the amount of data that you can store. Read more about Data Storage here: https://www.rapid7.com/globalassets/_pdfs/product-and-service-briefs/rapid7-insightidr-log-storage-and-retention-brief.pdf/

Import Logs from Cold Storage before you change your data retention plan

If you are increasing your Hot Storage data retention and want to move your Cold Storage data to your new Hot Retention plan, you must import your data from Cold Storage before you change your retention plan. If you increase your plan first, you may need to contact Rapid7 Support for assistance with importing your data from cold storage. For instructions on how to import your logs from Cold Storage, see the Cold Storage Logs documentation.

Raw Data Processing

Before InsightIDR parses and normalizes data for user attribution, it populates the Events Processed KPI on your homepage. You can click on these KPIs to view more granular data and to query the collected log data.

Raw data is then parsed and normalized into user attribution data.

Credential Storage

Credentials are not stored in AWS. The Collector eliminates raw logs that are unnecessary to your environment and removes sensitive data from them. InsightIDR does not retain information such as personally identifiable information, medical records, or employee, organization, or asset names.

Normalize Logs

InsightIDR transforms, or normalizes, raw data into JSON in order to provide additional context around user behavior, compromised credentials, and other potentially malicious activity.

Normalization converts log data from multiple sources into a common JSON format and extracts standard information, such as hostnames, timestamps, error levels, and more. It also allows you to run advanced queries on your endpoint logs and enhance your data visualization. See Log Search for more information.

After normalization, InsightIDR correlates data between a single asset and a user in a process called “User Attribution.”

Unfiltered Logs

When sending data from your environment to InsightIDR, you have the option of sending unfiltered logs, which includes all available information and does not omit any unneeded information. By default, InsightIDR applies a filter to firewall logs, keeping only events related to user attribution and discarding the rest.

See Filtered Event Sources for more information.

To remove this filter, check the Unfiltered Logs box when configuring an event source. Note that sending unfiltered logs will increase the amount of data you send to InsightIDR.

Prepare for Log Collection

Because most event source applications send data as syslog, you must configure each appliance to send data to the Collector on a unique TCP or UDP port.

See Collector Requirements for specific Collector port information. Otherwise, see Ports Used by InsightIDR.

Timezones

When configuring an event source, it is important that you select the time that matches the time zone of the application sending the data, which often matches the physical location of the application, or is set to UTC.

If time zones do not match, InsightIDR may apply the wrong timestamp which will appear as mislabeled logs in log search or alerts.

To check the timestamp of your logs:

  1. Select the Data Collection page from the left menu and select the Event Sources tab.
  2. Find your event source and click the View raw log link.
  1. If you need to correct the time zone or discover your logs do not have a time zone, click the Edit link on the running event source.
  2. Choose the correct timezone from the "Timezone" dropdown.
  3. Click the Save button.

Logs without a time stamp

If you discover that your logs do not have a timestamp, you should reconfigure your application to send logs in a format that include a timestamp, such as syslog.

S3 Archiving

S3 Archiving sends your InsightIDR log to an AWS S3 Archive. This ensures that your log entries are preserved as a backup.

See S3 Archiving for more information.

Collected Data by Event Source Category

InsightIDR uses multiple event sources to collect the data it needs to protect your environment and help you quickly detect and respond to malicious activity on your network. The following table displays what categorical information is collected by specific event sources:

Collected Data

Event Source(s)

User Details

Microsoft Active Directory, LDAP server logs, Rapid7 Metasploit, Virus scanner, VPN, and Endpoint Monitor

Asset Details

Microsoft Active Directory security logs and the DHCP server logs, Nexpose, and Endpoint Monitor

IP Address History

Microsoft Active Directory security logs, DHCP server logs

Location

VPN server logs, Cloud services for example, Cloud services (e.g. AWS, Box.com), and Microsoft ActiveSync

Services

DNS server logs, firewall, Web proxy, Cloud service - Box.com, Okta, Salesforce, and the Microsoft ActiveSync servers

Incidents

Microsoft Active Directory security logs, DHCP server logs, endpoint monitor, VPN servers (IP address ranges), DNS server logs, Firewall, and the Web proxy

Threats

DNS server logs, Firewall, and the Web proxy

Active Directory

The Collector pulls the following fields from Active Directory event sources:

  • Timestamp
  • Action
  • Source User
  • Source Account
  • Target User
  • Target Account
  • Group
  • Group Scope
  • Group Domain

Advanced Malware Detection

The Collector pulls the following fields from Advanced Malware Detection event sources:

  • Timestamp
  • Asset
  • Secondary Asset
  • Destination User
  • Source User
  • Source Address
  • Destination Address
  • Alert Name
  • Source Port
  • Destination Port
  • Device Address
  • Protocol
  • Signature Name
  • Severity
  • GEOIP Organization
  • GEOIP Country Code
  • GEOIP Country Name
  • GEOIP City
  • GEOIP Region

Asset Authentication

The Collector pulls the following fields from Asset Authentication event sources:

  • Timestamp
  • Source Asset
  • Destination Asset
  • Source Asset Address
  • Destination Asset Address
  • Destination User
  • Destination Account
  • Destination Domain
  • Destination Account SID
  • Login Type
  • Result
  • New Authentication
  • New Source Authentication
  • New Source for Account
  • Service

Cloud Service Administrator Activity

The Collector pulls the following fields from Cloud Service Administrator Activity event sources:

  • Timestamp
  • Service
  • Action
  • Source User
  • Source Account
  • Target User
  • Target Account

DNS

The Collector pulls the following fields from DNS event sources:

  • Timestamp
  • Asset
  • User
  • Source Address
  • Query
  • Public Suffix
  • Top Private Domain

File Access Activity

The Collector pulls the following fields from File Access Activity event sources:

  • Timestamp
  • User
  • Account
  • Account Domain
  • Source Address
  • Service
  • Target Address
  • File Path
  • File Name
  • File Extension
  • File Share
  • Access Type

Firewall Activity

The Collector pulls the following fields from Firewall Activity event sources:

  • Timestamp
  • Asset
  • User
  • Source Address
  • Source Port
  • Destination Address
  • Destination Port
  • Connection Status
  • Direction
  • GEOIP Organization
  • GEOIP Country Code
  • GEOIP Country Name
  • GEOIP City
  • GEOIP Region

Host to IP Observations

The Collector pulls the following fields from Host to IP Observation event sources:

  • Timestamp
  • Action
  • HostID
  • IP
  • Observation Status

IDS Alerts

The Collector pulls the following fields from IDS Alert event sources:

  • Timestamp
  • Asset
  • User
  • Signature
  • Source IP
  • Destination
  • Description
  • Severity
  • Protocol
  • Generator ID
  • Source Port
  • Destination Port

Ingress Authentication (OWA/ActiveSync)

The Collector pulls the following fields from Ingress Authentication (OWA/ActiveSync) event sources:

  • Timestamp
  • User
  • Account
  • Result
  • Source IP
  • Service
  • GEOIP Organization
  • GEOIP Country Code
  • GEOIP Country Name
  • GEOIP City
  • GEOIP Region

Raw Logs (Generic Syslog and Windows Event Log)

The Collector pulls the following fields from Raw Log (Generic Syslog and Windows Event Log) event sources:

  • Timestamp
  • Host Name
  • Event Code
  • Description
  • Package Name
  • Target User Name
  • Workstation
  • Status

SSO Authentication

The Collector pulls the following fields from SSO Authentication event sources:

  • Timestamp
  • User
  • Account
  • Source IP
  • Service
  • SSO Provider
  • GEOIP Organization
  • GEOIP Country Code
  • GEOIP Country Name
  • GEOIP City
  • GEOIP Region

Virus Alert

The Collector pulls the following fields from Virus Alert event sources:

  • Timestamp
  • Asset
  • User
  • Account Domain
  • Risk
  • Action