Setting Up DataHub

DataHub acts like a centralized collector that helps you prevent sensitive data from leaving your environment. If your organization wants to protect private data, such as personally identifiable information (PII), you can set up filtering rules. For example, if your organization has credit card information, user names, email addresses, or even specific entries, such as INFO level messages, that you want to obfuscate, DataHub can help you.

DataHub applies routing rules to aggregate data, and then scrubs and removes sensitive information so it is not sent to InsightOps. The information will not be exposed, and you will still be able to run analytical functions on it.

To set up DataHub, you will need to:

  1. Download and install DataHub.
  2. Create routing rules.
  3. Configure your clients to send data to your InsightOps account via the DataHub.

Install on Linux

DataHub can run on a server instance inside your cloud or in an on-premises environment. Currently, DataHub is only supported on Linux systems.

To set up DataHub on Linux:

  1. From the left menu in InsightOps, go to Data Collection.
  2. Click Manage DataHub to open the “Manage DataHub Routes” page.
  3. Click the Set up DataHub on Linux button. A side panel appears and displays the instructions for installing DataHub on Linux. Use the provided link in the instructions to download the installer.
  4. On your Ubuntu or Debian system, run sudo apt-get update; sudo apt-get upgrade to update and upgrade your system.
  5. Run sudo apt-get install openjdk-8-jre toInstall openjdk-8-jre if your system does not have it.
  6. Run sudo dpkg -i DataHub_2.0.deb to install the DataHub component.
  7. Run sudo vi /etc/datahub/datahub.config to open the datahub.config file in the Vi editor.
  8. Add the following parameter to your datahub.config file: api_key": "Read/Write API Key". To get your read/write key, go to Settings > API Keys.
  9. Run :wq to save your changes.

Extra step for non EU accounts

The modifications need to be done in the datahubLocal.config file.

Replace REGION with your respective region. The following regions are available:

US: United States: EU: Europe CA: Canada AU: Australia AP: Japan

json
1
{
2
"settings": {
3
"api_url": "https://REGION.rest.logs.insight.rapid7.com",
4
"api_url_log": "https://REGION.rest.logs.insight.rapid7.com/management/logs",
5
"host": "REGION.data.logs.insight.rapid7.com",
6
"api_url_logset": "https://REGION.rest.logs.insight.rapid7.com/management/logsets",
7
"api_url_rule": "https://REGION.rest.logs.insight.rapid7.com/datahub/rules"
8
}
9
}

Create Routing Rules

Routing rules allow you to find log entries that match certain patterns or values and route them to specific logs and log sets. You configure a routing rule by defining the pattern you want to match and the logs you want to send that data to.

After you set up DataHub, you will be able to access the Manage Routing Rules area to create and view rules. When you first set up DataHub, there will be a rule, called “Default.” This rule will route all collected log data by DataHub to a log called “Default.”

To create a routing rule:

  1. From the left menu in InsightOps, go to Manage Data Routes.
  2. Click the Add route button.
  1. When the Add Route panel appears, enter a name for the rule.
  2. Use the following parameters to create your rule:
    • Route Name: Any generic string notating the name of the route.
    • Route Description: Any generic string notating the description of the route.
    • Syslog Hostname: You can break out routes to specific logs based on hostname, if you need. Enter the hostname, such as a mail server.
    • Syslog Tag: You can break out routes to specific logs based on process, if you need. Enter the process name, such as sshd.
    • Matching Pattern: The pattern DataHub uses to match log entries to place into the log file defined. To forward all logs from a specific host, use “.*”.
    • Log Name and Destination Log Set: Specify the location you want to load all log entries.
  3. Create the route. It will appear in the route table after it is created.

Configure Clients

InsightOps agents, as well as syslog clients, can communicate through the DataHub. For the agent, you need to supply the IP or hostname and port used by Datahub.

For syslog clients, you need to specify the IP or hostname in the syslog.conf file.

To configure syslog clients:

  1. Run #/sbin/ifconfig to find Find the IP address of the DataHub server.
  2. Add the following line to the /etc/rsyslog.conf of each client: *.*@@<IPADDRESS_TO_DATAHUB>:10000.
  3. Run sudo service rsyslog restart to restart rsyslog daemon.
  4. Run logger -t test from InsightOps to test the integration.

View Health and Debugging Logs

By default, DataHub will log extra data to your InsightOps account.

The following data will be added:

  • datahub.log: The DataHub log you can use to debug any running issues. It can always help keep up to date with your current version of DataHub. You can access the log at /var/log/datahub/datahub.log.
  • Heartbeat.log: The Heartbeat log is a health check sent to DataHub every 5 minutes. It will post a simple message to your log to let you know that DataHub is currently logging. With this feature, you can use Inactivity Alerting to let you know when DataHub is no longer sending logs to InsightOps.