Setting Up DataHub
DataHub acts like a centralized collector that helps you prevent sensitive data from leaving your environment. If your organization wants to protect private data, such as personally identifiable information (PII), you can set up filtering rules. For example, if your organization has credit card information, user names, email addresses, or even specific entries, such as INFO level messages, that you want to hide, DataHub can help you.
DataHub applies routing rules to aggregate data and removes sensitive information so it is not sent to InsightOps. The information will not be exposed, and you will still be able to run analytical functions on it.
To set up DataHub, you will need to:
- Download and install DataHub.
- Create routing rules.
- Configure your clients to send data to your InsightOps account via the DataHub.
Install on Linux
DataHub can run on a server instance inside your cloud or in an on-premises environment. Currently, DataHub is only supported on Linux systems.
To set up DataHub on Linux:
- From the left menu in InsightOps, go to Data Collection.
- Click Manage DataHub to open the Manage DataHub Routes page.
- Click the Set up DataHub on Linux button. A side panel appears and displays the instructions for installing DataHub on Linux. Use the provided link in the instructions to download the installer.
- On your Ubuntu or Debian system, run
sudo apt-get update; sudo apt-get upgrade
to update and upgrade your system. - Run
sudo apt-get install openjdk-8-jre
to install openjdk-8-jre if your system does not have it. - Download the DataHub component. Then run
sudo dpkg -i DataHub_2.0.4.deb
to install it. - Run
sudo vi /etc/datahub/datahub.config
to open thedatahub.config
file in the Vi editor. - Add the following parameter to your datahub.config file:
api_key": "Read/Write API Key"
. To get your read/write key, go to Settings > API Keys. - Run
:wq
to save your changes.
Extra step for non EU accounts
The modifications need to be done in the datahubLocal.config
file.
Replace REGION with your respective region. The following regions are available:
- US: United States:
- EU: Europe
- CA: Canada
- AU: Australia
- AP: Japan
json
1{2"settings": {3"api_url": "https://REGION.rest.logs.insight.rapid7.com",4"api_url_log": "https://REGION.rest.logs.insight.rapid7.com/management/logs",5"host": "REGION.data.logs.insight.rapid7.com",6"api_url_logset": "https://REGION.rest.logs.insight.rapid7.com/management/logsets",7"api_url_rule": "https://REGION.rest.logs.insight.rapid7.com/datahub/rules"8}9}
Create Routing Rules
Routing rules allow you to find log entries that match certain patterns or values and route them to specific logs and log sets. You configure a routing rule by defining the pattern you want to match and the logs you want to send that data to.
After you set up DataHub, you will be able to access the Manage Routing Rules area to create and view rules. When you first set up DataHub, there will be a rule, called Default. This rule will route all collected log data by DataHub to a log called Default.
To create a routing rule:
- From the left menu in InsightOps, go to Manage Data Routes.
- Click Add route.

- When the Add Route panel appears, enter a name for the rule.
- Use the following parameters to create your rule:
- Route Name: Any generic string notating the name of the route.
- Route Description: Any generic string notating the description of the route.
- Syslog Hostname: You can break out routes to specific logs based on hostname, if you need. Enter the hostname, such as a mail server.
- Syslog Tag: You can break out routes to specific logs based on process, if you need. Enter the process name, such as
sshd
. - Matching Pattern: The pattern DataHub uses to match log entries to place into the log file defined. To forward all logs from a specific host, use “.*”.
- Log Name and Destination Log Set: Specify the location you want to load all log entries.
- Create the route. It will appear in the route table after it is created.
Configure Clients
InsightOps agents, as well as syslog clients, can communicate through the DataHub. For the agent, you need to supply the IP or hostname and port used by Datahub.
For syslog clients, you need to specify the IP or hostname in the syslog.conf
file.
To configure syslog clients:
- Run
#/sbin/ifconfig
to find the IP address of the DataHub server. - Add the following line to the /etc/rsyslog.conf of each client:
*.*@@<IPADDRESS_TO_DATAHUB>:10000
. - Run
sudo service rsyslog restart
to restart rsyslog daemon. - Run
logger -t test from InsightOps
to test the integration.
View Health and Debugging Logs
By default, DataHub will log extra data to your InsightOps account.
The following data will be added:
- datahub.log: The DataHub log you can use to debug any running issues. It can always help keep up to date with your current version of DataHub. You can access the log at
/var/log/datahub/datahub.log
. - Heartbeat.log: The Heartbeat log is a health check sent to DataHub every 5 minutes. It will post a simple message to your log to let you know that DataHub is currently logging. With this feature, you can use Inactivity Alerting to let you know when DataHub is no longer sending logs to InsightOps.