Configuring data warehousing settings

Database support

Currently, only PostgreSQL 9.4 or higher databases are supported as a warehousing target.

You can configure the Security Console to export data into an external data warehouse. You can use this feature to obtain a richer set data for integration with your own internal reporting systems, such as Business Intelligence tools. The export performs an extract, transform, and load (ETL) process into the target warehouse using a dimensional model.

You can view the schema for this model here.

Scheduling

The frequency of the ETL process to the external warehouse should be configured with your reporting needs in mind. The frequency of export matches the granularity of data points available for trending using historical fact tables. Due to the amount of data that can be exported, the warehousing process may take some time to complete. The recommended schedule setting is every 1 week. Care should be taken to schedule this export during non-critical scanning windows to minimize impact.

Configuration

Data warehousing can be configured by a Global Administrator. Before configuring the Security Console settings, ensure that the destination warehouse database server has been configured (For more information, see Deploying and Configuring the Warehouse). To configure data warehouse export settings:

  1. Click Manage data warehousing under Data Warehousing on the Administration page.
  2. Enter database server settings on the Database page.
    • Enable export - Indicates whether exporting is currently enabled.
    • Data model - Indicates the type of warehouse schema to be used. Dimensional is recommended and the Legacy model is deprecated.
    • Server address - The IP or host name of the target warehouse.
    • Server port - The port the target warehouse is accepting external connections on.
    • Database name - The name of the database to export to the model into.
    • User/Password - The credentials of the user to perform the export as. This user must have write access to the database.
    • Encrypt data in transit - If enabled, will use a SSL connection to the target database during the ETL process. This ensures that all data transmitted to the warehouse is encrypted in transit (Note: The warehouse is not encrypted at rest by default). The recommended setting is enabled.
    • Validate server identity - If enabled, verifies the server identity when Encrypt data in transit is also enabled. If the server certificate is unsigned and this option is enabled, the export process will not function properly. Disabling this setting allows trusting of self-signed certificates, but no longer prevents man-in-the-middle (MitM) attacks. The recommended setting is enabled.
  3. Test the connection using the Test Connection button. This will attempt to establish a connection with the target warehouse database. Any errors will be presented, and you may reconfigure the settings or the target warehouse database appropriately until the connection is successful.
  4. Go to the Schedule page to configure the export frequency.
    • Select a date and time to start the export process
    • Select an interval during which to repeat this process. The recommended setting is every 1 week. If you do want to run it more frequently, we recommend to run it no more often than every 24 hours.
  5. Click Save.

Upgrading from the Legacy Model

The following are recommended if you have an existing data warehouse configuration in place:

  1. Change the Data model from Legacy to Dimensional: This will change the structure of the output schema to an easier to use and more comprehensively supported data model. You will be required to update any consumers of the warehouse to use the new model. When the new model is used, the existing schema elements will remain untouched, but no longer update during future ETL processes.
  2. Enable encryption and identity validation: These settings were previously not supported, meaning any data in transit was not encrypted during the export process. Encryption of data in transit is recommend going forward, so you are encouraged to enable the Encrypt data in transit and Validate server identity settings. These changes will require a reconfiguration of SSL in the destination warehouse database. See "Deploying and Configuring the Warehouse" section for more information.

Schema Changes

The dimensional warehouse schema is guaranteed to be backwards compatible when changes are made. The ETL process performed by the Security Console may periodically add additional data elements to the schema, but this will not cause any reports or queries against this schema to break in the future. The following will not be subject to change in the schema:

  • Available tables, columns, and functions, including their names
  • Column and function data types

The following changes made be made in future iterations of the ETL process:

  • Additional columns are added to an existing table
  • New tables or functions are added

When changes are made to the model, applying a product upgrade and performing a new ETL process will upgrade the model in the target warehouse.

Reporting on the Warehouse

After the export process, the data warehouse is immediately available for reporting using any of: 1) direct connections; 2) a business intelligence tool; and/or 3) any additional custom tools/scripts or off-the-shelf software. During the export (ETL) process numerous DDL and DML queries are executed that manipulate the state of the warehouse. Consequently, the warehouse should not be accessed during this time period.

Deploying and Configuring the Warehouse

The data warehouse is a host running a PostgreSQL 9.4 or later database server. Management and configuration of the data warehouse server must be performed manually. The warehouse must be configured to support an external connection on the PostgreSQL database port, and allow ingress network traffic from the Security Console. Configuration of the warehouse for optimum performance varies based on the number of simultaneous connections needed, as well as the disk speed and available ram.

Hardware Requirements:

  • 2 GHz+ processor (Quad-core processor recommended)
  • 32 GB RAM (minimum), 72 GB+ RAM (recommended)
  • 1 TB HDD (minimum), 2 TB+ HDD (recommended)
  • 100 Mbps network interface (minimum), 1 Gbps (recommended)

Install and configure a new data warehouse

  1. Install PostgreSQL 9.4 or later, ensuring all available patches are applied
  2. Configure the postgresql.conf with the following recommended minimum settings (you may reconfigure to your hardware and connection requirements accordingly):

All PostgreSQL versions:

Setting

32 GB RAM

72 GB RAM

max_connections

10

20

shared_buffers

8 GB

18 GB

work_mem

419 MB

471 MB

maintenance_work_mem

2 GB

2 GB

checkpoint_segments

128

256

effective_cache_size

24 GB

54 GB

checkpoint_completion_target

0.9

0.9

wal_buffers

16 MB

32 MB

auto_vacuum

off

off

PostgreSQL 9.6+:

Setting

32 GB or 72 GB RAM

min_parallel_relation_size

8 MB

force_parallel_mode

on

max_worker_processes

number of CPU cores * 2

max_parallel_workers_per_gather

number of CPU cores / 2

  1. To enable SSL (and encryption of data in transit), acquire a certificate and enable the following in the postgresql.conf file:
    • ssl = on
    • ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL'
    • ssl_cert_file = 'server.crt'
    • ssl_key_file = 'server.key'
    • password_encryption = on
  2. Launch the postgreSQL process.

Troubleshooting steps for Single-User Mode

If the console goes in to maintenance mode with the following PID (Perimeter Intrusion Detection) error, the solution is to log in by using the "SINGLE USER" option.

WHERE seng_id = ?]; ERROR: database is not accepting commands to avoid wraparound data loss in database "nexpose"Hint: Stop the postmaster and vacuum that database in single-user mode. You might also need to >commit or roll back old prepared transactions, or drop stale replication slots.; nested exception is org.postgresql.util.PSQLException: ERROR: database is not accepting commands to avoid wraparound data loss in database "nexpose"Hint: Stop the postmaster and vacuum that database in single-user mode.You might also need to commit or roll back old prepared transactions, or drop stale replication slots.at org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:105) ~[spring-jdbc-4.2.4.RELEASE.jar:4.2.4.RELEASE]

During this procedure you might face errors related to the PID process. If the database is already running it will not allow you to log in. Stop the database and log in as a single user to successfully perform the vacuum process. Complete the following steps to overcome the issue.

  1. To get rid of the PID error, enter the following command into the console:
    sudo -u nxpgsql /opt/rapid7/nexpose/nsc/nxpgsql/bin/pg_ctl -D /opt/rapid7/nexpose/nsc/nxpgsql/nxpdata/ stop

  2. To Double-check that the status of the process has stopped, enter the command:
    ps -eaf | grep nxpgsql

  3. Login with Single user mode. Use the following path:
    cd /opt/rapid7/nexpose/nsc/nxpgsql/pgsql/bin

  4. To login, enter the command:
    sudo -u nxpgsql ./postgres --single -D /opt/rapid7/nexpose/nsc/nxpgsql/nxpdata/ nexpose

  5. To run the vacuum process in the database, enter the command:
    VACUUM (FULL,ANALYZE,VERBOSE);

  6. After running the vacuum process, restart the service. The database will go live again and the console will be working properly.