Troubleshoot a Workflow
Workflow failures in InsightConnect can originate in many places, ranging from an error in the workflow to a temporary outage in a service that the workflow relies on. Putting the error in context is a necessary step to determine the source of the failure.
Troubleshooting Overview
InsightConnect workflows can be triggered in 3 ways:
- Through the Test button in the builder
- Manually on the Active Workflows page
- In response to the workflow's trigger
In the latter two cases (by a workflow's trigger or manually from the Active Workflows page), a job will be created on the Jobs page. However, when testing a workflow from the builder, a job record is not created.
The data available to troubleshoot is the same, regardless of how the workflow is triggered, but workflow tests that run from the builder are not persisted, so the page must be kept up during troubleshooting so you can reference the data.
Testing a Workflow from the Builder
A number of checks are automatically performed on a workflow before it can be activated. These tests ensure that a workflow is internally consistent and that all steps are fully configured. If a step fails a check, it turns orange in the builder interface.
You can test a workflow in the builder while these internal checks are failing. If a misconfigured step is executed, it will result in a test failure.
If you're using the test functionality in the builder, ensure that all steps are green before following the steps below.
- The workflow was working in the past, but now every job is failing
- The workflow isn't being triggered (no jobs being created)
- The workflow is generating a failed job, either continually or intermittently
The workflow was working in the past, but now every job is failing
If the workflow hasn't changed, but it has started failing persistently, the problem is likely outside the workflow. Take the following steps:
- Check that the orchestrator is healthy under Settings > Orchestrators. If the orchestrator is unhealthy, see the Troubleshoot an Orchestrator documentation for further instructions.
- Check the credentials the workflow is using. Run a connection test under Settings > Plugins & Tools for all connections that the workflow utilizes.
- Follow the instructions for troubleshooting a failing workflow.
The workflow isn't being triggered (no jobs being created)
If a workflow isn't triggering at all, something must be wrong with the trigger. Depending on the type of trigger, the troubleshooting steps vary.
InsightIDR and InsightVM Platform Triggers
InsightIDR and InsightVM triggers are configured in their respective products. The first thing to check is that the trigger is properly configured in those interfaces. This does not include the InsightVM plugin trigger. For the InsightVM plugin trigger, follow the instructions in the Plugin Triggers section to troubleshoot.
InsightIDR Workflow Visibility
A custom workflow with an InsightIDR trigger only appears in InsighIDR if it is active in InsightConnect. If you've created an InsightIDR workflow that isn't showing up, ensure it is active.
If a workflow is triggering but the expected data is missing, it means that InsightIDR or InsightVM isn't sending the data to InsightConnect. It is likely that the alert triggering the workflow is misconfigured.
Plugin Triggers (including InsightVM plugin-based triggers)
Plugin-based triggers run on the orchestrator. They connect to a remote system or service and monitor for specific conditions. When the conditions are met, they trigger the workflow.
Steps to troubleshoot a plugin-based trigger:
Access to Orchestrator Required
You'll need to access the orchestrator to access the logs of a plugin-based trigger. Access to the orchestrator (either using SSH or the virtual machine console) is required once the configuration of the trigger is verified. Familiarity with Linux and Docker is strongly recommended here.
- Check the health of the orchestrator under Settings - if the orchestrator isn't showing as "healthy" then there is an issue with the orchestrator itself. Use the Troubleshoot an Orchestrator documentation to get the orchestrator into the healthy state.
- Run a connection test on the plugin using the three-dot menu on the Plugins & Tools page under Settings. Resolve any errors indicated.
- Verify that the trigger configuration in the workflow is correct. The error is often due to the configuration causing the trigger to monitor the wrong data (such as the wrong mailbox) or discard the event you expect it to trigger on.
- Connect to the orchestrator (using SSH or the virtual machine console).
- The trigger runs as a Docker container on the orchestrator. Run
sudo docker ps -a|grep trigger
to see the active trigger containers. - Identify the trigger based on the data displayed (the wider the window, the easier this will be). If you are unable to identify the trigger, try deactivating the workflow, wait about 2 minutes, and then reactivate it. This will restart the trigger. You'll see this reflected in the container's uptime when listing the active triggers.
- Use the container ID (the first column in the output) to examine the logs for the trigger with
sudo docker logs <container id>
. The container logs are the diagnostic output from the trigger. You can add the--follow
flag to watch them in real-time (sudo docker logs --follow <container id>
)
Logs are complicated - Support can help!
Looking at raw trigger logs can be a challenging task. If you're not comfortable looking at the logs, or just need a second set of eyes on them, don't hesitate to reach out to Rapid7 Support.
If the trigger container is not present, refer to the Troubleshoot an Orchestrator documentation and contact Rapid7 Support.
The workflow is generating a failed job, either continually or intermittently
Workflows are built with certain assumptions: the trigger contains certain data, it can look up data in other systems and services, etc. When a job is intermittently generating a failed job, it means these assumptions are not being met. Usually, but not always, this means the workflow must be modified to account for the condition that's causing it to fail.
Not All Failed Steps Are Fatal
Some workflows have steps that are designed to fail occasionally. These steps are set to "continue on failure" and do not terminate the workflow. These workflows have been specifically designed to accommodate that failed step. When looking for the step that caused the job failure, ensure that you're looking at the last failing step in the job.
- Under Jobs, select the job that failed.
- Select the All Outputs tab.
- Find the failed step in the job.
- Examine the Input tab of the step output. Verify that the input is both what you expect and what the plugin expects. A common problem is a missing or malformed input.
- If the input is incorrect, examine the workflow and determine where the input is coming from. Incorrect input usually means that a previous step is returning unexpected data.
- Examine the Logs tab of the step output. This log contains diagnostic information related to the failure, such as error messages returned from the underlying service.
- If the log indicates permission issues, run a connection test on the Settings > Plugins & Tools page and verify that the credentials are correct. If the connection test succeeds but the step is still failing due to permissions, it is likely the plugin doesn't have the correct permissions to perform the action.
- If the log indicated timeout issues, there could be a service disruption of the underlying service or the orchestrator is experiencing connection difficulties with the underlying service.
If you do not see any errors, or there are errors you cannot fix, please take a screenshot of your Inputs and Logs and reach out to Rapid7 Support.