Chaos testing for scenarios

You can add disruption events from chaos engineering platforms to your Controller scenarios. These events provide simulated attacks on your services and environments.

About chaos testing

Chaos testing enables you see how your system responds under stress. By incorporating simulated attacks from chaos engineering platforms into your load tests, you can determine the impact of potential failures on your applications.

The simulated attacks can be on different technologies and components, for example, databases, web servers, CPU, or memory. To trigger the attacks, you add disruption events to your scenario. When you run the scenario, the Controller microservice component, EventsHandlerMicroservice, sends a request to the chaos platform to execute the predefined attacks. This enables the attacks to be launched on the system in parallel to running the scenario.

An attack generally impacts the regular workflow, limiting response or reducing performance. For example, the web server may work slower than usual, and there may be fewer successful transactions. You can compare responses when running your scenario without attacks, and running when the system is stressed.

Controller works with two chaos engineering platforms, Gremlin and Steadybit.

Gremlin

Controller works with the Gremlin chaos engineering platform on the cloud. In Gremlin, you define scenarios containing multiple attacks, and add the Gremlin scenarios to your scenarios.

The Gremlin agent must run on the AUT machines. Controller then uses integrated Gremlin APIs to coordinate the chaos testing.

Steadybit

Controller works with Steadybit on public cloud, or on-premises from a private cloud platform. In Steadybit, you define experiments containing multiple attacks, and add the Steadybit experiments to your scenarios.

The Steadybit agent must run on the AUT machines. Controller then uses integrated Steadybit APIs to coordinate the chaos testing.

When working with Steadybit, it is recommended to define multiple attacks in one experiment, rather than running multiple shorter experiments one after the other.

Add disruption events to scenarios

You connect the disruption events from the chaos engineering platform to your scenarios, to run in parallel with the scripts.

You can have both Gremlin and Steadybit events running in the same scenario.

To incorporate disruption events:

Set up your chaos engineering platform and configure attacks for the application you are testing.
Test the connection between the agent and the chaos platform before running the scenario in Controller.
Open the scenario. In the Controller Design tab, in the Scenario Schedule pane, click the Disruption Events button .
In the displayed Disruption Events dialog box, click Add Event.
In the displayed Add Event dialog box, select the relevant chaos engineering provider from the Type list.

Define the relevant details to connect to the selected provider:

Platform	Description
Gremlin	Enter the API Key and Team ID
Steadybit	Enter the API Key. If working with a private cloud platform, select the On-prem checkbox and enter the URL (IP address, plus port if relevant).

Once successfully connected, a Gremlin or Steadybit icon is displayed next to the Type field, and the table is populated with the relevant disruption events (attack scenarios or experiments defined in the selected provider).

The table displays:

the name and type of the event.
the team or email of the event creator.

Select the disruption events you want to use, and click Add.
Tip:
- You can use the Search to search for events by name. This filters the list by your search term.
- For Gremlin events, you can filter by Attack Type or Technology Type.
- For Steadybit events, you can hover over an experiment name to see a description (if a description was added in Steadybit). You can also hover over an experiment type, to see a list of all actions it contains.
The selected event scenarios are displayed in the Disruption Events dialog box.
- Select the events you want to run for the current scenario, and define the required start time for each event.
  
  The start time is relative to the overall test scenario start time. The end time automatically adjusts when you modify the start time, based on the length of the event.
- The scheduling graph shows where each selected event is due to start and end, against the global Vusers schedule for the Controller scenario.
  
  Select an event from the legend to bold the event lines in the graph.
Note: A warning is displayed if:
- you select duplicate events in the list (events running the same attacks).
- events overlap, which may result in one event disrupting the other.
- the end time for an event is after the load test scenario run time. In this case, the event is forcefully stopped when the scenario ends.
The events will still run despite the warning, if you choose to leave your configuration as defined.
Configure continue on error.

In the Disruption Events dialog box, click Disruption Events Settings, then select or clear the checkbox for Continue scenario on disruption event error, according to the required behavior:
- Select the checkbox if the scenario should continue running if a scheduled disruption event fails.
- Clear the checkbox if the scenario should be forcefully stopped if a scheduled disruption event fails.
Your selection applies for all disruption events that are active during the scenario run.

Save your changes.
Click Save in the Disruption Events dialog box.
During the scenario run, the actual start and end times for each event are displayed as bars on the scenario graphs, enabling you to see how the chaos event impacts the monitored machine.

Click on the bars to highlight them and display the event name.

Note: In the graph settings, you can configure to display or hide the event bars for each individual graph. You can also display or hide for all graphs in the global graph settings.

You can view notifications for when every disruption event starts and stops in the Output window. For details, see Output Messages window. The Output window also reports if there is any error with the event run.
After the scenario run, view the results in Analysis. The Chaos Events graph is displayed automatically, showing data for the disruption events.

The event start and end times are also shown on the Scenario Schedule in the Summary Report.

Disruption event logs

You can view the logs for the disruption events:

Microservice-related logs: <installdir>\Logs\EventsHandlerSrv\events_handler.log
Controller-related logs: %TEMP%\lr_disruption_events_err.log

Logging levels are set in these files:

%LR_PATH%\bin\disruption\*log
%LR_PATH%\bin\disruption\*config

The log levels, in increasing priority level, are as follows:

.*config files	.*json files	Recommended values
All	Trace
Debug	Debug	For debugging
Info	Information
Warn	Warning	For regular workflow
Error	Error
Fatal	Critical
Off	None

For more information, see the official documentation for Apache log4net and for .NET Core.