Evaluate GenAI responses
Use the Evaluate GenAI response step to assess AI-generated text using configurable evaluation methods, metrics, and pass levels.
Evaluate GenAI response steps require an
Aviator license.
Overview
Because AI responses are non-deterministic, validating their quality requires metric-based evaluation instead of exact string matching.
Evaluate GenAI response steps let you customize the evaluation using multiple quality dimensions. For each metric, you set a minimum expected level on a scale from 1 (Fair) to 4 (Excellent). The step passes only if all selected metrics meet their configured minimum level.
Currently, the AI response that you want to evaluate must be stored in an input or output parameter before it can be evaluated.
The following evaluation methods are available:
| Evaluation method | Purpose | Metrics |
|---|---|---|
|
Evaluate compared to a baseline response |
Compare the generated response against a baseline response that you provide. |
Semantic Alignment, Key Element Coverage, Format Alignment, Avoidance of Deviations |
|
Evaluate based on context you provide |
Evaluate the quality of the response based on context you provide, such as the original prompt or additional background information. |
Relevance and Accuracy, Completeness, Absence of Hallucination |
|
General metrics |
Assess the overall quality of the generated response without providing reference text. |
Fluency & Grammar, Coherence, Bias & Fairness, Safety & Compliance |
Tip: Semantic alignment evaluates meaning and intent, not exact wording. A response that conveys the right information in different words can still pass the semantic alignment metric.
Create an Evaluate GenAI response step
Create the Evaluate step in the Editor and configure it in the step editing pane.
To create an Evaluate GenAI response step:
-
Make sure that before the Evaluate step runs, the text you want to evaluate is stored in a parameter.
For example: Trigger AI generation in your application, then store the generated text in an output parameter. For details, see Store a value in an output parameter. Use this parameter in the Evaluate step.
For more complex cases, such as multi-line text, use a function to retrieve the value from the application and store it in an output parameter. For details, see Custom functions.
-
In the script Editor, click the Add step button
and select Evaluate GenAI.
The editor adds the step template. Alternatively, you can type the command manually.
Syntax:
Evaluate GenAI response in @<parameter name> -
In the Editor or the step editing pane, specify the parameter to evaluate.
Note: The Current screen option is currently unavailable.
-
In the step editing pane, configure one or more evaluation methods and metrics. For each method you configure, provide any required information or guidelines, and select one or more metrics. For details, see Configure evaluation methods.
-
If you want the test run on the current application to stop when the verification step fails, select Stop test run upon failure. An indicator
is added to the step.If you didn't select the option, the step failure is reported but the run continues. (Default: Run does not stop)
After the script runs, the report for this step shows the evaluation status and metric results, including scores and explanations per metric.
Configure evaluation methods
Configure one or more evaluation methods for an Evaluate GenAI response step to use. This tells the script which evaluation criteria to use for your AI-generated text.
To configure evaluation methods:
-
In the step editing pane of an Evaluate GenAI response step, click an evaluation method.
-
Depending on the method you configure, enter the following information:
Evaluation method Required information Evaluate compared to a baseline response
In the Baseline response field, enter the response to use for comparison.
To reference a script parameter, use the
@<parameter name>syntax.Evaluate based on context you provide
Provide at least one of the following:
-
In The input provided to the AI in the application, enter the prompt or request used to generate the response.
-
In Additional context, enter background information to use as context for the evaluation.
To reference a script parameter, use the
@<parameter name>syntax.General metrics
No additional information required.
-
-
Select one or more evaluation metrics to check.
If you configure an evaluation method but do not select any metrics, the details you configured are saved for future use, but the evaluation method is ignored during the script run.
-
For each selected metric, set a Minimum expected level from 1 (Fair) to 4 (Excellent). The step passes only if all selected metrics meet their configured minimum level.
-
(Optional) Add Additional guidelines to refine the evaluation, if the results were not satisfactory when you ran the script with the initial configuration.
To reference a script parameter, use the
@<parameter name>syntax. -
Click Save. The evaluation method dialog box closes.
The step editing pane shows which metrics you selected and the minimum expected level configured for each metric.
To modify configured evaluation methods:
-
To edit the details of a configured evaluation method, click the More options button
and select Edit. -
To remove a specific metric from a configured method, click the x on the metric's chip.
-
To completely remove all configuration details for an evaluation method, click the More options button
and select Reset.
See also:

