Skip to main content
Batch Testing is a comprehensive testing framework that evaluates and validates the accuracy of intent detection of an AI Agent. It enables users to systematically test their AI Agent’s ability to understand user requests across multiple conversation types — including dialogs, FAQs, Knowledge (Search AI), and conversation intents. It also supports different model configurations and provides comprehensive performance metrics for both development and production environments. Unlike traditional testing approaches, Batch Testing replicates the complete DialogGPT runtime pipeline, providing authentic performance insights that mirror real user interactions.

Key Features

FeatureDescription
End-to-End Pipeline TestingProcesses each utterance through the full retrieval and LLM workflow, mirroring real-world behavior to uncover issues static testing might miss.
Model Configuration FlexibilitySupports testing across different combinations of embedding models and LLMs to identify the most effective configuration for your app.
Granular Performance InsightsMeasures accuracy, precision, recall, and F1 score across all conversation types, including Dialogs, FAQs, Knowledge, and Conversation Intents.
Lifecycle SupportEnables batch testing for both in-development and published apps, including Standard and Multi-App Routing, allowing validation at any stage of the deployment lifecycle.

Supported Conversation Types

  • Single Intent
  • Multi Intent
  • Small Talk
  • Conversation Intent
  • No Intent
  • Ambiguous Intent
  • Answer Generation

How Batch Testing Works

Batch Testing replicates actual runtime behavior by chaining retrieval and LLM calls, ensuring each test case goes through the complete conversation pipeline:
  1. Query Rephrasing (if enabled).
  2. Chunk Qualification from Dialogs, FAQs, and Search Index.
  3. Semantic Similarity Matching based on configured thresholds.
  4. LLM Processing for intent identification and fulfillment type determination.
This approach provides dynamic testing that mirrors real user interactions, enabling accurate performance evaluation across different model configurations.

Validate Specific Conversational Intent Types

The Batch Testing framework enables you to explicitly validate specific Conversational Intent Types — including Hold, Restart, Refuse, End, Agent Transfer, and Repeat — within the Conversation Intent fulfillment category. This helps you test and verify how each conversational action is recognized and processed. During execution, the batch testing engine performs granular validation by comparing expected and detected conversational intent types. Test results display both values to help identify mismatches and ensure accurate dialog handling.

Define Expected Intents

Specify expected intents using the following format:
Intent TypeFormat
HoldConversationIntent-Hold
RestartConversationIntent-Restart
RefuseConversationIntent-Refuse
EndConversationIntent-End
Agent TransferConversationIntent-AgentTransfer
RepeatConversationIntent-Repeat
JSON/CSV Upload — When importing test cases via JSON or CSV, define the expected intent using the format above. Download the sample CSV or JSON templates when creating a test suite. Quick Entry — The Expected Intent dropdown lists the predefined conversational intent types. These options appear when the fulfillment type is set to Conversation Intent.

Access

Go to Automation AI > Virtual Assistant > Testing > Regression Testing > Batch Testing.

Step 1 — Create a Test Suite

To conduct a batch test, create a test suite. Each test suite comprises multiple test cases, including key fields such as user utterance, expected intent, linked app, and fulfillment type.

Upload a File

This method adds multiple test cases simultaneously. Download the sample CSV or JSON file formats while creating the test suite.
For Multi-App Routing, you must enter the linked app name in addition to the utterance, fulfillment category, and intent.
  1. Go to Automation AI > Virtual Assistant > Testing > Regression Testing > Batch Testing.
  2. Click +New Test Suite.
  3. Enter the test Name and Description.
  4. Click Upload File, select the file, and click Add to Suite.
  5. Click Create Suite. The created test is displayed.

Quick Entry

Add one test case at a time using a form. The form includes mandatory fields: User Utterance, Fulfillment Type, and Expected Intent.
Fulfillment TypeBehavior
Answer GenerationExpected intent is automatically set to Answer Generation.
Multi IntentAdd up to five intents and reorder them in execution order.
Ambiguous IntentAdd a minimum of 2 and a maximum of 5 intents.
  1. Go to Automation AI > Virtual Assistant > Testing > Regression Testing > Batch Testing.
  2. Click +New Test Suite.
  3. Enter the test Name and Description.
  4. Click Quick Entry.
  5. Based on your app type:
    • Standard App — Enter the User Utterance, select the Fulfillment Category and Expected Intent.
    • Multi-App Routing — Enter the User Utterance, select the Fulfillment Category, Linked App, and Expected Intent. Select Linked App
  6. Click Save and add another for the next test cases, or click Add to Suite.
  7. Click Create Suite. The created test is displayed.

Step 2 — Run Test Suite

After creating a test suite, run it through the complete retrieval and LLM pipeline to simulate live interactions using a set of model configurations. You can run execution for both in-development and published versions, and add notes to record the purpose of the test run.
The embedding model cannot be changed. For testing purposes, the DialogGPT embedding model is used.
  1. Go to Automation AI > Virtual Assistant > Testing > Regression Testing > Batch Testing.
  2. Click Run Test Suite for the required suite.
  3. Select the App Version, Orchestration Model, Prompt, and add Notes if required.
  4. Click Run Test to start batch test execution.
  5. Once complete, the results are displayed.

Step 3 — Results and Analysis

The Results and Analysis stage evaluates performance using standardized intent detection metrics, presenting all batch test results conducted so far. Compare different combinations of embedding and language models and make data-driven decisions using key metrics.
MetricDescription
AccuracyOverall correctness of intent detection.
PrecisionRatio of correctly identified intents to total identified.
RecallRatio of correctly identified intents to total expected.
F1 ScoreHarmonic mean of precision and recall.
  1. Go to Automation AI > Virtual Assistant > Testing > Regression Testing > Batch Testing and click the test suite to view tests.
  2. Click the summary icon to view the result. Download the report as a CSV file or delete results. Batch Testing results summary
  3. The test result is displayed. Batch Testing test result
  4. Click Configure View to add or remove displayed metrics. Select the metric and click Apply. Batch Testing configure view
  5. Click any Intent to view intent details. Batch Testing intent details
  6. Click any Test Case to view test case details. Batch Testing test case details
  7. Click Conversation Orchestration to view request and response payload. Batch Testing conversation orchestration