Back to NLP TopicsBatch Testing evaluates your app’s ability to correctly identify expected intents and entities from a set of utterances. It provides statistical analysis of ML model performance.Go to: Automation AI > Virtual Assistant > Testing > Regression Testing > Batch Testing
Batch Testing supports the Zero-shot Model for intent detection. Ensure the Zero-shot ML Model feature is enabled and the ML Network Type is set to Zero-shot model.
Build a test suite of representative utterances first, then train against failures.
Update test suites regularly for high-usage utterances.
Publish only after thorough testing.
Keep intent names short (3–5 words), avoid special characters and stop words.
Batch tests do not consider conversation context — some False Negatives may be True Positives in live sessions.
The “count” in batch results refers to unique assertion statements, not CSV rows. Consecutive rows with the same utterance and different entity values count as one assertion.
Batch testing scores original input, not spell-corrected input.
input,intent,parentIntent,entityName,entityValue,entityOrderSend 200 dollars to Leonardo,Transfer Funds,,TransferAmount,200 USD,,,,PayeeName,Leonardo,TransferAmount>PayeeNameWhat is the balance in my checking account,Show Balance,Transfer Funds,,,Show my past 20 transactions,Show Account Statement,,HistorySize,20,
CSV columns:
Column
Type
Description
input
String
User utterance. Max 3000 characters.
intent
String
Expected intent. Prefix with trait: for traits.
parentIntent
String (Optional)
Parent intent for sub-intents or contextual Small Talk.
Click the test suite name in the Batch Testing window.
Select In Development or Published.
Click Run Test Suite.
New test suites automatically trigger runs for both In-Development and Published versions.Add notes: Click the Notes icon during or after a run to record the purpose or changes. Max 1024 characters.
When expected intent ≠ winning intent, the Elimination Reason column shows why. R&R policy reasons take precedence; otherwise scores from each engine are shown (FM: [score], ML: [score], FAQ: [score]).
Reason
Description
belowDependencyThreshold
Score below minimum dependency threshold.
verbMatchOnly
Only verb matched in a single-word match.
entityMatchOnly
Only entity (number, date, etc.) matched.
foundDefinitive
Definitive match found, possible matches discarded.