Skip to main content
Back to NLP Topics The Health and Monitoring dashboard provides goal-driven insights into your app’s NLP model performance. It analyzes training data, test coverage, and test results to surface actionable recommendations. Go to: Testing > Regression Testing > Health & Monitoring

Dashboard Sections

SectionDescription
NLPAggregates batch test results. Shows performance metrics and test coverage for Dialog intents, FAQs, Small Talks, Traits, and Entities.
FlowSummarizes conversation flow coverage. Shows transition coverage and intent summary from conversation test suites.

NLP Metrics

Displayed as aggregate values in the Bot Health summary and as individual scores in the intent-type panels (Dialog Intents, FAQs, Small Talk, Traits).
MetricDescription
AccuracyWhether the identified intent is correct.
F1 ScoreWeighted average of Precision and Recall.
Precision ScoreRatio of true positives to all predicted positives (TP / (TP + FP)).
Recall ScoreRatio of true positives to all actual positives (TP / (TP + FN)).
Total Test Coverage %Average test coverage across Dialog Intents, FAQs, Small Talk, Traits, and Entities.

Test Cases Detailed Analysis

Click View Test Cases in the NLP section to open the detailed analysis window. It shows results for Intents, Entities, and Traits.

Intents Tab

ColumnDescription
Test CasesTest case name.
Intent TypeDialog intent, FAQ, or Small Talk.
Expected IntentIntent expected from the utterance.
Matched IntentIntent actually matched.
Result TypeTrue Positive, False Positive, or False Negative.
TagsFollow-up labels assigned by the analyst.

Entities Tab

ColumnDescription
UtterancesThe user utterance in the test case.
Entity NameEntity name mapped to the test case.
Expected ValueEntity value expected.
Matched ValueEntity value actually matched.
Entity ResultTrue (matched) or False (not matched).
TagsFollow-up labels.

Traits Tab

ColumnDescription
Test CasesTrait test case name.
Intent TypeDisplays “Trait”.
Trait NameName of the trait analyzed.
Expected TraitTrait expected from the utterance.
Matched TraitTrait actually matched.
Trait ResultTrue Positive, False Positive, or False Negative.
TagsFollow-up labels.

Tags

Tags are labels for intent, entity, and trait test results that indicate follow-up actions.
TagMeaning
Add Negative PatternA negative pattern should be added.
NeedNLPHelpRequires explicit NLP support.
Needs Negative PatternNeeds a negative pattern to work as expected.
Needs TrainingApp needs training for this intent/entity/trait.
New IntentA new intent was detected during execution.

NLP Analysis (per Test Case)

Click the NLP Analysis tab in the test case detail view to see the historic NLP analysis captured at the time of execution. Shows qualified (definitive and probable) and disqualified intents for:
  • Traits (if applicable)
  • ML engine
  • FM engine
  • KG engine
  • Trait Rule (if applicable)
  • Ranking and Resolver
This differs from Utterance Testing, which shows current analysis based on the latest training data. NLP Analysis here is the snapshot from when the test ran.

NLP Performance Metrics (per Intent Type)

Result TypeDescription
True Positive (TP)Utterances that correctly matched the expected intent.
False Positive (FP)Utterances that matched an unexpected intent.
False Negative (FN)Utterances that did not match the expected intent.

Performance Metrics Table

The details window provides a drill-down view for intents, entities, and traits:
MetricIntentEntityTrait
Expected Intent/ValueYesYesYes
Matched Intent/ValueYesYesYes
Parent IntentYesNoYes
Task State (Configured/Published)YesNoYes
Result TypeYesNoYes
ML / FM / RR ScoresYesNoYes
Entity NameNoYesNo
Result (True/False)NoYesNo
Identified by (NLU engine)NoYesNo
Identified using (entity type reference)NoYesNo
Confidence ScoreNoYesNo

Dialog Intent Summary

Test Coverage

Shows the count and percentage of covered vs. uncovered intents. An intent is covered if it has at least one test case in the selected suite(s). Use View details to find uncovered intents and add test cases.

Recommendations

Training recommendations appear when errors or warnings are triggered during execution. Click View Recommendations to see the summary and corrective actions.

Intent Details Window

Click View Details in any summary panel (Dialog Intents, FAQs, Small Talks) to open a drill-down of performance metrics and recommendations. Training Data Summary columns:
MetricDialog IntentFAQSmall Talk
IntentIntent nameFAQ intent nameSmall Talk intent name
UtterancesCount of training utterancesN/A
Test CasesCount in selected suite(s)
True Positive (TP)Count of TP results
False Negative (FN)Count of FN results
False Positive (FP)Count of FP results
Covered InTest suite names
F1 / Accuracy / Precision / RecallRecommendation scores
RecommendationsCount + corrective actions linkN/AN/A
GroupN/AN/ASmall Talk group
PathN/AKG node pathN/A
Alt QuestionN/ACount of alternate questionsN/A
View Intents Not Covered: Click the three-dot menu on the panel to list intents not covered in batch testing. Add them to your training data to improve coverage.

FAQ Summary

Shows recommendation scores for FAQs from the latest batch test. Click View Recommendations to review the last report. Click Knowledge Graph to navigate to KG Analysis.

Small Talk Summary

Shows recommendation scores for Small Talk interactions from the latest batch test. Click Small Talk to view group names, user utterances, and bot responses.

Trait and Entity Summary

Shows recommendation scores for traits and entities from the latest batch test. Use Test Coverage and Test Results Analysis in each panel for details.

Utterance Testing from Health and Monitoring

Click the go to utterance testing (magic wand) icon on the Test Cases Detailed Analysis page to open the Utterance Testing window. From there you can retrain the app based on test failures. See Utterance Testing.