Knowledge Extraction

Back to NLP Topics Knowledge Extraction pulls FAQ content from external sources—web pages, PDFs, and CSV files—and lets you review and add it to your Knowledge Graph. Workflow:

Extract — Pull Q&A pairs from a supported source (PDF, URL, or CSV).
Edit — Review and edit extracted questions and answers.
Move — Drag extracted content into KG nodes. If no KG exists, one is created automatically.

Two ways to add extracted content:

Add to Knowledge Graph — Moves selected questions to the root node.
Add to Specific Term — Drag and drop to a specific node (requires an existing KG).

Extract from a URL

Go to Automation AI > Knowledge AI > FAQs > ⋯ > Manage Extracts.
Click Extract from URL.
Enter a Name and the URL, then click Proceed.
After extraction completes, click Review & Add to add questions to the KG.

Extract from a File

File size limit: 5 MB. Supported formats: PDF, CSV.

Go to Automation AI > Knowledge AI > FAQs > ⋯ > Manage Extracts.
Click Extract from file.
Click Browse and select your file.
Click Proceed.
For PDFs, you can optionally annotate before extraction. See Annotate & Extract.
After extraction, click Review & Add.

Annotate & Extract (PDF only)

Use this when your PDF is not in XO Platform-compatible format. Annotating teaches the KG engine where questions and answers are.

Select a PDF (new, or previously extracted with no questions added to the KG yet).
Click Annotate & Extract.

The PDF loads in the Annotation Tool. Select text and tag it:

Tag	Effect
Heading	Marks the question. Content between two consecutive headings is treated as the answer.
Header	Ignored during extraction. Trains the model to recognize and skip headers.
Footer	Ignored during extraction. Trains the model to recognize and skip footers.
Exclude	Not used for extraction.
Ignore Page	Entire page is skipped.

Annotate a few pages, then click Extract to review. Re-annotate if results are unsatisfactory.
After extraction, click Review & Add to add questions to the KG.

You can re-annotate only if no questions from the file have been added to the KG yet. If questions were already added, create a copy of the annotated document to work with.

Edit Extracted Content

Go to Automation AI > Knowledge AI > FAQs > ⋯ > Manage Extracts.
Click a successful extract.
Hover over a Q&A pair and click Edit.
Make changes and click Save.

Add Extracted Content to the KG

From Manage Extracts:

Go to Manage Extracts and open a successful extract.
Drag and drop Q&A pairs to the target node (child nodes expand during drag).
Multi-select for bulk moves.

From the Knowledge Graph:

Select the target node in the KG.
Click Add from Extraction.
Select a successful extract.
Check the Q&A pairs to add and click Add.

Once a Q&A pair is moved to the KG, it cannot be moved again. If the question is later modified or deleted from the KG, you can re-add it from the extract.

Supported Formats

CSV

Column 1: question; Column 2: answer.
No headers allowed. Other columns are ignored.

PDF

With table of contents: The extraction service uses the ToC to derive heading hierarchy (heading | subheading | sub-subheading).
Without table of contents: Uses a pre-trained ML model to detect headings by font style or size.

Web Pages

Supported page layouts:

Linear Q&A pairs.
Questions with hyperlinks pointing to answers on the same page.
Questions with hyperlinks pointing to answers on a different page.

Extraction fails for a question when:

Question text spans multiple HTML tags.
The answer tag is not a child or sibling of the question in the DOM.
The question has no hyperlink to its answer (for hyperlink-based layouts).
The linked answer page does not repeat the question above the answer.

Entire page extraction fails when:

The page mixes more than one of the above layouts.

Modules

Platform Services

References

Extract from a URL

Extract from a File

Annotate & Extract (PDF only)

Edit Extracted Content

Add Extracted Content to the KG

Supported Formats

CSV

PDF

Web Pages

Modules

Platform Services

References

​Extract from a URL

​Extract from a File

​Annotate & Extract (PDF only)

​Edit Extracted Content

​Add Extracted Content to the KG

​Supported Formats

​CSV

​PDF

​Web Pages

Extract from a URL

Extract from a File

Annotate & Extract (PDF only)

Edit Extracted Content

Add Extracted Content to the KG

Supported Formats

CSV

PDF

Web Pages