Skip to content

PDF Processing Configuration Strategies

Overview

Wabee AI offers various strategies for splitting PDF files that are uploaded to the agent in the chat window, allowing users to optimize how their agents process and analyze document content. This document outlines the available strategies and their use cases.

Where to Configure

The file split strategy can be configured in the agent settings when updating an agent configuration. The strategy can be set to one of the following options: Semantic, Title or Form-based split.

img
Configuring PDF file split strategy.

Available Strategies

1. Semantic Chunking

Strategy Name: semantic

Description

Semantic chunking divides the document into coherent sections based on the semantic meaning of the content. This strategy uses advanced natural language processing techniques to identify logical breaks in the text.

Use Cases

  • Long-form articles or reports
  • Academic papers
  • Legal documents

Benefits

  • Preserves context within chunks
  • Improves relevance of information retrieval
  • Enhances the quality of AI-generated responses

2. Form-based Chunking

Strategy Name: form

Description

Form-based chunking is designed specifically for documents with structured layouts, such as forms, invoices, or templated reports. This strategy identifies and extracts information based on the visual structure of the document.

Use Cases

  • Invoices and receipts
  • Application forms
  • Structured reports with consistent layouts

Benefits

  • Accurately captures field-value pairs
  • Preserves tabular data structure
  • Ideal for documents with repetitive layouts

3. Title-based Chunking (Default)

Strategy Name: title (default)

Description

Title-based chunking splits the document into sections based on identified titles or headings. This strategy is effective for well-structured documents with clear section demarcations.

Use Cases

  • Technical documentation
  • Business reports
  • User manuals

Benefits

  • Maintains document structure
  • Facilitates easy navigation of content
  • Suitable for a wide range of document types

Choosing the Right Strategy

When creating an agent, consider the following factors to select the most appropriate file split strategy:

  1. Document Type: Match the strategy to the typical structure of your documents.
  2. Content Complexity: For documents with varied content, semantic chunking might be more effective.
  3. Information Extraction Needs: If you need to extract specific fields, form-based chunking could be ideal.
  4. Processing Speed: Title-based chunking is generally faster and suitable for most use cases.