-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Building Data-Driven Applications with LlamaIndex
By :

Data is not always simple. Many real-world documents, such as research papers, financial reports, and others, contain a mix of unstructured text, as well as structured tabular data in tables. Ingesting such heterogeneous documents presents an additional challenge - we need to not only extract text but also identify, parse, and process tables embedded within the text. Because, sometimes you get tables, sometimes you get text and sometimes you have to deal with a mix of both.
LlamaIndex provides UnstructuredElementNodeParser
to tackle such documents containing both free-form text as well as tables and other structured elements. It leverages the Unstructured
library to analyze the document layout and delineate text sections from tables.
This parser works exclusively on HTML files and can extract two types of nodes: