Package org.nuxeo.ecm.core.convert.plugins.text.extractors
Plugins that provide some sort of document transformation / text extraction.
-
Class Summary Class Description BaseOfficeXMLTextConverter Base class that contains SAX based text extractor fallbackDOCX2TextConverter Docx to text converter: parses the Open XML text document to read its content.FullTextConverter Converter that tries to find a way to extract full text content according to input mime-type.Html2TextConverter Extract the text content of HTML documents while trying to respect the paragraph structure.MD2TextConverter Markdown to text converter.MSOffice2TextConverter OOo2TextConverter Based on Apache JackRabbit OOo converter.OOoXmlContentHandler OpenXmlContentHandler PDF2TextConverter PDF2TextConverter.PatchedPDFTextStripper PPTX2TextConverter Pptx to text converter: parses the Open XML presentation document to read its content.RFC822ToTextConverter RTF2TextConverter UnclosableZipInputStream Wrapper used because some consumer (SAX parser) tend to close the streamXL2TextConverter XLX2TextConverter XML2TextConverter Xml2TextHandler XmlZip2TextConverter XML zip to text converter: parses the XML zip entries to read their content.