DefaultFulltextParser (Nuxeo ECM Projects 9.10-SNAPSHOT API)

java.lang.Object
- org.nuxeo.ecm.core.storage.DefaultFulltextParser

All Implemented Interfaces:

FulltextParser
```
public class DefaultFulltextParser
extends Object
implements FulltextParser
```
Default fulltext parser, based on word and punctuation split, and lowercase normalization.
The regexp used can be configured using the system property "org.nuxeo.fulltext.wordsplit". The default is "[\\s\\p{Punct}]+".

Since:

5.9.5

Field Summary

Fields
Modifier and Type	Field and Description
`protected static int`	`HTML_MAGIC_OFFSET`
`protected static String`	`TEXT_HTML`
`static String`	`WORD_SPLIT_DEF`
`protected static Pattern`	`WORD_SPLIT_PATTERN`
`static String`	`WORD_SPLIT_PROP`

Constructor Summary

Constructors
Constructor and Description

DefaultFulltextParser()

Constructors
Constructor and Description
`DefaultFulltextParser()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`String`	`parse(String s, String path)` Parses one property value to normalize the fulltext for the database.
`void`	`parse(String s, String path, List<String> strings)` Parses one property value to normalize the fulltext for the database.
`String`	`parse(String s, String path, String mimeType, DocumentLocation documentLocation)` Parses one property value to normalize the fulltext for the database.
`void`	`parse(String s, String path, String mimeType, DocumentLocation documentLocation, List<String> strings)` Parses one property value to normalize the fulltext for the database.
`protected String`	`preprocessField(String s, String path, String mimeType)` Preprocesses one field at the given path.
`protected String`	`removeHtml(String s)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - WORD_SPLIT_PROP
```
public static final String WORD_SPLIT_PROP
```
    See Also:
    
    Constant Field Values
  - WORD_SPLIT_DEF
```
public static final String WORD_SPLIT_DEF
```
    See Also:
    
    Constant Field Values
  - WORD_SPLIT_PATTERN
```
protected static final Pattern WORD_SPLIT_PATTERN
```
  - HTML_MAGIC_OFFSET
```
protected static final int HTML_MAGIC_OFFSET
```
    See Also:
    
    Constant Field Values
  - TEXT_HTML
```
protected static final String TEXT_HTML
```
    See Also:
    
    Constant Field Values
- Constructor Detail
  - DefaultFulltextParser
```
public DefaultFulltextParser()
```
- Method Detail
  - parse
```
public String parse(String s,
                    String path)
```
    Description copied from interface: FulltextParser
    
    Parses one property value to normalize the fulltext for the database.
    The passed path may be null if the passed string is not coming from a specific path, for instance when it was extracted from binary data.
    
    Specified by:
    
    parse in interface FulltextParser
    
    Parameters:
    
    s - the string to be parsed and normalized
    
    path - the abstracted path for the property (where all complex indexes have been replaced by *), or null
    
    Returns:
    
    the normalized words as a single space-separated string
  - parse
```
public void parse(String s,
                  String path,
                  List<String> strings)
```
    Description copied from interface: FulltextParser
    
    Parses one property value to normalize the fulltext for the database.
    Like FulltextParser.parse(String, String) but uses the passed list to accumulate words.
    
    Specified by:
    
    parse in interface FulltextParser
    
    Parameters:
    
    s - the string to be parsed and normalized
    
    path - the abstracted path for the property (where all complex indexes have been replaced by *), or null
    
    strings - the list into which normalized words should be accumulated
  - parse
```
public String parse(String s,
                    String path,
                    String mimeType,
                    DocumentLocation documentLocation)
```
    Description copied from interface: FulltextParser
    
    Parses one property value to normalize the fulltext for the database.
    The passed path may be null if the passed string is not coming from a specific path, for instance when it was extracted from binary data.
    
    Specified by:
    
    parse in interface FulltextParser
    
    Parameters:
    
    s - the string to be parsed and normalized
    
    path - the abstracted path for the property (where all complex indexes have been replaced by *), or null
    
    mimeType - the mimeType of the string to be parsed and normalized. This may be null
    
    documentLocation - the documentLocation of the Document from which the property value string was extracted. This may be null
    
    Returns:
    
    the normalized words as a single space-separated string
  - parse
```
public void parse(String s,
                  String path,
                  String mimeType,
                  DocumentLocation documentLocation,
                  List<String> strings)
```
    Parses one property value to normalize the fulltext for the database.
    Like FulltextParser.parse(String, String) but uses the passed list to accumulate words.
    The default implementation normalizes text to lowercase and removes punctuation. The documentLocation parameter is currently unused but has some use cases for potential subclasses.
    This can be subclassed.
    
    Specified by:
    
    parse in interface FulltextParser
    
    Parameters:
    
    s - the string to be parsed and normalized
    
    path - the abstracted path for the property (where all complex indexes have been replaced by *), or null
    
    mimeType - the mimeType of the string to be parsed and normalized. This may be null
    
    documentLocation - the documentLocation of the Document from which the property value string was extracted. This may be null
    
    strings - the list into which normalized words should be accumulated
  - preprocessField
```
protected String preprocessField(String s,
                                 String path,
                                 String mimeType)
```
    Preprocesses one field at the given path.
    The path is unused for now.
  - removeHtml
```
protected String removeHtml(String s)
```

Class DefaultFulltextParser

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

WORD_SPLIT_PROP

WORD_SPLIT_DEF

WORD_SPLIT_PATTERN

HTML_MAGIC_OFFSET

TEXT_HTML

Constructor Detail

DefaultFulltextParser

Method Detail

parse

parse

parse

parse

preprocessField

removeHtml