public class DefaultFulltextParser extends Object implements FulltextParser
The regexp used can be configured using the system property "org.nuxeo.fulltext.wordsplit". The default is "[\\s\\p{Punct}]+".
Modifier and Type | Field and Description |
---|---|
static String |
WORD_SPLIT_DEF |
protected static Pattern |
WORD_SPLIT_PATTERN |
static String |
WORD_SPLIT_PROP |
Constructor and Description |
---|
DefaultFulltextParser() |
Modifier and Type | Method and Description |
---|---|
String |
parse(String s,
String path)
Parses one property value to normalize the fulltext for the database.
|
void |
parse(String s,
String path,
List<String> strings)
Parses one property value to normalize the fulltext for the database.
|
protected String |
preprocessField(String s,
String path)
Preprocesses one field at the given path.
|
protected String |
removeHtml(String s) |
public static final String WORD_SPLIT_PROP
public static final String WORD_SPLIT_DEF
protected static final Pattern WORD_SPLIT_PATTERN
public DefaultFulltextParser()
public String parse(String s, String path)
FulltextParser
The passed path
may be null
if the passed string is not coming from a specific path, for instance
when it was extracted from binary data.
parse
in interface FulltextParser
s
- the string to be parsed and normalizedpath
- the abstracted path for the property (where all complex indexes have been replaced by *
), or
null
public void parse(String s, String path, List<String> strings)
Like FulltextParser.parse(String, String)
but uses the passed list to accumulate words.
The default implementation normalizes text to lowercase and removes punctuation.
This can be subclassed.
parse
in interface FulltextParser
s
- the string to be parsed and normalizedpath
- the abstracted path for the property (where all complex indexes have been replaced by *
), or
null
strings
- the list into which normalized words should be accumulatedprotected String preprocessField(String s, String path)
The path is unused for now.
protected String removeHtml(String s)
Copyright © 2015 Nuxeo SA. All rights reserved.