|
Nuxeo ECM Projects 5.4.3-SNAPSHOT | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.nuxeo.common.utils.FullTextUtils
public class FullTextUtils
Functions related to simple fulltext parsing. They don't try to be exhaustive but they work for simple cases.
Field Summary | |
---|---|
static int |
MIN_SIZE
|
static String |
STOP_WORDS
|
static Set<String> |
stopWords
|
static String |
UNACCENTED
|
static Pattern |
wordPattern
|
Method Summary | |
---|---|
static Set<String> |
parseFullText(String string,
boolean removeDiacritics)
Extracts the words from a string for simple fulltext indexing. |
static String |
parseWord(String string,
boolean removeDiacritics)
Parses a word and returns a simplified lowercase form. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final Pattern wordPattern
public static final int MIN_SIZE
public static final String STOP_WORDS
public static final Set<String> stopWords
public static final String UNACCENTED
Method Detail |
---|
public static Set<String> parseFullText(String string, boolean removeDiacritics)
Initial order is kept, but duplicate words are removed.
It omits short or stop words, removes accents and does pseudo-stemming.
string
- the stringremoveDiacritics
- if the diacritics must be removed
public static String parseWord(String string, boolean removeDiacritics)
string
- the wordremoveDiacritics
- if the diacritics must be removed
null
if it was removed as a stop
word or a short word
|
Nuxeo ECM Projects 5.4.3-SNAPSHOT | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |