|
Nuxeo Enterprise Platform 5.4 | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.nuxeo.common.utils.FullTextUtils
public class FullTextUtils
Functions related to simple fulltext parsing. They don't try to be exhaustive but they work for simple cases.
Field Summary | |
---|---|
static int |
MIN_SIZE
|
static java.lang.String |
STOP_WORDS
|
static java.util.Set<java.lang.String> |
stopWords
|
static java.lang.String |
UNACCENTED
|
static java.util.regex.Pattern |
wordPattern
|
Method Summary | |
---|---|
static java.util.Set<java.lang.String> |
parseFullText(java.lang.String string,
boolean removeDiacritics)
Extracts the words from a string for simple fulltext indexing. |
static java.lang.String |
parseWord(java.lang.String string,
boolean removeDiacritics)
Parses a word and returns a simplified lowercase form. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final java.util.regex.Pattern wordPattern
public static final int MIN_SIZE
public static final java.lang.String STOP_WORDS
public static final java.util.Set<java.lang.String> stopWords
public static final java.lang.String UNACCENTED
Method Detail |
---|
public static java.util.Set<java.lang.String> parseFullText(java.lang.String string, boolean removeDiacritics)
Initial order is kept, but duplicate words are removed.
It omits short or stop words, removes accents and does pseudo-stemming.
string
- the stringremoveDiacritics
- if the diacritics must be removed
public static java.lang.String parseWord(java.lang.String string, boolean removeDiacritics)
string
- the wordremoveDiacritics
- if the diacritics must be removed
null
if it was removed as a stop
word or a short word
|
Nuxeo Enterprise Platform 5.4 | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |