public class FullTextUtils extends Object
Modifier and Type | Field and Description |
---|---|
static int |
MIN_SIZE |
static String |
STOP_WORDS |
static Set<String> |
stopWords |
static String |
UNACCENTED |
static Pattern |
wordPattern |
Modifier and Type | Method and Description |
---|---|
static Set<String> |
parseFullText(String string,
boolean removeDiacritics)
Extracts the words from a string for simple fulltext indexing.
|
static String |
parseWord(String string,
boolean removeDiacritics)
Parses a word and returns a simplified lowercase form.
|
public static final Pattern wordPattern
public static final int MIN_SIZE
public static final String STOP_WORDS
public static final String UNACCENTED
public static Set<String> parseFullText(String string, boolean removeDiacritics)
Initial order is kept, but duplicate words are removed.
It omits short or stop words, removes accents and does pseudo-stemming.
string
- the stringremoveDiacritics
- if the diacritics must be removedpublic static String parseWord(String string, boolean removeDiacritics)
string
- the wordremoveDiacritics
- if the diacritics must be removednull
if it was removed as a stop
word or a short wordCopyright © 2011 Nuxeo SA. All Rights Reserved.