public class FullTextUtils extends Object
| Modifier and Type | Field and Description |
|---|---|
static int |
MIN_SIZE |
static String |
STOP_WORDS |
static Set<String> |
stopWords |
static String |
UNACCENTED |
static Pattern |
wordPattern |
| Modifier and Type | Method and Description |
|---|---|
static Set<String> |
parseFullText(String string,
boolean removeDiacritics)
Extracts the words from a string for simple fulltext indexing.
|
static String |
parseWord(String string,
boolean removeDiacritics)
Parses a word and returns a simplified lowercase form.
|
public static final Pattern wordPattern
public static final int MIN_SIZE
public static final String STOP_WORDS
public static final String UNACCENTED
public static Set<String> parseFullText(String string, boolean removeDiacritics)
Initial order is kept, but duplicate words are removed.
It omits short or stop words, removes accents and does pseudo-stemming.
string - the stringremoveDiacritics - if the diacritics must be removedpublic static String parseWord(String string, boolean removeDiacritics)
string - the wordremoveDiacritics - if the diacritics must be removednull if it was removed as a stop word or a short wordCopyright © 2015 Nuxeo SA. All rights reserved.