Package org.nuxeo.ecm.core.storage
Class FulltextExtractorWork
- java.lang.Object
-
- org.nuxeo.ecm.core.work.AbstractWork
-
- org.nuxeo.ecm.core.storage.FulltextExtractorWork
-
- All Implemented Interfaces:
Serializable
,Work
public class FulltextExtractorWork extends AbstractWork
Work task that does fulltext extraction from the string properties and the blobs of the given document, saving them into the fulltext table.- Since:
- 5.7 for the original implementation, 10.3 the extraction and update are done in the same Work
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.nuxeo.ecm.core.work.api.Work
Work.Progress, Work.State
-
-
Field Summary
Fields Modifier and Type Field Description protected static String
ANY2TEXT_CONVERTER
protected static String
CATEGORY
protected List<DocumentRef>
docsToUpdate
protected DocumentModel
document
static String
FULLTEXT_DEFAULT_INDEX
protected FulltextConfiguration
fulltextConfiguration
protected static int
HTML_MAGIC_OFFSET
static String
SYSPROP_FULLTEXT_BINARY
static String
SYSPROP_FULLTEXT_JOBID
static String
SYSPROP_FULLTEXT_SIMPLE
protected static String
TEXT_HTML
protected static String
TITLE
protected boolean
updateBinaryText
If true, update the binary text from the document.protected boolean
updateSimpleText
If true, update the simple text from the document.protected boolean
useJobId
-
Fields inherited from class org.nuxeo.ecm.core.work.AbstractWork
callerThread, completionTime, docId, docIds, FAILURE_EXCEPTION, FAILURE_MSG, GLOBAL_DLQ_COUNT_REGISTRY_NAME, id, isTree, loginContext, originatingUsername, progress, RANDOM, repositoryName, schedulePath, schedulingTime, session, startTime, state, status, suspended, suspending, traceContext, WORK_FAILED_EVENT, WORK_INSTANCE
-
-
Constructor Summary
Constructors Constructor Description FulltextExtractorWork(String repositoryName, String docId, boolean updateSimpleText, boolean updateBinaryText, boolean useJobId)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected String
blobToText(Blob blob)
Converts the blob to text by calling a converter.protected void
extractAndUpdate()
protected void
extractAndUpdateBinaryText()
protected void
extractAndUpdateSimpleText()
protected void
findDocsToUpdate()
String
getCategory()
Gets the category for this work.protected String
getFulltextPropertyName(String name, String indexName)
int
getRetryCount()
Gets the number of times that this Work instance can be retried in case of concurrent update exceptions.String
getTitle()
Gets a human-readable name for this work instance.protected void
initFulltextConfiguration()
protected String
limitStringSize(String string, int maxSize)
protected String
removeEntities(String string)
protected String
removeHtml(String string)
protected String
stringToText(String string)
void
work()
This method should implement the actual work done by theWork
instance.-
Methods inherited from class org.nuxeo.ecm.core.work.AbstractWork
appendWorkToDeadLetterQueue, buildWorkFailureEventProps, cleanUp, closeSession, commitOrRollbackTransaction, equals, getCompletionTime, getDocument, getDocuments, getId, getOriginatingUsername, getPartitionKey, getProgress, getSchedulePath, getSchedulingTime, getSpanFromContext, getStartTime, getStatus, getWorkInstanceState, hashCode, initSession, initSession, isDocumentTree, isSuspending, isWorkInstanceSuspended, newDocumentLocation, openSystemSession, openUserSession, run, runWorkWithTransaction, setCompletionTime, setDocument, setDocument, setDocuments, setOriginatingUsername, setProgress, setSchedulePath, setStartTime, setStatus, setWorkInstanceState, setWorkInstanceSuspending, startTransaction, suspended, toString, workFailed
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface org.nuxeo.ecm.core.work.api.Work
isCoalescing, isGroupJoin, isIdempotent, onGroupJoinCompletion
-
-
-
-
Field Detail
-
SYSPROP_FULLTEXT_SIMPLE
public static final String SYSPROP_FULLTEXT_SIMPLE
- See Also:
- Constant Field Values
-
SYSPROP_FULLTEXT_BINARY
public static final String SYSPROP_FULLTEXT_BINARY
- See Also:
- Constant Field Values
-
SYSPROP_FULLTEXT_JOBID
public static final String SYSPROP_FULLTEXT_JOBID
- See Also:
- Constant Field Values
-
FULLTEXT_DEFAULT_INDEX
public static final String FULLTEXT_DEFAULT_INDEX
- See Also:
- Constant Field Values
-
CATEGORY
protected static final String CATEGORY
- See Also:
- Constant Field Values
-
TITLE
protected static final String TITLE
- See Also:
- Constant Field Values
-
ANY2TEXT_CONVERTER
protected static final String ANY2TEXT_CONVERTER
- See Also:
- Constant Field Values
-
HTML_MAGIC_OFFSET
protected static final int HTML_MAGIC_OFFSET
- See Also:
- Constant Field Values
-
TEXT_HTML
protected static final String TEXT_HTML
- See Also:
- Constant Field Values
-
fulltextConfiguration
protected transient FulltextConfiguration fulltextConfiguration
-
document
protected transient DocumentModel document
-
docsToUpdate
protected transient List<DocumentRef> docsToUpdate
-
updateSimpleText
protected final boolean updateSimpleText
If true, update the simple text from the document.
-
updateBinaryText
protected final boolean updateBinaryText
If true, update the binary text from the document.
-
useJobId
protected final boolean useJobId
-
-
Constructor Detail
-
FulltextExtractorWork
public FulltextExtractorWork(String repositoryName, String docId, boolean updateSimpleText, boolean updateBinaryText, boolean useJobId)
-
-
Method Detail
-
getCategory
public String getCategory()
Description copied from interface:Work
Gets the category for this work.Used to choose an execution queue.
- Specified by:
getCategory
in interfaceWork
- Overrides:
getCategory
in classAbstractWork
- Returns:
- the category, or
null
for the default
-
getTitle
public String getTitle()
Description copied from interface:Work
Gets a human-readable name for this work instance.- Returns:
- a human-readable name
-
getRetryCount
public int getRetryCount()
Description copied from class:AbstractWork
Gets the number of times that this Work instance can be retried in case of concurrent update exceptions.- Overrides:
getRetryCount
in classAbstractWork
- Returns:
- 0 for no retry, or more if some retries are possible
- See Also:
AbstractWork.work()
-
work
public void work()
Description copied from interface:Work
This method should implement the actual work done by theWork
instance.It should periodically update its progress through
Work.setProgress(org.nuxeo.ecm.core.work.api.Work.Progress)
.To allow for suspension by the
WorkManager
, it should periodically callWork.isSuspending()
, and iftrue
callWork.suspended()
return early with saved state data.Clean up can by implemented by
Work.cleanUp(boolean, Exception)
.- Specified by:
work
in interfaceWork
- Specified by:
work
in classAbstractWork
- See Also:
Work.isSuspending()
,Work.suspended()
,Work.cleanUp(boolean, java.lang.Exception)
-
initFulltextConfiguration
protected void initFulltextConfiguration()
-
findDocsToUpdate
protected void findDocsToUpdate()
-
extractAndUpdate
protected void extractAndUpdate()
-
extractAndUpdateSimpleText
protected void extractAndUpdateSimpleText()
-
extractAndUpdateBinaryText
protected void extractAndUpdateBinaryText()
-
stringToText
protected String stringToText(String string)
-
removeHtml
protected String removeHtml(String string)
-
removeEntities
protected String removeEntities(String string)
-
blobToText
protected String blobToText(Blob blob)
Converts the blob to text by calling a converter.
-
limitStringSize
protected String limitStringSize(String string, int maxSize)
-
getFulltextPropertyName
protected String getFulltextPropertyName(String name, String indexName)
-
-