Class LimitedTextContentExtractor
java.lang.Object
com.atlassian.confluence.search.v2.extractor.BaseAttachmentContentExtractor
com.atlassian.confluence.impl.search.v2.extractor.LimitedTextContentExtractor
- All Implemented Interfaces:
Extractor2
A subclass of
BaseAttachmentContentExtractor
which places a limit on how many bytes of the input stream
are read into memory. This prevents it from potentially reading in huge attachment streams that trigger memory starvation.
This may have the side-effect of some content not being indexed if it is to be found "beyond" the limit, but that's preferable to an OOME.
The default value was changed from fixed 10Mb to be in line with the value set for Attachments:
- Since:
- 7.17
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprotected String
extractText
(InputStream is, SearchableAttachment attachment) protected boolean
shouldExtractFrom
(String fileName, String contentType) Extract text from mime types like 'text/*', 'application/xml*' and 'application/*+xml'Methods inherited from class com.atlassian.confluence.search.v2.extractor.BaseAttachmentContentExtractor
extractFields, extractText, extractText
-
Constructor Details
-
LimitedTextContentExtractor
public LimitedTextContentExtractor()
-
-
Method Details
-
shouldExtractFrom
Extract text from mime types like 'text/*', 'application/xml*' and 'application/*+xml'- Specified by:
shouldExtractFrom
in classBaseAttachmentContentExtractor
-
extractText
- Specified by:
extractText
in classBaseAttachmentContentExtractor
- Parameters:
is
- a stream containing the attachment contentsattachment
- contains useful attachment metadata, e.g. filename- Returns:
- a String with a textual representation of the attachment's contents
-