Class LimitedTextContentExtractor
- java.lang.Object
-
- com.atlassian.confluence.search.v2.extractor.BaseAttachmentContentExtractor
-
- com.atlassian.confluence.impl.search.v2.extractor.LimitedTextContentExtractor
-
- All Implemented Interfaces:
Extractor2
public class LimitedTextContentExtractor extends BaseAttachmentContentExtractor
A subclass ofBaseAttachmentContentExtractor
which places a limit on how many bytes of the input stream are read into memory. This prevents it from potentially reading in huge attachment streams that trigger memory starvation.This may have the side-effect of some content not being indexed if it is to be found "beyond" the limit, but that's preferable to an OOME.
The default value was changed from fixed 10Mb to be in line with the value set for Attachments:
- Since:
- 7.17
- See Also:
AttachmentExtractedTextExtractor
-
-
Constructor Summary
Constructors Constructor Description LimitedTextContentExtractor()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected String
extractText(InputStream is, SearchableAttachment attachment)
protected boolean
shouldExtractFrom(String fileName, String contentType)
Extract text from mime types like 'text/*', 'application/xml*' and 'application/*+xml'-
Methods inherited from class com.atlassian.confluence.search.v2.extractor.BaseAttachmentContentExtractor
extractFields, extractText, extractText
-
-
-
-
Method Detail
-
shouldExtractFrom
protected boolean shouldExtractFrom(String fileName, String contentType)
Extract text from mime types like 'text/*', 'application/xml*' and 'application/*+xml'- Specified by:
shouldExtractFrom
in classBaseAttachmentContentExtractor
-
extractText
protected String extractText(InputStream is, SearchableAttachment attachment)
- Specified by:
extractText
in classBaseAttachmentContentExtractor
- Parameters:
is
- a stream containing the attachment contentsattachment
- contains useful attachment metadata, e.g. filename- Returns:
- a String with a textual representation of the attachment's contents
-
-