public class LimitedTextContentExtractor extends BaseAttachmentContentExtractor
BaseAttachmentContentExtractor
which places a limit on how many bytes of the input stream
are read into memory. This prevents it from potentially reading in huge attachment streams that trigger memory starvation.
This may have the side-effect of some content not being indexed if it is to be found "beyond" the limit, but that's preferable to an OOME.
The default value was changed from fixed 10Mb to be in line with the value set for Attachments:
AttachmentExtractedTextExtractor
Constructor and Description |
---|
LimitedTextContentExtractor() |
Modifier and Type | Method and Description |
---|---|
protected String |
extractText(InputStream is,
com.atlassian.bonnie.search.SearchableAttachment attachment) |
protected boolean |
shouldExtractFrom(String fileName,
String contentType)
Extract text from mime types like 'text/*', 'application/xml*' and 'application/*+xml'
|
extractFields, extractText, extractText
protected boolean shouldExtractFrom(String fileName, String contentType)
shouldExtractFrom
in class BaseAttachmentContentExtractor
protected String extractText(InputStream is, com.atlassian.bonnie.search.SearchableAttachment attachment)
extractText
in class BaseAttachmentContentExtractor
is
- a stream containing the attachment contentsattachment
- contains useful attachment metadata, e.g. filenameCopyright © 2003–2023 Atlassian. All rights reserved.
View cookie preferences