com.atlassian.confluence.search.v2.extractor.BaseAttachmentContentExtractor

com.atlassian.confluence.impl.search.v2.extractor.LimitedTextContentExtractor

All Implemented Interfaces:: Extractor2

public class LimitedTextContentExtractor extends BaseAttachmentContentExtractor

A subclass of BaseAttachmentContentExtractor which places a limit on how many bytes of the input stream are read into memory. This prevents it from potentially reading in huge attachment streams that trigger memory starvation.

This may have the side-effect of some content not being indexed if it is to be found "beyond" the limit, but that's preferable to an OOME.

The default value was changed from fixed 10Mb to be in line with the value set for Attachments:

Since:

7.17

See Also:

AttachmentExtractedTextExtractor

Constructor Summary

Constructors

Constructor

Description

LimitedTextContentExtractor()
Method Summary

Modifier and Type

Method

Description

protected String

extractText(InputStream is, SearchableAttachment attachment)

protected boolean

shouldExtractFrom(String fileName, String contentType)

Extract text from mime types like 'text/*', 'application/xml*' and 'application/*+xml'

Methods inherited from class com.atlassian.confluence.search.v2.extractor.BaseAttachmentContentExtractor
extractFields, extractText, extractText

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- LimitedTextContentExtractor
  
  public LimitedTextContentExtractor()
Method Details
- shouldExtractFrom
  
  protected boolean shouldExtractFrom(String fileName, String contentType)
  
  Extract text from mime types like 'text/*', 'application/xml*' and 'application/*+xml'
  
  Specified by:
  
  shouldExtractFrom in class BaseAttachmentContentExtractor
- extractText
  
  protected String extractText(InputStream is, SearchableAttachment attachment)
  
  Specified by:
  
  extractText in class BaseAttachmentContentExtractor
  
  Parameters:
  
  is - a stream containing the attachment contents
  
  attachment - contains useful attachment metadata, e.g. filename
  
  Returns:
  
  a String with a textual representation of the attachment's contents

Class LimitedTextContentExtractor

Constructor Summary

Method Summary

Methods inherited from class com.atlassian.confluence.search.v2.extractor.BaseAttachmentContentExtractor

Methods inherited from class java.lang.Object

Constructor Details

LimitedTextContentExtractor

Method Details

shouldExtractFrom

extractText