com.atlassian.bonnie.search.extractor
Class BaseAttachmentContentExtractor
java.lang.Object
com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor
- All Implemented Interfaces:
- Extractor
- Direct Known Subclasses:
- DefaultTextContentExtractor, MsExcelContentExtractor, MsPowerpointContentExtractor, MsWordContentExtractor, PdfContentExtractor
- public abstract class BaseAttachmentContentExtractor
- extends java.lang.Object
- implements Extractor
|
Field Summary |
static org.apache.log4j.Category |
log
|
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
log
public static final org.apache.log4j.Category log
BaseAttachmentContentExtractor
public BaseAttachmentContentExtractor()
addFields
public void addFields(org.apache.lucene.document.Document document,
java.lang.StringBuffer defaultSearchableText,
Searchable searchable)
- Specified by:
addFields in interface Extractor
shouldExtractFrom
protected boolean shouldExtractFrom(java.lang.String fileName,
java.lang.String contentType)
getMatchingContentTypes
protected java.lang.String[] getMatchingContentTypes()
getMatchingFileExtensions
protected java.lang.String[] getMatchingFileExtensions()
extractText
protected abstract java.lang.String extractText(java.io.InputStream is,
SearchableAttachment attachment)
throws ExtractorException
- Package access for unit testing only. Do not use this method directly. Use #addFields().
- Parameters:
is - a stream containing the attachment contentsattachment - contains useful attachment metadata, e.g. filename
- Returns:
- a String with a textual representation of the attachment's contents
- Throws:
ExtractorException - if there is a problem with converting the attachment content into text. A wrapper
around the original exception.
Copyright © 2006-2009 Atlassian Software Systems Pty Ltd. All Rights Reserved.