Class HTMLSearchableTextExtractor
- java.lang.Object
- 
- com.atlassian.confluence.search.lucene.extractor.HTMLSearchableTextExtractor
 
- 
 public final class HTMLSearchableTextExtractor extends Object A utility class that will take a String formatted as HTML and remove all tags and attributes leaving only the text nodes and CData content intact. In the case of stripping link tags, key attributes (like content-title) will replace the stripped tags as opposed to removing the tag entirely. Inline elements will be simply stripped, however the start of block elements such as 'p' will be replaced with a newline. The tag stripper also knows which elements in the Confluence schema should be removed entirely for indexing. 
- 
- 
Constructor SummaryConstructors Constructor Description HTMLSearchableTextExtractor()
 - 
Method SummaryAll Methods Static Methods Concrete Methods Modifier and Type Method Description static StringstripTags(String htmlSource)static StringstripTags(String pageTitle, String htmlSource)static StringstripTags(String htmlSource, String[] elementsToIgnore)static StringstripTags(String pageTitle, String htmlSource, String[] elementsToIgnore)
 
- 
- 
- 
Method Detail- 
stripTagspublic static String stripTags(String htmlSource) throws SAXException - Throws:
- SAXException
 
 - 
stripTagspublic static String stripTags(String htmlSource, String[] elementsToIgnore) throws SAXException - Throws:
- SAXException
 
 - 
stripTagspublic static String stripTags(String pageTitle, String htmlSource) throws SAXException - Throws:
- SAXException
 
 - 
stripTagspublic static String stripTags(String pageTitle, String htmlSource, String[] elementsToIgnore) throws SAXException - Throws:
- SAXException
 
 
- 
 
-