com.atlassian.confluence.search.lucene.extractor
Class HTMLSearchableTextExtractor
java.lang.Object
com.atlassian.confluence.search.lucene.extractor.HTMLSearchableTextExtractor
public final class HTMLSearchableTextExtractor
- extends java.lang.Object
A utility class that will take a String formatted as HTML and remove all tags and attributes leaving only the text
nodes and CData content intact. In the case of stripping link tags, key attributes (like content-title) will
replace the stripped tags as opposed to removing the tag entirely. Inline elements will be simply stripped, however
the start of block elements such as 'p' will be replaced with a newline.
The tag stripper also knows which elements in the Confluence schema should be removed entirely for indexing.
Method Summary |
static java.lang.String |
stripTags(java.lang.String htmlSource)
|
static java.lang.String |
stripTags(java.lang.String pageTitle,
java.lang.String htmlSource)
|
static java.lang.String |
stripTags(java.lang.String htmlSource,
java.lang.String[] elementsToIgnore)
|
static java.lang.String |
stripTags(java.lang.String pageTitle,
java.lang.String htmlSource,
java.lang.String[] elementsToIgnore)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
HTMLSearchableTextExtractor
public HTMLSearchableTextExtractor()
stripTags
public static java.lang.String stripTags(java.lang.String htmlSource)
throws org.xml.sax.SAXException
- Throws:
org.xml.sax.SAXException
stripTags
public static java.lang.String stripTags(java.lang.String htmlSource,
java.lang.String[] elementsToIgnore)
throws org.xml.sax.SAXException
- Throws:
org.xml.sax.SAXException
stripTags
public static java.lang.String stripTags(java.lang.String pageTitle,
java.lang.String htmlSource)
throws org.xml.sax.SAXException
- Throws:
org.xml.sax.SAXException
stripTags
public static java.lang.String stripTags(java.lang.String pageTitle,
java.lang.String htmlSource,
java.lang.String[] elementsToIgnore)
throws org.xml.sax.SAXException
- Throws:
org.xml.sax.SAXException
Copyright © 2003-2014 Atlassian. All Rights Reserved.