HTMLSearchableTextExtractor (Atlassian Confluence 5.9.1 API)

java.lang.Object
- com.atlassian.confluence.search.lucene.extractor.HTMLSearchableTextExtractor

```
public final class HTMLSearchableTextExtractor
extends Object
```
A utility class that will take a String formatted as HTML and remove all tags and attributes leaving only the text nodes and CData content intact. In the case of stripping link tags, key attributes (like content-title) will replace the stripped tags as opposed to removing the tag entirely. Inline elements will be simply stripped, however the start of block elements such as 'p' will be replaced with a newline.

The tag stripper also knows which elements in the Confluence schema should be removed entirely for indexing.

Constructor Summary

Constructors
Constructor and Description

HTMLSearchableTextExtractor()

Constructors
Constructor and Description
`HTMLSearchableTextExtractor()`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static String`	`stripTags(String htmlSource)`
`static String`	`stripTags(String pageTitle, String htmlSource)`
`static String`	`stripTags(String htmlSource, String[] elementsToIgnore)`
`static String`	`stripTags(String pageTitle, String htmlSource, String[] elementsToIgnore)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail
- HTMLSearchableTextExtractor
```
public HTMLSearchableTextExtractor()
```

Method Detail

stripTags

public static String stripTags(String htmlSource)
                        throws SAXException

Throws:: SAXException

stripTags

public static String stripTags(String htmlSource,
                               String[] elementsToIgnore)
                        throws SAXException

Throws:: SAXException

stripTags

public static String stripTags(String pageTitle,
                               String htmlSource)
                        throws SAXException

Throws:: SAXException

stripTags

public static String stripTags(String pageTitle,
                               String htmlSource,
                               String[] elementsToIgnore)
                        throws SAXException

Throws:: SAXException

Class HTMLSearchableTextExtractor

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

HTMLSearchableTextExtractor

Method Detail

stripTags

stripTags

stripTags

stripTags