HTMLSearchableTextExtractor (Atlassian Confluence 4.0.4 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.atlassian.confluence.search.lucene.extractor
Class HTMLSearchableTextExtractor

java.lang.Object
  com.atlassian.confluence.search.lucene.extractor.HTMLSearchableTextExtractor

public final class HTMLSearchableTextExtractor
extends Object
extends Object

A utility class that will take a String formatted as HTML and remove all tags and attributes leaving only the text nodes and CData content intact. Inline elements will be simply stripped, however the start of block elements such as 'p' will be replaced with a newline.

The tag stripper also knows which elements in the Confluence schema should be removed entirely for indexing.

Constructor Summary
`HTMLSearchableTextExtractor()`

Method Summary
`static String`	`stripTags(String htmlSource)`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

HTMLSearchableTextExtractor

public HTMLSearchableTextExtractor()

Method Detail

stripTags

public static String stripTags(String htmlSource)
                        throws SAXException

Throws:: SAXException