net.jxta.search.util
Class Html

java.lang.Object
  |
  +--net.jxta.search.util.Html

public class Html
extends java.lang.Object

There are three parts to this:

  • Closing dangerous opened tags.
  • Removing dangerous dangling close-tags.
  • Removing irrelevant tags and stuff in-between.


    Constructor Summary
    Html()
               
     
    Method Summary
    static java.lang.String getTagAttrib(java.lang.String id, java.lang.String source)
               
    static java.lang.String highlightTerms(java.lang.String terms, java.lang.String data)
               
    static void main(java.lang.String[] args)
               
    static java.lang.String modifyLinks(java.lang.String html, java.net.URL base)
               
    static java.lang.String relativeLink(java.lang.String base, java.lang.String link)
               
    static java.lang.String stripHtmlTags(java.lang.String data)
               
    static java.lang.String stripTags(java.lang.String data)
              Removes all HTML tags from the supplied string.
    static void testHtmlClass()
               
    static java.lang.String verify(java.lang.String html)
               
     
    Methods inherited from class java.lang.Object
    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
     

    Constructor Detail

    Html

    public Html()
    Method Detail

    stripHtmlTags

    public static java.lang.String stripHtmlTags(java.lang.String data)

    stripTags

    public static java.lang.String stripTags(java.lang.String data)
    Removes all HTML tags from the supplied string.

    Example: Before stripping: Amazon sells other items too such as Movies and more. After stripping: Amazon sells other items too such as movies and more.

    The method should be able to handle text that has no html markup embedded in it.

    FIXME: Does this method provably handle unbalanced tags correctly?

    Parameters:
    String - containing html and text.
    Returns:
    String with just the text, all html tags stripped out.

    modifyLinks

    public static java.lang.String modifyLinks(java.lang.String html,
                                               java.net.URL base)

    relativeLink

    public static java.lang.String relativeLink(java.lang.String base,
                                                java.lang.String link)

    highlightTerms

    public static java.lang.String highlightTerms(java.lang.String terms,
                                                  java.lang.String data)

    getTagAttrib

    public static java.lang.String getTagAttrib(java.lang.String id,
                                                java.lang.String source)

    verify

    public static java.lang.String verify(java.lang.String html)

    testHtmlClass

    public static void testHtmlClass()

    main

    public static void main(java.lang.String[] args)
                     throws java.io.IOException