gnu.xml.pipeline
Class LinkFilter

java.lang.Object
  extended bygnu.xml.pipeline.EventFilter
      extended bygnu.xml.pipeline.LinkFilter
All Implemented Interfaces:
ContentHandler, DeclHandler, DTDHandler, EventConsumer, LexicalHandler

public class LinkFilter
extends EventFilter

Pipeline filter to remember XHTML links found in a document, so they can later be crawled. Fragments are not counted, and duplicates are ignored. Callers are responsible for filtering out URLs they aren't interested in. Events are passed through unmodified.

Input MUST include a setDocumentLocator() call, as it's used to resolve relative links in the absence of a "base" element. Input MUST also include namespace identifiers, since it is the XHTML namespace identifier which is used to identify the relevant elements.

FIXME: handle xml:base attribute ... in association with a stack of base URIs. Similarly, recognize/support XLink data.

Version:
$Date: 2001/10/25 07:11:55 $
Author:
David Brownell

Field Summary
 
Fields inherited from class gnu.xml.pipeline.EventFilter
DECL_HANDLER, FEATURE_URI, LEXICAL_HANDLER, PROPERTY_URI
 
Constructor Summary
LinkFilter()
          Constructs a new event filter, which collects links in private data structure for later enumeration.
LinkFilter(EventConsumer next)
          Constructs a new event filter, which collects links in private data structure for later enumeration and passes all events, unmodified, to the next consumer.
 
Method Summary
 void endDocument()
          Forgets about any base URI information that may be recorded.
 Enumeration getLinks()
          Returns an enumeration of the links found since the filter was constructed, or since removeAllLinks() was called.
 void removeAllLinks()
          Removes records about all links reported to the event stream, as if the filter were newly created.
 void startDocument()
          Reports an error if no Locator has been made available.
 void startElement(String uri, String localName, String qName, Attributes atts)
          Collects URIs for (X)HTML content from elements which hold them.
 
Methods inherited from class gnu.xml.pipeline.EventFilter
attributeDecl, bind, chainTo, characters, comment, elementDecl, endCDATA, endDTD, endElement, endEntity, endPrefixMapping, externalEntityDecl, getContentHandler, getDocumentLocator, getDTDHandler, getErrorHandler, getNext, getProperty, ignorableWhitespace, internalEntityDecl, notationDecl, processingInstruction, setContentHandler, setDocumentLocator, setDTDHandler, setErrorHandler, setProperty, skippedEntity, startCDATA, startDTD, startEntity, startPrefixMapping, unparsedEntityDecl
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LinkFilter

public LinkFilter()
Constructs a new event filter, which collects links in private data structure for later enumeration.


LinkFilter

public LinkFilter(EventConsumer next)
Constructs a new event filter, which collects links in private data structure for later enumeration and passes all events, unmodified, to the next consumer.

Method Detail

getLinks

public Enumeration getLinks()
Returns an enumeration of the links found since the filter was constructed, or since removeAllLinks() was called.

Returns:
enumeration of strings.

removeAllLinks

public void removeAllLinks()
Removes records about all links reported to the event stream, as if the filter were newly created.


startElement

public void startElement(String uri,
                         String localName,
                         String qName,
                         Attributes atts)
                  throws SAXException
Collects URIs for (X)HTML content from elements which hold them.

Specified by:
startElement in interface ContentHandler
Overrides:
startElement in class EventFilter
Throws:
SAXException

startDocument

public void startDocument()
                   throws SAXException
Reports an error if no Locator has been made available.

Specified by:
startDocument in interface ContentHandler
Overrides:
startDocument in class EventFilter
Throws:
SAXException

endDocument

public void endDocument()
                 throws SAXException
Forgets about any base URI information that may be recorded. Applications will often want to call removeAllLinks(), likely after examining the links which were reported.

Specified by:
endDocument in interface ContentHandler
Overrides:
endDocument in class EventFilter
Throws:
SAXException


Source code is under GPL (with library exception) in the JAXP project at http://www.gnu.org/software/classpathx/jaxp
This documentation was derived from that source code on 2007-02-12.