com.ibm.icu.text
Class RuleBasedBreakIterator_New

java.lang.Object
  extended by com.ibm.icu.text.BreakIterator
      extended by com.ibm.icu.text.RuleBasedBreakIterator
          extended by com.ibm.icu.text.RuleBasedBreakIterator_New
All Implemented Interfaces:
Cloneable

public class RuleBasedBreakIterator_New
extends RuleBasedBreakIterator

Rule Based Break Iterator implementation. This is a port of the C++ class RuleBasedBreakIterator from ICU4C. A note on future plans: Once a new DictionaryBasedBreakIterator implementation is completed, the archaic implementation class RuleBasedBreakIterator_Old can be completely removed, and this class can be renamed to be simply RuleBasedBreakIterator.


Field Summary
static boolean fTrace
          Debugging flag.
 
Fields inherited from class com.ibm.icu.text.RuleBasedBreakIterator
WORD_IDEO, WORD_IDEO_LIMIT, WORD_KANA, WORD_KANA_LIMIT, WORD_LETTER, WORD_LETTER_LIMIT, WORD_NONE, WORD_NONE_LIMIT, WORD_NUMBER, WORD_NUMBER_LIMIT
 
Fields inherited from class com.ibm.icu.text.BreakIterator
DONE, KIND_CHARACTER, KIND_LINE, KIND_SENTENCE, KIND_TITLE, KIND_WORD
 
Method Summary
protected static void checkOffset(int offset, CharacterIterator text)
          Throw IllegalArgumentException unless begin <= offset < end.
 Object clone()
          Clones this iterator.
 int current()
          Returns the current iteration position.
 void dump()
          Dump the contents of the state table and character classes for this break iterator.
 boolean equals(Object that)
          Returns true if both BreakIterators are of the same class, have the same rules, and iterate over the same text.
 int first()
          Sets the current iteration position to the beginning of the text.
 int following(int offset)
          Sets the iterator to refer to the first boundary position following the specified position.
static RuleBasedBreakIterator getInstanceFromCompiledRules(InputStream is)
          Create a break iterator from a precompiled set of rules.
 int getRuleStatus()
          Deprecated. This is a draft API and might change in a future release of ICU.
 int getRuleStatusVec(int[] fillInArray)
          Deprecated. This is a draft API and might change in a future release of ICU.
 CharacterIterator getText()
          Return a CharacterIterator over the text being analyzed.
 int hashCode()
          Compute a hashcode for this BreakIterator
 boolean isBoundary(int offset)
          Returns true if the specfied position is a boundary position.
 int last()
          Sets the current iteration position to the end of the text.
 int next()
          Advances the iterator to the next boundary position.
 int next(int n)
          Advances the iterator either forward or backward the specified number of steps.
 int preceding(int offset)
          Sets the iterator to refer to the last boundary position before the specified position.
 int previous()
          Moves the iterator backwards, to the last boundary preceding this one.
 void setText(CharacterIterator newText)
          Set the iterator to analyze a new piece of text.
 String toString()
          Returns the description (rules) used to create this iterator.
 
Methods inherited from class com.ibm.icu.text.BreakIterator
getAvailableLocales, getAvailableULocales, getCharacterInstance, getCharacterInstance, getCharacterInstance, getLineInstance, getLineInstance, getLineInstance, getLocale, getSentenceInstance, getSentenceInstance, getSentenceInstance, getTitleInstance, getTitleInstance, getTitleInstance, getWordInstance, getWordInstance, getWordInstance, registerInstance, registerInstance, setText, unregister
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

fTrace

public static boolean fTrace
Debugging flag. Trace operation of state machine when true.

Method Detail

dump

public void dump()
Dump the contents of the state table and character classes for this break iterator. For debugging only.


clone

public Object clone()
Clones this iterator.

Overrides:
clone in class RuleBasedBreakIterator
Returns:
A newly-constructed RuleBasedBreakIterator with the same behavior as this one.

equals

public boolean equals(Object that)
Returns true if both BreakIterators are of the same class, have the same rules, and iterate over the same text.

Overrides:
equals in class RuleBasedBreakIterator

toString

public String toString()
Returns the description (rules) used to create this iterator. (In ICU4C, the same function is RuleBasedBreakIterator::getRules())

Overrides:
toString in class RuleBasedBreakIterator

hashCode

public int hashCode()
Compute a hashcode for this BreakIterator

Overrides:
hashCode in class RuleBasedBreakIterator
Returns:
A hash code

getInstanceFromCompiledRules

public static RuleBasedBreakIterator getInstanceFromCompiledRules(InputStream is)
                                                           throws IOException
Create a break iterator from a precompiled set of rules.

Throws:
IOException

first

public int first()
Sets the current iteration position to the beginning of the text. (i.e., the CharacterIterator's starting offset).

Overrides:
first in class RuleBasedBreakIterator
Returns:
The offset of the beginning of the text.

last

public int last()
Sets the current iteration position to the end of the text. (i.e., the CharacterIterator's ending offset).

Overrides:
last in class RuleBasedBreakIterator
Returns:
The text's past-the-end offset.

next

public int next(int n)
Advances the iterator either forward or backward the specified number of steps. Negative values move backward, and positive values move forward. This is equivalent to repeatedly calling next() or previous().

Overrides:
next in class RuleBasedBreakIterator
Parameters:
n - The number of steps to move. The sign indicates the direction (negative is backwards, and positive is forwards).
Returns:
The character offset of the boundary position n boundaries away from the current one.

next

public int next()
Advances the iterator to the next boundary position.

Overrides:
next in class RuleBasedBreakIterator
Returns:
The position of the first boundary after this one.

previous

public int previous()
Moves the iterator backwards, to the last boundary preceding this one.

Overrides:
previous in class RuleBasedBreakIterator
Returns:
The position of the last boundary position preceding this one.

following

public int following(int offset)
Sets the iterator to refer to the first boundary position following the specified position.

Overrides:
following in class RuleBasedBreakIterator
Parameters:
offset - The position from which to begin searching for a break position.
Returns:
The position of the first break after the current position.

preceding

public int preceding(int offset)
Sets the iterator to refer to the last boundary position before the specified position.

Overrides:
preceding in class RuleBasedBreakIterator
Parameters:
offset - The position to begin searching for a break from.
Returns:
The position of the last boundary before the starting position.

checkOffset

protected static final void checkOffset(int offset,
                                        CharacterIterator text)
Throw IllegalArgumentException unless begin <= offset < end.


isBoundary

public boolean isBoundary(int offset)
Returns true if the specfied position is a boundary position. As a side effect, leaves the iterator pointing to the first boundary position at or after "offset".

Overrides:
isBoundary in class RuleBasedBreakIterator
Parameters:
offset - the offset to check.
Returns:
True if "offset" is a boundary position.

current

public int current()
Returns the current iteration position.

Overrides:
current in class RuleBasedBreakIterator
Returns:
The current iteration position.

getRuleStatus

public int getRuleStatus()
Deprecated. This is a draft API and might change in a future release of ICU.

Return the status tag from the break rule that determined the most recently returned break position. The values appear in the rule source within brackets, {123}, for example. For rules that do not specify a status, a default value of 0 is returned. If more than one rule applies, the numerically largest of the possible status values is returned.

Of the standard types of ICU break iterators, only the word break iterator provides status values. The values are defined in class RuleBasedBreakIterator, and allow distinguishing between words that contain alphabetic letters, "words" that appear to be numbers, punctuation and spaces, words containing ideographic characters, and more. Call getRuleStatus after obtaining a boundary position from next(), previous(), or any other break iterator functions that returns a boundary position.

Overrides:
getRuleStatus in class RuleBasedBreakIterator
Returns:
the status from the break rule that determined the most recently returned break position.

getRuleStatusVec

public int getRuleStatusVec(int[] fillInArray)
Deprecated. This is a draft API and might change in a future release of ICU.

Get the status (tag) values from the break rule(s) that determined the most recently returned break position. The values appear in the rule source within brackets, {123}, for example. The default status value for rules that do not explicitly provide one is zero.

The status values used by the standard ICU break rules are defined as public constants in class RuleBasedBreakIterator.

If the size of the output array is insufficient to hold the data, the output will be truncated to the available length. No exception will be thrown.

Overrides:
getRuleStatusVec in class RuleBasedBreakIterator
Parameters:
fillInArray - an array to be filled in with the status values.
Returns:
The number of rule status values from rules that determined the most recent boundary returned by the break iterator. In the event that the array is too small, the return value is the total number of status values that were available, not the reduced number that were actually returned.

getText

public CharacterIterator getText()
Return a CharacterIterator over the text being analyzed. This version of this method returns the actual CharacterIterator we're using internally. Changing the state of this iterator can have undefined consequences. If you need to change it, clone it first.

Overrides:
getText in class RuleBasedBreakIterator
Returns:
An iterator over the text being analyzed.

setText

public void setText(CharacterIterator newText)
Set the iterator to analyze a new piece of text. This function resets the current iteration position to the beginning of the text.

Overrides:
setText in class RuleBasedBreakIterator
Parameters:
newText - An iterator over the text to analyze.


Copyright (c) 2004 IBM Corporation and others.