com.ibm.icu.text
Class RuleBasedBreakIterator
java.lang.Object
com.ibm.icu.text.BreakIterator
com.ibm.icu.text.RuleBasedBreakIterator
- All Implemented Interfaces:
- Cloneable
- Direct Known Subclasses:
- RuleBasedBreakIterator_New, RuleBasedBreakIterator_Old
- public class RuleBasedBreakIterator
- extends BreakIterator
A subclass of BreakIterator whose behavior is specified using a list of rules.
- Status:
- Stable ICU 2.0.
Field Summary |
static int |
WORD_IDEO
Deprecated. This is a draft API and might change in a future release of ICU. |
static int |
WORD_IDEO_LIMIT
Deprecated. This is a draft API and might change in a future release of ICU. |
static int |
WORD_KANA
Deprecated. This is a draft API and might change in a future release of ICU. |
static int |
WORD_KANA_LIMIT
Deprecated. This is a draft API and might change in a future release of ICU. |
static int |
WORD_LETTER
Deprecated. This is a draft API and might change in a future release of ICU. |
static int |
WORD_LETTER_LIMIT
Deprecated. This is a draft API and might change in a future release of ICU. |
static int |
WORD_NONE
Deprecated. This is a draft API and might change in a future release of ICU. |
static int |
WORD_NONE_LIMIT
Deprecated. This is a draft API and might change in a future release of ICU. |
static int |
WORD_NUMBER
Deprecated. This is a draft API and might change in a future release of ICU. |
static int |
WORD_NUMBER_LIMIT
Deprecated. This is a draft API and might change in a future release of ICU. |
Constructor Summary |
protected |
RuleBasedBreakIterator()
This default constructor is used when creating derived classes
of RulesBasedBreakIterator. |
|
RuleBasedBreakIterator(String description)
Constructs a RuleBasedBreakIterator_Old according to the description
provided. |
Method Summary |
Object |
clone()
Clones this iterator. |
int |
current()
Returns the current iteration position. |
boolean |
equals(Object that)
Returns true if both BreakIterators are of the same class, have the same
rules, and iterate over the same text. |
int |
first()
Sets the current iteration position to the beginning of the text.
|
int |
following(int offset)
Sets the iterator to refer to the first boundary position following
the specified position. |
static RuleBasedBreakIterator |
getInstanceFromCompiledRules(InputStream is)
Deprecated. This is a draft API and might change in a future release of ICU. |
int |
getRuleStatus()
Deprecated. This is a draft API and might change in a future release of ICU. |
int |
getRuleStatusVec(int[] fillInArray)
Deprecated. This is a draft API and might change in a future release of ICU. |
CharacterIterator |
getText()
Return a CharacterIterator over the text being analyzed. |
int |
hashCode()
Compute a hashcode for this BreakIterator |
boolean |
isBoundary(int offset)
Returns true if the specfied position is a boundary position. |
int |
last()
Sets the current iteration position to the end of the text.
|
int |
next()
Advances the iterator to the next boundary position. |
int |
next(int n)
Advances the iterator either forward or backward the specified number of steps.
|
int |
preceding(int offset)
Sets the iterator to refer to the last boundary position before the
specified position. |
int |
previous()
Advances the iterator backwards, to the last boundary preceding this one. |
void |
setText(CharacterIterator newText)
Set the iterator to analyze a new piece of text. |
String |
toString()
Returns the description used to create this iterator |
Methods inherited from class com.ibm.icu.text.BreakIterator |
getAvailableLocales, getAvailableULocales, getCharacterInstance, getCharacterInstance, getCharacterInstance, getLineInstance, getLineInstance, getLineInstance, getLocale, getSentenceInstance, getSentenceInstance, getSentenceInstance, getTitleInstance, getTitleInstance, getTitleInstance, getWordInstance, getWordInstance, getWordInstance, registerInstance, registerInstance, setText, unregister |
WORD_NONE
public static final int WORD_NONE
- Deprecated. This is a draft API and might change in a future release of ICU.
- Tag value for "words" that do not fit into any of other categories.
Includes spaces and most punctuation.
- See Also:
- Constant Field Values
- Status:
- Draft ICU 3.0.
- Status:
- Deprecated in This is a draft API and might change in a future release of ICU..
WORD_NONE_LIMIT
public static final int WORD_NONE_LIMIT
- Deprecated. This is a draft API and might change in a future release of ICU.
- Upper bound for tags for uncategorized words.
- See Also:
- Constant Field Values
- Status:
- Draft ICU 3.0.
- Status:
- Deprecated in This is a draft API and might change in a future release of ICU..
WORD_NUMBER
public static final int WORD_NUMBER
- Deprecated. This is a draft API and might change in a future release of ICU.
- Tag value for words that appear to be numbers, lower limit.
- See Also:
- Constant Field Values
- Status:
- Draft ICU 3.0.
- Status:
- Deprecated in This is a draft API and might change in a future release of ICU..
WORD_NUMBER_LIMIT
public static final int WORD_NUMBER_LIMIT
- Deprecated. This is a draft API and might change in a future release of ICU.
- Tag value for words that appear to be numbers, upper limit.
- See Also:
- Constant Field Values
- Status:
- Draft ICU 3.0.
- Status:
- Deprecated in This is a draft API and might change in a future release of ICU..
WORD_LETTER
public static final int WORD_LETTER
- Deprecated. This is a draft API and might change in a future release of ICU.
- Tag value for words that contain letters, excluding
hiragana, katakana or ideographic characters, lower limit.
- See Also:
- Constant Field Values
- Status:
- Draft ICU 3.0.
- Status:
- Deprecated in This is a draft API and might change in a future release of ICU..
WORD_LETTER_LIMIT
public static final int WORD_LETTER_LIMIT
- Deprecated. This is a draft API and might change in a future release of ICU.
- Tag value for words containing letters, upper limit
- See Also:
- Constant Field Values
- Status:
- Draft ICU 3.0.
- Status:
- Deprecated in This is a draft API and might change in a future release of ICU..
WORD_KANA
public static final int WORD_KANA
- Deprecated. This is a draft API and might change in a future release of ICU.
- Tag value for words containing kana characters, lower limit
- See Also:
- Constant Field Values
- Status:
- Draft ICU 3.0.
- Status:
- Deprecated in This is a draft API and might change in a future release of ICU..
WORD_KANA_LIMIT
public static final int WORD_KANA_LIMIT
- Deprecated. This is a draft API and might change in a future release of ICU.
- Tag value for words containing kana characters, upper limit
- See Also:
- Constant Field Values
- Status:
- Draft ICU 3.0.
- Status:
- Deprecated in This is a draft API and might change in a future release of ICU..
WORD_IDEO
public static final int WORD_IDEO
- Deprecated. This is a draft API and might change in a future release of ICU.
- Tag value for words containing ideographic characters, lower limit
- See Also:
- Constant Field Values
- Status:
- Draft ICU 3.0.
- Status:
- Deprecated in This is a draft API and might change in a future release of ICU..
WORD_IDEO_LIMIT
public static final int WORD_IDEO_LIMIT
- Deprecated. This is a draft API and might change in a future release of ICU.
- Tag value for words containing ideographic characters, upper limit
- See Also:
- Constant Field Values
- Status:
- Draft ICU 3.0.
- Status:
- Deprecated in This is a draft API and might change in a future release of ICU..
RuleBasedBreakIterator
public RuleBasedBreakIterator(String description)
- Constructs a RuleBasedBreakIterator_Old according to the description
provided. If the description is malformed, throws an
IllegalArgumentException. Normally, instead of constructing a
RuleBasedBreakIterator_Old directory, you'll use the factory methods
on BreakIterator to create one indirectly from a description
in the framework's resource files. You'd use this when you want
special behavior not provided by the built-in iterators.
- Status:
- Stable ICU 2.0.
RuleBasedBreakIterator
protected RuleBasedBreakIterator()
- This default constructor is used when creating derived classes
of RulesBasedBreakIterator. Not intended for use by normal
clients of break iterators.
- Status:
- Internal. This API is Internal Only and can change at any time.
getInstanceFromCompiledRules
public static RuleBasedBreakIterator getInstanceFromCompiledRules(InputStream is)
throws IOException
- Deprecated. This is a draft API and might change in a future release of ICU.
- Get a break iterator based on a set of pre-compiled break rules.
- Parameters:
is
- An input stream that supplies the compiled rule data. The
format of the rule data on the stream is that of a rule data file
produced by the ICU4C tool "genbrk".
- Returns:
- A RuleBasedBreakIterator based on the supplied break rules.
- Throws:
IOException
- Status:
- Draft ICU 3.0.
- Status:
- Deprecated in This is a draft API and might change in a future release of ICU..
clone
public Object clone()
- Clones this iterator.
- Overrides:
clone
in class BreakIterator
- Returns:
- A newly-constructed RuleBasedBreakIterator with the same
behavior as this one.
- Status:
- Stable ICU 2.0.
equals
public boolean equals(Object that)
- Returns true if both BreakIterators are of the same class, have the same
rules, and iterate over the same text.
- Status:
- Stable ICU 2.0.
toString
public String toString()
- Returns the description used to create this iterator
- Status:
- Stable ICU 2.0.
hashCode
public int hashCode()
- Compute a hashcode for this BreakIterator
- Returns:
- A hash code
- Status:
- Stable ICU 2.0.
first
public int first()
- Sets the current iteration position to the beginning of the text.
(i.e., the CharacterIterator's starting offset).
- Specified by:
first
in class BreakIterator
- Returns:
- The offset of the beginning of the text.
- Status:
- Stable ICU 2.0.
last
public int last()
- Sets the current iteration position to the end of the text.
(i.e., the CharacterIterator's ending offset).
- Specified by:
last
in class BreakIterator
- Returns:
- The text's past-the-end offset.
- Status:
- Stable ICU 2.0.
next
public int next(int n)
- Advances the iterator either forward or backward the specified number of steps.
Negative values move backward, and positive values move forward. This is
equivalent to repeatedly calling next() or previous().
- Specified by:
next
in class BreakIterator
- Parameters:
n
- The number of steps to move. The sign indicates the direction
(negative is backwards, and positive is forwards).
- Returns:
- The character offset of the boundary position n boundaries away from
the current one.
- Status:
- Stable ICU 2.0.
next
public int next()
- Advances the iterator to the next boundary position.
- Specified by:
next
in class BreakIterator
- Returns:
- The position of the first boundary after this one.
- Status:
- Stable ICU 2.0.
previous
public int previous()
- Advances the iterator backwards, to the last boundary preceding this one.
- Specified by:
previous
in class BreakIterator
- Returns:
- The position of the last boundary position preceding this one.
- Status:
- Stable ICU 2.0.
following
public int following(int offset)
- Sets the iterator to refer to the first boundary position following
the specified position.
- Specified by:
following
in class BreakIterator
- Parameters:
offset
- The position from which to begin searching for a break position.
- Returns:
- The position of the first break after the current position.
- Status:
- Stable ICU 2.0.
preceding
public int preceding(int offset)
- Sets the iterator to refer to the last boundary position before the
specified position.
- Overrides:
preceding
in class BreakIterator
- Parameters:
offset
- The position to begin searching for a break from.
- Returns:
- The position of the last boundary before the starting position.
- Status:
- Stable ICU 2.0.
isBoundary
public boolean isBoundary(int offset)
- Returns true if the specfied position is a boundary position. As a side
effect, leaves the iterator pointing to the first boundary position at
or after "offset".
- Overrides:
isBoundary
in class BreakIterator
- Parameters:
offset
- the offset to check.
- Returns:
- True if "offset" is a boundary position.
- Status:
- Stable ICU 2.0.
current
public int current()
- Returns the current iteration position.
- Specified by:
current
in class BreakIterator
- Returns:
- The current iteration position.
- Status:
- Stable ICU 2.0.
getRuleStatus
public int getRuleStatus()
- Deprecated. This is a draft API and might change in a future release of ICU.
- Return the status tag from the break rule that determined the most recently
returned break position. The values appear in the rule source
within brackets, {123}, for example. For rules that do not specify a
status, a default value of 0 is returned. If more than one rule applies,
the numerically largest of the possible status values is returned.
The values used by the standard ICU break rules are defined as
constants in this class, and allow distinguishing between words
that contain alphabetic letters, "words" that appear to be numbers,
punctuation and spaces, words containing ideographic characters, and
more. Call getRuleStatus
after obtaining a boundary
position from next(), previous()
, or
any other break iterator functions that returns a boundary position.
- Returns:
- the status from the break rule that determined the most recently
returned break position.
- Status:
- Draft ICU 3.0.
- Status:
- Deprecated in This is a draft API and might change in a future release of ICU..
getRuleStatusVec
public int getRuleStatusVec(int[] fillInArray)
- Deprecated. This is a draft API and might change in a future release of ICU.
- Get the status (tag) values from the break rule(s) that determined the most
recently returned break position. The values appear in the rule source
within brackets, {123}, for example. The default status value for rules
that do not explicitly provide one is zero.
The values used by the standard ICU rules are defined as contants in
this class.
If the size of the output array is insufficient to hold the data,
the output will be truncated to the available length. No exception
will be thrown.
- Parameters:
fillInArray
- an array to be filled in with the status values.
- Returns:
- The number of rule status values from rules that determined
the most recent boundary returned by the break iterator.
In the event that the array is too small, the return value
is the total number of status values that were available,
not the reduced number that were actually returned.
- Status:
- Draft ICU 3.0.
- Status:
- Deprecated in This is a draft API and might change in a future release of ICU..
getText
public CharacterIterator getText()
- Return a CharacterIterator over the text being analyzed. This version
of this method returns the actual CharacterIterator we're using internally.
Changing the state of this iterator can have undefined consequences. If
you need to change it, clone it first.
- Specified by:
getText
in class BreakIterator
- Returns:
- An iterator over the text being analyzed.
- Status:
- Stable ICU 2.0.
setText
public void setText(CharacterIterator newText)
- Set the iterator to analyze a new piece of text. This function resets
the current iteration position to the beginning of the text.
- Specified by:
setText
in class BreakIterator
- Parameters:
newText
- An iterator over the text to analyze.- Status:
- Stable ICU 2.0.
Copyright (c) 2004 IBM Corporation and others.