com.ibm.icu.text
Class RuleBasedBreakIterator_Old.Builder

java.lang.Object
  extended by com.ibm.icu.text.RuleBasedBreakIterator_Old.Builder
Direct Known Subclasses:
DictionaryBasedBreakIterator.Builder
Enclosing class:
RuleBasedBreakIterator_Old

protected class RuleBasedBreakIterator_Old.Builder
extends Object

The Builder class has the job of constructing a RuleBasedBreakIterator_Old from a textual description. A Builder is constructed by RuleBasedBreakIterator_Old's constructor, which uses it to construct the iterator itself and then throws it away.

The construction logic is separated out into its own class for two primary reasons:

It'd be really nice if this could be an independent class rather than an inner class, because that would shorten the source file considerably, but making Builder an inner class of RuleBasedBreakIterator_Old allows it direct access to RuleBasedBreakIterator_Old's private members, which saves us from having to provide some kind of "back door" to the Builder class that could then also be used by other classes.


Field Summary
protected static int ALL_FLAGS
          A bit mask representing the union of the mask values listed above.
protected  Vector categories
          A temporary holding place used for calculating the character categories.
protected  boolean clearLoopingStates
          A flag that is used to indicate when the list of looping states can be reset.
protected  Vector decisionPointList
          A list of all the states that have to be filled in with transitions to the next state that is created.
protected  Stack decisionPointStack
          A stack for holding decision point lists.
protected static int DONT_LOOP_FLAG
          A bit mask used to indicate a bit in the table's flags column that marks a state as one the builder shouldn't loop to any looping states
protected static int END_STATE_FLAG
          A bit mask used to indicate a bit in the table's flags column that marks a state as an accepting state.
protected  Hashtable expressions
          A table used to map parts of regexp text to lists of character categories, rather than having to figure them out from scratch each time
protected  UnicodeSet ignoreChars
          A temporary holding place for the list of ignore characters
protected static int LOOKAHEAD_STATE_FLAG
          A bit mask used to indicate a bit in the table's flags column that marks a state as a lookahead state.
protected  Vector loopingStates
          A list of states that loop back on themselves.
protected  Vector mergeList
          A list mapping pairs of state numbers for states that are to be combined to the state number of the state representing their combination.
protected  Vector statesToBackfill
          Looping states actually have to be backfilled later in the process than everything else.
protected  Vector tempStateTable
          A temporary holding place where the forward state table is built
 
Constructor Summary
RuleBasedBreakIterator_Old.Builder()
          No special construction is required for the Builder.
 
Method Summary
 void buildBreakIterator()
          This is the main function for setting up the BreakIterator's tables.
protected  void buildCharCategories(Vector tempRuleList)
          This function builds the character category table.
protected  void debugPrintTempStateTable()
           
protected  void debugPrintVector(String label, Vector v)
           
protected  void debugPrintVectorOfVectors(String label1, String label2, Vector v)
           
protected  void error(String message, int position, String context)
          Throws an IllegalArgumentException representing a syntax error in the rule description.
protected  void handleSpecialSubstitution(String replace, String replaceWith, int startPos, String description)
          This function defines a protocol for handling substitution names that are "special," i.e., that have some property beyond just being substitutions.
protected  void mungeExpressionList(Hashtable expressions)
           
protected  String processSubstitution(String substitutionRule, String description, int startPos)
          This function performs variable-name substitutions.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

categories

protected Vector categories
A temporary holding place used for calculating the character categories. This object contains UnicodeSet objects.


expressions

protected Hashtable expressions
A table used to map parts of regexp text to lists of character categories, rather than having to figure them out from scratch each time


ignoreChars

protected UnicodeSet ignoreChars
A temporary holding place for the list of ignore characters


tempStateTable

protected Vector tempStateTable
A temporary holding place where the forward state table is built


decisionPointList

protected Vector decisionPointList
A list of all the states that have to be filled in with transitions to the next state that is created. Used when building the state table from the regular expressions.


decisionPointStack

protected Stack decisionPointStack
A stack for holding decision point lists. This is used to handle nested parentheses and braces in regexps.


loopingStates

protected Vector loopingStates
A list of states that loop back on themselves. Used to handle .*?


statesToBackfill

protected Vector statesToBackfill
Looping states actually have to be backfilled later in the process than everything else. This is where a the list of states to backfill is accumulated. This is also used to handle .*?


mergeList

protected Vector mergeList
A list mapping pairs of state numbers for states that are to be combined to the state number of the state representing their combination. Used in the process of making the state table deterministic to prevent infinite recursion.


clearLoopingStates

protected boolean clearLoopingStates
A flag that is used to indicate when the list of looping states can be reset.


END_STATE_FLAG

protected static final int END_STATE_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as an accepting state.

See Also:
Constant Field Values

DONT_LOOP_FLAG

protected static final int DONT_LOOP_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as one the builder shouldn't loop to any looping states

See Also:
Constant Field Values

LOOKAHEAD_STATE_FLAG

protected static final int LOOKAHEAD_STATE_FLAG
A bit mask used to indicate a bit in the table's flags column that marks a state as a lookahead state.

See Also:
Constant Field Values

ALL_FLAGS

protected static final int ALL_FLAGS
A bit mask representing the union of the mask values listed above. Used for clearing or masking off the flag bits.

See Also:
Constant Field Values
Constructor Detail

RuleBasedBreakIterator_Old.Builder

public RuleBasedBreakIterator_Old.Builder()
No special construction is required for the Builder.

Method Detail

buildBreakIterator

public void buildBreakIterator()
This is the main function for setting up the BreakIterator's tables. It just vectors different parts of the job off to other functions.


processSubstitution

protected String processSubstitution(String substitutionRule,
                                     String description,
                                     int startPos)
This function performs variable-name substitutions. First it does syntax checking on the variable-name definition. If it's syntactically valid, it then goes through the remainder of the description and does a simple find-and-replace of the variable name with its text. (The variable text must be enclosed in either [] or () for this to work.)


handleSpecialSubstitution

protected void handleSpecialSubstitution(String replace,
                                         String replaceWith,
                                         int startPos,
                                         String description)
This function defines a protocol for handling substitution names that are "special," i.e., that have some property beyond just being substitutions. At the RuleBasedBreakIterator_Old level, we have one special substitution name, IGNORE_VAR. Subclasses can override this function to add more. Any special processing that has to go on beyond that which is done by the normal substitution-processing code is done here.


buildCharCategories

protected void buildCharCategories(Vector tempRuleList)
This function builds the character category table. On entry, tempRuleList is a vector of break rules that has had variable names substituted. On exit, the charCategoryTable data member has been initialized to hold the character category table, and tempRuleList's rules have been munged to contain character category numbers everywhere a literal character or a [] expression originally occurred.


mungeExpressionList

protected void mungeExpressionList(Hashtable expressions)

error

protected void error(String message,
                     int position,
                     String context)
Throws an IllegalArgumentException representing a syntax error in the rule description. The exception's message contains some debugging information.

Parameters:
message - A message describing the problem
position - The position in the description where the problem was discovered
context - The string containing the error

debugPrintVector

protected void debugPrintVector(String label,
                                Vector v)

debugPrintVectorOfVectors

protected void debugPrintVectorOfVectors(String label1,
                                         String label2,
                                         Vector v)

debugPrintTempStateTable

protected void debugPrintTempStateTable()


Copyright (c) 2004 IBM Corporation and others.