morfologik.fsa
Class FSA

java.lang.Object
  extended by morfologik.fsa.FSA
All Implemented Interfaces:
java.lang.Iterable<java.nio.ByteBuffer>
Direct Known Subclasses:
CFSA, CFSA2, ConstantArcSizeFSA, FSA5

public abstract class FSA
extends java.lang.Object
implements java.lang.Iterable<java.nio.ByteBuffer>

This is a top abstract class for handling finite state automata. These automata are arc-based, a design described in Jan Daciuk's Incremental Construction of Finite-State Automata and Transducers, and Their Use in the Natural Language Processing (PhD thesis, Technical University of Gdansk).

Concrete subclasses (implementations) provide varying tradeoffs and features: traversal speed vs. memory size, for example.

See Also:
FSABuilder

Constructor Summary
FSA()
           
 
Method Summary
abstract  int getArc(int node, byte label)
           
 int getArcCount(int node)
          Calculates the number of arcs of a given node.
abstract  byte getArcLabel(int arc)
          Return the label associated with a given arc.
abstract  int getEndNode(int arc)
          Return the end node pointed to by a given arc.
abstract  int getFirstArc(int node)
           
abstract  java.util.Set<FSAFlags> getFlags()
          Returns a set of flags for this FSA instance.
abstract  int getNextArc(int arc)
           
 int getRightLanguageCount(int node)
           
abstract  int getRootNode()
           
 java.lang.Iterable<java.nio.ByteBuffer> getSequences()
          An alias of calling iterator() directly (FSA is also Iterable).
 java.lang.Iterable<java.nio.ByteBuffer> getSequences(int node)
          Returns an iterator over all binary sequences starting at the given FSA state (node) and ending in final nodes.
abstract  boolean isArcFinal(int arc)
          Returns true if the destination node at the end of this arc corresponds to an input sequence created when building this automaton.
abstract  boolean isArcTerminal(int arc)
          Returns true if this arc does not have a terminating node (@link getEndNode(int) will throw an exception).
 java.util.Iterator<java.nio.ByteBuffer> iterator()
          Returns an iterator over all binary sequences starting from the initial FSA state (node) and ending in final nodes.
static
<T extends FSA>
T
read(java.io.InputStream in)
          A factory for reading automata in any of the supported versions.
<T extends StateVisitor>
T
visitAllStates(T v)
          Visit all states.
<T extends StateVisitor>
T
visitInPostOrder(T v)
          Same as visitInPostOrder(StateVisitor, int), starting from root automaton node.
<T extends StateVisitor>
T
visitInPostOrder(T v, int node)
          Visits all states reachable from node in postorder.
<T extends StateVisitor>
T
visitInPreOrder(T v)
          Same as visitInPreOrder(StateVisitor, int), starting from root automaton node.
<T extends StateVisitor>
T
visitInPreOrder(T v, int node)
          Visits all states in preorder.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FSA

public FSA()
Method Detail

getRootNode

public abstract int getRootNode()
Returns:
Returns the identifier of the root node of this automaton. Returns 0 if the start node is also the end node (the automaton is empty).

getFirstArc

public abstract int getFirstArc(int node)
Returns:
Returns the identifier of the first arc leaving node or 0 if the node has no outgoing arcs.

getNextArc

public abstract int getNextArc(int arc)
Returns:
Returns the identifier of the next arc after arc and leaving node. Zero is returned if no more arcs are available for the node.

getArc

public abstract int getArc(int node,
                           byte label)
Returns:
Returns the identifier of an arc leaving node and labeled with label. An identifier equal to 0 means the node has no outgoing arc labeled label.

getArcLabel

public abstract byte getArcLabel(int arc)
Return the label associated with a given arc.


isArcFinal

public abstract boolean isArcFinal(int arc)
Returns true if the destination node at the end of this arc corresponds to an input sequence created when building this automaton.


isArcTerminal

public abstract boolean isArcTerminal(int arc)
Returns true if this arc does not have a terminating node (@link getEndNode(int) will throw an exception). Implies isArcFinal(int).


getEndNode

public abstract int getEndNode(int arc)
Return the end node pointed to by a given arc. Terminal arcs (those that point to a terminal state) have no end node representation and throw a runtime exception.


getFlags

public abstract java.util.Set<FSAFlags> getFlags()
Returns a set of flags for this FSA instance.


getArcCount

public int getArcCount(int node)
Calculates the number of arcs of a given node. Unless really required, use the following idiom for looping through all arcs:
 for (int arc = fsa.getFirstArc(node); arc != 0; arc = fsa.getNextArc(arc)) {
 }
 


getRightLanguageCount

public int getRightLanguageCount(int node)
Returns:
Returns the number of sequences reachable from the given state if the automaton was compiled with FSAFlags.NUMBERS. The size of the right language of the state, in other words.
Throws:
java.lang.UnsupportedOperationException - If the automaton was not compiled with FSAFlags.NUMBERS. The value can then be computed by manual count of getSequences(int).

getSequences

public java.lang.Iterable<java.nio.ByteBuffer> getSequences(int node)
Returns an iterator over all binary sequences starting at the given FSA state (node) and ending in final nodes. This corresponds to a set of suffixes of a given prefix from all sequences stored in the automaton.

The returned iterator is a ByteBuffer whose contents changes on each call to Iterator.next(). The keep the contents between calls to Iterator.next(), one must copy the buffer to some other location.

Important. It is guaranteed that the returned byte buffer is backed by a byte array and that the content of the byte buffer starts at the array's index 0.

See Also:
Iterable

getSequences

public final java.lang.Iterable<java.nio.ByteBuffer> getSequences()
An alias of calling iterator() directly (FSA is also Iterable).


iterator

public final java.util.Iterator<java.nio.ByteBuffer> iterator()
Returns an iterator over all binary sequences starting from the initial FSA state (node) and ending in final nodes. The returned iterator is a ByteBuffer whose contents changes on each call to Iterator.next(). The keep the contents between calls to Iterator.next(), one must copy the buffer to some other location.

Important. It is guaranteed that the returned byte buffer is backed by a byte array and that the content of the byte buffer starts at the array's index 0.

Specified by:
iterator in interface java.lang.Iterable<java.nio.ByteBuffer>
See Also:
Iterable

visitAllStates

public <T extends StateVisitor> T visitAllStates(T v)
Visit all states. The order of visiting is undefined. This method may be faster than traversing the automaton in post or preorder since it can scan states linearly. Returning false from StateVisitor.accept(int) immediately terminates the traversal.


visitInPostOrder

public <T extends StateVisitor> T visitInPostOrder(T v)
Same as visitInPostOrder(StateVisitor, int), starting from root automaton node.


visitInPostOrder

public <T extends StateVisitor> T visitInPostOrder(T v,
                                                   int node)
Visits all states reachable from node in postorder. Returning false from StateVisitor.accept(int) immediately terminates the traversal.


visitInPreOrder

public <T extends StateVisitor> T visitInPreOrder(T v)
Same as visitInPreOrder(StateVisitor, int), starting from root automaton node.


visitInPreOrder

public <T extends StateVisitor> T visitInPreOrder(T v,
                                                  int node)
Visits all states in preorder. Returning false from StateVisitor.accept(int) skips traversal of all sub-states of a given state.


read

public static <T extends FSA> T read(java.io.InputStream in)
                          throws java.io.IOException
A factory for reading automata in any of the supported versions. If possible, explicit constructors should be used.

Throws:
java.io.IOException
See Also:
FSA5.FSA5(InputStream)