morfologik.fsa
Class FSA5

java.lang.Object
  extended by morfologik.fsa.FSA
      extended by morfologik.fsa.FSA5
All Implemented Interfaces:
java.lang.Iterable<java.nio.ByteBuffer>

public final class FSA5
extends FSA

FSA binary format implementation for version 5.

Version 5 indicates the dictionary was built with these flags: FSAFlags.FLEXIBLE, FSAFlags.STOPBIT and FSAFlags.NEXTBIT. The internal representation of the FSA must therefore follow this description (please note this format describes only a single transition (arc), not the entire dictionary file).

 ---- this node header present only if automaton was compiled with NUMBERS option.
 Byte
        +-+-+-+-+-+-+-+-+\
      0 | | | | | | | | | \  LSB
        +-+-+-+-+-+-+-+-+  +
      1 | | | | | | | | |  |      number of strings recognized
        +-+-+-+-+-+-+-+-+  +----- by the automaton starting
        : : : : : : : : :  |      from this node.
        +-+-+-+-+-+-+-+-+  +
  ctl-1 | | | | | | | | | /  MSB
        +-+-+-+-+-+-+-+-+/
        
 ---- remaining part of the node
 
 Byte
       +-+-+-+-+-+-+-+-+\
     0 | | | | | | | | | +------ label
       +-+-+-+-+-+-+-+-+/
 
                  +------------- node pointed to is next
                  | +----------- the last arc of the node
                  | | +--------- the arc is final
                  | | |
             +-----------+
             |    | | |  |
         ___+___  | | |  |
        /       \ | | |  |
       MSB           LSB |
        7 6 5 4 3 2 1 0  |
       +-+-+-+-+-+-+-+-+ |
     1 | | | | | | | | | \ \
       +-+-+-+-+-+-+-+-+  \ \  LSB
       +-+-+-+-+-+-+-+-+     +
     2 | | | | | | | | |     |
       +-+-+-+-+-+-+-+-+     |
     3 | | | | | | | | |     +----- target node address (in bytes)
       +-+-+-+-+-+-+-+-+     |      (not present except for the byte
       : : : : : : : : :     |       with flags if the node pointed to
       +-+-+-+-+-+-+-+-+     +       is next)
   gtl | | | | | | | | |    /  MSB
       +-+-+-+-+-+-+-+-+   /
 gtl+1                           (gtl = gotoLength)
 


Field Summary
static int ADDRESS_OFFSET
          An offset in the arc structure, where the address and flags field begins.
 byte annotation
          Annotation character.
 byte[] arcs
          An array of bytes with the internal representation of the automaton.
static int BIT_FINAL_ARC
          Bit indicating that an arc corresponds to the last character of a sequence available when building the automaton.
static int BIT_LAST_ARC
          Bit indicating that an arc is the last one of the node's list and the following one belongs to another node.
static int BIT_TARGET_NEXT
          Bit indicating that the target node of this arc follows it in the compressed automaton structure (no goto field).
static byte DEFAULT_ANNOTATION
          Default annotation byte.
static byte DEFAULT_FILLER
          Default filler byte.
 byte filler
          Filler character.
 int gtl
          Number of bytes each address takes in full, expanded form (goto length).
 int nodeDataLength
          The length of the node header structure (if the automaton was compiled with NUMBERS option).
static byte VERSION
          Automaton version as in the file header.
 
Constructor Summary
FSA5(java.io.InputStream fsaStream)
          Read and wrap a binary automaton in FSA version 5.
 
Method Summary
 int getArc(int node, byte label)
          
 byte getArcLabel(int arc)
          Return the label associated with a given arc.
 int getEndNode(int arc)
          Return the end node pointed to by a given arc.
 int getFirstArc(int node)
          
 java.util.Set<FSAFlags> getFlags()
          Returns a set of flags for this FSA instance.
 int getNextArc(int arc)
          
 int getRightLanguageCount(int node)
          Returns the number encoded at the given node.
 int getRootNode()
          Returns the start node of this automaton.
 boolean isArcFinal(int arc)
          Returns true if the destination node at the end of this arc corresponds to an input sequence created when building this automaton.
 boolean isArcLast(int arc)
          Returns true if this arc has LAST bit set.
 boolean isArcTerminal(int arc)
          Returns true if this arc does not have a terminating node (@link FSA.getEndNode(int) will throw an exception).
 boolean isNextSet(int arc)
           
 
Methods inherited from class morfologik.fsa.FSA
getArcCount, getSequences, getSequences, iterator, read, visitAllStates, visitInPostOrder, visitInPostOrder, visitInPreOrder, visitInPreOrder
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_FILLER

public static final byte DEFAULT_FILLER
Default filler byte.

See Also:
Constant Field Values

DEFAULT_ANNOTATION

public static final byte DEFAULT_ANNOTATION
Default annotation byte.

See Also:
Constant Field Values

VERSION

public static final byte VERSION
Automaton version as in the file header.

See Also:
Constant Field Values

BIT_FINAL_ARC

public static final int BIT_FINAL_ARC
Bit indicating that an arc corresponds to the last character of a sequence available when building the automaton.

See Also:
Constant Field Values

BIT_LAST_ARC

public static final int BIT_LAST_ARC
Bit indicating that an arc is the last one of the node's list and the following one belongs to another node.

See Also:
Constant Field Values

BIT_TARGET_NEXT

public static final int BIT_TARGET_NEXT
Bit indicating that the target node of this arc follows it in the compressed automaton structure (no goto field).

See Also:
Constant Field Values

ADDRESS_OFFSET

public static final int ADDRESS_OFFSET
An offset in the arc structure, where the address and flags field begins. In version 5 of FSA automata, this value is constant (1, skip label).

See Also:
Constant Field Values

arcs

public final byte[] arcs
An array of bytes with the internal representation of the automaton. Please see the documentation of this class for more information on how this structure is organized.


nodeDataLength

public final int nodeDataLength
The length of the node header structure (if the automaton was compiled with NUMBERS option). Otherwise zero.


gtl

public final int gtl
Number of bytes each address takes in full, expanded form (goto length).


filler

public final byte filler
Filler character.


annotation

public final byte annotation
Annotation character.

Constructor Detail

FSA5

public FSA5(java.io.InputStream fsaStream)
     throws java.io.IOException
Read and wrap a binary automaton in FSA version 5.

Throws:
java.io.IOException
Method Detail

getRootNode

public int getRootNode()
Returns the start node of this automaton.

Specified by:
getRootNode in class FSA
Returns:
Returns the identifier of the root node of this automaton. Returns 0 if the start node is also the end node (the automaton is empty).

getFirstArc

public final int getFirstArc(int node)

Specified by:
getFirstArc in class FSA
Returns:
Returns the identifier of the first arc leaving node or 0 if the node has no outgoing arcs.

getNextArc

public final int getNextArc(int arc)

Specified by:
getNextArc in class FSA
Returns:
Returns the identifier of the next arc after arc and leaving node. Zero is returned if no more arcs are available for the node.

getArc

public int getArc(int node,
                  byte label)

Specified by:
getArc in class FSA
Returns:
Returns the identifier of an arc leaving node and labeled with label. An identifier equal to 0 means the node has no outgoing arc labeled label.

getEndNode

public int getEndNode(int arc)
Return the end node pointed to by a given arc. Terminal arcs (those that point to a terminal state) have no end node representation and throw a runtime exception.

Specified by:
getEndNode in class FSA

getArcLabel

public byte getArcLabel(int arc)
Return the label associated with a given arc.

Specified by:
getArcLabel in class FSA

isArcFinal

public boolean isArcFinal(int arc)
Returns true if the destination node at the end of this arc corresponds to an input sequence created when building this automaton.

Specified by:
isArcFinal in class FSA

isArcTerminal

public boolean isArcTerminal(int arc)
Returns true if this arc does not have a terminating node (@link FSA.getEndNode(int) will throw an exception). Implies FSA.isArcFinal(int).

Specified by:
isArcTerminal in class FSA

getRightLanguageCount

public int getRightLanguageCount(int node)
Returns the number encoded at the given node. The number equals the count of the set of suffixes reachable from node (called its right language).

Overrides:
getRightLanguageCount in class FSA
Returns:
Returns the number of sequences reachable from the given state if the automaton was compiled with FSAFlags.NUMBERS. The size of the right language of the state, in other words.

getFlags

public java.util.Set<FSAFlags> getFlags()
Returns a set of flags for this FSA instance.

For this automaton version, an additional FSAFlags.NUMBERS flag may be set to indicate the automaton contains extra fields for each node.

Specified by:
getFlags in class FSA

isArcLast

public boolean isArcLast(int arc)
Returns true if this arc has LAST bit set.

See Also:
BIT_LAST_ARC

isNextSet

public boolean isNextSet(int arc)
See Also:
BIT_TARGET_NEXT