morfologik.fsa
Class CFSA2

java.lang.Object
  extended by morfologik.fsa.FSA
      extended by morfologik.fsa.CFSA2
All Implemented Interfaces:
java.lang.Iterable<java.nio.ByteBuffer>

public final class CFSA2
extends FSA

CFSA (Compact Finite State Automaton) binary format implementation, version 2:

The encoding of automaton body is as follows.

 ---- CFSA header
 Byte                            Description 
       +-+-+-+-+-+-+-+-+\
     0 | | | | | | | | | +------ '\'
       +-+-+-+-+-+-+-+-+/
       +-+-+-+-+-+-+-+-+\
     1 | | | | | | | | | +------ 'f'
       +-+-+-+-+-+-+-+-+/
       +-+-+-+-+-+-+-+-+\
     2 | | | | | | | | | +------ 's'
       +-+-+-+-+-+-+-+-+/
       +-+-+-+-+-+-+-+-+\
     3 | | | | | | | | | +------ 'a'
       +-+-+-+-+-+-+-+-+/
       +-+-+-+-+-+-+-+-+\
     4 | | | | | | | | | +------ version (fixed 0xc6)
       +-+-+-+-+-+-+-+-+/
       +-+-+-+-+-+-+-+-+\
     5 | | | | | | | | | +----\
       +-+-+-+-+-+-+-+-+/      \ flags [MSB first]
       +-+-+-+-+-+-+-+-+\      /
     6 | | | | | | | | | +----/
       +-+-+-+-+-+-+-+-+/
       +-+-+-+-+-+-+-+-+\
     7 | | | | | | | | | +------ label lookup table size
       +-+-+-+-+-+-+-+-+/
       +-+-+-+-+-+-+-+-+\
  8-32 | | | | | | | | | +------ label value lookup table 
       : : : : : : : : : |
       +-+-+-+-+-+-+-+-+/
 
 ---- Start of a node; only if automaton was compiled with NUMBERS option.
 
 Byte
        +-+-+-+-+-+-+-+-+\
      0 | | | | | | | | | \  
        +-+-+-+-+-+-+-+-+  +
      1 | | | | | | | | |  |      number of strings recognized
        +-+-+-+-+-+-+-+-+  +----- by the automaton starting
        : : : : : : : : :  |      from this node. v-coding
        +-+-+-+-+-+-+-+-+  +
        | | | | | | | | | /  
        +-+-+-+-+-+-+-+-+/

 ---- A vector of this node's arcs. An arc's layout depends on the combination of flags.
 
 1) NEXT bit set, mapped arc label. 
 
        +----------------------- node pointed to is next
        | +--------------------- the last arc of the node
        | | +------------------- this arc leads to a final state (acceptor)
        | | |  _______+--------- arc's label; indexed if M > 0, otherwise explicit label follows
        | | | / | | | |
       +-+-+-+-+-+-+-+-+\
     0 |N|L|F|M|M|M|M|M| +------ flags + (M) index of the mapped label.
       +-+-+-+-+-+-+-+-+/
       +-+-+-+-+-+-+-+-+\
     1 | | | | | | | | | +------ optional label if M == 0
       +-+-+-+-+-+-+-+-+/
       : : : : : : : : :
       +-+-+-+-+-+-+-+-+\
       |A|A|A|A|A|A|A|A| +------ v-coded goto address
       +-+-+-+-+-+-+-+-+/
 


Field Summary
 byte[] arcs
          An array of bytes with the internal representation of the automaton.
static int BIT_FINAL_ARC
          The arc corresponds to the last character of a sequence available when building the automaton (acceptor transition).
static int BIT_LAST_ARC
          The arc is the last one from the current node's arcs list.
static int BIT_TARGET_NEXT
          The target node of this arc follows the last arc of the current state (no goto field).
 byte[] labelMapping
          Label mapping for M-indexed labels.
static byte VERSION
          Automaton header version value.
 
Constructor Summary
CFSA2(java.io.InputStream in)
          Reads an automaton from a byte stream.
 
Method Summary
 int getArc(int node, byte label)
          
 byte getArcLabel(int arc)
          Return the label associated with a given arc.
 int getEndNode(int arc)
          Return the end node pointed to by a given arc.
 int getFirstArc(int node)
          
 java.util.Set<FSAFlags> getFlags()
          Returns a set of flags for this FSA instance.
 int getNextArc(int arc)
          
 int getRightLanguageCount(int node)
          
 int getRootNode()
          
 boolean isArcFinal(int arc)
          Returns true if the destination node at the end of this arc corresponds to an input sequence created when building this automaton.
 boolean isArcLast(int arc)
          Returns true if this arc has NEXT bit set.
 boolean isArcTerminal(int arc)
          Returns true if this arc does not have a terminating node (@link FSA.getEndNode(int) will throw an exception).
 boolean isNextSet(int arc)
           
 
Methods inherited from class morfologik.fsa.FSA
getArcCount, getSequences, getSequences, iterator, read, visitAllStates, visitInPostOrder, visitInPostOrder, visitInPreOrder, visitInPreOrder
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

VERSION

public static final byte VERSION
Automaton header version value.

See Also:
Constant Field Values

BIT_TARGET_NEXT

public static final int BIT_TARGET_NEXT
The target node of this arc follows the last arc of the current state (no goto field).

See Also:
Constant Field Values

BIT_LAST_ARC

public static final int BIT_LAST_ARC
The arc is the last one from the current node's arcs list.

See Also:
Constant Field Values

BIT_FINAL_ARC

public static final int BIT_FINAL_ARC
The arc corresponds to the last character of a sequence available when building the automaton (acceptor transition).

See Also:
Constant Field Values

arcs

public byte[] arcs
An array of bytes with the internal representation of the automaton. Please see the documentation of this class for more information on how this structure is organized.


labelMapping

public final byte[] labelMapping
Label mapping for M-indexed labels.

Constructor Detail

CFSA2

public CFSA2(java.io.InputStream in)
      throws java.io.IOException
Reads an automaton from a byte stream.

Throws:
java.io.IOException
Method Detail

getRootNode

public int getRootNode()

Specified by:
getRootNode in class FSA
Returns:
Returns the identifier of the root node of this automaton. Returns 0 if the start node is also the end node (the automaton is empty).

getFirstArc

public final int getFirstArc(int node)

Specified by:
getFirstArc in class FSA
Returns:
Returns the identifier of the first arc leaving node or 0 if the node has no outgoing arcs.

getNextArc

public final int getNextArc(int arc)

Specified by:
getNextArc in class FSA
Returns:
Returns the identifier of the next arc after arc and leaving node. Zero is returned if no more arcs are available for the node.

getArc

public int getArc(int node,
                  byte label)

Specified by:
getArc in class FSA
Returns:
Returns the identifier of an arc leaving node and labeled with label. An identifier equal to 0 means the node has no outgoing arc labeled label.

getEndNode

public int getEndNode(int arc)
Return the end node pointed to by a given arc. Terminal arcs (those that point to a terminal state) have no end node representation and throw a runtime exception.

Specified by:
getEndNode in class FSA

getArcLabel

public byte getArcLabel(int arc)
Return the label associated with a given arc.

Specified by:
getArcLabel in class FSA

getRightLanguageCount

public int getRightLanguageCount(int node)

Overrides:
getRightLanguageCount in class FSA
Returns:
Returns the number of sequences reachable from the given state if the automaton was compiled with FSAFlags.NUMBERS. The size of the right language of the state, in other words.

isArcFinal

public boolean isArcFinal(int arc)
Returns true if the destination node at the end of this arc corresponds to an input sequence created when building this automaton.

Specified by:
isArcFinal in class FSA

isArcTerminal

public boolean isArcTerminal(int arc)
Returns true if this arc does not have a terminating node (@link FSA.getEndNode(int) will throw an exception). Implies FSA.isArcFinal(int).

Specified by:
isArcTerminal in class FSA

isArcLast

public boolean isArcLast(int arc)
Returns true if this arc has NEXT bit set.

See Also:
BIT_LAST_ARC

isNextSet

public boolean isNextSet(int arc)
See Also:
BIT_TARGET_NEXT

getFlags

public java.util.Set<FSAFlags> getFlags()
Returns a set of flags for this FSA instance.

Specified by:
getFlags in class FSA