This is documentation for calculating the highest scoring hypothesis in a lattice given a language model.
All of the tools are located in the directory
|
|
This section describes the format of lattices. Lattices for multiple utterances can be stored in the same file.
At the beginning of the lattice for each utterance, include the line
FSM-ID: <unique-ID>The ID is arbitrary; it is used to help out later processing. At the end of the lattice for each utterance, put the line
END
To list an alternative for the nth word in a sentence, use the line:
<n-1> <n> <raw-word> [<expanded-word>] [<cost>] [<tag>]The fields should be separated by exactly a single tab. Multiple words in <expanded-word> should be separated by spaces. The last three fields are optional. If <expanded-word> is missing or empty, it is assumed to be the same as <raw-word>. To specify an empty <expanded word>, use the token <sil>. <cost> should be a log probability, base 10. <tag> is our internal tagging, e.g., ASWD.
Alternatives must be listed for words in the order they occur in the sentence, i.e., alternatives for the (n+1)st word must follow those of the nth word.
The following is a sample file (/home/ws99/sfc/pub/data/h034b/h034b.alt):
FSM-ID: a034c1 0 1 NATO NATO -0.1 ASWD 0 1 NATO N. A. T. O. -0.4 LSEQ 1 2 LIVES LIVES 0 ASWD 2 3 ### <sil> 0 SLNT 3 4 ON 4 5 AND 5 6 ON END FSM-ID: a034c2 0 1 NATO NATO -0.1 ASWD 0 1 NATO N. A. T. O. -0.4 LSEQ 1 2 LIVES LIVES 0 ASWD 2 3 ### <sil> 0 SLNT 3 4 ### <sil> ENDAnother sample file can be found in
|
To calculate the best hypothesis in a lattice given a language model, first the lattice file must be converted into a format that the lattice rescorer tool can read. Use the command:
e034c.pl -aux <aux-file> <in-lattice> > <out-lattice>The file <aux-file> is a file that is created to help later align the best hypothesis with the tokens in the original raw text. For example, the command
e034c.pl -aux h034b.aux e034e.alt > h034b.stxtcreates the auxiliary file h034b.aux and the converted lattice h034b.stxt.
To calculate the best hypothesis in a lattice given a language model, use the command
Lattice -lattype fsm-list -lmfile <LM-file> -unkpen 1e-10 -langwgt 1 \
-outtype nbest -outfile <out-file> -1best.3 <lattice-file>
For example, the command
Lattice -lattype fsm-list -lmfile f034c.dmp -unkpen 1e-10 -langwgt 1 \
-outtype nbest -outfile h034b.nbest -1best.3 h034b.stxt
creates the file h034b.nbest containing the best hypothesis
for each utterance.
To create a file containing the alignment between the best hypothesis and the tokens in the original raw text, use the command:
f034e.pl <best-hyp-file> | f034f.pl <aux-file> > <out-file>For example, the command
f034e.pl h034b.nbest | f034f.pl h034b.aux > h034b.aligncreates the file h034b.align containing the alignment.