| CLSP Homepage : Workshop Homepage | |
![]() | |
| Workshop 2004 | Saturday, November 7, 2009 |
2. Type "make" to make and install programs.
Help for each tool is available by calling the program with no arguments. The following help files are copied from the usage file for each program.
Options between pairs of source filenames
-h src_n Horizontal concatenation
INPUT=[INPUT, src_n]
-v src_n Vertical concatenation
INPUT=[INPUT; src_n]
Options that precede all source filenames Default
-c j [i]
The -c option specifies the number of input frames that should be
concatenated prior to performing the specified transforms. The input
to all specified transforms is a matrix A0 with the same number of
rows as the inputfile, but 21 times as many columns: each row is the
concatenation of nContext consecutive frames of the inputfile. The -c
option may only be specified once; it may be specified anywhere on the
command line.
The -m and -n options specify the transform stages. ORDER MATTERS.
If the number of -m and -n options on the command line is K, then the
transform is computed in K stages, numbered 1,...,k,...,K. The kth -m
or -n option specifies the computation of the kth transform stage.
The output file contains the result of the Kth transform stage.
The -m option specifies a linear transform, of the form Ak=Ai*B+b.
The required argument "filename" specifies a file containing the
matrix B and the offset vector b, in the format specified in section V
below. The number of rows of B must match the number of columns in
Ai; the number of columns of B must match the number of columns of b.
The optional argument "i" specifies the index of the input stage,
which must be at least 0 and at most k-1; if not specified, i is set
equal to k-1.
The -n option specifies a nonlinear transform, of the form
Ak=funcname(Ai,Aj). The required argument "funcname" specifies a
nonlinear function name, chosen from the list given below in section
VI. The optional argument "i" specifies the index of the first input
stage. The optional argument "j" specifies the index of the second
input stage. If either "i" or "j" is omitted, the default is k-1. If
"funcname" is a two-argument function, the sizes of the matrices Ai
and Aj must equal. If "funcname" is a one-argument function, Aj is
ignored.
A5 = sin(A0*B + b) - tanh(A0*C + c)
Applying this transform consists of the following steps. First,
create the matrix files Bb.txt and Cc.txt, containing the matrices B
and C and the row vectors b and c in the format given below (section
IV). Then call:
VTransform currently understands the following functions. If you would
like to add a new nonlinear function, it is very easy; look for the
subroutine "FunctionToDVectors," and add an appropriate entry to the
table.
STDIN is an MLF. First label on each line should be a phoneme; other
labels are copied directly from input to output without change.
STDOUT is an MLF. Each line in STDOUT contains two repetitions of the
same landmark time, followed by a complete description of the
distinctive features plausibly implemented at the landmark, followed
by any additional labels present in the input. Distinctive features
are separated by , rather than space, so that the entire DF
transcription is on the first level of the MLF hierarchy; in order to
split the DF transcription, pipe the output through sed 's/,/ /'.
A "landmark" is an instant in time at which at least one manner
feature changes value. The manner features are currently silence,
consonantal, continuant, sonorant, and syllabic. No landmark is
transcribed unless the feature is specified for both the phone before
and the phone after the landmark, with different values.
The files feattab.txt and splitphones.txt specify the way in which
landmarks should be computed:
VTransform: Long-term linear and nonlinear feature transforms.
SUMMARY
VTransform performs the following functions:
The general idea of VTransform is to enable fast HTK-based computation of
a wide range of transforms that would otherwise have to be done in
matlab.SYNTAX
VTransform [opts] src1 [-h/v src2 [...]] tgt
Option order: standard HTK options are processed first,
then file concatenation options (-h and -v),
then transform options (-c, -d, -m and -n).
Transform options are processed in command-line order.
Output of the file concatenation options is called stage_0;
each consecutive transform option defines a new stage.
Example: to concatenate multiple input files, multiply inputs by the
matrix in mat.txt, then apply a sigmoid transform, type
VTransform -A -T 1 -m mat.txt -n sigmoid -S script.scp
... where script.scp contains lines of the form
src1.htk -h src2.htk -h src3.htk -h src4.htk tgt.htk Next stage concatenates
j consecutive frames from stage_i. 1 [previous]
-d j [i] Next stage is created by discarding
j-1 of every j frames from stage_i. 1 [previous]
-e n Error-Exit if two horizontally Inf
concatenated source files differ in
frame count by more than n frames,
or if two vertically concatenated
source files differ in feature
dimension more than n dimensions
-m file [i] Next stage is linear transform of
i'th stage (default: previous stage)
Format of the matrix file:
DIMS;OFFSET;MATRIX
Example: 3x3 identity transform:
4 3
0 0 0
1 0 0
0 1 0
0 0 1
-n func [i] Next stage is func(stage_i). [previous stage]
Supported functions include
sin, cos, tan, atan, exp, fabs,
floor, ceil, tanh, sinh, cosh ,
log, log10, asin, acos, square,
cube, sqrt, cubert, halfwave, sign,
step, inv, sigmoid,
sigmoid_deriv, tanh_deriv
-n func j [i] Next stage=func(stage_j,stage_i). [previous stage]
Supported functions: add,
subtract, multiply, divide, pow
-p settings Determines how to align input files of
different lengths. 'fs'
'settings' is any combination of:
f = align first frames
s = symmetric. Cut the same number
of frames from beginning and end.
z = zero the missing frames
r = pad missing frames by repeating
first and last frames of the shorter file.
-A Print command line arguments
-R Print RCS version information
-S f Use script file f
-T n Set trace level to n (meaningful: 1,3,7,15,31)
The -C, -S, -T options work as in any HTK tool. If a
script file is specified, it should contain filenames in input/output
pairs, similar to the syntax of a script file for the HCopy tool. If
-S is specified, inputfile and outputfile need not be specified. USAGE EXAMPLE
Suppose you wish to concatenate 21 frames of the input parameter file
into each row of matrix A0, and then apply the following function to
create the output parameter matrix A5:
VTransform -S TRAIN.scp -c 21 -m Bb.txt 0 -n sin 1
-m Cc.txt 0 -n tanh 3 -n subtract 2 4
or equivalently (taking advantage of default stage indices):
VTransform -S TRAIN.scp -c 21 -m Bb.txt 0 -n sin
-m Cc.txt 0 -n tanh -n subtract 2
Either of these two commands performs the following steps:
LINEAR TRANSFORM SPECIFICATION
Matrices are loaded from text files (e.g. Bb.txt, Cc.txt) with the
following format. The first line must contain the number of rows
(including the offset vector) and the number of columns. The second
line contains the offset vector. Remaining lines contain the rows of
the transform matrix. Comments are not allowed. A non-square
transform matrix will result in a change in the dimension of the
feature vector. Here is a short example matrix file, for converting
from a 5-dimensional input feature vector to a 4-dimensional output
feature vector. The offset vector is here set to zero:
6 4
0 0 0 0
1 1 1 1
1 1 1 -1
1 1 -1 -1
1 -1 -1 -1
1 1 -0.5 0.5
NONLINEAR TRANSFORM SPECIFICATION
Nonlinear transforms are applied element-wise to each dimension of the
feature vector: the idea is that you can implement an arbitrary
N-dimensional nonlinear transform by using N-dimensional linear
transforms in sequence with 1-dimensional nonlinear transforms.
Nonlinear transforms are specified by name on the command line. Some
nonlinear transforms take two input arguments; some take only one.
The inputs to a two-argument function must be matrices of the same
size; all operations are applied to each scalar pair of elements. If
a one-argument function is given two arguments, the second argument is
simply ignored.
One-Argument Functions______________
sin, cos, tan, atan, exp, fabs, floor, ceil, tanh, sinh, cosh
- call the C functions of these names
log, log10
- take the log or log10 of any input feature greater than
MINLARG; otherwise log is MINEARG, log10 is MINEARG/log(10)
asin, acos
- return asin or acos if input is between -1 and 1; otherwise
clip the output in the range asin:(-PI/2,PI/2) or acos:(0,PI).
square, cube
- compute the square or cube of the feature
sqrt
- take the odd-symmetric pseudo-square-root:
sqrt(x) = sign(x) * abs(x)^0.5
cubert
- cube root
halfwave
- halfwave rectify: y = (x>0) ? x : 0;
sign
- return the sign of x, or 0 if x==0
step
- return 1 if x>=0, otherwise 0
inv
- return 1/x
sigmoid
- sigmoid function, sigmoid(x) = 1 / (1+exp(-x))
sigmoid_deriv
- derivative of the sigmoid
sigmoid_deriv(x) = 1 / (2+exp(x)+exp(-x))
tanh_deriv
- derivative of the hyperbolic tangent
tanh_deriv(x) = 1 - tanh(x)^2
Two-Argument Functions______________
add, subtract, multiply, divide, pow
- element-wise scalar operations x+y, x-y, x*y, x/y, x^y
VExtract
USAGE: VExtract [opts] src1 ...
Suggested usage:
(1) Create listings of the training and testing files:
ls traindata/*.htk > train.scp;
ls testdata/*.htk > test.scp;
(2) Suppose that $p1 and $p2 should be labeled +1, while
$p3 should be labeled -1. Assume that file $mlf is
an HTK-format master label file. Compute the statistics,
and extract the training tokens, using
VExtract -T 1 -b -p /$p1/ /+1/ -p /$p2/ /+1/ -p /$p3/ /-1/
-f train.stats -o train.toks -I $mlf -S train.scp
(3) Finally, extract the test tokens, but normalize using
normalizing statistics from the training tokens:
VExtract -T 1 -b -q corpus -p /$p1/ /+1/ -p /$p2/ /+1/ -p /$p3/ /-1/
-g train.stats -o test.toks -I $mlf -S test.scp
If both the -s and -p options are specified, a line in the MLF must
match at least one of the -p options, AND all of the -s options.
Strings specified by -s and -p must match exactly with some substring
in the MLF line. If -n is specified, at most n tokens per class per
file will be output. If -m is specified, at most m tokens will be
output for the entire corpus. If '-q corpus' is specified (the default),
the same number of tokens will be output for each class, even if there
aren't enough examples to satisfy 'm'. The '-q file' is specified,
the same rule holds for each file individually. Only by specifying
'-q none' can you turn off token-count-locking entirely.
Tokens are taken from uniformly spaced positions spanning each file.
If -T 1 is specified, VExtract will print (to stderr) a specification
of the frame number and filename from which each token vector was
extracted.
The MLF read by VExtract is not as flexible as a usual HTK MLF.
Lines starting with a quotation mark are read as filenames. Lines
starting with a digit are read as segment descriptors (start end
label). Other lines are ignored.
If -f is specified, a statsfile will be generated, and all output
tokens will be normalized by the rule (x-mu)/sd. If -g is specified,
the same type of normalization is applied, but mu and sd are read
from the first two lines of the specified statsfile, rather than computed.
If -g and -f are both specified, -g takes precedence, but a new statsfile is
also written. statsfile is in svmlight format. First three lines are mean, SD,
and number of tokens in each dimension. If -h is specifed, the next
lines are labeled by the histogram threshold, with values equal to
the histogram count. Histogram bins contain the number of tokens in each
bin AFTER normalization, thus it is usually reasonable to specify thresholds
in the range of roughly -2:0.2:2 (meaning -2 standard deviations, up to +2 sd).
Following the global stats, stats are given separately for each class.
Option
-a outfile Append toks to outfile in svmlight format
-b Patterns only match if whitespace-bounded or comma-bounded
-f statsfile Print mean and sd of extracted toks to statsfile
-g statsfile Normalize outputs using stats in statsfile
-h /th1,th2,th3/ Compute a histogram with given thresholds
-h /b:s:e/ Compute a histogram with thresholds b,b+s,b+2s,...,e
-m n Read at most n tokens per class, total
-n n Read at most n toks per class per inputfile
-o outfile Write toks to outfile in svmlight format
-p /s/ /L/ String s is a sufficient marker of class L
-q locktype Locking type: can be file (default), corpus, or none
-s /s/ /L/ String s is a necessary marker of class L
-s /s/ String s is a necessary marker of all classes
-t /t1,t2,t3,t4/ Concatenate frames t+t1,t+t2,t+t3,t+t4
-t /b:s:e/ Concatenate frames t+b,t+b+s,t+b+2*s,...,t+e
-A Print command line arguments
-I MLF Read transcriptions from master label file MLF
-R Print RCS version information
-S f Use script file f
-T n Set trace level to n (meaningful: 1,3,7,15,31)
VApplySvms
Apply SVMs reasonably efficiently to a large database of files in HTK format.
USAGE: VApplySvms [opts] src tgt
Apply all SVMs specified on the command line (in consecutive -s
options). For each SVM, compute the nonlinear discriminant corresponding
to every frame or row in the input file. Output a vector of the
corresponding SVM discriminants.
If src is in HTK format, tgt is also in HTK format. If src is in
svm_light format, tgt is also in svm_light format, with the label
of each row in tgt equal to the corresponding label in src.
The -t option specifies that each output row should be computed by
concatenating several frames of the input (at offsets
t1,t2,... relative to the row number of the output), and then
applying all SVMs to the resulting super-row. Row numbers that refer
before row 1 are filled by repeating row 1; likewise for the last
row. This is similar to the concatenation performed by VExtract.
The -g option specifies a statsfile (e.g., created using VExtract)
that will be used to normalize inputs after concatenation.
WARNING: ORDER MATTERS. -g and -t must be specified BEFORE -s.
In this way, it is possible to specify many different -s files
with different -g and -t options; you simply need to observe
the sequence.
Example: suppose that you want to apply two SVMs to
every MFC file in the directory mydata. Both foo.svm
and bar.svm should be applied to every three-frame sequence
in the input data. Data should be normalized, prior to applying
the SVMs, by the normalization files foo.stats and bar.stats.
The file train.toks contains training vectors that have been
correctly concatenated together, but not normalized.
(1) Use foo.traintoks and bar.traintoks to learn parameters
of the posterior PDF:
VApplySvms -T 1 -g foo.stats -s foo.svm -g bar.stats
-s bar.svm -d discrim.stats -p posterior
-h /-2:0.125:2/ -i 1 train.toks train.out
(2) Create a script file ``mydata.scp'' with pairs
of the form ``mydata/src.MFC mydata/tgt.prob''
(3) For every mydata/src.MFC: concatenate three consecutive
frames, normalize by foo.stats, apply the SVM in foo.svm,
calculate a sigmoid posterior probability estimate (using
info in discrim.stats), and write to the first column in
mydata/tgt.prob. Repeat same steps with bar.svm, and write
resulting probabilities to second column in mydata/tgt.prob:
VApplySvms -T 1 -t /-1:1:1/ -g foo.stats -s foo.svm -g bar.stats
-s bar.svm -e discrim.stats -i 1 -p posterior -S mydata.scp
Recommendation: always use -T 1 to get a list of files being
processed. If you really want a detailed trace, use -T 15 or -T 31.
-T 15 or -T 31 is currently the only way to see the normalized
posterior histogram.
SVM definitions are assumed to be roughly in SVM-light v5.00 format.
The first word in the file should contain the characters svm or
SVM. A line of the form `number number:number number:number ...' is
assumed to be a support vector; the first number is its alpha. A
line of the form `number # remainder' is assumed to be a parameter.
Other lines are ignored. All parameters must be specified before the
first support vector is read. Parameter definitions with the
following remainders are read; others are ignored: 'kernel parameter
-d', 'kernel parameter -g', 'kernel parameter -s', 'kernel parameter
-r', 'highest feature index', 'number of support vectors plus 1',
'threshold b', 'kernel type'.
'kernel type': the following kernel types are understood.
0=linear K(x,y)=x'*y
1=polynomial K(x,y)=(s*x'*y+r)^d
2=rbf K(x,y)=exp(-g*|x-y|^2)
3=tanh K(x,y)=tanh(s*x'*y+r)
If -p is specified, the output file will contain probability estimates
rather than raw SVM discriminant outputs. Probability estimates are
based on the discrimstats file, which may be computed (if -d is specified)
or read from an external file (if -e is specified). If -d is specified,
one of the class labels in the input must be '1' or '+1'; all other classes
will be contrasted with class 1, regardless of their label. If -e is
specified, the file 'discrimstats' must have 3*N lines for some N; the
first N are global stats, the next N are for class 1, the next N are for
class ~1 (class not-one). The first three lines in each class must be
mu, sd, and number of tokens; remaining lines give the histograms.
Histogram entries must be integers -- they are presumed to represent
the count in each bin. Histogram entries are normalized differently
depending on whether you ask for ``likelihood'' or ``posterior'' histogram
estimates. If -i is specified, ``mincount'' is added to each histogram
bin after reading or writing discrimstats, but before normalizing to
create a likelihood or posterior histogram. -i 1 is recommended.
The type specified by -p can be one of:
Option Output in col n Output in col n+N Estimator
-p sigmoid p(class 1|d[n]) p(class ~1|d[n]) 1/(1+exp(-a*(d[n]-b)))
-p posterior p(class 1|d[n]) p(class ~1|d[n]) histogram
-p gaussian p(d[n]|class 1) p(d[n]|class ~1) Gaussian
-p likelihood p(d[n]|class 1) p(d[n]|class ~1) histogram
Options Meaning
-d discrimstats Save stats of SVM discriminants in discrimstats
-e discrimstats Load stats of SVM discriminants from discrimstats
-g statsfile Normalize input features by mean and SD from statsfile
-h /th1,th2,th3/ Compute a histogram with given thresholds
-h /b:s:e/ Compute a histogram with thresholds b,b+s,b+2s,...,e
-i mincount Smooth the histogram by adding mincount to every bin
-p type Write probabilities, not discriminants.
-t /t1,t2,t3,t4/ Concatenate frames t+t1,t+t2,t+t3,t+t4
-t /b:s:e/ Concatenate frames t+b,t+b+s,t+b+2*s,...,t+e
-s path SVM definition file
-A Print command line arguments
-R Print RCS version information
-S f Use script file f
-T n Set trace level to n (meaningful: 1,3,7,15,31)
PHN2FEAT.PL
PHN2FEAT.PL: Convert the phone labels in an MLF into landmark and distinctive feature labels.
Usage:
phn2feat.pl feattab.txt splitphones.txt landmarkmargin < inputfile > outputfile
Example:
phn2feat.pl feattab.txt splitphones.txt 200000 < TRAIN.mlf > TRAIN_lm.mlf
This script reads phoneme labels (labeled using TIMIT, Switchboard, or
any similar phoneme label set), estimates the positions of
manner-change landmarks, and outputs a transcription specifying two
types of distinctive feature events:
Landmarks separate segments. The distinctive features of a segment
are presumed to be in flux within $landmarkmargin 100ns-units of a
phoneme boundary -- thus segment start times in the outputfile are
$landmarkmargin 100ns-units later than segment start times in the
inputfile, and conversely for segment end times.
tcl -sonorant,-continuant,+blade,+anterior,-voice
t -sonorant,+continuant,-strident,+blade,+anterior,-voice
sh -sonorant,+continuant,+strident,+blade,-anterior,-voice
n +sonorant,-continuant,+blade,+anterior
r +sonorant,+continuant,+blade,-anterior
er +sonorant,+continuant,+syllabic,+blade,-anterior
aa +sonorant,+continuant,+syllabic,+low,+tenselow,-front,-high
If one of the two phonemes creating a landmark is a vowel or glide,
then the place and voicing features of the landmark are the features
of the less-sonorant of the two phonemes. A landmark separating two
+consonantal regions (e.g., a stop-fricative boundary) has NO place
or voicing features.
NIST2MLF.PL
NIST2MLF: Convert SPHERE-format transcripts to MLF-format.
Usage:
nist2mlf.pl SCP_file RATE ext1 ext2 ... > output.mlf
Example:
nist2mlf.pl TRAIN.scp 16000 phn wrd > TRAIN.mlf
This script reads in multiple NIST-format transcription files, and
merges them into an HTK-format MLF file, which is written to STDOUT.
Each line in SCP_file is assumed to contain two filenames: the
filename of the input NIST-format files (extension ignored), and the
equivalent HTK lab-filename (extension ignored). Thus, for example,
if the file TRAIN.scp contains the line
d:/timit/TIMIT/TRAIN/DR1/FAEM0/SI1392.WAV data/FAEM0SI1392.mfc
then nist2mlf.pl, called as in the example above, will search for
the files d:/timit/TIMIT/TRAIN/DR1/FAEM0/SI1392.phn and .wrd (case is
ignored) and will write the results to an MLF entry headed by
"*/FAEM0SI1392.lab"
For each such filename, nist2mlf.pl looks for filenames in the same
directory but with the extensions
The Center for Language and Speech Processing
The Johns Hopkins University
3400 North Charles Street, Barton Hall
Baltimore, MD 21218
![]()
Telephone: (410) 516-4237
![]()
Fax: (410) 516-5050
![]()
E-mail: clsp@clsp.jhu.edu