CLSP Homepage : Workshop Homepage
Workshop 2004
Workshop 2004 Monday, November 23, 2009

Periodic Vector Toolkit

A small set of C tools for transforming and classifying feature vectors stored in HTK binary format.

Download

Programs

Installation

1. Edit the Makefile. You will at least need to change $(CPU), $(INSTALL_ROOT) and $(hbin).

2. Type "make" to make and install programs.

Help for each tool is available by calling the program with no arguments. The following help files are copied from the usage file for each program.

VTransform: Long-term linear and nonlinear feature transforms.

SUMMARY

VTransform performs the following functions:
  1. Create "long-term time-frequency features:" each output feature vector is the concatenation of C input feature vectors.
  2. Transform the resulting feature vector using an arbitrary sequence of linear and nonlinear feature transforms.
The general idea of VTransform is to enable fast HTK-based computation of a wide range of transforms that would otherwise have to be done in matlab.

SYNTAX

VTransform [opts] src1 [-h/v src2 [...]] tgt

 Option order: standard HTK options are processed first,
   then file concatenation options (-h and -v),
   then transform options (-c, -d, -m and -n).
   Transform options are processed in command-line order.

 Output of the file concatenation options is called stage_0;
   each consecutive transform option defines a new stage.

 Example: to concatenate multiple input files, multiply inputs by the
   matrix in mat.txt, then apply a sigmoid transform, type

 VTransform -A -T 1 -m mat.txt -n sigmoid -S script.scp
... where script.scp contains lines of the form
src1.htk -h src2.htk -h src3.htk -h src4.htk tgt.htk

Options between pairs of source filenames -h src_n Horizontal concatenation INPUT=[INPUT, src_n] -v src_n Vertical concatenation INPUT=[INPUT; src_n] Options that precede all source filenames Default -c j [i] Next stage concatenates j consecutive frames from stage_i. 1 [previous] -d j [i] Next stage is created by discarding j-1 of every j frames from stage_i. 1 [previous] -e n Error-Exit if two horizontally Inf concatenated source files differ in frame count by more than n frames, or if two vertically concatenated source files differ in feature dimension more than n dimensions -m file [i] Next stage is linear transform of i'th stage (default: previous stage) Format of the matrix file: DIMS;OFFSET;MATRIX Example: 3x3 identity transform: 4 3 0 0 0 1 0 0 0 1 0 0 0 1 -n func [i] Next stage is func(stage_i). [previous stage] Supported functions include sin, cos, tan, atan, exp, fabs, floor, ceil, tanh, sinh, cosh , log, log10, asin, acos, square, cube, sqrt, cubert, halfwave, sign, step, inv, sigmoid, sigmoid_deriv, tanh_deriv -n func j [i] Next stage=func(stage_j,stage_i). [previous stage] Supported functions: add, subtract, multiply, divide, pow -p settings Determines how to align input files of different lengths. 'fs' 'settings' is any combination of: f = align first frames s = symmetric. Cut the same number of frames from beginning and end. z = zero the missing frames r = pad missing frames by repeating first and last frames of the shorter file. -A Print command line arguments -R Print RCS version information -S f Use script file f -T n Set trace level to n (meaningful: 1,3,7,15,31)

The -C, -S, -T options work as in any HTK tool. If a script file is specified, it should contain filenames in input/output pairs, similar to the syntax of a script file for the HCopy tool. If -S is specified, inputfile and outputfile need not be specified.

The -c option specifies the number of input frames that should be concatenated prior to performing the specified transforms. The input to all specified transforms is a matrix A0 with the same number of rows as the inputfile, but 21 times as many columns: each row is the concatenation of nContext consecutive frames of the inputfile. The -c option may only be specified once; it may be specified anywhere on the command line.

The -m and -n options specify the transform stages. ORDER MATTERS. If the number of -m and -n options on the command line is K, then the transform is computed in K stages, numbered 1,...,k,...,K. The kth -m or -n option specifies the computation of the kth transform stage. The output file contains the result of the Kth transform stage.

The -m option specifies a linear transform, of the form Ak=Ai*B+b. The required argument "filename" specifies a file containing the matrix B and the offset vector b, in the format specified in section V below. The number of rows of B must match the number of columns in Ai; the number of columns of B must match the number of columns of b. The optional argument "i" specifies the index of the input stage, which must be at least 0 and at most k-1; if not specified, i is set equal to k-1.

The -n option specifies a nonlinear transform, of the form Ak=funcname(Ai,Aj). The required argument "funcname" specifies a nonlinear function name, chosen from the list given below in section VI. The optional argument "i" specifies the index of the first input stage. The optional argument "j" specifies the index of the second input stage. If either "i" or "j" is omitted, the default is k-1. If "funcname" is a two-argument function, the sizes of the matrices Ai and Aj must equal. If "funcname" is a one-argument function, Aj is ignored.

USAGE EXAMPLE

Suppose you wish to concatenate 21 frames of the input parameter file into each row of matrix A0, and then apply the following function to create the output parameter matrix A5:

A5 = sin(A0*B + b) - tanh(A0*C + c)

Applying this transform consists of the following steps. First, create the matrix files Bb.txt and Cc.txt, containing the matrices B and C and the row vectors b and c in the format given below (section IV). Then call:

VTransform -S TRAIN.scp -c 21 -m Bb.txt 0 -n sin 1
     -m Cc.txt 0 -n tanh 3 -n subtract 2 4
or equivalently (taking advantage of default stage indices):

VTransform -S TRAIN.scp -c 21 -m Bb.txt 0 -n sin
     -m Cc.txt 0 -n tanh -n subtract 2
Either of these two commands performs the following steps:

LINEAR TRANSFORM SPECIFICATION

Matrices are loaded from text files (e.g. Bb.txt, Cc.txt) with the following format. The first line must contain the number of rows (including the offset vector) and the number of columns. The second line contains the offset vector. Remaining lines contain the rows of the transform matrix. Comments are not allowed. A non-square transform matrix will result in a change in the dimension of the feature vector. Here is a short example matrix file, for converting from a 5-dimensional input feature vector to a 4-dimensional output feature vector. The offset vector is here set to zero:

6 4
0 0 0 0 
1 1 1 1
1 1 1 -1
1 1 -1 -1 
1 -1 -1 -1
1 1 -0.5 0.5

NONLINEAR TRANSFORM SPECIFICATION

Nonlinear transforms are applied element-wise to each dimension of the feature vector: the idea is that you can implement an arbitrary N-dimensional nonlinear transform by using N-dimensional linear transforms in sequence with 1-dimensional nonlinear transforms. Nonlinear transforms are specified by name on the command line. Some nonlinear transforms take two input arguments; some take only one. The inputs to a two-argument function must be matrices of the same size; all operations are applied to each scalar pair of elements. If a one-argument function is given two arguments, the second argument is simply ignored.

VTransform currently understands the following functions. If you would like to add a new nonlinear function, it is very easy; look for the subroutine "FunctionToDVectors," and add an appropriate entry to the table.

One-Argument Functions______________

sin, cos, tan, atan, exp, fabs, floor, ceil, tanh, sinh, cosh 
   - call the C functions of these names
log, log10 
   - take the log or log10 of any input feature greater than
     MINLARG; otherwise log is MINEARG, log10 is MINEARG/log(10)
asin, acos
   - return asin or acos if input is between -1 and 1; otherwise
     clip the output in the range asin:(-PI/2,PI/2) or acos:(0,PI).
square, cube 
   - compute the square or cube of the feature
sqrt 
   - take the odd-symmetric pseudo-square-root:
     sqrt(x) = sign(x) * abs(x)^0.5
cubert 
   - cube root
halfwave
   - halfwave rectify: y = (x>0) ? x : 0;
sign
   - return the sign of x, or 0 if x==0
step
   - return 1 if x>=0, otherwise 0
inv
   - return 1/x
sigmoid 
   - sigmoid function, sigmoid(x) = 1 / (1+exp(-x))
sigmoid_deriv
   - derivative of the sigmoid
     sigmoid_deriv(x) = 1 / (2+exp(x)+exp(-x))
tanh_deriv
   - derivative of the hyperbolic tangent
     tanh_deriv(x) = 1 - tanh(x)^2

Two-Argument Functions______________

add, subtract, multiply, divide, pow
   - element-wise scalar operations x+y, x-y, x*y, x/y, x^y

VExtract

USAGE: VExtract [opts] src1 ...

 Suggested usage:

 (1) Create listings of the training and testing files:

 ls traindata/*.htk > train.scp;
 ls testdata/*.htk > test.scp;

 (2) Suppose that $p1 and $p2 should be labeled +1, while 
     $p3 should be labeled -1.  Assume that file $mlf is
     an HTK-format master label file.  Compute the statistics,
     and extract the training tokens, using 

 VExtract -T 1 -b -p /$p1/ /+1/ -p /$p2/ /+1/ -p /$p3/ /-1/ 
   -f train.stats -o train.toks -I $mlf -S train.scp

 (3) Finally, extract the test tokens, but normalize using
     normalizing statistics from the training tokens:

 VExtract -T 1 -b -q corpus -p /$p1/ /+1/ -p /$p2/ /+1/ -p /$p3/ /-1/
   -g train.stats -o test.toks -I $mlf -S test.scp

 If both the -s and -p options are specified, a line in the MLF must
 match at least one of the -p options, AND all of the -s options.
 Strings specified by -s and -p must match exactly with some substring
 in the MLF line.  If -n is specified, at most n tokens per class per
 file will be output.  If -m is specified, at most m tokens will be 
 output for the entire corpus.  If '-q corpus' is specified (the default),
 the same number of tokens will be output for each class, even if there 
 aren't enough examples to satisfy 'm'.  The '-q file' is specified,
 the same rule holds for each file individually.  Only by specifying
 '-q none' can you turn off token-count-locking entirely.

 Tokens are taken from uniformly spaced positions spanning each file.
 If -T 1 is specified, VExtract will print (to stderr) a specification
 of the frame number and filename from which each token vector was
 extracted.

 The MLF read by VExtract is not as flexible as a usual HTK MLF.
 Lines starting with a quotation mark are read as filenames.  Lines
 starting with a digit are read as segment descriptors (start end
 label).  Other lines are ignored.

 If -f is specified, a statsfile will be generated, and all output
 tokens will be normalized by the rule (x-mu)/sd.  If -g is specified,
 the same type of normalization is applied, but mu and sd are read
 from the first two lines of the specified statsfile, rather than computed.
 If -g and -f are both specified, -g takes precedence, but a new statsfile is
 also written. statsfile is in svmlight format.  First three lines are mean, SD,
 and number of tokens in each dimension.  If -h is specifed, the next
 lines are labeled by the histogram threshold, with values equal to
 the histogram count.  Histogram bins contain the number of tokens in each
 bin AFTER normalization, thus it is usually reasonable to specify thresholds
 in the range of roughly -2:0.2:2 (meaning -2 standard deviations, up to +2 sd).
 Following the global stats, stats are given separately for each class.

 Option                                                     

 -a outfile       Append toks to outfile in svmlight format
 -b               Patterns only match if whitespace-bounded or comma-bounded
 -f statsfile     Print mean and sd of extracted toks to statsfile
 -g statsfile     Normalize outputs using stats in statsfile
 -h /th1,th2,th3/ Compute a histogram with given thresholds
 -h /b:s:e/       Compute a histogram with thresholds b,b+s,b+2s,...,e
 -m n             Read at most n tokens per class, total
 -n n             Read at most n toks per class per inputfile 
 -o outfile       Write toks to outfile in svmlight format
 -p /s/ /L/       String s is a sufficient marker of class L
 -q locktype      Locking type: can be file (default), corpus, or none
 -s /s/ /L/       String s is a necessary marker of class L
 -s /s/           String s is a necessary marker of all classes
 -t /t1,t2,t3,t4/ Concatenate frames t+t1,t+t2,t+t3,t+t4        
 -t /b:s:e/       Concatenate frames t+b,t+b+s,t+b+2*s,...,t+e
 -A               Print command line arguments
 -I MLF           Read transcriptions from master label file MLF
 -R               Print RCS version information
 -S f             Use script file f
 -T n             Set trace level to n (meaningful: 1,3,7,15,31)

VApplySvms

Apply SVMs reasonably efficiently to a large database of files in HTK format.
USAGE: VApplySvms [opts] src tgt

 Apply all SVMs specified on the command line (in consecutive -s 
 options).  For each SVM, compute the nonlinear discriminant corresponding
 to every frame or row in the input file.  Output a vector of the 
 corresponding SVM discriminants.

 If src is in HTK format, tgt is also in HTK format.  If src is in 
 svm_light format, tgt is also in svm_light format, with the label
 of each row in tgt equal to the corresponding label in src.

 The -t option specifies that each output row should be computed by
 concatenating several frames of the input (at offsets
 t1,t2,... relative to the row number of the output), and then
 applying all SVMs to the resulting super-row.  Row numbers that refer
 before row 1 are filled by repeating row 1; likewise for the last
 row.  This is similar to the concatenation performed by VExtract.

 The -g option specifies a statsfile (e.g., created using VExtract)
 that will be used to normalize inputs after concatenation.  

 WARNING: ORDER MATTERS. -g and -t must be specified BEFORE -s.
   In this way, it is possible to specify many different -s files
   with different -g and -t options; you simply need to observe
   the sequence.
 Example: suppose that you want to apply two SVMs to 
 every MFC file in the directory mydata.  Both foo.svm
 and bar.svm should be applied to every three-frame sequence
 in the input data. Data should be normalized, prior to applying 
 the SVMs, by the normalization files foo.stats and bar.stats.
 The file train.toks contains training vectors that have been
 correctly concatenated together, but not normalized.

 (1) Use foo.traintoks and bar.traintoks to learn parameters
   of the posterior PDF:

   VApplySvms -T 1 -g foo.stats -s foo.svm -g bar.stats 
           -s bar.svm -d discrim.stats -p posterior
           -h /-2:0.125:2/ -i 1 train.toks train.out
   
 (2) Create a script file ``mydata.scp'' with pairs
      of the form ``mydata/src.MFC mydata/tgt.prob''

 (3) For every mydata/src.MFC: concatenate three consecutive
   frames, normalize by foo.stats, apply the SVM in foo.svm,
   calculate a sigmoid posterior probability estimate (using
   info in discrim.stats), and write to the first column in 
   mydata/tgt.prob.  Repeat same steps with bar.svm, and write
   resulting probabilities to second column in mydata/tgt.prob:

   VApplySvms -T 1 -t /-1:1:1/ -g foo.stats -s foo.svm -g bar.stats 
          -s bar.svm -e discrim.stats -i 1 -p posterior -S mydata.scp

 Recommendation: always use -T 1 to get a list of files being 
 processed.  If you really want a detailed trace, use -T 15 or -T 31.
 -T 15 or -T 31 is currently the only way to see the normalized 
 posterior histogram.

 SVM definitions are assumed to be roughly in SVM-light v5.00 format.
 The first word in the file should contain the characters svm or
 SVM. A line of the form `number number:number number:number ...'  is
 assumed to be a support vector; the first number is its alpha.  A
 line of the form `number # remainder' is assumed to be a parameter.
 Other lines are ignored.  All parameters must be specified before the
 first support vector is read.  Parameter definitions with the
 following remainders are read; others are ignored: 'kernel parameter
 -d', 'kernel parameter -g', 'kernel parameter -s', 'kernel parameter
 -r', 'highest feature index', 'number of support vectors plus 1',
 'threshold b', 'kernel type'.

 'kernel type': the following kernel types are understood.
     0=linear      K(x,y)=x'*y
     1=polynomial  K(x,y)=(s*x'*y+r)^d
     2=rbf         K(x,y)=exp(-g*|x-y|^2)
     3=tanh        K(x,y)=tanh(s*x'*y+r)

 If -p is specified, the output file will contain probability estimates
 rather than raw SVM discriminant outputs.  Probability estimates are
 based on the discrimstats file, which may be computed (if -d is specified)
 or read from an external file (if -e is specified).  If -d is specified,
 one of the class labels in the input must be '1' or '+1'; all other classes
 will be contrasted with class 1, regardless of their label.  If -e is 
 specified, the file 'discrimstats' must have 3*N lines for some N; the 
 first N are global stats, the next N are for class 1, the next N are for 
 class ~1 (class not-one).  The first three lines in each class must be 
 mu, sd, and number of tokens; remaining lines give the histograms.
 Histogram entries must be integers -- they are presumed to represent
 the count in each bin.  Histogram entries are normalized differently
 depending on whether you ask for ``likelihood'' or ``posterior'' histogram
 estimates.  If -i is specified, ``mincount'' is added to each histogram
 bin after reading or writing discrimstats, but before normalizing to
 create a likelihood or posterior histogram. -i 1 is recommended.
 The type specified by -p can be one of:

 Option        Output in col n  Output in col n+N   Estimator
 -p sigmoid    p(class 1|d[n])  p(class ~1|d[n])    1/(1+exp(-a*(d[n]-b)))
 -p posterior  p(class 1|d[n])  p(class ~1|d[n])    histogram
 -p gaussian   p(d[n]|class 1)  p(d[n]|class ~1)    Gaussian
 -p likelihood p(d[n]|class 1)  p(d[n]|class ~1)    histogram

 Options          Meaning 

 -d discrimstats  Save stats of SVM discriminants in discrimstats
 -e discrimstats  Load stats of SVM discriminants from discrimstats
 -g statsfile     Normalize input features by mean and SD from statsfile
 -h /th1,th2,th3/ Compute a histogram with given thresholds
 -h /b:s:e/       Compute a histogram with thresholds b,b+s,b+2s,...,e
 -i mincount      Smooth the histogram by adding mincount to every bin
 -p type          Write probabilities, not discriminants.
 -t /t1,t2,t3,t4/ Concatenate frames t+t1,t+t2,t+t3,t+t4        
 -t /b:s:e/       Concatenate frames t+b,t+b+s,t+b+2*s,...,t+e
 -s path          SVM definition file
 -A               Print command line arguments
 -R               Print RCS version information
 -S f             Use script file f
 -T n             Set trace level to n (meaningful: 1,3,7,15,31)

PHN2FEAT.PL

PHN2FEAT.PL: Convert the phone labels in an MLF into landmark and distinctive feature labels.
Usage:
  phn2feat.pl feattab.txt splitphones.txt landmarkmargin < inputfile > outputfile
Example: 
  phn2feat.pl feattab.txt splitphones.txt 200000 < TRAIN.mlf > TRAIN_lm.mlf
This script reads phoneme labels (labeled using TIMIT, Switchboard, or any similar phoneme label set), estimates the positions of manner-change landmarks, and outputs a transcription specifying two types of distinctive feature events:

Landmarks separate segments. The distinctive features of a segment are presumed to be in flux within $landmarkmargin 100ns-units of a phoneme boundary -- thus segment start times in the outputfile are $landmarkmargin 100ns-units later than segment start times in the inputfile, and conversely for segment end times.

STDIN is an MLF. First label on each line should be a phoneme; other labels are copied directly from input to output without change. STDOUT is an MLF. Each line in STDOUT contains two repetitions of the same landmark time, followed by a complete description of the distinctive features plausibly implemented at the landmark, followed by any additional labels present in the input. Distinctive features are separated by , rather than space, so that the entire DF transcription is on the first level of the MLF hierarchy; in order to split the DF transcription, pipe the output through sed 's/,/ /'.

A "landmark" is an instant in time at which at least one manner feature changes value. The manner features are currently silence, consonantal, continuant, sonorant, and syllabic. No landmark is transcribed unless the feature is specified for both the phone before and the phone after the landmark, with different values.

The files feattab.txt and splitphones.txt specify the way in which landmarks should be computed:

NIST2MLF.PL

NIST2MLF: Convert SPHERE-format transcripts to MLF-format.
 
Usage:
  nist2mlf.pl SCP_file RATE ext1 ext2 ... > output.mlf
Example: 
  nist2mlf.pl TRAIN.scp 16000 phn wrd > TRAIN.mlf
 
This script reads in multiple NIST-format transcription files, and
merges them into an HTK-format MLF file, which is written to STDOUT.

Each line in SCP_file is assumed to contain two filenames: the
filename of the input NIST-format files (extension ignored), and the
equivalent HTK lab-filename (extension ignored).  Thus, for example,
if the file TRAIN.scp contains the line

d:/timit/TIMIT/TRAIN/DR1/FAEM0/SI1392.WAV data/FAEM0SI1392.mfc

then nist2mlf.pl, called as in the example above, will search for
the files d:/timit/TIMIT/TRAIN/DR1/FAEM0/SI1392.phn and .wrd (case is
ignored) and will write the results to an MLF entry headed by
"*/FAEM0SI1392.lab"

For each such filename, nist2mlf.pl looks for filenames in the same
directory but with the extensions , , etcetera.
NIST-format transcriptions are read, and times are converted using the
global variables $MLFPERIOD and $PHNPERIOD defined in the program
(default values: 100ns and 1/16 ms.  Change these if using
Switchboard).

Extensions must be listed in order of increasing span.  The start time
of any label in an  file is rounded to the nearest start time of
any label in level .  End times of all labels beyond level
 are entirely discarded.  Each transcription file is assumed to
be sorted by start times.  This algorithm works fine for combining
TIMIT phn, wrd, and text files.

The Center for Language and Speech Processing
The Johns Hopkins University
3400 North Charles Street, Barton Hall
Baltimore, MD 21218
*Telephone: (410) 516-4237 *Fax: (410) 516-5050 *E-mail: clsp@clsp.jhu.edu