athanasios katsamanis

sailAlign

SailAlign is an open-source software toolkit for robust long speech-text alignment implementing an adaptive, iterative speech recognition and text alignment scheme that allows for the processing of very long (and possibly noisy) audio and is robust to transcription errors. It is mainly written as a perl library but its functionality also depends on freely available software, namely HTK, srilm and sclite.

Author

SailAlign's author is Nassos Katsamanis.

Usage

Detailed usage examples are included in the distribution. You may also want to download a tutorial explaining the main usage scenario.

Installing

You may find detailed installations instructions in the README file included in the distribution.

Dependencies

SailAlign does not implement its own speech recognition engine and language modeling algorithms. Instead, I have built interfaces to external freely-available software. Currently, interfaces to the following engines are available:

HTK which is a commonly used speech recognition engine.
Srilm is a toolkit written in C++ which provides various methods for language modeling.

Apart from HTK, precompiled versions of the prerequisite binaries are included with the distribution.

Downloads

To obtain SailAlign, please contact Nassos Katsamanis by email.

Publications

A. Katsamanis, M. Black, P. Georgiou, L. Goldstein and S. Narayanan,
SailAlign: Robust long speech-text alignment,
in Proc. of Workshop on New Tools and Methods for Very-Large Scale Phonetics Research, Jan. 2011.
[pdf][ppt]

If you use SailAlign in your research, please cite this paper, which is the most up-to-date reference to SailAlign's functionality.

Kishore Prahallad, Alan W. Black, Segmentation of Monologues in Audio Books for Building Synthetic Voices,
accepted for publication in IEEE Transactions on Audio, Speech and Language Processing.

Release History

+ v1.4.0 was released in November 2013. Support for Greek has been added. The software now supports non-latin encodings as well but not unicode yet.

+ v1.3.0 was released in September 2013. Versioning convention has changed. Support for Spanish has been added. The Voice Activity Detection module has been made optional and as a result the software can now also run on OSX. Source code is available on github: https://github.com/nassosoassos/sail_align

+ v1.10 was released in June, 2011. Out-of-vocabulary words are treated properly, so that the phonetic alignment would not fail. A few minor bugs have also been corrected. + The first open-source version of the toolkit was distributed in Jan. 2011.

Licensing

SailAlign is Copyright © 2011-2013 by Nassos Katsamanis. SailAlign is distributed under the GNU General Public License (GPL). If you are interested in alternative licensing options (i.e. Dual Licensing) or consulting help, please contact Nassos Katsamanis by email.

The srilm binaries which are included in the distribution for convenience are licensed under SRILM Research Community License.

Acknowledgments

Financial support for this software has been partly provided by NSF. This is gratefully acknowledged. We are also grateful to Isidoros Rodomagoulakis for providing us the greek acoustic models.

Last Update: November 11th, 2013
For comments or questions contact Nassos Katsamanis.