Aurora 2 Database
Aurora 2 based on TIdigits downsampled to 8kHz. There are three different test sets of noises:
- Set A: Subway, Babble, Car, Exhibition Hall
- Set b: Restaurant, Street, Airport, Train Station
- Set c: Subway, Street (with different frequency characteristics)
The training conditions on Aurora 2 database are of two kinds:
- Training on clean data
- Multi-condition Training on noisy data search).
The number of training sentences are 8440 and there are 1001 test sentences per test set.
We used three Front-End configurations for our baseline experinments
(click on each one to see the baseline results):
Aurora 3 Database
Five European languages constitute the Aurora 3 database:
- Finnish
- Spanish
- German
- Danish
- Italian
All the data have been recorded in three noisy conditions: quiet, low noisy
(low), high noisy
(high).
There are three experimental setups:
- Well-Matched (WM)
- 70% of all utterances in quiet, low, and high conditions were used for training, and the remaining 30% for testing
- Medium Mismatched (MM)
- 100% hands-free recordings from quiet and low for training and 100% hands-free recordings from high for testing
- High Mismatched (HM)
- 70% of close-talking recordings from all noise conditions for training and 30% of hands-free recordings from low and high for testing
Click
here to see the Aurora 3 baseline results.
Aurora 4 Database
Aurora 4 database is based on the WSJ0 collection, contained of 7138 training data.
Six different noises have been artificially added: Car, Babble, Restaurant, Street, Airport, and Train Station.
Training data sets are of three kinds: clean, multicondition, and noisy.
There are 14 test sets of two sizes: small
(166 utterances), and large
(330 utterances).
You can find more information about the Aurora 4 database and the imported Lattices in
here.
Click
here to see the Aurora 4 baseline results.