SUPER-SID LISTENING QUIZ

 

Human auditory perception and its capability to recognize familiar voices has played an important motivational role in the development of algorithms for automatic speaker detection and identification. The speech signal provides the ear with a variety of information on a multitude of levels to tell about who is speaking. Among these, the speaker-specific acoustics (pronunciation of sounds, words), the prosody (use of melodic patterns), use of lexicon, idiosyncracies etc. etc. can be named. The following quiz consisting of three sections is designed to illustrate the difficulties in comparing voices when different type of information is emphasized in the signal. Please follow the instructions below, complete the form and print out this page when you are finished. Your comments to the experiment are welcomed!


Group/Name: 

Test and adjust your wavefile player here

START ( Headphones recommended)


TEST SECTION 1 - CLEAN SPEECH

By clicking on the link items you will hear voice samples (telephone-quality recording). In each row of the table, choose one of the five Speaker recordings that matches the speaker contained in the Test sample. You may listen to the samples repeatedly and in any order.
 
Test 1.1 Speaker 1.1.A Speaker 1.1.B Speaker 1.1.C Speaker 1.1.D Speaker 1.1.E
Test 1.2 Speaker 1.2.A Speaker 1.2.B Speaker 1.2.C Speaker 1.2.D Speaker 1.2.E.
Test 1.3 Speaker 1.3.A Speaker 1.3.B Speaker 1.3.C Speaker 1.3.D Speaker 1.3.E
Test 1.4 Speaker 1.4.A Speaker 1.4.B Speaker 1.4.C Speaker 1.4.D Speaker 1.4.E

 Do you have some comments on this part?


TEST SECTION 2 - SHUFFLED SOUNDS

By clicking on the test items you will hear sequences of random-order short speech segments. The segment shuffling removes the word structure, the prosody, and thus the meaning from the sentences, leaving only some short-time acoustic information for the listener. This type of (short-time acoustic) information is used in today's most popular text-independent speaker recognition systems. Again, try to match one of the five speakers to the test sample in each row.
 
 
 
Test 2.1 Speaker 2.1.A Speaker 2.1.B Speaker 2.1.C Speaker 2.1.D Speaker 2.1.E
Test 2.2 Speaker 2.2.A Speaker 2.2.B Speaker 2.2.C Speaker 2.2.D Speaker 2.2.E
Test 2.3 Speaker 2.3.A Speaker 2.3.B Speaker 2.3.C Speaker 2.3.D Speaker 2.3.E
Test 2.4 Speaker 2.4.A Speaker 2.4.B Speaker 2.4.C Speaker 2.4.D Speaker 2.4.E
Test 2.5 Speaker 2.5.A Speaker 2.5.B Speaker 2.5.C Speaker 2.5.D Speaker 2.5.E

What clues did you use to make your decisions? Any comments on this part?


TEST SECTION 3 - SPEECH MELODY

Get ready for the tough stuff! This speech was filtered using an adaptive inverse Linear Prediction Coding (LPC) filter which removes nearly all short-time spectral structure (i.e. it removes the articulation information), such that mainly the fundamental tone and the loudness signal dominate the residual. These two are basic components of Prosody - a source known to carry relatively complex information, including the speaker style, emotion, sentence mode, language type and more. Prosody will be, among others, a subject of study by the Super-SID team this summer. Try to identify the two samples in each row that belong to the same speaker, but be aware that your ear will be robbed the usual acoustic convenience, focus on alternatives!
 
Test 3.1 Speaker 3.1.A Speaker 3.1.B Speaker 3.1.C Speaker 3.1.D Speaker 3.1.E
Test 3.2 Speaker 3.2.A Speaker 3.2.B Speaker 3.2.C Speaker 3.2.D Speaker 3.2.E
Test 3.3 Speaker 3.3.A Speaker 3.3.B Speaker 3.3.C Speaker 3.3.D Speaker 3.3.E
Test 3.4 Speaker 3.4.A Speaker 3.4.B Speaker 3.4.C Speaker 3.4.D Speaker 3.4.E
Test 3.5 Speaker 3.5.A Speaker 3.5.B Speaker 3.5.C Speaker 3.5.D Speaker 3.5.E

Comments on this part:


END

Please PRINT out this page. Use the answer key distributed by the lab supervisor to check your answers.
For more information about the experiment, the algorithms and scripts used to create the audio samples, please contact the author.