Sangam: A Confluence of Knowledge Streams

Listening test materials for "Evaluating comprehension of natural and synthetic conversational speech"

Show simple item record

dc.contributor EPSRC - Engineering and Physical Sciences Research Council
dc.contributor Wester, Mirjam
dc.creator Wester, Mirjam
dc.creator Watts, Oliver
dc.creator Henter, Gustav Eje
dc.date 2016-03-03T17:08:21Z
dc.date 2016-03-03T17:08:21Z
dc.date.accessioned 2023-02-17T20:52:04Z
dc.date.available 2023-02-17T20:52:04Z
dc.identifier Wester, Mirjam; Watts, Oliver; Henter, Gustav Eje. (2016). Listening test materials for "Evaluating comprehension of natural and synthetic conversational speech", [dataset]. University of Edinburgh, School of Informatics, Centre for Speech Technology Research. https://doi.org/10.7488/ds/1352.
dc.identifier https://hdl.handle.net/10283/1935
dc.identifier https://doi.org/10.7488/ds/1352
dc.identifier.uri http://localhost:8080/xmlui/handle/CUHPOERS/243958
dc.description Current speech synthesis methods typically operate on isolated sentences and lack convincing prosody when generating longer segments of speech. Similarly, prevailing TTS evaluation paradigms, such as intelligibility (transcription word error rate) or MOS, only score sentences in isolation, even though overall comprehension arguably is more important for speech-based communication. In an effort to develop more ecologically-relevant evaluation techniques that go beyond isolated sentences, we investigated comprehension of natural and synthetic speech dialogues. Specifically, we tested listener comprehension on long segments of spontaneous and engaging conversational speech (three 10-minute radio interviews of comedians). Interviews were reproduced either as natural speech, synthesised from carefully prepared transcripts, or synthesised using durations from forced-alignment against the natural speech, all in a balanced design. Comprehension was measured using multiple choice questions. A significant difference was measured between the comprehension/retention of natural speech (74% correct responses) and synthetic speech with forced-aligned durations (61% correct responses). However, no significant difference was observed between natural and regular synthetic speech (70% correct responses). Effective evaluation of comprehension remains elusive.
dc.description The dataset is described in the readme.txt file.
dc.format application/zip
dc.format application/zip
dc.format text/plain
dc.language eng
dc.publisher University of Edinburgh, School of Informatics, Centre for Speech Technology Research
dc.relation "Evaluating comprehension of natural and synthetic conversational speech" presented at Speech Prosody 2016, Boston, USA
dc.rights Creative Commons Attribution 4.0 International Public License
dc.subject Speech Synthesis Evaluation
dc.subject Comprehension
dc.subject Conversational Speech
dc.subject Statistical Parametric Speech Synthesis
dc.subject Mathematical and Computer Sciences::Speech and Natural Language Processing
dc.title Listening test materials for "Evaluating comprehension of natural and synthetic conversational speech"
dc.type dataset


Files in this item

Files Size Format View
audio.zip 417.9Mb application/zip View/Open
readme.txt 4.504Kb text/plain View/Open
wester_comprehension_data.zip 80.00Kb application/zip View/Open

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse