In recent years, voice-AI systems have seen significant improvements in intelligibility and naturalness, but the human experience when talking to a machine is still remarkably different from the experience of talking to a fellow human. In this paper, we explore one dimension of such differences, i.e., the occurrence of disfluency in machine speech and how it may impact human listeners’ processing and memory of linguistic information. We conducted a human-machine conversation task in Mandarin Chinese using a humanoid social robot (Furhat), with different types of machine speech (pre-recorded natural speech vs. synthesized speech, fluent vs. disfluent). During the task, the human interlocutor was tested in terms of how well they remembered the information presented by the robot. The results showed that disfluent speech (surrounded by “um”/“uh”) did not benefit memory retention both in pre-recorded speech and in synthesized speech. We discuss the implications of current findings and possible directions of future work.
|Name||Lecture Notes in Computer Science|
|Competition||24th International Conference on Human-Computer Interaction (HCII2022)|
|Period||26/06/22 → 1/07/22|
- Human-robot interaction
- Humanoid robot
- Spoken disfluency