Public speech recognition transcripts as a configuring parameter
Public speech recognition transcripts as a configuring parameter
Displaying a written transcript of what a human said (i.e. producing an "automatic speech recognition transcript") is a common feature for smartphone vocal assistants: the utterance produced by a human speaker (e.g. a question) is displayed on the screen while it is being verbally responded to by the vocal assistant. Although very rarely, this feature also exists on some "social" robots which transcribe human interactants' speech on a screen or a tablet. We argue that this informational configuration is pragmatically consequential on the interaction, both for human participants and for the embodied conversational agent. Based on a corpus of co-present interactions with a humanoid robot, we attempt to show that this transcript is a contextual feature which can heavily impact the actions ascribed by humans to the robot: that is, the way in which humans respond to the robot's behavior as constituting a specific type of action (rather than another) and as constituting an adequate response to their own previous turn.
Damien Rudaz、Christian Licoppe
计算技术、计算机技术
Damien Rudaz,Christian Licoppe.Public speech recognition transcripts as a configuring parameter[EB/OL].(2025-04-06)[2025-06-10].https://arxiv.org/abs/2504.04488.点此复制
评论