Gale M. Lucas, Jill Boberg, David Traum, Ron Artstein, Jon Gratch, Alesia Gainer, Emmanuel Johnson, Anton Leuski, and Mikio Nakano. The role of social dialogue and errors in robots. Proceedings of the 5th International Conference on Human Agent Interaction, pages 431–433. Bielefeld, Germany, October 2017. (Poster)

Abstract: Social robots establish rapport with human users. This work explores the extent to which rapport-building can benefit (or harm) conversations with robots, and under what circumstances this occurs. For example, previous work has shown that agents that make conversational errors are less capable of influencing people than agents that do not make errors [1]. Some work has shown this effect with robots, but prior research has not considered additional factors such as the level of rapport between the person and the robot. We predicted that building rapport through a social dialogue (such as an ice-breaker) could mitigate the detrimental effect of a robot’s errors on influence. Our study used a Nao robot programmed to persuade users to agree with its rankings on two “survival tasks” (e.g., lunar survival task). We manipulated both errors and social dialogue: the robot either exhibited errors in the second survival task or not, and users either engaged in an ice-breaker with the robot between the two survival tasks or completed a control task. Replicating previous research, errors tended to reduce the robot’s influence in the second survival task. Contrary to our prediction, results revealed that the ice-breaker did not mitigate the effect of errors, and if anything, errors were more harmful after the ice-breaker (intended to build rapport) than in the control condition. This backfiring of attempted rapport-building may be due to a contrast effect, suggesting that the design of social robots should avoid introducing dialogues of incongruent quality.

Eugenia Hee, Ron Artstein, Su Lei, Cristian Cepeda, and David Traum. Assessing differences in multimodal grounding with embodied and disembodied agents. 5th European and 8th Nordic Symposium on Multimodal Communication. Bielefeld, Germany, October 2017.

Abstract: Establishing common ground is an essential part of any collaboration process and can be critical in the success of the desired task at hand. With the increased introduction of artificial agents into society, understanding the way that we interact with both embodied and disembodied versions of these agents becomes even more critical. While people are getting more comfortable with using machines for accessing information and providing services, it is less clear to what degree people strive for common ground with these machines and provide feedback related to their reactions to the provided information. We look at the question of how people provide grounding-related feedback when in conversation with a robot and a virtual human in a variety of tasks and modalities. We examine several different types of activities, including first-contact social dialogue, and several item-ranking tasks, in which participants can reveal their own rankings and rationales and potentially influence others. We also examine several kinds of feedback, including positive and negative signals of understanding and agreement. Finally, we examine verbal utterances and non-verbal signals for these functions. We look at whether different tasks or agent types influence the amount and modalities of different kinds of feedback behaviors. We also look at whether feedback patterns are correlated with different amounts of influence that the agents exert on humans.

Anton Leuski and Ron Artstein. Lessons in dialogue system deployment. Proceedings of the SIGDIAL 2017 Conference: the 18th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 352–355. Saarbrücken, Germany, August 2017. (Demo paper)

Abstract: We analyze deployment of an interactive dialogue system in an environment where deep technical expertise might not be readily available. The initial version was created using a collection of research tools. We summarize a number of challenges with its deployment at two museums and describe a new system that simplifies the installation and user interface; reduces reliance on 3rd-party software; and provides a robust data collection mechanism.

Jacqueline Brixey, Rens Hoegen, Wei Lan, Joshua Rusow, Karan Singla, Xusen Yin, Ron Artstein, and Anton Leuski. SHIHbot: A Facebook chatbot for sexual health information on HIV/AIDS. Proceedings of the SIGDIAL 2017 Conference: the 18th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 370–373. Saarbrücken, Germany, August 2017. (Demo paper)

Abstract: We present the implementation of an autonomous chatbot, SHIHbot, deployed on Facebook, which answers a wide variety of sexual health questions on HIV/AIDS. The chatbot’s response database is compiled from professional medical and public health resources in order to provide reliable information to users. The system’s backend is NPCEditor, a response selection platform trained on linked questions and answers; to our knowledge this is the first retrieval-based chatbot deployed on a large public social network.

Matthew Marge, Claire Bonial, Ashley Foots, Cory Hayes, Cassidy Henry, Kimberly A. Pollard, Ron Artstein, Clare R. Voss, and David Traum. Exploring variation of natural human commands to a robot in a collaborative navigation task. Proceedings of the First Workshop on Language Grounding for Robotics, pages 58–66. Vancouver, British Columbia, Canada, August 2017. (Poster)

Abstract: Robot-directed communication is variable, and may change based on human perception of robot capabilities. To collect training data for a dialogue system and to investigate possible communication changes over time, we developed a Wizard-of-Oz study that (a) simulates a robot’s limited understanding, and (b) collects dialogues where human participants build a progressively better mental model of the robot’s understanding. With ten participants, we collected ten hours of human-robot dialogue. We analyzed the structure of instructions that participants gave to a remote robot before it responded. Our findings show a general initial preference for including metric information (e.g., move forward 3 feet) over landmarks (e.g., move to the desk) in motion commands, but this decreased over time, suggesting changes in perception.

Cassidy Henry, Pooja Moolchandani, Kimberly A. Pollard, Claire Bonial, Ashley Foots, Ron Artstein, Cory Hayes, Clare R. Voss, David Traum, and Matthew Marge. Towards efficient human-robot dialogue collection: Moving Fido into the virtual world. WiNLP workshop. Vancouver, British Columbia, Canada, July 2017. (Poster)

Abstract: Our research aims to develop a natural dialogue interface between robots and humans. We describe two focused efforts to increase data collection efficiency towards this end: creation of an annotated corpus of interaction data, and a robot simulation, allowing greater flexibility in when and where we can run experiments.

Ron Artstein. Inter-annotator agreement. In Handbook of Linguistic Annotation, edited by Nancy Ide and James Pustejovsky, pages 297–313. Springer, Dordrecht, 2017.

Abstract: This chapter touches upon several issues in the calculation and assessment of inter-annotator agreement. It gives an introduction to the theory behind agreement coefficients and examples of their application to linguistic annotation tasks. Specific examples explore variation in annotator performance due to heterogeneous data, complex labels, item difficulty, and annotator differences, showing how global agreement coefficients may mask these sources of variation, and how detailed agreement studies can give insight into both the annotation process and the nature of the underlying data. The chapter also reviews recent work on using machine learning to exploit the variation among annotators and learn detailed models from which accurate labels can be inferred. I therefore advocate an approach where agreement studies are not used merely as a means to accept or reject a particular annotation scheme, but as a tool for exploring patterns in the data that are being annotated.

Bethany Lycan and Ron Artstein. Direct and mediated interaction with a Holocaust survivor. In International Workshop on Spoken Dialogue Systems Technology. Farmington, Pennsylvania, June 2017. (Short paper/poster)

Abstract: The New Dimensions in Testimony dialogue system was placed in two museums under two distinct conditions: docent-led group interaction, and free interaction with visitors. Analysis of the resulting conversations shows that docent-led interactions have a lower vocabulary and a higher proportion of user utterances that directly relate to the system’s subject matter, while free interaction is more personal in nature. Under docent-led interaction the system gives a higher proportion of direct appropriate responses, but overall correct system behavior is about the same in both conditions because the free interaction condition has more instances where the correct system behavior is to avoid a direct response.

Ron Artstein, David Traum, Jill Boberg, Alesia Gainer, Jonathan Gratch, Emmanuel Johnson, Anton Leuski, and Mikio Nakano. Listen to my body: Does making friends help influence people? In Proceedings of the Thirtieth International Florida Artificial Intelligence Research Society Conference, pages 430–435. Marco Island, Florida, May 2017.

Abstract: We investigate the effect of relational dialogue on creating rapport and exerting social influence in human-robot conversation, by comparing interactions with and without a relational component, and with different agent types. Human participants interact with two agents – a Nao robot and a virtual human – in four dialogue scenarios: one involving building familiarity, and three involving sharing information and persuasion in item-ranking tasks. Results show that both agents influence human decision-making; people prefer interacting with the robot, feel higher rapport with the robot, and believe the robot has more influence; and that objective influence of the agent on the person is increased by building familiarity, but is not significantly different between the agents.

Simon S. Woo, Elsi Kaiser, Ron Artstein, and Jelena Mirkovic. Life-experience passwords (LEPs). In Annual Computer Security Applications Conference (ACSAC), pages 113–126. Los Angeles, December 2016.

Abstract: Passwords are widely used for user authentication, but they are often difficult for a user to recall, easily cracked by automated programs and heavily reused. Security questions are also used for secondary authentication. They are more memorable than passwords, but are very easily guessed. We propose a new authentication mechanism, called “life-experience passwords (LEPs),” which outperforms passwords and security questions, both at recall and at security. Each LEP consists of several facts about a user-chosen past experience, such as a trip, a graduation, a wedding, etc. At LEP creation, the system extracts these facts from the user’s input and transforms them into questions and answers. At authentication, the system prompts the user with questions and matches her answers with those stored by the system.

In this paper we propose two LEP designs, and evaluate them via user studies. We further compare LEPs to passwords, and find that: (1) LEPs are 30–47 bits stronger than an ideal, randomized, 8-character password, (2) LEPs are up to 3× more memorable, and (3) LEPs are reused half as often as passwords. While both LEPs and security questions use personal experiences for authentication, LEPs use several questions, which are closely tailored to each user. This increases LEP security against guessing attacks. In our evaluation, only 0.7% of LEPs were guessed by friends, while prior research found that friends could guess 17–25% of security questions. LEPs also contained a very small amount of sensitive or fake information. All these qualities make LEPs a promising, new authentication approach.

Ron Artstein, David Traum, Jill Boberg, Alesia Gainer, Jonathan Gratch, Emmanuel Johnson, Anton Leuski, and Mikio Nakano. Niki and Julie: A robot and virtual human for studying multimodal social interaction. In 18th ACM International Conference on Multimodal Interaction (ICMI), pages 402–403. Tokyo, November 2016. (Demo paper)

Abstract: We demonstrate two agents, a robot and a virtual human, which can be used for studying factors that impact social influence. The agents engage in dialogue scenarios that build familiarity, share information, and attempt to influence a human participant. The scenarios are variants of the classical “survival task,” where members of a team rank the importance of a number of items (e.g., items that might help one survive a crash in the desert). These are ranked individually and then re-ranked following a team discussion, and the difference in ranking provides an objective measure of social influence. Survival tasks have been used in psychology, virtual human research, and human-robot interaction. Our agents are operated in a “Wizard-of-Oz” fashion, where a hidden human operator chooses the agents’ dialogue actions while interacting with an experiment participant.

Albert Rizzo, Stefan Scherer, David DeVault, Jonathan Gratch, Ron Artstein, Arno Hartholt, Gale Lucas, Stacy Marsella, Fabrizio Morbini, Angela Nazarian, Giota Stratou, David Traum, Rachel Wood, Jill Boberg, and Louis-Philippe Morency. Detection and computational analysis of psychological signals using a virtual human interviewing agent. Journal of Pain Management 9(3): 311–321, 2016.

Abstract: It has long been recognized that facial expressions, body posture/gestures and vocal parameters play an important role in human communication and the implicit signalling of emotion. Recent advances in low cost computer vision and behavioral sensing technologies can now be applied to the process of making meaningful inferences as to user state when a person interacts with a computational device. Effective use of this additive information could serve to promote human interaction with virtual human (VH) agents that may enhance diagnostic assessment. This paper will focus on our current research in these areas within the DARPA-funded “Detection and Computational Analysis of Psychological Signals” project, with specific attention to the SimSensei application use case. SimSensei is a virtual human interaction platform that is able to sense and interpret real-time audiovisual behavioral signals from users interacting with the system. It is specifically designed for health care support and leverages years of virtual human research and development at USC-ICT. The platform enables an engaging face-to-face interaction where the virtual human automatically reacts to the state and inferred intent of the user through analysis of behavioral signals gleaned from facial expressions, body gestures and vocal parameters. Akin to how non-verbal behavioral signals have an impact on human to human interaction and communication, SimSensei aims to capture and infer from user non-verbal communication to improve engagement between a VH and a user. The system can also quantify and interpret sensed behavioral signals longitudinally that can be used to inform diagnostic assessment within a clinical context.

Matthew Marge, Claire Bonial, Kimberly A. Pollard, Ron Artstein, Brendan Byrne, Susan G. Hill, Clare Voss, and David Traum. Assessing agreement in human-robot dialogue strategies: A tale of two wizards. In Intelligent Virtual Agents: 16th International Conference, IVA 2016, Los Angeles, CA, USA, September 20–23, 2016 Proceedings (Lecture Notes in Artificial Intelligence 10011), pages 484–488. Springer, Heidelberg, October 2016. (Poster)

Abstract: The Wizard-of-Oz (WOz) method is a common experimental technique in virtual agent and human-robot dialogue research for eliciting natural communicative behavior from human partners when full autonomy is not yet possible. For the first phase of our research reported here, wizards play the role of dialogue manager, acting as a robot’s dialogue processing. We describe a novel step within WOz methodology that incorporates two wizards and control sessions: the wizards function much like corpus annotators, being asked to make independent judgments on how the robot should respond when receiving the same verbal commands in separate trials. We show that inter-wizard discussion after the control sessions and the resolution with a reconciled protocol for the follow-on pilot sessions successfully impacts wizard behaviors and significantly aligns their strategies. We conclude that, without control sessions, we would have been unlikely to achieve both the natural diversity of expression that comes with multiple wizards and a better protocol for modeling an automated system.

Vasily Konovalov, Oren Melamud, Ron Artstein, and Ido Dagan. Collecting better training data using biased agent policies in negotiation dialogues. In Proceedings of WOCHAT, the Second Workshop on Chatbots and Conversational Agent Technologies. Los Angeles, September 2016.

Abstract: When naturally occurring data is characterized by a highly skewed class distribution, supervised learning often benefits from reducing this skew. Human-agent dialogue data is commonly highly skewed when using standard agent policies. Hence, we suggest that agent policies need to be reconsidered in the context of training data collection. Specifically, in this work we implemented biased agent policies that are optimized for data collection in the negotiation domain. Empirical evaluations show that our method is successful in collecting a reasonably balanced corpus in the highly skewed Job-Candidate domain. Furthermore, using this balanced corpus to train a negotiation intent classifier yields notable performance improvements relative to naturally distributed data.

Satheesh Ravi and Ron Artstein. Language portability for dialogue systems: Translating a question-answering system from English into Tamil. Proceedings of the SIGDIAL 2016 Conference: the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 111–116. Los Angeles, September 2016. (Short paper/poster)

Abstract: A training and test set for a dialogue system in the form of linked questions and responses is translated from English into Tamil. Accuracy of identifying an appropriate response in Tamil is 79%, compared to the English accuracy of 89%, suggesting that translation can be useful to start up a dialogue system. Machine translation of Tamil inputs into English also results in 79% accuracy. However, machine translation of the English training data into Tamil results in a drop in accuracy to 54% when tested on manually authored Tamil, indicating that there is still a large gap before machine translated dialogue systems can interact with human users.

Ron Artstein, Alesia Gainer, Kallirroi Georgila, Anton Leuski, Ari Shapiro, and David Traum. New Dimensions in Testimony demonstration. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pages 32–36. Association for Computational Linguistics, San Diego, California, June 2016.

Abstract: New Dimensions in Testimony is a prototype dialogue system that allows users to conduct a conversation with a real person who is not available for conversation in real time. Users talk to a persistent representation of Holocaust survivor Pinchas Gutter on a screen, while a dialogue agent selects appropriate responses to user utterances from a set of pre-recorded video statements, simulating a live conversation. The technology is similar to existing conversational agents, but to our knowledge this is the first system to portray a real person. The demonstration will show the system on a range of screens (from mobile phones to large TVs), and allow users to have individual conversations with Mr. Gutter.

Olga Uryupina, Ron Artstein, Antonella Bristot, Federica Cavicchio, Kepa Rodriguez, and Massimo Poesio. ARRAU: linguistically-motivated annotation of anaphoric descriptions. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 2058–2062. Portorož, Slovenia, May 2016.

Abstract: This paper presents a second release of the ARRAU dataset: a multi-domain corpus with thorough linguistically motivated annotation of anaphora and related phenomena. Building upon the first release almost a decade ago, a considerable effort had been invested in improving the data both quantitatively and qualitatively. Thus, we have doubled the corpus size, expanded the selection of covered phenomena to include referentiality and genericity and designed and implemented a methodology for enforcing the consistency of the manual annotation. We believe that the new release of ARRAU provides a valuable material for ongoing research in complex cases of coreference as well as for a variety of related tasks. The corpus is publicly available through LDC.

Vasily Konovalov, Ron Artstein, Oren Melamud, and Ido Dagan. The Negochat corpus of human-agent negotiation dialogues. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 3141–3145. Portorož, Slovenia, May 2016.

Abstract: Annotated in-domain corpora are crucial to the successful development of dialogue systems of automated agents, and in particular for developing natural language understanding (NLU) components of such systems. Unfortunately, such important resources are scarce. In this work, we introduce an annotated natural language human-agent dialogue corpus in the negotiation domain. The corpus was collected using Amazon Mechanical Turk following the ‘Wizard-Of-Oz’ approach, where a ‘wizard’ human translates the participants’ natural language utterances in real time into a semantic language. Once dialogue collection was completed, utterances were annotated with intent labels by two independent annotators, achieving high inter-annotator agreement. Our initial experiments with an SVM classifier show that automatically inferring such labels from the utterances is far from trivial. We make our corpus publicly available to serve as an aid in the development of dialogue systems for negotiation agents, and suggest that analogous corpora can be created following our methodology and using our available source code. To the best of our knowledge this is the first publicly available negotiation dialogue corpus.

Ron Artstein and Kenneth Silver. Ethics for a combined human-machine dialogue agent. In Ethical and Moral Considerations in Non-Human Agents: Papers from the AAAI Spring Symposium, pages 184–189. Stanford, California, March 2016.

Abstract: We discuss philosophical and ethical issues that arise from a dialogue system intended to portray a real person, using recordings of the person together with a machine agent that selects recordings during a synchronous conversation with a user. System output may count as actions of the speaker if the speaker intends to communicate with users and the outputs represent what the speaker would have chosen to say in context; in such cases the system can justifiably be said to be holding a conversation that is offset in time. The autonomous agent may at times misrepresent the speaker’s intentions, and such failures are analogous to good-faith misunderstandings. The user may or may not need to be informed that the speaker is not organically present, depending on the application.

David Traum, Andrew Jones, Kia Hays, Heather Maio, Oleg Alexander, Ron Artstein, Paul Debevec, Alesia Gainer, Kallirroi Georgila, Kathleen Haase, Karen Jungblut, Anton Leuski, Stephen Smith, and William Swartout. New Dimensions in Testimony: Digitally Preserving a Holocaust Survivor’s Interactive Storytelling. In Interactive Storytelling: 8th International Conference on Interactive Digital Storytelling, ICIDS 2015, Copenhagen, Denmark, November 30–December 4, 2015, Proceedings (Lecture Notes in Computer Science 9445), pages 269–281. Springer, Heidelberg, December 2015. Best paper award

Abstract: We describe a digital system that allows people to have an interactive conversation with a human storyteller (a Holocaust survivor) who has recorded a number of dialogue contributions, including many compelling narratives of his experiences and thoughts. The goal is to preserve as much as possible of the experience of face-to-face interaction. The survivor’s stories, answers to common questions, and testimony are recorded in high fidelity, and then delivered interactively to an audience as responses to spoken questions. People can ask questions and receive answers on a broad range of topics including the survivor’s experiences before, after and during the war, his attitudes and philosophy. Evaluation results show that most user questions can be addressed by the system, and that audiences are highly engaged with the resulting interaction.

David Traum, Kallirroi Georgila, Ron Artstein, and Anton Leuski. Evaluating spoken dialogue processing for time-offset interaction. Proceedings of the SIGDIAL 2015 Conference: the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 199–208. Prague, Czech Republic, September 2015. Best paper award

Abstract: This paper presents the first evaluation of a full automated prototype system for time-offset interaction, that is, conversation between a live person and recordings of someone who is not temporally co-present. Speech recognition reaches word error rates as low as 5% with general-purpose language models and 19% with domain-specific models, and language understanding can identify appropriate direct responses to 60–66% of user utterances while keeping errors to 10–16% (the remainder being indirect, or off-topic responses). This is sufficient to enable a natural flow and relatively open-ended conversations, with a collection of under 2000 recorded statements.

Ron Artstein, Anton Leuski, Heather Maio, Tomer Mor-Barak, Carla Gordon, and David Traum. How many utterances are needed to support time-offset interaction? In Proceedings of the Twenty-Eighth International Florida Artificial Intelligence Research Society Conference, pages 144–149. Hollywood, Florida, May 2015.

Abstract: Time-offset interaction is a new technology that enables conversational interaction with a person who is not present, using pre-recorded video statements. Statements were recorded by Pinchas Gutter, a Holocaust survivor, talking about his personal experiences before, during and after the Holocaust. Participants interacted with the statements through a “Wizard of Oz” system, where live operators select an appropriate reaction to each utterance in real time; unanswered questions were analyzed to identify gaps, and additional statements were recorded to fill the gaps. Even though participant questions were completely unconstrained, the recorded statements from the first round directly addressed at least 58% of the questions; this number rises to 95% with the second round of recording, when tested on newly elicited utterances. This demonstrates the feasibility for a system to address unseen questions and sustain short conversations when the topic is well defined. The statements have been put into an automated system using existing language understanding technology, to create a preliminary working system of time-offset interaction, allowing a live conversation with a real human who is not present for the conversation in real time.

Simon S. Woo, Jelena Mirkovic, Ron Artstein, and Elsi Kaiser. Life-experience passwords (LEPs). In Who are you?! Adventures in Authentication: WAY Workshop. Menlo Park, California, July 2014.

Abstract: User-supplied textual passwords are extensively used today for user authentication. However, these passwords have serious deficiencies in the way they interact with humans’ natural ability to form memories. Strong passwords that are hard to crack are also often hard for humans to remember, while memorable passwords are easily brute-forced or guessed. We propose a novel password design – life-experience passwords (LEPs). We explain how to use users’ existing episodic memories about defining life events to create memorable and hard-to-guess passwords and discuss challenges involved in design and use of LEPs.

Jonathan Gratch, Ron Artstein, Gale Lucas, Giota Stratou, Stefan Scherer, Angela Nazarian, Rachel Wood, Jill Boberg, David DeVault, Stacy Marsella, David Traum, Skip Rizzo, and Louis-Philippe Morency. The Distress Analysis Interview Corpus of Human and Computer Interviews. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pages 3123–3128. Reykjavik, Iceland, May 2014.

Abstract: The Distress Analysis Interview Corpus (DAIC) contains clinical interviews designed to support the diagnosis of psychological distress conditions such as anxiety, depression, and post traumatic stress disorder. The interviews are conducted by humans, human controlled agents and autonomous agents, and the participants include both distressed and non-distressed individuals. Data collected include audio and video recordings and extensive questionnaire responses; parts of the corpus have been transcribed and annotated for a variety of verbal and non-verbal features. The corpus has been used to support the creation of an automated interviewer agent, and for research on the automatic identification of psychological distress.

David DeVault, Ron Artstein, Grace Benn, Teresa Dey, Ed Fast, Alesia Gainer, Kallirroi Georgila, Jon Gratch, Arno Hartholt, Margaux Lhommet, Gale Lucas, Stacy Marsella, Fabrizio Morbini, Angela Nazarian, Stefan Scherer, Giota Stratou, Apar Suri, David Traum, Rachel Wood, Yuyu Xu, Albert Rizzo, and Louis-Philippe Morency. SimSensei kiosk: A virtual human interviewer for healthcare Decision Support. Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2014), pages 1061–1068, Paris, May 2014. Nominated for best paper award

Abstract: We present SimSensei Kiosk, an implemented virtual human interviewer designed to create an engaging face-to-face interaction where the user feels comfortable talking and sharing information. SimSensei Kiosk is also designed to create interactional situations favorable to the automatic assessment of distress indicators, defined as verbal and nonverbal behaviors correlated with depression, anxiety or post-traumatic stress disorder (PTSD). In this paper, we summarize the design methodology, performed over the past two years, which is based on three main development cycles: (1) analysis of face-to-face human interactions to identify potential distress indicators, dialogue policies and virtual human gestures, (2) development and analysis of a Wizard-of-Oz prototype system where two human operators were deciding the spoken and gestural responses, and (3) development of a fully automatic virtual interviewer able to engage users in 15–25 minute interactions. We show the potential of our fully automatic virtual human interviewer in a user study, and situate its performance in relation to the Wizard-of-Oz prototype.

Ron Artstein, David Traum, Oleg Alexander, Anton Leuski, Andrew Jones, Kallirroi Georgila, Paul Debevec, William Swartout, Heather Maio, and Stephen Smith. Time-offset interaction with a Holocaust survivor. In IUI ’14: Proceedings of the 19th international conference on Intelligent User Interfaces, pages 163–168, Haifa, Israel, February 2014.

Abstract: Time-offset interaction is a new technology that allows for two-way communication with a person who is not available for conversation in real time: a large set of statements are prepared in advance, and users access these statements through natural conversation that mimics face-to-face interaction. Conversational reactions to user questions are retrieved through a statistical classifier, using technology that is similar to previous interactive systems with synthetic characters; however, all of the retrieved utterances are genuine statements by a real person. Recordings of answers, listening and idle behaviors, and blending techniques are used to create a persistent visual image of the person throughout the interaction. A proof-of-concept has been implemented using the likeness of Pinchas Gutter, a Holocaust survivor, enabling short conversations about his family, his religious views, and resistance. This proof-of-concept has been shown to dozens of people, from school children to Holocaust scholars, with many commenting on the impact of the experience and potential for this kind of interface.

William Swartout, Ron Artstein, Eric Forbell, Susan Foutz, H. Chad Lane, Belinda Lange, Jacquelyn Morie, Dan Noren, Skip Rizzo, and David Traum. Virtual humans for learning. AI Magazine 34(4): 13-30, 2013.

Abstract: Virtual humans are computer-generated characters designed to look and behave like real people. Studies have shown that virtual humans can mimic many of the social effects that one finds in human-human interactions such as creating rapport, and people respond to virtual humans in ways that are similar to how they respond to real people. We believe that virtual humans represent a new metaphor for interacting with computers, one in which working with a computer becomes much like interacting with a person and this can bring social elements to the interaction that are not easily supported with conventional interfaces. We present two systems that embody these ideas. The first, the Twins are virtual docents in the Museum of Science, Boston, designed to engage visitors and raise their awareness and knowledge of science. The second SimCoach, uses an empathetic virtual human to provide veterans and their families with information about PTSD and depression.

Lauren Faust and Ron Artstein. People hesitate more, talk less to virtual interviewers than to human interviewers. In Semdial 2013 DialDam: Proceedings of the 17th Workshop on the Semantics and Pragmatics of Dialogue, pages 35–43, Amsterdam, December 2013.

Abstract: In a series of screening interviews for psychological distress, conducted separately by a human interviewer and by an animated virtual character controlled by a human, participants talked substantially less and produced twice as many filled pauses when talking to the virtual character. This contrasts with earlier findings, where people were less disfluent when talking to a computer dialogue system. The results suggest that the characteristics of computer-directed speech vary depending on the type of dialogue system used.

David DeVault, Kallirroi Georgila, Ron Artstein, Fabrizio Morbini, David Traum, Stefan Scherer, Albert (Skip) Rizzo and Louis-Philippe Morency. Verbal indicators of psychological distress in interactive dialogue with a virtual human. In Proceedings of the SIGDIAL 2013 Conference: the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 193–202. Metz, France, August 2013.

Abstract: We explore the presence of indicators of psychological distress in the linguistic behavior of subjects in a corpus of semi-structured virtual human interviews. At the level of aggregate dialogue-level features, we identify several significant differences between subjects with depression and PTSD when compared to non-distressed subjects. At a more fine-grained level, we show that significant differences can also be found among features that represent subject behavior during specific moments in the dialogues. Finally, we present statistical classification results that suggest the potential for automatic assessment of psychological distress in individual interactions with a virtual human dialogue system.

Fabrizio Morbini, Kartik Audhkhasi, Kenji Sagae, Ron Artstein, Doğan Can, Panayiotis Georgiou, Shri Narayanan, Anton Leuski and David Traum. Which ASR should I choose for my dialogue system? In Proceedings of the SIGDIAL 2013 Conference: the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 394–403. Metz, France, August 2013.

Abstract: We present an analysis of several publicly available automatic speech recognizers (ASRs) in terms of their suitability for use in different types of dialogue systems. We focus in particular on cloud based ASRs that recently have become available to the community. We include features of ASR systems and desiderata and requirements for different dialogue systems, taking into account the dialogue genre, type of user, and other features. We then present speech recognition results for six different dialogue systems. The most interesting result is that different ASR systems perform best on the data sets. We also show that there is an improvement over a previous generation of recognizers on some of these data sets. We also investigate language understanding (NLU) on the ASR output, and explore the relationship between ASR and NLU performance.

Fabrizio Morbini, Kartik Audhkhasi, Ron Artstein, Maarten Van Segbroeck, Kenji Sagae, Panayiotis Georgiou, David R. Traum, and Shri Narayanan. A reranking approach for recognition and classification of speech input in conversational dialogue systems. In Fourth IEEE Workshop on Spoken Language Technology (SLT). Miami Beach, Forida, December 2012.

Abstract: We address the challenge of interpreting spoken input in a conversational dialogue system with an approach that aims to exploit the close relationship between the tasks of speech recognition and language understanding through joint modeling of these two tasks. Instead of using a standard pipeline approach where the output of a speech recognizer is the input of a language understanding module, we merge multiple speech recognition and utterance classification hypotheses into one list to be processed by a joint reranking model. We obtain substantially improved performance in language understanding in experiments with thousands of user utterances collected from a deployed spoken dialogue system.

Sunghyun Park, Gelareh Mohammadi, Ron Artstein, and Louis-Philippe Morency. Crowdsourcing micro-level multimedia annotations: The challenges of evaluation and interface. To appear in International ACM Workshop on Crowdsourcing for Multimedia (CrowdMM). Nara, Japan, October 2012.

Abstract: This paper presents a new evaluation procedure and tool for crowdsourcing micro-level multimedia annotations and shows that such annotations can achieve a quality comparable to that of expert annotations. We propose a new evaluation procedure, called MM-Eval (Micro-level Multimedia Evaluation), which compares fine time-aligned annotations using Krippendorff’s alpha metric and introduce two new metrics to evaluate the types of disagreement between coders. We also introduce OCTAB (Online Crowdsourcing Tool for Annotations of Behaviors), a web-based annotation tool that allows precise and convenient multimedia behavior annotations, directly from Amazon Mechanical Turk interface. With an experiment using the above tool and evaluation procedure, we show that a majority vote among annotations from 3 crowdsource workers leads to a quality comparable to that of local expert annotations.

David Traum, Priti Aggarwal, Ron Artstein, Susan Foutz, Jillian Gerten, Athanasios Katsamanis, Anton Leuski, Dan Noren, and William Swartout. Ada and Grace: Direct interaction with museum visitors. In Intelligent Virtual Agents: 12th International Conference, IVA 2012, Santa Cruz, CA, USA, September 12–14, 2012 Proceedings (Lecture Notes in Artificial Intelligence 7502), pages 245–251. Springer, Heidelberg, September 2012.

Abstract: We report on our efforts to prepare Ada and Grace, virtual guides in the Museum of Science, Boston, to interact directly with museum visitors, including children. We outline the challenges in extending the exhibit to support this usage, mostly relating to the processing of speech from a broad population, especially child speech. We also present the summative evaluation, showing success in all the intended impacts of the exhibit: that children ages 7–14 will increase their awareness of, engagement in, interest in, positive attitude about, and knowledge of computer science and technology.

Xuchen Yao, Emma Tosch, Grace Chen, Elnaz Nouri, Ron Artstein, Anton Leuski, Kenji Sagae, and David Traum. Creating conversational characters using question generation tools. Dialogue and Discourse 3(2): 125–146, 2012.

Abstract: This article describes a new tool for extracting question-answer pairs from text articles, and reports three experiments which investigate how suitable this technique is for supplying knowledge to conversational characters. Experiment 1 demonstrates the feasibility of our method by creating characters for 14 distinct topics and evaluating them using hand-authored questions. Experiment 2 evaluates three of these characters using questions collected from naive participants, showing that the generated characters provide full or partial answers to about half of the questions asked. Experiment 3 adds automatically extracted knowledge to an existing, hand-authored character, demonstrating that augmented characters can answer questions about new topics but with some degradation of the ability to answer questions about topics that the original character was trained to answer. Overall, the results show that question generation is a promising method for creating or augmenting a question answering conversational character using an existing text.

William Yang Wang, Ron Artstein, Anton Leuski, and David Traum. Improving spoken dialogue understanding using phonetic mixture models. In Cross-Disciplinary Advances in Applied Natural Language Processing: Issues and Approaches, edited by Chutima Boonthum-Denecke, Philip M. McCarthy, and Travis A. Lamkin, chapter 15, pages 225–238. IGI Global, Hershey, Pensylvania, 2012.

Abstract: Reasoning about sound similarities improves the performance of a Natural Language Understanding component that interprets speech recognizer output: the authors observed a 5% to 7% reduction in errors when they augmented the word strings with a phonetic representation, derived from the words by means of a dictionary. The best performance comes from mixture models incorporating both word and phone features. Since the phonetic representation is derived from a dictionary, the method can be applied easily without the need for integration with a specific speech recognizer. The method has similarities with autonomous (or bottom-up) psychological models of lexical access, where contextual information is not integrated at the stage of auditory perception but rather later.

Sin-Hwa Kang, Jonathan Gratch, Candy Sidner, Ron Artstein, Lixing Huang, and Louis-Philippe Morency. Towards building a virtual counselor: Modeling nonverbal behavior during intimate self-disclosure. In Eleventh International Conference on Autonomous Agents and Multiagent Systems (AAMAS). Valencia, Spain, June 2012.

Abstract: Nonverbal behavior is considered critical for indicating intimacy and is important when designing a social virtual agent such as a counselor. One key research question is how to properly express intimate self-disclosure. In this paper we present an extensive study of human nonverbal behavior during intimate self-disclosure. This is an important milestone in creating a virtual counselor. A study of video interactions between human participants demonstrated that people display more head tilts and pauses when they revealed highly intimate information about themselves; they presented more head nods and eye gazes during less intimate sharing. An implementation of these behaviors in a virtual agent suggests that people tend to perceive head tilts, pauses and gaze aversion by the agent as conveying intimate self-disclosure. These findings are important for future research with virtual counselors and other social agents.

Priti Aggarwal, Ron Artstein, Jillian Gerten, Athanasios Katsamanis, Shrikanth Narayanan, Angela Nazarian, and David Traum. The Twins corpus of museum visitor questions. In Proceedings of the Eigth International Conference on Language Resources and Evaluation (LREC 2012), pages 2355–2361. Istanbul, Turkey, May 2012.

Abstract: The Twins corpus is a collection of utterances spoken in interactions with two virtual characters who serve as guides at the Museum of Science in Boston. The corpus contains about 200,000 spoken utterances from museum visitors (primarily children) as well as from trained handlers who work at the museum. In addition to speech recordings, the corpus contains the outputs of speech recognition performed at the time of utterance as well as the system interpretation of the utterances. Parts of the corpus have been manually transcribed and annotated for question interpretation. The corpus has been used for improving performance of the museum characters and for a variety of research projects, such as phonetic-based Natural Language Understanding, creation of conversational characters from text resources, dialogue policy learning, and research on patterns of user interaction. It has the potential to be used for research on children’s speech and on language used when talking to a virtual human.

Elnaz Nouri, Ron Artstein, Anton Leuski and David Traum. Augmenting Conversational Characters with Generated Question-Answer Pairs. In Question Generation: Papers form the AAAI Fall Symposium, pages 49–52. Arlington, Virginia, November 2011.

Abstract: We take a conversational character trained on a set of linked question-answer pairs authored by hand, and augment its training data by adding sets of question-answer pairs which are generated automatically from texts on different topics. The augmented characters can answer questions about the new topics, at the cost of some performance loss on questions about the topics that the original character was trained to answer.

Priti Aggarwal, Kevin Feeley, Fabrizio Morbini, Ron Artstein, Anton Leuski, David Traum, and Julia Kim. Interactive characters for cultural training of small military units. In Intelligent Virtual Agents: 11th International Conference, IVA 2011, Reykjavik, Iceland, September 15–17, 2011 Proceedings (Lecture Notes in Artificial Intelligence 6895), pages 426–427. Springer, Heidelberg, 2011. (Poster)

Abstract: CHAOS, the Combat Hunter Action and Observation Simulation, is an immersive simulation training environment which gives small military units the experience of interacting with local Afghan villagers during a patrol. It is a physical build-out of a housing compound in a mock Afghan village, with several life-size reactive and interactive animated Pashto-speaking virtual characters. The exercise requires an infantry squad to locate and interview a character named Omar, communicating through a live human interpreter and attending to proper protocol regarding Omar’s family. Character animation and behavior is based on extensive interviews with Afghan experts to provide a realistic setting of the intended locale. The system combines virtual human technology, story engineering, and physical set building to provide a compelling training environment that can handle a full squad, requiring trainees to integrate tasks such as working with an interpreter, dealing with non-English speakers from another culture, and assessing information and disposition to make decisions in a mission context.

Sin-Hwa Kang, Candy Sidner, Jonathan Gratch, Ron Artstein, Lixing Hwang, and Louis-Philippe Morency. Modeling nonverbal behavior of a virtual counselor during intimate self-disclosure. In Intelligent Virtual Agents: 11th International Conference, IVA 2011, Reykjavik, Iceland, September 15–17, 2011 Proceedings (Lecture Notes in Artificial Intelligence 6895), pages 455–457. Springer, Heidelberg, 2011. (Poster)

Abstract: Humans often share personal information with others in order to create social connections. Sharing personal information is especially important in counseling interactions. Research studying the relationship between intimate self-disclosure and human behavior critically informs the development of virtual agents that create rapport with human interaction partners. One significant example of this application is using virtual agents as counselors in psychotherapeutic situations. The capability of expressing different intimacy levels is key to a successful virtual counselor to reciprocally induce disclosure in clients. Nonverbal behavior is considered critical for indicating intimacy and is important when designing a social virtual agent such as a counselor. One key research question is how to properly express intimate self-disclosure. In this study, our main goal is to find what types of interviewees’ nonverbal behavior is associated with different intimacy levels of verbal self-disclosure. Thus, we investigated humans’ nonverbal behavior associated to self-disclosure during interview setting (with intimate topics).

Ron Artstein, Michael Rushforth, Sudeep Gandhe, David Traum and Aram Donigian. Limits of Simple Dialogue Acts for Tactical Questioning Dialogues. Proceedings of the 7th IJCAI workshop on knowledge and reasoning in practical dialogue systems, pages 1–8. Barcelona, Spain, July 2011.

Abstract: A set of dialogue acts, generated automatically by applying a dialogue act scheme to a domain representation designed for easy scenario authoring, covers approximately 72%–76% of user utterances spoken in live interaction with a tactical questioning simulation trainer. The domain is represented as facts of the form <object, attribute, value> and conversational actions of the form <character, action>. User utterances from the corpus that fall outside the scope of the scheme include questions about temporal relations, relations between facts and relations between objects, questions about reason and evidence, assertions by the user, conditional offers, attempts to set the topic of conversation, and compound utterances. These utterance types constitute the limits of the simple dialogue act scheme.

Ron Artstein. Error Retun Plots. Proceedings of the SIGDIAL 2011 Conference: the 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 319–324. Portland, Oregon, June 2010. (Poster)

Abstract: Error-return plots show the rate of error (misunderstanding) against the rate of non-return (non-understanding) for Natural Language Processing systems. They are a useful visual tool for judging system performance when other measures such as recall/precision and detection-error tradeoff are less informative, specifically when a system is judged on the correctness of its responses, but may elect to not return a response.

Kallirroi Georgila, Ron Artstein, Angela Nazarian, Michael Rushforth, David Traum, and Katia Sycara. An annotation scheme for cross-cultural argumentation and persuasion dialogues. Proceedings of the SIGDIAL 2011 Conference: the 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 272–278. Portland, Oregon, June 2010. (Poster)

Abstract: We present a novel annotation scheme for cross-cultural argumentation and persuasion dialogues. This scheme is an adaptation of existing coding schemes on negotiation, following a review of literature on cross-cultural differences in negotiation styles. The scheme has been refined through application to coding both two-party and multi-party negotiation dialogues in three different domains, and is general enough to be applicable to different domains with minor or no modifications at all. Dialogues annotated with the scheme have been used to successfully learn culture-specific dialogue policies for argumentation and persuasion.

William Yang Wang, Ron Artstein, Anton Leuski, and David Traum. Improving spoken dialogue understanding using phonetic mixture models. Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference, pages 329–334. Palm Beach, Florida, May 2011. Finalist for best paper award

Abstract: Augmenting word tokens with a phonetic representation, derived from a dictionary, improves the performance of a Natural Language Understanding component that interprets speech recognizer output: we observed a 5% to 7% reduction in errors across a wide range of response return rates. The best performance comes from mixture models incorporating both word and phone features. Since the phonetic representation is derived from a dictionary, the method can be applied easily without the need for integration with a specific speech recognizer. The method has similarities with autonomous (or bottom-up) psychological models of lexical access, where contextual information is not integrated at the stage of auditory perception but rather later.

Grace Chen, Emma Tosch, Ron Artstein, Anton Leuski, and David Traum. Evaluating conversational characters created through question generation. Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference, pages 343–344. Palm Beach, Florida, May 2011. (Poster) Best poster award

Abstract: Question generation tools can be used to extract a question-answer database from text articles. We investigate how suitable this technique is for giving domain-specific knowledge to conversational characters. We tested these characters by collecting questions and answers from naive participants, running the questions through the character, and comparing the system responses to the participant answers. Characters gave a full or partial answer to 53% of the user questions which had an answer available in the source text, and 43% of all questions asked. Performance was better for questions asked after the user had read the source text, and also varied by question type: the best results were answers to who questions, while answers to yes/no questions were among the poorer performers. The results show that question generation is a promising method for creating a question answering conversational character from an existing text.

Julia Campbell, Mark Core, Ron Artstein, Lindsay Armstrong, Arno Hartholt, Cyrus Wilson, Kallirroi Georgila, Fabrizio Morbini, Edward Haynes, Dave Gomboc, Mike Birch, Jonathan Bobrow, H. Chad Lane, Jillian Gerten, Anton Leuski, David Traum, Matthew Trimmer, Rich DiNinni, Matthew Bosack, Timothy Jones, Richard E. Clark, and Kenneth A. Yates. Developing INOTS to support interpersonal skills practice. 2011 IEEE Aerospace Conference, Big Sky, Montana, March 2011.

Abstract: The Immersive Naval Officer Training System (INOTS) is a blended learning environment that merges traditional classroom instruction with a mixed reality training setting. INOTS supports the instruction, practice and assessment of interpersonal communication skills. The goal of INOTS is to provide a consistent training experience to supplement interpersonal skills instruction for Naval officer candidates without sacrificing trainee throughput and instructor control. We developed an instructional design from cognitive task analysis interviews with experts to serve as a framework for system development. We also leveraged commercial student response technology and research technologies including natural language recognition, virtual humans, realistic graphics, intelligent tutoring and automated instructor support tools. In this paper, we describe our methodologies for developing a blended learning environment, and our challenges adding mixed reality and virtual human technologies to a traditional classroom to support interpersonal skills training.

Antonio Roque, Kallirroi Georgila, Ron Artstein, Kenji Sagae, and David Traum. Natural language processing for joint fire observer training. 27th Army Science Conference, Orlando, Florida, December 2010.

Abstract: We describe recent research to enhance a training system which interprets Call for Fire (CFF) radio artillery requests. The research explores the feasibility of extending the system to also understand calls for Close Air Support (CAS). This work includes automated analysis of complex language behavior in CAS missions, evaluation of speech recognition performance, and simulation of speech recognition errors.

Jenny Brusk, Ron Artstein, and David Traum. Don’t tell anyone! Two experiments on gossip conversations. Proceedings of the SIGdial 2010 Conference, pages 193–200. Tokyo, Japan, September 2010.

Abstract: The purpose of this study is to get a working definition that matches people’s intuitive notion of gossip and is sufficiently precise for computational implementation. We conducted two experiments investigating what type of conversations people intuitively understand and interpret as gossip, and whether they could identify three proposed constituents of gossip conversations: third person focus, pejorative evaluation and substantiating behavior. The results show that (1) conversations are very likely to be considered gossip if all elements are present, no intimate relationships exist between the participants, and the person in focus is unambiguous. (2) Conversations that have at most one gossip element are not considered gossip. (3) Conversations that lack one or two elements or have an ambiguous element lead to inconsistent judgments.

William Swartout, David Traum, Ron Artstein, Dan Noren, Paul Debevec, Kerry Bronnenkant, Josh Williams, Anton Leuski, Shrikanth Narayanan, Diane Piepol, Chad Lane, Jacquelyn Morie, Priti Aggarwal, Matt Liewer, Jen-Yuan Chiang, Jillian Gerten, Selina Chu, and Kyle White. Ada and Grace: Toward realistic and engaging virtual museum guides. In Intelligent Virtual Agents: 10th International Conference, IVA 2010, Philadelphia, PA, USA, September 20–22, 2010 Proceedings (Lecture Notes in Artificial Intelligence 6356), pages 286–300. Springer, Heidelberg, 2010.

Abstract: To increase the interest and engagement of middle school students in science and technology, the InterFaces project has created virtual museum guides that are in use at the Museum of Science, Boston. The characters use natural language interaction and have near photoreal appearance to increase engagement. The paper presents an evaluation of natural language performance and presents reports from museum staff on visitor reaction.

Xuchen Yao, Pravin Bhutada, Kallirroi Georgila, Kenji Sagae, Ron Artstein, and David Traum. Practical evaluation of speech recognizers for virtual human dialogue systems. LREC 2010, Valetta, Malta, May 2010. (Poster)

Abstract: We perform a large-scale evaluation of multiple off-the-shelf speech recognizers across diverse domains for virtual human dialogue systems. Our evaluation is aimed at speech recognition consumers and potential consumers with limited experience with readily available recognizers. We focus on practical factors to determine what levels of performance can be expected from different available recognizers in various projects featuring different types of conversational utterances. Our results show that there is no single recognizer that outperforms all other recognizers in all domains. The performance of each recognizer may vary significantly depending on the domain, the size and perplexity of the corpus, the out-of-vocabulary rate, and whether acoustic and language model adaptation has been used or not. We expect that our evaluation will prove useful to other speech recognition consumers, especially in the dialogue community, and will shed some light on the key problem in spoken dialogue systems of selecting the most suitable available speech recognition system for a particular application, and what impact training will have.

Michael Rushforth, Sudeep Gandhe, Ron Artstein, Antonio Roque, Sarrah Ali, Nicolle Whitman, and David Traum. Varying personality in spoken dialogue with a virtual human. In Intelligent Virtual Agents: 9th International Conference, IVA 2009, Amsterdam, The Netherlands, September 14-16, 2009 Proceedings (Lecture Notes in Artificial Intelligence 5773), pages 541–542. Springer, Heidelberg, 2009. (Poster)

Abstract: This poster reports the results of two experiments to test a personality framework for virtual characters. We use the Tactical Questioning dialogue system architecture (TACQ) as a testbed for this effort. Characters built using the TACQ architecture can be used by trainees to practice their questioning skills by engaging in a role-play with a virtual human. The architecture supports advanced behavior in a questioning setting, including deceptive behavior, simple negotiations about whether to answer, tracking subdialogues for offers/threats, grounding behavior, and maintenance of the affective state of the virtual human. Trainees can use different questioning tactics in their sessions. In order for the questioning training to be effective, trainees should have experience of interacting with virtual humans with different personalities, who react in different ways to the same questioning tactics.

Sudeep Gandhe, Nicolle Whitman, David Traum and Ron Artstein. An integrated authoring tool for tactical questioning dialogue systems. In 6th Workshop on Knowledge and Reasoning in Practical Dialogue Systems, Pasadena, California, July 2009.

Abstract: We present an integrated authoring tool for rapid prototyping of dialogue systems for virtual humans taking part in tactical questioning simulations. The tool helps domain experts, who may have little or no knowledge of linguistics or computer science, to build virtual characters that can play the role of the interviewee. Working in a top-down fashion, the authoring process begins with specifying a domain of knowledge for the character; the authoring tool generates all relevant dialogue acts and allows authors to assign the language that will be used to refer to the domain elements. The authoring tool can also be used to manipulate some aspects of the dialogue strategies employed by the virtual characters, and it also supports re-using some of the authored content across different characters.

Ron Artstein, Sudeep Gandhe, Michael Rushforth and David Traum. Viability of a Simple Dialogue Act Scheme for a Tactical Questioning Dialogue System. In DiaHolmia 2009: Proeedings of the 13th Workshop on the Semantics and Pragmatics of Dialogue, pages 43–50. Stockholm, Sweden, June 2009.

Abstract: User utterances in a spoken dialogue system for tactical questioning simulation were matched to a set of dialogue acts generated automatically from a representation of facts as <object, attribute, value> triples and actions as <character, action> pairs. The representation currently covers about 50% of user utterances, and we show that a few extensions can increase coverage to 80% or more. This demonstrates the viability of simple schemes for representing question-answering dialogues in implemented systems.

Ron Artstein, Sudeep Gandhe, Jillian Gerten, Anton Leuski and David Traum. Semi-formal evaluation of conversational characters. In Languages: From Formal to Natural. Essays Dedicated to Nissim Francez on the Occasion of His 65th Birthday (Lecture Notes in Computer Science 5533), edited by Orna Grumberg, Michael Kaminski, Shmuel Katz and Shuly Wintner, pages 22–35. Springer, Heidelberg, 2009.

Abstract: Conversational dialogue systems cannot be evaluated in a fully formal manner, because dialogue is heavily dependent on context and current dialogue theory is not precise enough to specify a target output ahead of time. Instead, we evaluate dialogue systems in a semi-formal manner, using human judges to rate the coherence of a conversational character and correlating these judgments with measures extracted from within the system. We present a series of three evaluations of a single conversational character over the course of a year, demonstrating how this kind of evaluation helps bring about an improvement in overall dialogue coherence.

Ron Artstein, Jacob Cannon, Sudeep Gandhe, Jillian Gerten, Joe Henderer, Anton Leuski and David Traum. Coherence of off-topic responses for a virtual character. 26th Army Science Conference, Orlando, Florida, December 2008.

Abstract: We demonstrate three classes of off-topic responses which allow a virtual question-answering character to handle cases where it does not understand the user s input: ask for clarification, indicate misunderstanding, and move on with the conversation. While falling short of full dialogue management, a combination of such responses together with prompts to change the topic can improve overall dialogue coherence.

Sudeep Gandhe, David DeVault, Antonio Roque, Bilyana Martinovski, Ron Artstein, Anton Leuski, Jillian Gerten, and David Traum. From domain specification to virtual humans: An integrated approach to authoring tactical questioning characters. Interspeech 2008, Brisbane, Australia, September 2008.

Abstract: We present a new approach for rapidly developing dialogue capabilities for virtual humans. Starting from domain specification, an integrated authoring interface automatically generates dialogue acts with all possible contents. These dialogue acts are linked to example utterances in order to provide training data for natural language understanding and generation. The virtual human dialogue system contains a dialogue manager following the information-state approach, using finite-state machines and SCXML to manage local coherence, as well as explicit modeling of emotions and compliance level and a grounding component based on evidence of understanding. Using the authoring tools, we design and implement a version of the virtual human Hassan and compare to previous architectures for the character.

David DeVault, David Traum and Ron Artstein. Making grammar-based generation easier to deploy in dialogue systems. Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue, pages 198–207. Columbus, Ohio, June 2008.

Abstract: We present a development pipeline and associated algorithms designed to make grammarbased generation easier to deploy in implemented dialogue systems. Our approach realizes a practical trade-off between the capabilities of a system s generation component and the authoring and maintenance burdens imposed on the generation content author for a deployed system. To evaluate our approach, we performed a human rating study with system builders who work on a common largescale spoken dialogue system. Our results demonstrate the viability of our approach and illustrate authoring/performance trade-offs between hand-authored text, our grammar-based approach, and a competing shallow statistical NLG technique.

David DeVault, David Traum and Ron Artstein. Practical grammar-based NLG from examples. Proceedings of the Fifth International Natural Language Generation Conference, pages 77–85. Salt Fork, Ohio, June 2008.

Abstract: We present a technique that opens up grammar-based generation to a wider range of practical applications by dramatically reducing the development costs and linguistic expertise that are required. Our method infers the grammatical resources needed for generation from a set of declarative examples that link surface expressions directly to the application s available semantic representations. The same examples further serve to optimize a run-time search strategy that generates the best output that can be found within an application-specific time frame. Our method offers substantially lower development costs than hand-crafted grammars for applicationspecific NLG, while maintaining high output quality and diversity.

Massimo Poesio and Ron Artstein. Anaphoric annotation in the ARRAU corpus. LREC 2008, Marrakech, Morocco, May 2008.

Abstract: Arrau is a new corpus annotated for anaphoric relations, with information about agreement and explicit representation of multiple antecedents for ambiguous anaphoric expressions and discourse antecedents for expressions which refer to abstract entities such as events, actions and plans. The corpus contains texts from different genres: task-oriented dialogues from the Trains-91 and Trains-93 corpus, narratives from the English Pear Stories corpus, newspaper articles from the Wall Street Journal portion of the Penn Treebank, and mixed text from the Gnome corpus.

Ron Artstein, Sudeep Gandhe, Anton Leuski and David Traum. Field Testing of an interactive question-answering character. Proceedings of the ELRA workshop on evaluation, pages 36–40. Marrakech, Morocco, May 2008.

Abstract: We tested a life-size embodied question-answering character at a convention where he responded to questions from the audience. The character’s responses were then rated for coherence. The ratings, combined with speech transcripts, speech recognition results and the character’s responses, allowed us to identify where the character needs to improve, namely in speech recognition and providing off-topic responses.

Ron Artstein and Massimo Poesio. Identifying reference to abstract objects in dialogue. brandial 2006 proceedings, Potsdam, Germany, September 2006.

Abstract: In two experiments, many annotators marked antecedents for discourse deixis as unconstrained regions of text. The experiments show that annotators do converge on the identity of these text regions, though much of what they do can be captured by a simple model. Demonstrative pronouns are more likely than definite descriptions to be marked with discourse antecedents. We suggest that our methodology is suitable for the systematic study of discourse deixis.

Massimo Poesio, Patrick Sturt, Ron Artstein, and Ruth Filik. Underspecification and Anaphora: Theoretical Issues and Preliminary Evidence. Discourse Processes 42(2): 157-175, 2006.

Distributed as Technical report CSM-438, University of Essex Department of Computer Science, October 2005.

Abstract: Much experimental work in psycholinguistics suggests that fully specified syntactic and semantic interpretations are obtained incrementally. The finding that intepretation takes place incrementally is very robust and underlies our own view of sentence processing as well; however, most of this work tends to test very simple interpretive judgments, and using materials which have very clean-cut interpretations, which makes the view expressed above more questionable when applied to semantic interpretation. This article discusses a class of anaphoric expressions that do not appear to have a clear antecedent, using both corpus analysis and psychological experiments. We argue that these cases of anaphora are similar to cases of lexical polysemy, and propose an explicit semantic representation for such cases.

Ron Artstein and Massimo Poesio. Inter-coder agreement for computational linguistics (survey article). Computational Linguistics 34(4): 555-596, 2008.

Abstract: This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff’s alpha as well as Scott’s pi and Cohen’s kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in Computational Linguistics, may be more appropriate for many corpus annotation tasks – but that their use makes the interpretation of the value of the coefficient even harder.

Ron Artstein and Nissim Francez. Plurality and temporal modification. Linguistics and Philosophy 29(3): 251-276, 2006.

Abstract: A semantics with plural entities and plural times accounts for cumulative relations between plural arguments and temporal expressions. The semantics equips nominal, verbal and sentential meanings with temporal context variables and treats temporal modifiers as temporal generalized quantifiers; cumulative conjunction, however, takes place at types lower than generalized quantifiers. The mediation of temporal context variables allows cumulative relations to percolate between an argument in a main clause and one in a temporal clause, in apparent violation of locality restrictions. Plural times form a semilattice structure imposed on the set of intervals; no interaction is observed between this and the internal temporal structure of intervals.

Ron Artstein and Massimo Poesio. Bias decreases in proportion to the number of annotators. In Gerhard Jaeger, Paola Monachesi, Gerald Penn, James Rogers, and Shuly Wintner (eds.), Proceedings of FG-MoL 2005, pages 141-150. Edinburgh, August 2005.

Abstract: The effect of the individual biases of corpus annotators on the value of reliability coefficients is inversely proportional to the number of annotators (less one). As the number of annotators increases, the effect of their individual preferences becomes more similar to random noise. This suggests using multiple annotators as a means to control individual biases.

Massimo Poesio and Ron Artstein. Annotating (anaphoric) ambiguity. Corpus linguistics, Birmingham, England, July 2005.

Abstract: We report the results of a preliminary study attempting to identify ambiguous expressions in spoken language dialogues. In this study we developed methods for marking explicit ambiguity, and generalized previous proposals by Passonneau concerning a distance metric for anaphora to be used with the α coefficient to allow for ambiguous annotations.

Massimo Poesio and Ron Artstein. The reliability of anaphoric annotation, reconsidered: Taking ambiguity into account. Proceedings of the Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, pages 76-83. Ann Arbor, June 2005.

Abstract: We report the results of a study of the reliability of anaphoric annotation which (i) involved a substantial number of naive subjects, (ii) used Krippendorff’s α instead of κ to measure agreement, as recently proposed by Passonneau, and (iii) allowed annotators to mark anaphoric expressions as ambiguous.

Ron Artstein. Quantificational arguments in temporal adjunct clauses. Linguistics and Philosophy 28(5): 541-597, 2005.

Abstract: Quantificational arguments can take scope outside of temporal adjunct clauses, in an apparent violation of locality restrictions: the sentence few secretaries cried after each executive resigned allows the quantificational NP each executive to take scope above few secretaries. I show how this scope relation is the result of local operations: the adjunct clause is a temporal generalized quantifier, which takes scope over the main clause (Pratt and Francez 2001), and within the adjunct clause, the quantificational argument takes scope above the implicit determiner which forms the temporal generalized quantifier. The paper explores various relations among quantificational arguments across clause boundaries, including temporal clauses that are modified internally by a temporal adverbial and temporal clauses with embedded sentential complements.

Ron Artstein. Coordination of parts of words. Lingua 115(4): 359-393, 2005.

Abstract: Coordination of parts of words, as in ortho and periodontists, has to be interpreted at the level of the word parts because the above NP can felicitously describe a pair of one orthodontist and one periodontist. This paper develops a theory of denotations for arbitrary word parts, in which the coordinate word parts denote their own sound, and the rest of the word is a function from sounds to word meanings. This yields the correct interpretation for number in coordinate constructions. The paper also explores phonological constraints on coordinate structures, and shows how certain ungrammatical structures that can be interpreted by the semantics are ruled out on phonological grounds.

Ron Artstein. Focus below the word level. Natural Language Semantics 12(1): 1-22, 2004.

Abstract: Intonational focus can be observed on parts of words that appear to lack intrinsic meaning, and triggers alternatives that are similar in form. In order to provide a unified treatment of focus above and below the word level (they do, after all, behave the same in most respects), I develop a theory of denotations for arbitrary word parts in which focused word parts denote their own sound and the unfocused parts are functions from sounds to word meanings. This allows focus theories to generalize below the word level; any differences with focus above the word level are located in the semantics of word parts. The paper also explores phonological constraints on focus placement, and shows that the focusability of a word part depends solely on its prosodic status, not on any semantic factors.

Ron Artstein. A focus semantics for echo questions. In Ágnes Bende-Farkas and Arndt Riester (eds.), Workshop on Information Structure in Context, pp. 98-107. IMS, University of Stuttgart, 2002.

Abstract: Echo questions are interpreted through focus semantics. Echo questions must be entailed by previous discourse; focus is therefore not needed to mark givenness, and instead it is used to compute the question denotation: the questioned element, marked with a pitch accent, is a focus constituent, and the alternative set of the echo question is its question denotation, i.e. the set of possible answers. The focus strategy exempts echo questions from locality restrictions (“islands”), allows echo questions on parts of words, and allows second-order echo questions which denote sets of questions.

Ron Artstein. Person, animacy and null subjects. In Tina Cambier-Langeveld, Anikó Lipták, Michael Redford and Erik Jan van der Torre (eds.), Proceedings of Console VII, pp. 1-15. SOLE, Leiden, 1999.

Abstract: Licensing of null subjects can be contingent on person and animacy specification. For example, Hebrew allows null subjects if they are first or second person, but not if they are third person. This follows from a general typology that is based on the universal person/animacy hierarchy: if a subject of a certain person or animacy specification may be null, then every subject higher on the hierarchy may be null as well. The above typology, in turn, follows from the general way abstract hierarchies interact in the grammar: elements that appear on the high end of one hierarchy and the low end of another give rise to marked configurations. The mechanism of alignment in Optimality Theory gives a formalization of these universal properties of hierarchies.

Ron Artstein. The incompatibility of underspecification and markedness in Optimality Theory. In Ron Artstein and Madeline Holler (eds.), RuLing Papers 1: Working Papers from Rutgers University, pp. 7-13. Rutgers University Department of Linguistics, New Brunswick, NJ, 1998.

Abstract: Underspecification in the underlying representation cannot give rise to marked structure on the surface, because Optimality Theory grammars force an output to be equally or less marked than the input. Underspecification can still account for alternations involving unmarked structure, but it is only useful when such alternations exist along with forms that do not alternate. The evidence for the existence of such grammatical systems is not very convincing, casting doubts about the usefulness of underspecification in general.

Ron Artstein. Group events as means for representing collectivity. In Benjamin Bruening (ed.), MITWPL 31: Proceedings of the Eighth Student Conference in Linguistics , pp. 41-51. MIT Working Papers in Linguistics, Cambridge, MA, 1997.

Abstract: In this paper I argue in favor of the introduction of "group" events into a framework of event semantics; these mirror the "group" individuals introduced by Landman (1989), and give the domain of events a structure similar to that of the domain of individuals. Group events are used in order to capture collectivity effects that cannot be represented through the domain of individuals, as in the case of predicate conjunction. An attempt to extend the notion of group events and to use them for counting with adverbials such as three times proves at the very least troublesome.

