Outcomes Research in Review

Can the Use of Siri, Alexa, and Google Assistant for Medical Information Result in Patient Harm?


 

References

Outcomes by conversational assistant were significantly different (X24 = 132.2, P < 0.001). Alexa failed for most tasks (125/394 [91.9%]), resulting in significantly more attempts made but significantly fewer instances in which responses could lead to harm. Siri had the highest task completion rate (365 [77.6%]), in part because it typically displayed a list of web pages in its response that provided at least some information to the participant. However, because of this, it had the highest likelihood of causing harm for the tasks tested (27 [20.9%]). Median user satisfaction with the 3 conversational assistants was neutral, but with significant differences among them. Participants were least satisfied with Alexa and most satisfied with Siri, and stated they were most likely to follow the recommendations provided by Siri.

Qualitatively, most participants said they would use conversational assistants for medical information, but many felt they were not quite up to the task yet. When asked about their trust in the results provided by the conversational assistants, participants said they trusted Siri the most because it provided links to multiple websites in response to their queries, allowing them to choose the response that most closely matched their assumptions. They also appreciated that Siri provided a display of its speech recognition results, giving them more confidence in its responses, and allowing them to modify their query if needed. Many participants expressed frustration with the systems, but particularly Alexa.

Conclusion. Reliance on conversational assistants for actionable medical information represents a safety risk for patients and consumers. Patients should be cautioned to not use these technologies for answers to medical questions they intend to act on without further consultation from a health care provider.

Commentary

Roughly 9 in 10 American adults use the Internet,1 with the ability to easily access information through a variety of devices including smartphones, tablets, and laptop computers. This ease of access to information has played an important role in shifting how individuals access health information and interact with their health care provider.2,3 Online health information can increase patients’ knowledge of, competence with, and engagement in health care decision-making strategies. Online health information seeking can also complement and be used in synergy with provider-patient interactions. However, online health information is difficult to regulate, complicated further by the wide range of health information literacy among patients. Inaccurate or misleading health information can lead patients to make detrimental or even dangerous health decisions. These benefits and concerns similarly apply to conversational assistants like Siri (Apple), Alexa (Amazon), and Google Assistant, which are increasingly being used by patients and consumers to access medical- and health-related information. As these technologies are voice-activated, they appear to address some health literacy limitations. However, they still pose important limitations and safety risks,4 especially as conversational assistants are being perceived as a trustworthy parallel to clinical assessment and counseling systems.5

There has been little systematic research to explore potential risks of these platforms, as well as systematically characterize error types and error rates. This study aimed to determine the capabilities of widely used, general-purpose conversational assistants in responding to a broad range of medical questions when asked by laypersons in their own words and sought to conduct a systematic evaluation of the potential harm that could result from patients or consumers acting on the resulting recommendations. The study authors found that when asked questions about situations that require medical expertise, conversational assistants failed more than half of the time and led study participants to report that they would take actions that could have resulted in harm or death. Further, the authors characterized several failure modes, including errors due to misrecognition of study participant queries, study participant misunderstanding of tasks and responses by the conversation assistant, and limited understanding of the capabilities of the assistants to understand user queries. This misalignment of expectations by users that assistants can follow conversations/discourse led to frustrating experiences by some study participants.

Recommended Reading

Study: Few physicians use telemedicine
Journal of Clinical Outcomes Management
NCI director: Data failures cost lives
Journal of Clinical Outcomes Management
Next legal ruling on ACA could come on New Year’s Eve
Journal of Clinical Outcomes Management
Memorial Sloan Kettering’s season of turmoil
Journal of Clinical Outcomes Management
Hospital Readmissions Reduction Program may be doing more harm than good
Journal of Clinical Outcomes Management
New or existing drugs? Both fuel price inflation
Journal of Clinical Outcomes Management
More than 23% of antibiotic fills deemed unnecessary
Journal of Clinical Outcomes Management
Revised ACA premium calculator could up uninsured rate
Journal of Clinical Outcomes Management
Drug-pricing policies find new momentum as ‘a 2020 thing’
Journal of Clinical Outcomes Management
Anxiety, depression, burnout higher in physician mothers caring for others at home
Journal of Clinical Outcomes Management