Evolution of XAIA
XAIA was created with the aim of providing AI-assisted, self-administered, psychological support within VR settings. Upon activation, users select from nine immersive natural landscapes, including water-based, terrestrial, or celestial environments. In each setting, the user is welcomed by a robot known as “XAIA,” who acts as the therapist.
While testing with GPT-4, we observed that its replies did not consistently align with the best practices of psychotherapy. For instance, the LLM was too hasty to provide recommendations and did not invest time in building rapport. Therefore, we implemented a structured protocol to train GPT-4 to respond in a way similar to a human therapist (Fig. 2).
Initially, we gathered transcripts of CBT patient-therapist interactions conducted by a proficient psychotherapist to enhance the program’s adherence to the style and rhythm of an experienced human therapist. From these, we identified recurring exchanges and encoded these patterns into GPT-4’s system prompts. For instance, a guideline was established: “Demonstrate empathy and understanding to validate [first name]’s emotions.” Along with this guideline were sample responses, such as: “It must be difficult facing such circumstances” and “I comprehend how this has led to your feelings of [emotion].” Similarly, we included system prompts to reframe cognitive distortions, recognize automatic negative thoughts, approach discussions without judgment, and avoid technical terminology or condescending language, among a list of over seventy other psychotherapy best practices.
XAIA was programmed to identify the extent to which users were fully engaged, and to reassess its line of questioning if the system detected decreasing engagement. XAIA was also instructed to use the VR environment in a therapeutic manner, such as introducing an interactive breathing exercise within the virtual space if deemed appropriate by the AI.
If at any point the user expressed indications of suicidal thoughts, they were directed to seek crisis intervention and immediate support, and were given information for emergency services. If the user mentioned medical issues beyond the scope of talk therapy, XAIA was programmed to recommend that the user seek care from a medical healthcare professional.
In collaboration with an expert psychotherapist and an experienced psychiatrist, we continuously updated and refined the system prompts to optimize idealized responses of a compassionate, non-judgmental, and helpful therapist. The system was then methodically evaluated by licensed mental health professionals assuming the roles of patients across a wide range of clinical scenarios (e.g., discussing anxiety, depression, work-life balance, relationship issues, trust issues, post-traumatic stress, grief, self-compassion, emotional regulation, and social isolation, among other reasons people seek talk therapy). Their detailed feedback allowed further refinement and expansion of the system prompts and model responses.
Subsequently, we expanded our user base, enabling interactions with XAIA within a supervised environment. Each transcript was reviewed by a mental health expert, identifying areas of potential improvement. This iterative process, consisting of prompt adjustment followed by evaluation and expert review, was repeated over a hundred times. We continued this cycle until feedback consistently indicated significant improvement, with critiques becoming increasingly specific and infrequent.
Participants in the Study
We aimed to recruit up to 20 adult participants, with the objective of ending recruitment once thematic saturation was achieved. We recruited via IRB-approved social media posts and direct recruitment from clinicians.
Participants were required to speak English and obtain a score on the Patient-Health Questionnaire between 5 and 19 or a score on the Generalized Anxiety Disorder 7-Item Scale between 5 and 14, representing mild-to-moderate depression or anxiety, respectively. We excluded individuals with motion sickness, facial or head deformities, seizure in the past year, pregnancy, or being legally deaf or blind. The study protocol was approved by the Cedars-Sinai Medical Center Institutional Review Board (IRB STUDY00002753) and rated by the IRB as minimal risk and requiring verbal consent. Our study was conducted in accordance with the International Conference of Harmonization Guidelines for Good Clinical Practice and the Declaration of Helsinki.
Consenting patients visited Cedars-Sinai to participate in a single therapy session. During their visit, participants were briefed on use of the headset (Meta Quest 2) and then engaged privately with XAIA for up to 30 minutes. A licensed mental health professional remained available if needed. Participants then engaged in a post-session interview led by a qualitative researcher.
Two qualitative researchers led inductive thematic analyses to derive themes from the transcribed interviews. They created codes and labels from the unstructured data through iterative passes, with each subsequent pass refining and aggregating the codes into themes supported by direct quotes. This process occurred after each visit and continued until no new codes were identified, indicating thematic saturation. We quantitatively tracked saturation by graphing the emergence rate of new second-level pattern codes by interview, generating a cumulative count of novel second-level codes over time (Supplementary Fig. 1). The Supplementary Information provides further details regarding the qualitative analyses.
All original text was authored by the associated individuals for a complete publication. After editorial guidance to format the paper for a concise communication, we utilized Chat GPT-4 to help alter the text to fit within the shortened structure.