Evaluating the Usability of AI Chatbots

introduction

As AI chatbots become embedded into everyday workflows, their usability depends not only on speed, but on visibility, interaction patterns, and perceived response quality.

This study evaluated the usability of three major AI chatbots: ChatGPT, Microsoft Copilot, and Google Gemini.

Through controlled usability testing and interviews with 15 participants, I examined how interface design features influence:

⏱️ Task efficiency

📲 Interaction behavior

🔎 Feature discoverability

☑️ User satisfaction

This paper was written for submission to the International Journal of Recent Trends in Human Computer Interaction (IJHCI).

▶️ Research Problem:

Most existing chatbot research focuses on:

⏱️ Task efficiency

📍 Response accuracy

📈 Performance metrics

But less research examines:

❇️ How interface design elements (icons, visibility, accessibility) influence actual user behavior and satisfaction.

Additionally, traditional usability frameworks were developed for graphical user interfaces, not conversational AI.

▶️ Research Question:

How do interface visibility, accessibility, and response quality influence user satisfaction and behavior across AI chatbot platforms?

methodology

▶️ Participants:

15 adults (ages 21 – 65)

They had varied experience with AI chatbots.

After removing one extreme outlier, final analysis included 14 participants.

▶️ Study Design:

Within-subjects repeated measures design

I used Latin Square counterbalancing to prevent order effects

Each participant completed two tasks on each of the 3 chatbots:

  • Solve a personal or work-related problem
  • Copy the chatbot’s response

▶️ Data Collected:

🔢 Quantitative

⏱️ Task Completion Times in Seconds

📈 Descriptive Statistics Test of Completion Times in Seconds on JASP

🔠 Qualitative

🕹️ User Interactions with Icons vs. Keyboard & Mouse to Submit & Copy

⭐ Participants’ Average Rating for Each Chatbot

  • ChatGPT - 4.4 out of 5
  • Gemini - 3.9 out of 5
  • Copilot - 3.6 out of 5
🗣️ Participants’ Verbal Feedback on Each Chatbot Categorized
  • 12 out of 14 participants mentioned satisfaction with ChatGPT's response quality and/or length
  • 3 out of 14 participants mentioned satisfaction with Copilots's response quality and/or length
  • 2 out of 14 participants mentioned satisfaction with Gemini's response quality and/or length

findings

1️⃣ No Significant Task Efficiency Differences:

Repeated Measures ANOVA showed:

  • No significant difference in task completion time between chatbots
  • Efficiency alone did not explain user preference

This challenges the assumption that speed = usability.

2️⃣ Users Avoided Interface Icons:

▶️ While Submitting Prompts:

64–79% used Enter/Return keys instead of submit icon

▶️ While Copying Responses:

79% manually (with keypad or mouse) copied text in ChatGPT
100% manually copied text in Copilot and Gemini

Despite visible icons, users overwhelmingly preferred keyboard and mouse shortcuts.

3️⃣ Icon Visibility Affected Behavior:

▶️ Interface differences:
  • Gemini nests Copy under a hidden menu
  • Copilot and Gemini icons only appear on hover
  • ChatGPT icons are always visible

Users interacted with visible features slightly more often, suggesting:

🌟 Discoverability directly influences feature adoption.

4️⃣ Satisfaction Differences Emerged:

Even though efficiency was similar:

  • 85.7% of participants preferred ChatGPT’s responses
  • ChatGPT received the highest ease-of-use ratings

This suggests:

🌟 Perceived response quality influences usability ratings more than speed alone.

Recommendations

Based on findings, AI chatbot interfaces should:

🔹 Make high-frequency actions always visible

Avoid hover-only or nested menus for critical actions.

🔹 Support both novices and power users

Keyboard shortcuts are heavily used, they should be preserved and discoverable.

🔹 Prioritize response quality

Users equate better responses with better usability.

🔹 Increase transparency

Consistent icon placement and system feedback build trust.

next steps

With a larger sample size and more complex tasks, future research could:

  • Test chatbot usability in professional workflows
  • Study long-term use patterns
  • Examine discoverability interventions

View Other Projects

Brightspace Pulse App - Course Management Redesign

UX/UI Research and Design

The Fourth Floor Website - Startup Website and Data

UX/UI Research and Design, Data Analysis and Visualization