paint-brush
Inclusiveness Matters: A Large-Scale Analysis of User Feedbackby@feedbackloop

Inclusiveness Matters: A Large-Scale Analysis of User Feedback

tldt arrow

Too Long; Didn't Read

This study delves into user feedback from popular apps on Reddit, Google Play Store, and Twitter, unveiling a two-layer taxonomy of inclusiveness concerns. From fairness and technology to demography, discover diverse perspectives on software inclusivity. The study also explores the effectiveness of large language models in automating the identification of inclusiveness-related user feedback.

Company Mentioned

Mention Thumbnail
featured image - Inclusiveness Matters: A Large-Scale Analysis of User Feedback
The FeedbackLoop: #1 in PM Education HackerNoon profile picture

Authors:

(1) Nowshin Nawar Arony;

(2) Ze Shi Li;

(3) Bowen Xu;

(4) Daniela Damian.

Abstract & Introduction

Motivation

Related Work

Methodology

A Taxonomy of Inclusiveness

Inclusiveness Concerns in Different Types of Apps

Inclusiveness Across Different Sources of User Feedback

Automated Identification of Inclusiveness User Feedback

Discussion

Conclusion & References

Abstract

In an era of rapidly expanding software usage, catering to the diverse needs of users from various backgrounds has become a critical challenge. Inclusiveness, representing a core human value, is frequently overlooked during software development, leading to user dissatisfaction. Users often engage in discourse on online platforms where they indicate their concerns. In this study, we leverage user feedback from three popular online sources, Reddit, Google Play Store, and Twitter, for 50 of the most popular apps in the world to reveal the inclusiveness-related concerns from end users. Using a Socio-Technical Grounded Theory approach, we analyzed 23,107 posts across the three sources and identified 1,211 inclusiveness related posts. We organize our empirical results in a taxonomy for inclusiveness comprising 6 major categories: Fairness, Technology, Privacy, Demography, Usability, and Other Human Values. To explore automated support to identifying inclusiveness-related posts, we experimented with five state-of-the-art pre-trained large language models (LLMs) and found that these models’ effectiveness is high and yet varied depending on the data source. GPT-2 performed best on Reddit, BERT on the Google Play Store, and BART on Twitter. Our study provides an in-depth view of inclusiveness-related user feedback from most popular apps and online sources. We provide implications and recommendations that can be used to bridge the gap between user expectations and software so that software developers can resonate with the varied and evolving needs of the wide spectrum of users.


Index Terms—inclusion, diversity, user feedback, human aspects, deep learning

1 INTRODUCTION

As software usage continues to grow worldwide, an increasingly diverse user base is engaging with the applications. The diverse group includes individuals from various genders, regions, cultures, socio-economic backgrounds, political beliefs, people with physical and cognitive abilities, values, and educational backgrounds, among many others. However, software is often built for the “average user” [1] and fails to adhere to the diverse user needs. For instance, Twitter (currently known as X), a widely used social networking app with over 390 million global users [2], released an image cropping algorithm that automatically cropped images. It focused on important parts, such as faces and text, to optimize space on the main feed and allow multiple pictures in a single tweet. However, users soon identified that the algorithm could only detect white faces and cropped out faces of black people [3]. The topic soon became trending as thousands of users joined the discussion. Similarly, numerous other incidents have emerged from online user feedback [4], highlighting the lack of inclusiveness in software.


In fact, the feedback provided by users on online platforms (e.g. app reviews) has grown significantly in amount and significance to software organizations. Software companies are not only able to identify areas of product improvement based on such feedback but also to learn about the inclusiveness aspects of software. In this space, Crowd Requirement Engineering (CrowdRE) has become a popular area of study for identifying product relevant information from large volumes of user feedback in various online platforms such as app stores, social media, and forums. A growing body of research in “end user human aspects” has attempted to address and understand aspects like gender and accessibility using CrowdRE sources such as App reviews [5], [6]. Khalajzadeh et al. [7] studied user feedback from Google Play Store and developer discussion from GitHub to understand the human aspects related conversations from 12 open source apps. The authors found inclusiveness related discussions from both sources (31 from Google Play Store and 31 from Github). While insightful, open-source applications represent only a portion of the many applications used in our society and, therefore, can result in limited user feedback and, more importantly, a lack of representation of diverse opinions.


Therefore, there is a need for a more extensive exploration of the inclusiveness category from a larger, therefore more diverse, user base. The analysis may reveal variations in inclusiveness concerns depending on the type of software, insights which can help companies focus on their users’ specific needs. Furthermore, with the increasing number of user feedback platforms (e.g., social media), diverse users may prefer using different mediums due to different levels of engagement with particular online platforms [8]. Thus, exploring a variety of sources of feedback can reveal more insights about inclusiveness. Finally, the growing amount of user feedback, while useful, represents a significant manual effort for software organizations, making the automation in identifying inclusiveness-related user concerns worth the effort to reduce the manual overload of analysis. Our study aims to fill this gap through a largescale analysis of user feedback for 50 of the most popular apps with millions of users, from Google Play Store, Reddit, and Twitter.


Guided by the following research questions, we employed a Socio-technical grounded theory (STGT) approach [9] to analyze the user feedback we collected from these multiple sources and software apps.


RQ1 What are the different types of inclusiveness related user feedback found on online sources?


RQ2 How does inclusiveness related user feedback differ for different types of apps?


RQ3 How does inclusiveness related user feedback differ across different sources of feedback?


RQ4 How effective are pre-trained large language models in automatically identifying inclusiveness related user feedback?


We collected over 10 million posts and examined the inclusiveness related user feedback both through qualitative analysis and by experimenting with large language models to automatically identify inclusiveness related user feedback. Our study provides the following contributions:


• We propose a two-layer taxonomy of inclusiveness based on user feedback from 50 of the most popular apps in a variety of types of software. Comprising of categories of inclusiveness concerns such as fairness, technology, privacy, demography, usability and other human values, the taxonomy advances our preliminary, limited understanding of user inclusiveness-related concerns developed in previous research from only open source projects;


• Insights into the different inclusiveness concerns in different application types and feedback sources.


• A manually annotated dataset of inclusiveness-related user feedback that can facilitate future research and practice.


• Insights and empirical results from using large language models to automatically identify inclusiveness related user feedback from multiple sources and which companies and practitioners can leverage to address the inclusiveness concerns of their end users.


This paper is available on arxiv under CC 4.0 license.