top of page

Early concept testing

User research based on low-fidelity prototypes was conducted during the product development to help the product team discover the best solution. This research aimed to investigate how UI features affected the users' feelings and perceived product quality of a Smart equaliser.

My Role

 

  1. Designed, planned, and ran research: I defined the experimental design, prepared the research material, and set up a factorial survey design with open-ended questions for a remote usability study.

  2. Data analysis: I cleaned, structured, and visualised the resulting data. I performed hypotheses testing using analysis of variance on the aligned transformed data. I used thematic analysis.  

  3. Results: Presented research insights and design recommendations to product stakeholders  

Results

  1. Assessed the impact of UX features on users' feelings and perceived product quality.

  2. Help define functional requirements. 

  3. Provided clear design recommendations

  4. Defined direction for future development of the software tool

Semantics -Estimated Settings – 2.png

The figure shows one of the AI-assistive wireframe prototypes used as a stimulus to elicit participants' responses in the survey.

Project Overview

Music Tribe is a multinational company with twelve reputable brands, including Behringer, MIDAS, TC-Electronic, LabGruppen, and others. One of our brands had just begun to develop a new product. The ASPAI team was asked to create a machine-learning model for this product that assists users in applying equalisation to process sounds. The assistive feature would allow users to analyse an audio input and suggest sound processing settings to help users improve tonal balance and enhance the details of their sound. This type of AI-augmented audio effects is a relatively recent technology. As a result, we do not know how different design choices and interaction methods will affect the users' feelings, trust, and perceived product quality. Assessing these effects was one of the main motivations of this user research.

     Additionally, when our machine learning engineers started working on this project, they raised questions and required explicit requirements from the product team. The product team for whom we were developing the technology could not provide a clear set of requirements because they did not have the capacity and expertise to investigate these aspects of the product and answer the engineers' questions with certainty without making too many assumptions. The answer to these questions would directly impact various aspects of usability, the explainability of the model, and the level of control the users have to correct the AI model's suggestions.

      Not answering those questions in time started to become a bottleneck. The engineers were making critical decisions based on assumptions about the type of problems the machine learning technology should solve and how to solve the problems in the best and most helpful way for the user. The decisions made at this stage would directly impact the user experience (i.e., the level of control over the model, the ability to correct the AI output and the feedback provided by the model). To help better define the requirements and understand user perceptions, preferences, and needs from such a technology, I ran three user studies, two surveys (factorial and traditional), and a usability study using our competitors' tools. On this page of the portfolio, I will present the factorial survey design and results.

ideas_edited.png

I designed and conducted a survey inviting our customers to experience and assess four AI-augmented user interfaces that gave the users varying degrees of control. Through video prototypes, users explored different interaction methods and were provided with contextual information that explained the AI model's suggestions. I measured how these factors affect the users' perception and attitude toward technology. The results were used to help define critical aspects of the user experience and defined functional requirements. The research presented on this page of the portfolio is one of the three complementary studies which I designed, aiming to provide insights and recommendations to the project stakeholders and help drive the development of the product. The other two studies were designed to re-enforce one another, these comprised a large-scale survey and a hands-on usability study using similar competitors' products as a proxy to learn from. 

What?

Why?

Because the product we are developing could be used by different customer segments with varied skill levels and types of activities, it was essential to identify if users had different needs from this technology and be able to trace user preferences to particular user groups based on their type of occupation  (e.g., musician, engineer) and level of expertise (e.g., novice- semi-professional, professional). Answering these questions is important to us because the AI technology we were developing would

Musictribe_Brands.png

serve different market segments and could potentially, in the future, be embedded in other products. Moreover, the team had several assumptions related to the requirements of these customer segments that had not been tested empirically. For example, they believed that novice users would prefer simpler interfaces with fewer controls (e.g., one button only and a slider) and explanations of the AI decisions over the algorithm that experienced users. So, empathizing with users from various demographic groups and gaining a better understanding of how the technology could serve them was crucial. Moreover, Music Tribe is predominantly a hardware company, so there are limitations to the richness of hardware interfaces (e.g. limited controls and displays) available on some of our products. As a result, assessing the impact of different interfaces on the user experience is important to us. 

qesearch_questions_edited.jpg

Research questions

​This survey aimed to answer the following questions.

  • Does contextual information and the ability to correct the AI outputs affect the users’ feelings, level of trust, and perceived usefulness of the system?

  • Which interaction methods do users prefer and why?

  • What are participants’ perceptions and reasons for preferring one product over another?

  • What do they like/dislike about the different interaction methods?

  • Do subjects’ feelings, trust, and perceived product qualities vary as a result of their age, subjects’ skill type, and level?

Hypothesis2.png

H1: Providing contextual information will lead to higher understanding of how the AI system works, trust and acceptance.

H2: Users ability to correct and overwrite the AI suggestions will lead to higher perceived product usefulness, satisfaction compared to lack of such ability.

H3: Contextual information will improve less experienced users’ confidence and adequacy in applying EQ.

Hypotheses

Dependent variable

Variables_slide.png

I assessed quantitatively UX metrics using Likert scales that fall into three major categories: feelings, trust, and perceived product quality.

Feelings

  • Feeling more in control

  • Feeling less inadequate

  • Feeling more productive

  • Feeling more confident

Trust

  • Trust

  • Suspicion

Perceive product quality

  • Perceived usefulness

  • Perceived ease of use

  • Influence their decision about how to process sound

  • Perceived flexibility

I also used open-ended questions asking the participants to explain their preference for one interface over another. 

Prototype design

In collaboration with the product stakeholders, we decided that in this context, we should focus on measuring the impact of three Human-AI interaction guidelines proposed in previous research (see Ameresi et al. 2016). I used AdobeXD to create four video prototypes describing interaction with the AI-Equaliser. I manipulated three UX features in the design of the prototypes:

  1. Level of control the user has over the automation,

  2. Ability to correct and dismiss the AI's recommendations,

  3. Provision of contextually relevant information that the UI provides to help the user understand how the AI derives its suggestions. 

 Experimental conditions and their assigned properties

Independent_variables.png

Below you can see screenshots of the four prototypes I designed with Adobe-XD to measure the UX impact of the three experimental variables (i.e., level of control, ability to correct, and the display of relevant contextual information) on users' feelings, trust, and perceived product quality. 

Strength_match

Strength Match

Tone-Match

Tone Match

Semantic space

Semantic Space

Semantics graph

Semantic Graph

The list below provides a description of the user experience aspect that we manipulated in the four prototypes:

  1. Strength Match provides only a high-level slider that lets the user control the strength of the equalisation curve applied to the sound but does not give access to the low-level parameters of other contextual information or a way to manually adjust the equaliser settings. 

  2. Tone Match provides low-level control of the equaliser parameters so that users can manually adjust recommendations and correct the AI suggestions. Also, it provides a graphical representation of the reference signal that the machine learning model is trying to match, which helps the user understand what the AI is trying to achieve (i.e., the objective function). The graphical representation that provides visual feedback and explains the behaviour of the system is closer to the signal rather than the perceptual level.

  3. Semantic Space provides low-level control of the equaliser parameters so that users can manually adjust recommendations and correct the AI suggestions. However, the user interface does not provide a graphical representation of the reference signal that would help the user understand what the AI model tries to achieve.

  4. Semantic graph provides low-level control of the equaliser parameters so that users can manually adjust recommendations and correct the AI suggestions. Also, it provides a graphical representation of the frequency bands with labels naming the instrument-specific and perceptually relevant regions. The graphical representation that provides visual feedback in this prototype explains the behaviour of the machine learning model in a perceptually relevant way using familiar semantic terms that most experienced users should be able to understand and correlate to the attributes of the sound.

Subjects_Slide.png

60 Participants

Below is a breakdown of the respondents' distribution

based on their demographics and behavioural attributes. 

Gender:

  • 2 Female,

  • 57 Male,

  • 1 Rather not say

Skill Type:

  • 10 Creatives

  • 32 Musicians

  • 18 Engineers

Skill Level:

  • 20 Amateur

  • 24 Advanced

  • 16 Professional

Factors.png

One within-subjects factor

User Interface (subjects experienced all conditions)

 

Two between-subjects factors

Subject Skill

Levels: Creative, Musician, Engineer

Subject level of expertise

Levels: Amateur, Advanced, Professional

 

Mixed model, repeated measures using Analysis of Variance of Aligned Rank Transformed Data

Constant variables

Study procedure

study_procedure.drawio.png

The study procedure is shown from left to right. A screening survey is presented to the participants to ensure they have some experience with the type of music production software described in the video prototypes. Participants who passed the screening test (marked as blue profile icons) will be directed to the factorial survey task. The presentation order of the prototypes is randomized to minimise the order effect.

The list below explains the procedures of the study:

  1. Welcome, Introduction & Consent pages

  2. Participant screening: to take part in the survey, users had to engage in music-making or sound production

  3. Each participant was shown a video prototype and then directed to the survey questions assessing the dependent variables. 

  4. After that, the participants are directed to the second prototype and the same set of questions. The order of prototypes (SmartEQ1, SmartEQ2....) were randomised.

  5. After that, each participant was asked to indicate their preferred prototype and explain through a series of open-ended questions the reasons for preferring one prototype over another.

  6. Post-survey, they were asked to fill out a questionnaire about their demographic information and attitudes towards AI. This information was used to enable us to compare groups of users based on their demographics and attitudes towards AI and see if these might have influence their answers.

Average duration of the survey was 25 minutes. 

Data analysis

To measure the impact on UX metrics, I perform multifactor analysis on the non-parametric Likert responses and measure statistical significance. I used the Aligned Rank Transform method. The Aligned Rank Transform relies on a preprocessing step that “aligns” data before applying averaged ranks. After data transformation, a common factorial ANOVA analysis procedure can be used for hypothesis testing. The plots below show the distribution of the data analysis and results from the hypotheses tests.

The figure below is an interactive slideshow of the plots showing the data gathered from the Likert questions. Please click the graph to see the hypotheses tests results I performed on the aligned ranked transformed data. More interpretations of the finds will be added to the portfolio soon.

As shown in the figures below, users strongly preferred the Semantic Graph followed by the Tone Match prototype. The magnitude of this effect was not correlated to the user skills. The survey's qualitative results show that participants prefer the Semantic Graph interface because it is more transparent and intuitive and provides semantic explanations, bridging the gap between hearing and thinking. They found the interface simple yet intuitive and educational, providing rich visual feedback and contextual information.

      The participants preferred the Tone Match prototype because it would be well suited for referencing other songs, easy to control, and offered informative visual feedback, letting the user spectrally align tracks. What participants liked about the Semantic Space prototype is that it enables morphing between settings. The least preferred UX of the prototypes, i.e., Strength Match, is that it lacked transparency and did not provide a sufficient level of control over the settings for processing sound. However, it might be a good solution for really novice users.

The figure shows the percentage of users who preferred one of the four prototypes tested in this study grouped based on the participants' type of skill, i.e., Creative, Engineer, or Musician.

The figure shows the percentage of users who preferred one of the four prototypes tested in this study grouped based on the participants' level of expertise, i.e., Amatuer, Advance, Professional.

Outcomes

1) Beyond AI model performance, UX design can significantly improve user acceptance and perceived value.

 

2) Reducing the level of control, the contextual information that helps the user interpret the AI outputs, and the ability to rectify errors can negatively impact the users’ feelings, trust, and perceived usefulness of the product.

 

3) We observed that these effects do not depend on the users’ skill or level of expertise.

Outcomes.jpg

Design Recommendations

Design_Solutions2_edited.jpg
  • Removing the users' ability to control the EQ has a negative effect. The survey findings show that, in most cases, it affects users' feelings of control, adequacy, usefulness, and perceived product quality.

  • Provide sufficient contextual information. participants' responses suggest that visual feedback helps the user understand the decisions of the AI model to build trust, decrease suspicion towards the AI, improve usefulness, increase users' level of confidence in the recommendations, and influence their behaviour.

  • Intuitive contextual information and AI model explanations.  The analysis of the data from the open questions suggests that users value more semantic explanations using terms that are closer to human perception and make associations between the physical attributes of the sound and their perceived quality. So, using higher-level semantics rather than signal-level explanations is preferable, but this might be context-dependent.

  • Design for user customisation of AI suggestions, letting the user optimise suggestions is important because it is difficult for the machine learning model to anticipate users' intentions and stylistic preferences as well as take into account the broader musical context (i.e., how the sound

      will fit in the music piece). So, we should not assume that the model's output will be acceptable to the user. Hence, the user                should be given the option to adjust the parameter settings until they achieve the desirable results.

  • The adaptability of the model to the user style is important. The findings from one of the survey questions suggest that the model's adaptability is important to most users. Adaptability, either through re-enforcement learning or another mechanism (style transfer, constraining the model, etc.), could help personalise the models' outputs and account for the stylistic and aesthetic preferences of the user. 

  • Matching relevant social norms. If users are used to the EQ filters, then the AI should apply frequency, gains, and Q values to achieve the desired output. Based on the open-ended questions and the findings of another UX study we conducted where professional users were asked to use and evaluate three of our competitors' AI-assisted smart equaliser, we found that drastically changing how the interaction method and the user interface can negatively affect user acceptability. So in most cases, it is better to augment the existing user interface or the existing workflow with AI features to avoid alienating users. 

Learnings

Learning_edited.jpg

The results provide insights into the implications of different designs on the users' feelings and perceived product quality. The results also highlight the importance of assessing user perceptions and acceptance before investing in engineering. A factorial survey using low-fidelity prototypes early in product development has proven a very effective and low-cost way to elicit product requirements and assess how different design decisions impact the users' perception and acceptance of the technology. Compared to interviews and focus groups, this research protocol allowed us to reach a relatively large sample size, and it took me less than a month to plan, prepare the research material, run the survey, and analyse the data. Two limitations of this research are the lack of in-person interaction with the users to collect more detailed information through follow-up questions and the lack of realistic interactions with the implemented system. To account for these limitations, I've also remotely conducted a usability study asking expert users to test competitors' AI-assisted equalisers. Then, through a combination of Likert-type and open-ended questions, I gathered insights about the performance, usability, and overall user experience of our competitors' products. What is exciting and gratifying about conducting parallel studies using different research protocols with different strengths and weaknesses is when the findings between the studies converge, making the conclusions and product recommendations more robust and trustworthy. Moreover, when the findings between the complementary studies diverge, we gain a more complete picture of the effect of the different factors affecting the research questions that are being investigated, which is very beneficial in making informed decisions, minimising risk and ensuring the success of products and services.

bottom of page