Channel AI - Research
My Role
-
Design: I conducted a literature review on Human-AI interaction and defined a design framework to improve information architecture, usability, user acceptance, and trust.
-
User research: I defined the research questions and method and conducted interviews to determine how well the Channel-AI supported existing workflow and assess user acceptance.
-
Testing: I took part in design reviews, provided feedback on usability, and performed subjective listening testing activities.
Results
-
Researched, appropriated, and applied design principles to guide the design of the Channel Ai.
-
Evaluated design with users using qualitative methods
-
Defined direction for future development of the software tool

The figure shows the mixing desk in which the Channel-AI, the system our team has developed is embedded.
The project
Early in 2019, I joined the AI team as an AI researcher specializing in product design and user experience research. I worked on the design of the Channel AI, an assistive technology of the Midas Heritage-D. The Heritage-D is a professional mixing console commonly used in high-profile venues to support live performances worldwide. My team developed an embedded AI system that ran on the console and offered a range of capabilities to support mixing engineers. The capabilities included instrument recognition and generating parameter settings suggestions for gain levels, gating, compression, and equalization, specific to the input signal and the instrument type.

The figure shows the Heritage D96 graphical user interface. The toolbar on the right section of the user interface shows the Channel-AI.
Personas
trust calibration, i.e., the degree to which a person trusts another agent, is an extremely important variable that should be taken into account when designing human-AI interaction.
We identified four main application domains where users would likely utilise our technologies: live performance, broadcasting, theatre, and church. What differentiates these four categories is that the live performance, broadcasting, and theatre mixing engineers are most likely domain experts. At the same time, the church sound operators are less likely to be domain experts. The list below provides information on the expected characteristics of a typical profile and the motivations of our users. These are likely valid for the first three types of personas (i.e., live performance, broadcasting, and theatre). These characteristics are less likely to apply to the church sound operator, who may not fit the profile features 2 -7 in the list below.
The target audience profile and motivations:
-
Wants to achieve the best possible sounding results.
-
They are very quality-conscious.
-
Have a lot of experience.
-
They are domain experts.
-
Have well-defined/established workflows.
-
They are hesitant to change their workflows.
-
They are well-versed in the tools they use.

The Channel-AI is used mainly by highly skilled mixing engineers in very high-profile venues where the stakes are high for the engineers, the organizers, and the artists. Hence, users are quality conscious and conservative in incorporating new tools into their workflow. Therefore, we consider it essential to find a good balance in human-machine cooperation in terms of user control, type, and level of automation. So the system should be carefully designed to co-exist harmoniously with the existing user workflows without getting in the way.
The interaction method we adopted in the design of the Channel AI should mitigate the following risks, which we believe could impede the adoption and utilization of the system:
1) Automation makes undesired, suboptimal, and non-rectifiable decisions.
2) Removing engineers’ authority and control to do their jobs in the best way they see fit.
3) Forcing users to change existing workflows radically.

Literature reviewing: Human-AI interaction
Distrust of AI recommendations could result from the lack of understanding of the underlying reasoning that underpins the models’ outputs.
Trust calibration is a variable we considered when designing the Channel AI. My research showed that trust is critical to enabling two agents to co-operate effectively. We must carefully plan human-machine cooperation to allow for an appropriate level of trust. A mismatch between the system's capabilities and the users’ level of trust can lead to under-trusting or over-trusting the system. For instance, if the trust exceeds the capabilities of the automated system, it can lead to misuse, i.e., delegating tasks that the system cannot perform. When confidence is lower than the system's capabilities, it can lead to disuse, i.e., underutilization of its features. According to the literature, other essential factors for developing trust between the user and the system include feedback provision, an indication of how confident the system is about the validity of its outputs, users’ understanding of how an AI reaches a particular conclusion, and why it has reached it. Providing appropriate feedback is essential to facilitate user understanding, justify, and enable user control.
After reviewing a large body of literature on human-AI interaction and automation, I collated a set of design principles and liaised with the product manager to apply the principles and guide the Channel-AI design.
The Design principles
To aid the interaction design process of the AI-System, we identified a set of design principles (derived from, see Ameresi et al. 2016) and applied them to further optimise the human-AI interaction and evaluate our current design. A total of 18 design principles are proposed in the original paper, these are grouped into 4 categories: Initial, During Interaction, When Wrong, Over Time. We only utilized the first 11 principles since the remaining seven apply only to AI systems implementing interactive machine learning techniques such as reinforcement learning. We utilised these principles to ensure appropriate trust calibration (principles 1, 2, 5), feedback provision (principles 3, 4), maximisation of user control and minimisation of disruption to current workflows (principles 6-11). Using these principles at both the conceptualisation and evaluation phases has proven very rewarding since it led to a major redesign of the interface and the workflow, a more detailed explanation of the principles used and how they map to the user interface.
The video explains which guidelines have UX been applied to the design of the Channel-AI.
System Analysis Framework
Most complex music information retrieval and intelligent sound processing systems consist of many layers. Each layer can exhibit different levels of automation from No Automation, Assistance, Partial Automation, Conditional Automation, High Automation, and Total Automation SAE (2021). To understand where the Channel-AI stands regarding human-machine cooperation and help our team identify risks and plan future system development, we performed system analysis using the stage model suggested by Parasuraman et al. (2000). The model consists of four functions that can be performed by a human or an intelligent automation system: Information Acquisition, Information Analysis, Decision and Action Selection, and Action Implementation. We combined models suggested by Parasuraman et al. and the levels of automation inspired by the international standard in autonomous driving, to determine what level and type of automation each component of the Channel-AI will perform, as shown in the Figure below.

System analysis showing the different levels of automation of the Channel-AI features across the four functions.
Video Demo
The video below provides a detailed explanation of how the design framework has been used to design the user interface of the Channel-AI. This excerpt has been extracted from the video that accompanied the conference publication I co-authored with my former manager Alessandro Palladini with the title Towards a Human-Centric Design Framework for AI-Assisted Music Production which was presented at the International Conferences on New Interfaces for Musical Expression in July 2020.
Devising perceptually relevant semantics and timbral terms for naming presets and frequency bands
The terms used to name contextual information in our plugins had to make sense to the users. who interact with our algorithms There are several important questions that I had to address so that we can create instrument-specific presets that are adapted to the input signal:
1) Dividing and naming each frequency spectrum band in a way that made sense to users psychoacoustically.
2) Naming the presets using timbral terms that will describe the preset's effect on the sound.

FAST-EQ screenshot, shows an example of the semantic and timbral terms.
Dividing the frequency spectrum into bands
Through practice, audio engineers have observed that specific bands of the frequency spectrum play a distinct role in how we perceive sound. To formalise and communicate these effects, engineers proposed a few empirically defined divisions of the frequency spectrum. One possible division is based on octave increments commonly used on graphic equalisers, shown in the figure below. The ranges of these areas and their effect on the sound will vary depending on the instrument type because each instrument's frequency range will ultimately depend on the physical and acoustic properties of the instrument.

Figure shows the division of the frequency spectrum based on octave increments
Based on this understanding and by following EQ cheat sheets which are commonly used by sound practitioners, several frequency ranges have been defined for each of the nine instruments for which presets are generated, as shown in Table 1. The last row of the table shows the timbral terms of the frequencies or ranges that engineers tend to either boost or cut to attenuate or accentuate that specific perceptual attribute of the sound.
Table 1 shows the bands, ranges, AR order used, and timbral terms per instrument.

The timbral terms used in the context of the presets
To decide on the preset names and their effect on the sound, I drew on research findings from psychoacoustic and music psychology. Several studies have identified different timbral terms used to describe audio qualities that are usually not related to pitch and loudness. Studies that investigated timbral descriptors have shown that people often use similar linguistic terms to express these perceptual attributes of sound. Pearce et al. provide an excellent summary of the research findings related to the correspondence between perceptual attributes and timbral terms. I drew on this relevant academic literature to define the names and the effect of each preset presented below. I used timbral terms provided in (ibid: p11) to characterise the indented effect of the presets on the input signal after equalization.
Table shows the intended effect of a given preset in the balance between the different areas of the frequency spectrum.

Borrowing the timbral terms shown in the third column from the table above, I define the names of the presets listed in the table below.
Table shows the final names of the presets.

Evaluation
Before defining our research questions and test methods and tasks, we need to identify the most important features of our products and make assumptions about the value we expect these features to offer to the end-user. The table below shows the central values and goals of the Channel-AI. These will form the basis for formulating our research questions.

Study Design
Each mixing engineer was given a demonstration of all of the features of the Channel-AI. During the demonstration, we ensured that the participants understood the functionality of the features and answered any questions they may have about the system's functionality. Before the interview, we had instructed the engineers to bring a multitrack recording they had recently mixed and felt comfortable with. After we gave a quick demonstration of the system, we asked the engineer to mix the music tracks utilising the Channel-AI. We told engineers that they could diverge if they wanted from the system suggestions until they achieved a good mix. After the mixing session, we conducted a semi-structured interview to elicit information regarding the engineers’ experience of mixing using the Channel-AI and receive feedback on the workflow and the parameter settings suggestions offered by the adaptive presets.

Usability study testing procedure
Results
Our findings provided many insights related to the performance of the system and the effectiveness of the preset suggestions, which were very useful for improving the generated preset values. Moreover, it highlighted the issues that could potentially arise in designing automated systems that support professional audio practitioners. We observed that although domain experts valued the system's assistance and did not identify any significant usability issues other than having to inspect the resulting preset suggestions, they were skeptical when asked about their willingness to adopt intelligent automated systems in their workflows. The results from the interviews suggest that usability and set-up time are significant when designing tools for live mixing engineers. These findings led us to conclude that striking the right balance between automation and user control is paramount for adopting automated music production systems by domain experts.
Objective metrics & Subjective evaluation of algorithms
When I joined the team, they had an ad-hoc way of assessing the quality of our algorithms. Most of the testing would happen by the person developing the algorithms through informal, not very rigorous listening tests usually performed by the team developing the algorithms. They would only rarely have external users try the algorithms and comment on their experience. However, these user sessions were often done in a very unstructured and non-methodical manner. As a result, the usefulness of the information we received from users was good for confirmation of acceptability but limited in providing the insights we needed for improving the algorithms. Moreover, the team would not document the user responses or quantify this information in a rigorous manner to ensure the highest possible quality and save the knowledge in a repository for future reference. Finally, we lacked Ground truth data that would allow objective evaluation of our machine learning models that generated parameter settings for processing the input sounds.
Research plan - Evaluation of ML models' performance
I devised a detailed research plan to allow evaluation of our algorithms, here is a copy of the plan I proposed: Evaluation of smart instrument prosessing [pdf]. After brief literature research, The most suitable method for subjective evaluation was to perform unmoderated remote listening tests using two different methods AB test and MUSHARA type test. The image below shows a diagram with sensory evaluation methods from which I choose the most suitable approach. The test would allow us to compare our algorithms' performance against our competitors.

An overview of primary sensory evaluation methods, focusing upon listening only test (LOT) methods. The diagram also includes both audio specific and generic sensory evaluation methods.
Zacharov, N. (Ed.). (2018). Sensory evaluation of sound. CRC Press.
WEBMUSRA
WebMUSHRA appeared to be the most suitable tool for contacting this type of experiment. I liaised with the platforms team to host the tools on an Azure server through which we could administer the test remotely. The screenshot below shows the interface of the tool after configuring for a MUSHRA type of test.

WebMUSHRA interface used for the listening tests.
Ground truth data collection
To collect ground truth data from expert users, I proposed a controlled study with expert and semi-pro users. The aim was to ask participants to process the sound and use the processed sound as ground truth, based on which we could devise distance-based metrics for assessing the accuracy of the parameter generation models. We recruited a total of five professional users, and we produced a total of fifty-eight times five audio samples that the engineers could use to evaluate the performance of two models one for equalising voice and one for equalising guitar sounds.
Learnings
AI-infused technologies that are designed for expert users and deployed in critical application domains must be carefully designed to strike a good balance between automation and user control and instil a sense of trust in the user. User psychology and attitudes should not be ignored! Finding the right balance between automation and user control, the right way to display AI recommendations, and carefully augmenting existing user workflow, is critical to achieving user acceptance and maximising the adoption potential for a given technology. In this context, it took a lot of design and engineering ingenuity, UX craftsmanship, and research to create a product that users will love. Reflecting on my experience working on this project, I realise how essential it is to prioritise deep user engagement and robust research, including customer psychology, acceptance, and desirability. It is important to put the teams' vision to the test and be eager to hear the truth, even if it runs contrary to our initial gut feelings and ideas. Engagement with users helps ensure that the feature, product, or service will offer real value to the end user and that it is presented and marketed correctly. Indeed, finding the right balance and optimal ways to do this is hard and context-dependent. However, in my experience, it can be done cheaply, it does not have to be time-consuming, and the entire product team benefits from user research in a multitude of ways, including solidifying. Hence, I believe appropriate engagement with users, user & market data, and stakeholders are all key to the success of any product or business in the marketplace.