Use Cases

Aequitas has incorporated six use cases (covering three different domains) that serve two main purposes. First, at the outset of the project, the use cases provided context and concrete requirements for the Aequitas framework. Secondly, more towards the later stages of the projects, the use case owners will use the data and context of their use case to validate the functionality of the Aequitas framework as a test of its features.

Below we detail the use cases and the experimentation done in the context of the validation. Please note that the bullets below are hyperlinks to a (new) page dedicated to that particular use case.

Domain: Recruitment

Use case HR1: Bias free AI assisted recruiting system

Context

This case study analyzes a specific software used by the Adecco Group in the candidate selection process. The software supports the recruitment process by recommending the best candidates for a given job position and, conversely, suggesting the most suitable positions for candidates.

At the core of the software is a recommendation engine powered by AI, which extracts relevant information from candidates’ resumes and standardizes it against structured data stored in the internal database. By comparing the candidates’ attributes with the requirements of the job position, the software generates a ranked list of candidates based on their compatibility. The results are then presented to the recruiter.

Following this, the shortlisted candidates are personally evaluated by the recruiters and, if deemed suitable, proposed to the relevant companies.

Additionally, the software provides access to a dataset composed of anonymized data, collected over a specific period (e.g., a snapshot of a single day’s operation). This dataset includes the matches proposed by the software, enriched with candidates’ information such as age, gender, geographic location, education, and the specific job positions requirements.

Goal

The aim of this case study is to target the cognitive and structural bias that might be associated with existing assisted hiring systems. The analysis of the tool will make it possible to detect and assess possible biased outcomes resulting from the algorithm/software and the human expert in the selection of candidates.

Known biases and unfairness

In the context of the Adecco Group’s use of automated recruiting software, particularly for Adecco Formazione, several biases and potential unfairness may arise. First, algorithmic bias can occur if the training data embedded in the software system is not representative of the diverse applicant pool, leading to skewed selection processes that favor certain demographics over others based on age, gender, geographic location, and educational background. Additionally, if the software disproportionately recommends candidates from specific regions or with certain educational credentials, it may inadvertently marginalize equally or more qualified candidates from other backgrounds.

Method

AEQUITAS assessed the algorithm already in use for candidate selection. The algorithm assessment can be run both with synthetic data (suitably generated through the synthesizer of Task. 7.2) and through real data from Adecco Group (once possibly corrected by bias). The dimensions to be validated are 1) diagnosis of bias in algorithms 2) their reparation if needed. Further validation is related to the bias that may exist in the interpretation of results Accordingly, the algorithm outputs will be analysed in relation to the human decision to proceed with this validation.

ADECCO Data and Analysis
Data Collection

The two datasets come from two matching algorithms: direct matching versus reverse matching. The direct matching dataset contains the best 10 candidates for each job position. The data are sorted by candidate identifier asc and match score desc.

The reverse matching dataset contains the best 5 offers (job positions published on Adecco website) and the best 5 orders (job position managed internally without publishing on Adecco website) for each candidate. The data are sorted by job identifier asc and match score desc. Note that match rank is calculated for each job position type (orders vs offers).

Experimentation and results

Experiments conducted within the AEQUITAS framework and experimentation environment, leading to the best solution for ADECCO, can be found at the following links.

Use case HR2: Assess and repair job-matching AI-assisted recruiting tool to mitigate gender and other bias

Context

The subject of fairness in the hiring process is crucial to guarantee a diverse workforce that evenly represents all demographics. Assessing the absence of biases in a set of data collected from a real-world example, can not only help immediately a company improve their hiring processes and polices, but also create fair sources for future AI training [6]. Therefore, this use case, by providing a set of data collected during the candidate selection process of a large engineering company in Italy, is aiming to contribute to these objectives, allowing to create a bias free AI assisted recruiting system.

The Akkodis use case focuses on the analysis of the dataset created during the company’s recruitment process. This dataset contains information about all candidates and employees that went through the hiring process, including profiles imported from external databases, such as the ones provided by universities.

The information collected for each candidate includes data related to the candidate’s demographic identity (such as gender and age), geographical location (such as region of residence), if they are part of a protected category according to the state’s laws, work and study background, technical skills, their current status inside the company, and results of the evaluations they received during the hiring process.

This way we are able to represent and analyze the actual state of the profiles taken into consideration by a large engineering company in Italy to evaluate and improve the fairness of the recruitment process in the STEM field.

Goal

The process of finding new candidates in line with the current available positions in the company is complex and involves many steps performed by different professional profiles, spanning from HR representatives, throughout commercial staff (BM), to technical experts.

This process guarantees an efficient selection, evaluating a candidate from different perspectives and taking into consideration their full value as a possible employee.

However, as much as involving different points of view throughout the process can help create a fair environment, it can also lead to an increase in cognitive biases that may manifest in each different step.

By analyzing this dataset, we aim to identify and possibly correct any bias introduced during the hiring process. As it will provide an overview of the data uncovering patterns that may indicate unfairness, such as disparities in candidate evaluation based on gender, age, or other sensitive categories.

The way the dataset is structured (an entry for each step of the candidate’s hiring process) will also provide insights on where these improper patterns are actually occurring in the hiring process, rising awareness on cognitive biases and promoting more objective decision making.

Eventually, our goal is to improve the fairness and inclusivity of the hiring environment, ensuring that all candidates are evaluated based on their skills and potential rather than extraneous factors.

Recruitment Process

The Akkodis recruitment process is composed of several phases:

  1. Initial phase:

    • Client contacts the company with a specific need.

    • The Commercial staff (the Business Managers team) identify a client’s need as an opportunity the company may pursue to increase the technical staff.

  2. Requirements analysis:

    • Understand the type of professional roles needed.

  3. Search phase:

    • Search for a possible candidate in major search engines and professional social networks such as Monster, LinkedIn, AlmaLaurea (to name a few), as well as the database of spontaneous applications collected on the Akkodis website and internal referrals from colleagues.

    • First round of quick telephone interviews conducted by our HR or recruiters. These interviews aim to determine the candidates’ actual availability for an introductory interview.

  4. Introductory interview (HR):

    • Interview to determine/confirm the candidate’s characteristics, delve into their professional background, and understand their economic aspirations.

  5. Technical Interview (Internal Technical Expert):

    • Interview conducted by an internal expert or someone with relevant skills, or an internal resource already allocated at the client’s site. Candidates who successfully pass this initial technical evaluation ideally undergo a qualification meeting at the client’s site.

  6. Qualification Meeting (Client):

    • Second technical evaluation reviewed by the client and usually limited to a small number of candidates. Note that not necessarily both the technical interview and QM take place; often, only the latter is conducted.

  7. Hiring phase:

    • The client ultimately determines the suitability of the consultant to be hired.

    • In turnkey projects the QM is absent, and only the technical interview is conducted.

  8. Onboarding phase:

    • Set of administrative and technical procedures necessary to prepare the new resource’s entry into the company.

Known biases and unfairness

The primary factor that introduces unfairness in this type of recruitment process is the cognitive bias of the people involved. Cognitive biases can lead to subjective judgments that may exclude valuable candidates due to incorrect preconceived notions about specific demographic groups. These biases can manifest at various stages of the process, from the initial search phase to the final hiring decision.

In particular, as it was observed even in recent studies [2],despite the continuous efforts by companies to promote the advantages of a diverse workforce, these types of biases can still present themselves. Taking into account the nature of the data features of the use case’s dataset, the most affected categories are gender, age, country of origin, and protected category. Any of these factors may lead the recruiter to believe that a candidate is more or less fit for a position without considering their actual skills.

By assessing the fairness of the dataset, the company will be able to raise awareness among all stakeholders about the potential for bias and its implications. Additionally, the company will be able to take corrective actions where necessary.

Method

Data Collection

As it was described in the previous chapter, the Akkodis dataset [8] contains data collected during the company’s recruitment process, precisely from the year 2019 to year 2023.

The data are inserted into the Akkodis system by the TA team when looking for potential candidates. More data relating to a specific candidate is added during each phase of the hiring process to fill in information about the interview’s outcome.

The dataset was then created by exporting in an analyzable format the data present on the Akkodis system.

Dataset Structure and Pre-Processing

The dataset consists of 40 columns and 21,377 entries.

The data has been carefully anonymized. In particular, the name (and the surname) of each candidate has been replaced with a hash code (ID), and names of previous companies where the candidate worked have been removed. Furthermore, the field Citizenship was removed as it presented a high risk of re-identification.

No other pre-processing steps were applied to data. Further details of the data and analysis can be found here: Akkodis Dataset Analysis.

Experimentation and results

Experiments conducted within the AEQUITAS framework and experimentation environment, leading to the best solution for AKKODIS, can be found at the following links.

References

Domain: Society and economics

Use case S1: AI assisted identification of child abuse and neglect in hospital with implications for socio-economic disadvantaged and racial bias reduction

Context

In the last decades, child abuse and neglect have seen soaring numbers. The broad adoption of electronic health records in clinical settings offers a new avenue for addressing this. Detecting and assessing risks of child abuse and neglect within hospital settings could prevent and reduce bias against ethnic and socioeconomic disadvantaged communities as well as raise the overall safety of children. There are two main factors that guide a doctor during the diagnosis: 1) the visit 2) the medical history (anamnesis). In the case of pediatric patients, the latter is even more delicate as it is the parent who interprets and reports the child’s disease and represents the. child’s rights. Furthermore, today’s reality of migration flows and increasingly deficient parenting are the norms, make anamnesis ever more complex: language barriers, parenting skills, cultural differences all become accelerators of bias insertion in the entire process. It should also be noted that, in this process, the parent often uses many strategies of mystification, to avoid being seen as a potential abuser. Thus, hospitals’ processes currently rely mainly on experts as cultural mediators for dealing with potential child abuse, who are not continuously available at the hospital, contributing to delays in diagnosis, affecting children and parents and elevating overall costs. An AI decision support system can only give concrete, effective and rapid help, if fair outcomes compliant with EGTAI are produced.

Goal

To develop an innovative AI system following AEQUITAS methodologies to detect and assess risk for child abuse and neglect within hospital settings, prioritizing the prevention and reduction of bias against ethnic and socioeconomic disadvantaged communities. The possibility of racial bias in AI systems reinforces our challenge of addressing racism through an AI system and finding the optimal form of human-machine collaboration.

Known biases and unfairness

In the context of leveraging AI to detect and assess risks of child abuse and neglect within hospital settings, several state-of-the-art AI biases are noteworthy. The integration of AI in healthcare, especially in sensitive areas such as pediatric anamnesis, introduces complex ethical challenges. Key among these is the potential for racial and socioeconomic biases, which can arise from skewed training data that does not adequately represent diverse populations. This can lead to discriminatory outcomes against ethnic minorities and economically disadvantaged groups, exacerbating existing disparities in healthcare.

Moreover, language barriers and cultural differences present in scenarios influenced by migration and varied parenting norms can further complicate the effectiveness of AI systems. These factors may lead AI algorithms to misinterpret or overlook crucial contextual information provided during medical histories, thus risking inaccurate assessments. The reliance on parents’ accounts can also lead to biased data input, as some parents might downplay or misrepresent symptoms and situations due to fear of stigmatization or legal repercussions.

Method

Data from the AOUBO pediatric emergency room management system will be used to design a new AI system to support doctors in identifying “at-risk” cases given a specific medical history. This case study will benefit from the AEQUITAS socio-technical approach of including and assessing the often disadvantaged background of children and their parent(s). Being aware that the accompanying parent is in the vast majority of cases the mother that might be without a driving license, living in less serviced urban areas, or – in some cases – in need of her husband’s permission to move autonomously, is fundamental to have a full picture of the patient. This is a typical case where intersectionality does play a role as gender and race lines of inequality interact and multiply possible negative discriminatory effects. This relevant socio-economic background is crucial in 2 directions: 1. It is relevant when designing and developing the AI system in support of doctors’ decisions because it adds a novel layer of information upon which the decision is made and 2. It is relevant to doctors’ evaluation of the possible protocols. The technical solutions will avoid the risk of repeating, amplifying and contributing to gender stereotypes in the overall evaluation of patient history. The case study develop a predictive system from scratch, following methodologies and techniques (WP5) and comparing its results with those suggested by human experts to assess their fairness.

Experimentation and results

Note

The experiments for this use case have become more challenging as they progressed due to data scarcity in the scenario. Drawing meaningful conclusions from such a limited dataset would be ethically unacceptable and pose significant risks. Therefore, the exercises being currently conducted on this use case are related to: 1) the application of the TAIRA methodology, including the Question-0 methodology; 2) experimentation with GenAI for the automatic translation of the “notebooks containing guidelines for the identification of abuse and mistreatment” owned by the hospital.

Use case S2: Unfair Inequality in Education

Context

Academic performance in primary school is a good predictor of an individual’s future income and well-being [11]. Anticipating low academic performance levels is relevant to implementing corrective policies at early ages, and anticipating high academic performance is also relevant to applying incentive mechanisms to achieve excellence [12].

To achieve this, it is necessary to have good predictive models, in terms of accuracy. However, it is equally important for these models to generate fair predictions [13]. This means they should provide consistent predictions across different groups, ensuring no disparity in awarding excellence prizes to students regardless of their social background or their parents’ educational levels. To achieve this, it is essential to have models that ensure fairness in their predictions.

This use case exploits an extensive dataset covering students from Las Palmas de Gran Canaria and Santa Cruz de Tenerife provinces, Canary Islands, Spain. This dataset extends beyond basic student information, encompassing data on their families, teachers, and academic achievements. While the raw data primarily consisted of questionnaire responses, we propose a refined dataset that has been preprocessed to ease training AI algorithms specifically designed to address challenging tasks, such as improving student performance and reducing drop-off rates. This focus on educational equality and fairness allows researchers to develop and test AI solutions that promote equal opportunities within the educational system.

The Agencia Canaria de Calidad Universitaria y Evaluación Educativa (ACCUEE)3 is a public institution in the Canary Islands, which aims to monitor the quality of education services in the region. To serve this purpose, it collects data concerning students, curricula, and schools.

Concerning students, collected data includes information on their academic performance in Mathematics, Spanish Language (local native language), and English, plus questionnaire responses from students, families, teachers, and school principals, which attempt to capture the socio-economic and cultural background of the students, as well as the situations of their schools. The data is collected from students in the 3rd and 6th grades of primary education, and in the 4th grade of secondary education. The collection process was repeated over four academic years, from 2015–16 to 2018–19.

We believe the dataset may be exploited to ease the life of students—hence improving the overall quality of the education system. The goal would be to identify the key predictors of academic performance, with a focus on factors that may lead to poor performance or drop-off. Identifying these factors is a fundamental step towards developing interventions that can help students in need— hence promoting fair access to education and ultimately enhancing student success. Intervention here may include: personalized academic support such as tutoring and mentoring, academic counselling to help students develop effective study habits, allocation of essential educational resources on a per-need basis, remedial coursework or enrichment programs to strengthen skills.

Goal

To link the outcomes of predictive models and their degree of bias to the socioeconomic concept of inequality of opportunities [14] reflected in the data and real life. Inequality of opportunity in educational achievement is measured as the inequality that is explained by factors that are beyond the control of the individual, such as the socioeconomic status of the parents, the cultural environment at home, the immigrant status, the state of health at birth, the neighborhood of birth, etc. (in the terminology of the AI-fairness literature these are the sensitive variables).

In particular, we analyze how the existence of certain sensitive variables (referred to as circumstances in economics) that explain a relevant percentage of the inequality in educational achievement are the cause of generating unfair predictions, as long as they influence the predictive models directly or indirectly (i.e., through other predictors).

Known biases and unfairness
  1. Halo Effect: From teachers towards students, inferring skills, abilities, or attributes of a person based on a first impression (e.g., socioeconomic background, race, gender, etc.). Specifically, when marking is based on behaviours or characteristics of a student unrelated to the elements to be assessed within a subject.

  2. Confirmation Bias: Both by teachers and students. The tendency to give rigor and truthfulness to data, ideas, or reasoning that align with what we believe to be true. Conversely, ignoring data, ideas, or explanations that cast doubt on our beliefs.

  3. Status Quo Bias: Resistance to educational innovation by teachers. Preferring the environment and situation to remain stable. More value is placed on potential losses from change or innovation than on potential future gains.

  4. Sampling Bias: Some specific students could not attend to school the day of the assessments, or they do not answer the questionnaires. Usually, these students come from more disadvantaged contexts, implying samples with underrepresentation of lower-class households or low performer students.

  5. Selection Bias: Students and teachers are not randomly assigned to schools. Conversely, more motivated parents and teachers select better schools, generating self-selection problems (endogeneity). Then, schools’ value-added can be biased because better performance can be driven by the presence of better students and teachers.

Method

Data Collection

The raw data collected by the ACCUEE comes in tabular form. The table comes with 83,857 rows and 554 columns. Each row refers to a single student at a given grade and academic year. Primary education data for the A.Y. 2015–16 and 2018–19 is gathered through a comprehensive census over the entire population. For other grades and academic years, the data is collected through sampling. Longitudinal data is also included: students in 3rd grade (primary school) during A.Y. 2015–16 are sampled again in their 6th grade.

The columns of the table represent relevant features collected for each student. Overall, the 555 columns represent features from six categories: (i) identifiers (8 columns): various sorts of identifiers involving the student (e.g., at the school or ACCUEE level), and their academic situation (school, grade, academic year); (ii) performance features (6 columns), containing the student’s scores in Mathematics, Spanish Language, and English; (iii) students’ answers (154 columns) to a questionnaire concerning their own experiences with the school system (including but not limited to their access to resources, their relationship with teachers, their satisfaction level, etc.); (iv) principals’ answers (138 columns) to a questionnaire concerning the school the student is enrolled into; (v) families’ answers (91 columns) to a questionnaire concerning the socio-economic conditions of the student; (vi) teachers’ answers (158 columns) to a questionnaire concerning their workload, satisfaction, and methodology.

Questionnaires were standardized; consequently, the features obtained from them are either categorical or numerical. However, the data is not clean as missing values are present, and they are not evenly distributed across the features. Given the high dimensionality of the dataset, preprocessing is necessary to make it more manageable and suitable for the tasks we intend to address.

Dataset Structure and Pre-Processing

The dataset has been released and is publicly available, increasing the fairness benchmarks in the field. It can be found here: https://zenodo.org/records/11171863.

For more information about the data and preprocessing, see https://zenodo.org/records/11171863 and https://ceur-ws.org/Vol-3808/paper17.pdf.

Experimentation and results

Experiments conducted within the AEQUITAS framework and experimentation environment, leading to the best solution for ULL, can be found here:

References

Domain: Healthcare

Use case HC1: AI assisted identification of dermatological disease for diversity and inclusion in dermatology

Context

There are many areas where AI can assist dermatological experts, such as computer-aided detection/diagnosis, disease prediction, image segmentation, etc. [18]. The most successful AI applications to dermatology involve processing images and making automated decisions based on images of skin patches, e.g., distinguishing between images portraying healthy skin from images containing dermatological conditions [28].

Dermatology is among the areas which can benefit from data-driven models, as the first step of identifying skin diseases typically consists of visual inspection (possibly followed by further analyses) and AI approaches are well-suited to classify images—if provided with sufficient training data.

One of the greatest success stories of AI is image classification and image manipulation, in particular through data-driven approaches such as ML and Deep Learning (DL) [4]. Computer vision boosted by DL has been employed in a variety of medical contexts, including dermatology, covering several tasks from disease classification using clinical images to disease classification using dermatopathology images [3] [5] [10]. One of the biggest limitations of the widespread adoption of DL techniques is their data-hungry nature. Generally speaking, DL models base their success on the availability of large, annotated data sets, e.g., thousands of different images containing various examples of healthy skin and dermatological diseases.

AI can also come to the rescue to remove this obstacle, as in recent years, great strides have been made toward synthetic medical image generation through DL approaches [29] [26], in particular using DL models such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs) and Diffusion Models [7]. However, the majority of these data augmentation techniques –with few exceptions, e.g., [2] [9] – do not target skin images, but rather focus on MR images, PET, CT scans, radiography, etc [17]. Instead, synthetic generation of clinical skin images with pathology aims at generating realistic and diverse images depicting various skin conditions and pathological patterns [15] [20]. The goal is to capture the complexity and visual characteristics of different skin conditions, including dermatological diseases, lesions, and abnormalities [1].

To achieve this, researchers employ various strategies, including the incorporation of domain knowledge, data augmentation techniques, and conditioning methods that guide the generation process based on additional information or attributes. Existing approaches in the field have predominantly relied on the utilization of GANs or VAEs. These methods have proven to be effective in generating high-quality samples and learning latent representations. However, they tend to require several thousand training images to learn the features of skin with and without pathological conditions. In this paper, we circumvent this issue and propose to generate realistic skin images with a diffusion model and with a very scarce training set, i.e., a few hundred pictures. We validate our approach using real images taken from a public hospital in Italy (IRCCS Azienda OspedalieroUniversitaria Di Bologna); the code used to implement the approach and run the experiments is publicly available (https://github.com/aequitas-aod/experiment-gen-skin-images). To the best of our knowledge, there are very few approaches in the literature that employ diffusion models in this context and demonstrate their suitability even with a very small training set.

Goal

The goal of this use case is to develop a synthetic image generator that can generate synthetic images of skin with pathologies in a way that captures the complexity and visual characteristics of different skin conditions, including dermatological disease, lesions, and abnormalities [1].

Method

Data, Analysis, Pre-processing
Raw data

The initial raw dataset comprises a collection of 2495 images, belonging to 187 different medical cases, captured by doctors in hospitals using cameras or mobile phone cameras. Each photograph focuses on a specific patient, with the corresponding label indicating the diagnosed skin disease for that individual. At this stage, the data is in an unprocessed and sensitive format, frequently encompassing personal details, including identifiable facial features or recognizable characteristics. Additionally, although the photos primarily capture a specific body part related to the patient’s condition, they frequently include extraneous background elements or foreground clutter that are unrelated to the main focus. Considering these factors, an initial round of data refinement was necessary, primarily focusing on data cleaning, which involved removing irrelevant regions such as background elements from numerous photos, and ensuring data anonymization. The initial step of the data refinement process involved a manual labeling procedure, consisting of the following steps:

– Patch extraction: for each image, multiple patches of varying sizes were extracted. The guiding principle for this procedure was to extract patches that were relevant, specifically targeting the largest skin-only patch that contained a significant portion of diseased skin. Throughout this process, special care was taken to exclude personally identifiable regions of the body, including facial features and other sensitive areas like skin marks.

– Mask labelling: for each of the previously extracted patches, a binary mask was created. The purpose is to indicate which areas of the skin within the patch exhibited the presence of the disease.

Fixed-size patches generation

Although the data has undergone significant cleaning, its variable-sized nature prevents its direct utilization by ML models. As a result, a second round of refinement was implemented, this time employing an automatic approach. The primary objective at this stage is to extract fixedsize patches from the existing variable-sized ones, ensuring that diseased skin regions are adequately represented.

Coloured mask generation

Even after implementing the aforementioned refinement steps, a significant portion of the extracted patches still exhibited undesirable properties, such as poor lighting or blurriness. To address this, an automatic filtering process was applied to eliminate patches with low contrast, common features of both poorly exposed and blurry images. For each of these patches, a binary mask was generated based on the provided labelling. However, inspired by the approach presented in [9], we extract a coloured mask as well. This coloured mask was obtained by filling the 0 and 1 regions of the binary mask with the dominant colour found in the corresponding part of the original image. Following these procedures, a total of 8,204 patches were successfully extracted from the initial set of 284 variable-sized images. However, 1,118 patches were discarded due to inadequate contrast. Despite these refinement steps, the dataset cannot be considered entirely clean. Upon closer examination, it was observed that a significant number of images still suffered from poor exposure, and the quality of the masks was often suboptimal. These factors must be taken into account when applying any type of model to this dataset.

For more information, please refer to https://link.springer.com/chapter/10.1007/978-3-031-63592-2_5.

Experimentation and results

Experiments conducted within the AEQUITAS framework and experimentation environment, leading to the best solution for HC1, can be found at the following links.

References

Use case HC2: Bias-aware prediction of ECG healthcare outcomes

Context

Electrocardiograms (ECG) are the gold-standard in medicine for an overall and finegrained assessment of the condition of the heart. The deflections in the so-called PQRST complex that represents a single heartbeat are connected to the sequence of contractions of the heart muscle. Deviations in those deflections are directly tied to cardiovascular diseases like (ventricular) arrhythmia, myocarditis, myocardial fibrosis, or even inherited or acquired defects. Classification of ECG traces as symptomatic or normal is typically done by experts. AI Algorithms trained on existing data can be used to automate this process. However, these training data may suffer from various forms of bias which in turn are entrained into the resulting AI algorithm. To prevent that the algorithm disadvantages particular groups by generating false positives or false negatives, we propose to investigate if and how such AI algorithms are affected by bias. To this end, we will determine whether bias exist(ed) in the original data sets and mitigate that bias by generating synthetic non-biased data. Additionally, instead of trying to remove bias from the data, we will try to deliberately introduce bias to determine if the algorithm can handle this bias and generalizes well. Developing methods to generate synthetic ECG data will be part of this process.

Goal

To build a bias-aware algorithm for the classification of ECG traces as normal or symptomatic.

Method

We will first investigate methods that are suited to generate synthetic ECG data with or without deliberately introduced bias. Once the ECG synthetic data, with one or more deliberately introduced (additional) biases is available, the data will be used to determine whether a Philips proprietary solution is sensitive to bias. This solution evaluates ECG traces to classify, beat by beat, whether the beats are normal or affected by disease. Due to bias in the original training data these evaluations can be biased as well. To this end, PRE will use the Aequitas experimentation platform on premises to validate this use case.