How Your Data Affects AI

Having previously established the benefits your data grants AI, this page will show the negative influences that AI can be affected by; including, but not limited to, the user data used to train it.

Bias in AI
User data can mislead AI models through biases in data used for training, improper collection practices, and bias in those responsible for validating the model and data. These factors can also occur as a result of intentional, antagonistic attacks. Check out the NIST video to the right to learn more.
Data Collection
Web scraping and web crawling automate the collection of large quantities of data needed to train an AI model, but this limits control and consideration over what data is used. Data collected using this method may not accurately represent the population of interest, creating bias.
Security
The process of training AI creates vulnerability to attacks at several stages, through which bias can be introduced.

Through data poisoning, the data used to train the model is altered by an attacker to disrupt how the model learns. In "Protecting AI: We Built the Brains, but What About Helmets,"  Mark Campbell uses the example of an attacker labeling images of pigeons as passenger airliners in the data used to train a model to recognize airliners. 

Through data biasing, the selection criteria of the model are altered so that it learns with bias. Campbell uses the example of skewing the training to omit all Airbus aircraft. There would be no evident issues with the model for most cases, but it would fail to properly identify Airbus planes.
Examples of Bias in AI
Use of biased arrest data to train AI tools to allocate police patrols, such as PredPol, to districts causes a feedback loop. When data with more arrests in minority neighborhoods is used to train such a tool, algorithms direct more policing to those areas, resulting in even more arrests. Similar results arise from use of other biased data for training.

At the University of Texas, a predictive algorithm based on witness reports was built using the same model as PredPol. The AI tool in question underpredicted crime locations by 80% in a district with few reports, and overpredicted by 20% in a district with many reports.
Source: MIT Technology Review
Starting in 2014, a series of Amazon machine learning tools were designed to select job applicants by observing patterns in resumes submitted to the company over a 10-year period. Most of these resumes came from men, leading the AI to conclude that male candidates were preferable, and penalize resumes that included the word "women's".

The project was later restarted with 500 computer models focused on specific job functions and locations. These models learned to recognize around 50,000 terms from past candidates' resumes in data from web crawling and from each other. The models barely regarded commonly seen skills such as coding, but instead favored candidates who used verbs more commonly found on male engineers’ resumes, such as "executed" and "captured". Similar problems led to unqualified people being recommended for a variety of jobs, and the project was shut down.

Source: The Wall Street Journal

In February 2024, Google’s recent AI image generating tool Gemini, was tuned to depict a greater range of ethnicities and characteristics. Because this tuning had no limitations on which ranges of people applied to which cultural or historical contexts, it resulted in the depiction of stereotypically diverse peoples regardless of the context provided by users.

The chatbot portion of Gemini was also tuned to be more cautious with responses, causing it to not respond directly to obvious questions when morality was involved. In this case, bias in those tuning or validating the model towards safe responses which Google would not face criticism for may have led to inaccuracies.

Source: Vox

Citations
1. Team, IBM Data and AI. “Shedding Light on AI Bias with Real World Examples.” IBM Blog, 16 Oct. 2023, www.ibm.com/blog/shedding-light-on-ai-bias-with-real-world-examples/.
2. Heaven, Will Douglas. “Predictive Policing Is Still Racist-Whatever Data It Uses.” MIT Technology Review, MIT Technology Review, 21 June 2023, www.technologyreview.com/2021/02/05/1017560/predictive-policing-racist-algorithmic-bias-data-crime-predpol/#:~:text=It%27s%20no%20secret%20that%20predictive,lessen%20bias%20has%20little%20effect.
3. Dastin, Jeffery. “Insight - Amazon Scraps Secret AI Recruiting Tool That Showed Bias against Women | Reuters.” Reuters, www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G/. Accessed 10 May 2024.
4. Hern, Alex, and Dan Milmo. “Google Chief Admits ‘biased’ Ai Tool’s Photo Diversity Offended Users.” The Guardian, Guardian News and Media, 28 Feb. 2024, www.theguardian.com/technology/2024/feb/28/google-chief-ai-tools-photo-diversity-offended-users.