Skip to main content
Social Sci LibreTexts

9.1: Case Study- OpenAI’s Data Labelling

  • Page ID
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    OpenAI, the company responsible for the enormously popular AI Large Language Model ChatGPT, has made great strides in the past 12 months with many forms of Artificial Intelligence. However, some of these achievements have raised significant ethical concerns regarding the exploitation of human labour and the handling of harmful content. This case study explores the findings of an excellent piece of investigative journalism published earlier this year in Time magazine.

    Read the article: OpenAI Used Kenyan Workers on Less Than $2 Per Hour

    GPT-3 was designed to demonstrate exceptional linguistic abilities, stringing together sentences in a strikingly human-like manner. It was trained on hundreds of billions of words scraped from the internet, a vast corpus of human language that I’ve written about in other posts. This method endowed GPT-3 with impressive language-processing skills but also became its largest setback, as it incorporated the internet’s toxicity and bias into its output.

    To tackle these challenges, OpenAI aimed to construct an AI-powered safety mechanism, akin to the systems deployed by social media companies like Facebook to detect and remove hate speech and other forms of toxic language. The premise was straightforward: feed an AI with labelled examples of violence, hate speech, and abuse, and this tool could learn to identify and eliminate these forms of toxicity.

    In November 2021, OpenAI began the process of creating this safety system. They sent tens of thousands of snippets of text to an outsourcing firm in Kenya, Sama. The text was pulled from various internet sources, including extremely harmful content describing graphic situations of abuse, murder, and self-harm. Sama, a San Francisco-based company, employs workers in Kenya, Uganda, and India to label data for Silicon Valley clients like Google, Meta, and Microsoft. While it brands itself as an “ethical AI” company and boasts of lifting over 50,000 people out of poverty, there are concerning elements surrounding its operations.

    Sama’s data labellers, who were contracted to work on behalf of OpenAI, earned a take-home wage of approximately $1.32 to $2 per hour depending on seniority and performance. This rate was for work that involved labouring over harmful, potentially traumatising content. To learn about the full extent of the trauma on these workers, you should read the original article at Time magazine.

    The case of OpenAI’s development of GPT-3 and its associated safety mechanism serves as an instructive example of the ethical challenges that permeate the AI industry. As technology companies continue to pursue advancements in AI, it is critical to scrutinise the labour practices that underlie these developments and to ensure that the quest for “ethical AI” does not overlook the wellbeing and fair treatment of the human workforce powering it

    9.1: Case Study- OpenAI’s Data Labelling is shared under a CC BY-NC 4.0 license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?