The Bias of Google Translate

Issue 6, Volume 113

By Stefanie Chen 

All you want to do is translate a message. Maybe it’s your fault you’ve never learned how to read Turkish, but you’re pretty sure that it’s not your mom congratulating a random guy on becoming a doctor. Many languages don’t have a clear gender distinction between pronouns like English does. For example, Turkish uses the “o” as a pronoun for both genders. So when you input your mom’s message, why does Google Translate spit out, “He has finally become a doctor!?”

It is common knowledge that Google Translate always returns a single translation, so for languages in which one pronoun is used for both genders, instead of returning “he/she,” Google Translate will choose one of the pronouns. The selection that Google Translate makes is heavily dependent on the context of the sentence. This translation process, however, is rooted in gender bias.

Google Translate uses a Neural Machine Translation (NMT) system that utilizes an artificial neural network capable of deep learning, accumulating knowledge through the imitation of human behavior from millions of example situations. While gender bias is not specifically programmed into the translation system, the existing translations that the neural network has received and relies on are not always representative of both genders for specific contexts. Many times, political, social, and cultural biases are embedded within the examples provided to Google’s NMT, resulting in translations that reflect these viewpoints.

The most famous example of this bias is with the translation of gender-neutral pronouns from Turkish to English when translating the phrase “he/she is a doctor.” Google Translate interprets it as “he is a doctor,” while the phrase “he/she is a nurse,” is translated as “she is a nurse.” According to a study by Cornell University in 2016, the occupations of female historians and male nurses do not exist to Google Translate. For the same reason, descriptive phrases such as “he/she is beautiful” and “he/she is clever” are also translated based on stereotypes, leading to “she” who is beautiful and “he” who is clever. An experiment conducted by Nikhil Sonnad in 2017 inserted a Turkish poem of jobs and adjectives most commonly associated with a certain gender in Google Translate. Unsurprisingly, the gender-neutral Turkish “o” turned to the English “he” when associated with words such as “entrepreneur,” “hardworking,” and “president.” On the other hand, Google Translate ascribed “she” to words such as “nanny,” “lazy,” and “teacher.” Words with primarily negative connotations were assigned to women, while those that are used positively were for men.

This idea applies to our society, where men are typically viewed as more hardworking and ambitious, going out to work nine-to-five jobs to support their families. In contrast, women tend to be viewed in a negative light as lazy and incompetent. Google Translate seems to reflect these stereotypes as its deep learning technology emulates human biases and reaffirms the realities of our prejudice-laden society.

Despite the lack of gendered nouns in English, many other languages have gendered nouns that, when translated from English, may be translated into either gender depending on the gender-biased algorithm of Google Translate. One study done by Marcus Tomalin and other researchers in 2021 randomly selected 17.2 million pairs (half English and half German) of sentences from an existing selection of English-German text. The study found vast gender-specific imbalances with how the word “engineer” was translated by humans. It was translated into the German masculine version, “der Ingenieur,” 75 times more often than its female counterpart. A similar pattern is apparent with the word “doctor,” which was translated into the masculine form 853 times and only translated into the feminine form 35 times. Overall, occupations that were considered masculine or feminine by human translators in the training dataset were then translated into those masculine or feminine forms by Google Translate without additional context.

In recent years, however, the biased neural algorithm of Google Translate seems to be making improvements through Google’s NMT. In line with its AI Principles, Google has developed an approach that involves detecting gender-neutral submissions, generating gender-specific translations, and checking for accuracy. Even though this implementation has led to a 90 percent decrease in biased English translations of Hungarian, Finnish, and Persian, it still has not completely alleviated the issue due to low detection rates, with up to 40 percent of eligible gendered cases being glossed over. So while Google does have methods to counter biased translation, they aren’t reliable enough to trust that it will produce unbiased translations each time.

Google has also only implemented its approach with a handful of languages, including Turkish, Spanish, Finnish, Hungarian, and Persian. While the rate of biased translations has decreased with their new method, this limited set still leaves great traces of gender bias in the many other languages on Google Translate. Therefore, it is necessary for Google Translate to implement this new approach in other languages that have issues translating gender-neutral pronouns. Another resolution that has been commonly suggested is the usage of “they” as the default translation when translating from gender-neutral languages. While this idea could work, one of its main downsides is that it reduces the clarity of a translation. While “he/she” is clearly singular, to individuals who are not familiar with the language, it may be unclear whether “they” is singular or plural. However, as this drawback is not an incredibly serious issue, users who use the service for casual reasons may find value in translations using “they” rather than gender-specific pronouns.

While it may seem as simple as a wrong translation, these biased pronoun translations reflect the roles and connotations our society associates with gender. If this bias isn’t resolved, or at least reduced, it will continue to reaffirm the normalization of gender inequality. With the effectiveness of Google Translate’s new approach in a handful of languages, the best next step is to apply this approach to a wider scope of languages.