You might remember a spate of news stories last year about Google Translate spitting out ominous chunks of religious prophecy when presented with nonsense words and phrases to translate. Clickbait sites suggested it might be a conspiracy, but no, it was just Google’s machine learning systems getting confused and falling back on the data they were trained on: religious texts.
But as the head of Google Translate, Macduff Hughes, told The Verge recently, machine learning is what makes Google’s ever-useful translation tools really sing. Free, easy, and instantaneous translation is one of those perks of 21st century living that many of us take for granted, but it wouldn’t be possible without AI.
Something curious that happened last year with Google Translate was people discovering that if you inputted nonsense words it would spit out snippets of religious text. […] Why did it happen?
Usually it’s because the language you’re translating to had a lot of religious text in the training data. For every language pair we have, we train using whatever we can find on the world wide web. So the typical behavior of these models is that if it gets gibberish in, it picks out something that’s common in the training data on the target side, and for many of these low-resource languages — where there’s not a lot of text translated on the web for us to draw on — what is produced often happens to be religious.
Some languages, the first translated material we found were translations of the Bible. We take whatever we can get and that’s usually fine, but in a case where gibberish goes in, often this is the result. If the underlying translation data had been legal documents, the model would have produced legalese; if it was aircraft flight instruction manuals, it would have produced aircraft flight instructions.