The Algorithm Has an Accent

On who trained the system everyone is now selling

May 15, 2026

I’ve been spending a lot of time moving through Southeast Asia recently, which means I now belong to more group chats than is probably advisable.

Community chats. Expat chats. Women’s groups. Solo traveller groups. The kind of places you join when you’ve landed somewhere new and need to know where to buy oat milk, which dentist won’t traumatise you, and whether anyone wants dinner on a Tuesday.

At first, the AI conversations in those groups were fairly familiar. People asking what tools others were using. Swapping prompts. Sharing little productivity hacks. The usual early-adopter hum.

But lately, something has shifted.

The conversation has moved from using AI to selling it.

People who discovered ChatGPT six months ago are now packaging themselves as AI consultants. Selling courses. Offering automation systems. Teaching small businesses how to ‘get ahead’. And increasingly, the pitch is moving outward. Toward local business owners. Retailers. Tour operators. Cafe owners. People being told, with great confidence, that AI is the future and they cannot afford to be left behind.

I don’t think most of the people doing this are malicious.

That’s not really what unsettles me.

The sellers are the symptom. The deeper question is what exactly is being sold when a system trained overwhelmingly in English, shaped by Western ideas of clarity and professionalism, and evaluated mostly by English-speaking users, is handed to people whose language, humour, cadence and reasoning were barely represented inside it.

Because AI doesn’t arrive without a voice.

It arrives sounding like somewhere.

And then it teaches everyone else to sound a little more like that, too.

The system has a voice

The AI tools being marketed as global, universal, accessible to everyone, are not, in their bones, neutral technologies. They have a voice. And the voice has a passport.

According to OpenAI’s own disclosures, around 92.65% of GPT-3’s training tokens were in English. Llama 2 was around 89.7% English in pre-training. Even Meta’s more recent Llama 3 has non-English data making up just over 5% of its dataset. The picture has improved slightly with newer models, but the foundation of every major commercial LLM is overwhelmingly anglophone. A 2024 paper looking at the Llama-2 family of models, by Wendler and colleagues at EPFL, found something quite specific. These systems don’t just speak English well. In some models, they appear to route meaning through English first. Even when the prompt is in another language, the internal reasoning steps run through English on the way to producing the output.

That’s the technical detail. What it actually means, in practice, is this. When a small business owner in a multilingual market asks an AI to draft a customer email in her own language, the system isn’t really thinking in that language. It’s thinking in English, and producing her language on the way out. The grammar will be correct. The vocabulary will be appropriate. But the cadence will be subtly off. The structures of politeness, the layers of formality, the kinds of metaphor that land in local marketing, the way disagreement or apology gets signalled in business writing - those get smoothed into something the system was trained to recognise as good. Which is to say, something that reads as English wearing other clothes.

For most users, this won’t be visible. They’ll get an output, it’ll look fine, they’ll use it. But research is starting to show what happens at scale, and the numbers are stark. A 2026 benchmark study found a consistent 13.8 to 16.7 percentage point performance gap between English and low-resource languages like Kazakh and Mongolian, with models maintaining surface-level fluency while producing significantly less accurate content. A Gates Foundation study on African languages found gaps of 12 to 20 percentage points between English and the average across 11 African languages. The IrokoBench evaluation, which tested 16 African languages across the major commercial LLMs, found an average performance gap of around 45%. Another 2025 paper found that even when models technically produce output in another language, they default to Western entities, references and reasoning patterns regardless of the prompt language.

The system isn’t malicious. It’s just that the world it was built in is a particular world, and that world is now being exported as the default.

Who decides what counts as good?

When an AI tool produces a piece of writing that feels clear, professional, persuasive, well-structured, those qualities aren’t neutral. They are inherited from the corpus the system was trained on. Clarity in English business writing is not the same as clarity in Vietnamese business writing. Professionalism in American marketing has a tone that wouldn’t read as professional in Indonesian. Persuasion in academic English uses structures that are different from persuasion in Mandarin or Bahasa.

But the system has a default. And the default has weight.

When a small business owner asks an AI to make her copy ‘more professional,’ what the system does is shift her copy toward the version of professional it was trained to recognise. Which is, more often than not, closer to an American professional. She’ll learn quickly which prompts produce outputs her customers respond to, and she’ll keep prompting in that direction. Her writing will start to sound a particular way. Other small business owners using the same tools will end up sounding similar to her. Their customers will start to expect that sound. And over time, slowly and without anyone deciding it should happen, the cultural cadence of how small businesses speak to their customers shifts.

That isn’t preservation of voice. That’s homogenisation.

It isn’t catastrophic. It isn’t urgent in the way that AI safety conversations tend to be urgent. But it is happening, in real time, in millions of small interactions, in dozens of languages, and the people most likely to be reshaped by it are the people whose languages and cultures were already underrepresented in the data that shaped the system to begin with.

Who counts as the user?

There’s a question I keep coming back to that I think doesn’t get asked often enough. When an AI system gets evaluated, refined, made better, who is doing the evaluating?

The teams running reinforcement learning, the people writing the safety evaluations, the researchers calibrating tone and helpfulness, the testers deciding whether a response feels right - those people are overwhelmingly English-speaking, mostly Western, often based in San Francisco. A 2025 survey of nearly 300 LLM safety publications found that English safety research dominated the entire field across publication volume, topical coverage, methodological reporting, and conference visibility. More than half of English-only safety papers didn’t even mention that English was the language being studied. The assumption of universality is so embedded that it has stopped being visible to the people inside it.

The consequence is structural. The system improves at sounding good to people who already speak its native language. It improves slower, or not at all, at understanding everyone else. So when these tools get sold globally as universal, what’s being sold is a system that genuinely is better at understanding some users than others. And the users it’s better at understanding are the ones it least needed to be built for.

One Gates Foundation-backed study put this pointedly. LLMs are least reliable for the language speakers who have the most to gain. Of the world’s poor, 66% live in Africa, 66% of those don’t have access to the internet, and 80% don’t speak English. The people the technology is least built for are the same people most often told they need to adopt it to participate.

The promise of AI as democratisation has a real kernel inside it. I have written about this and I still believe it. People do access work they couldn’t before. People do find audiences they couldn’t have reached. People do build something.

But the door that opens leads into a room that was already furnished, in a style that was already chosen, by people who weren’t asked to consider whether anyone else might want different furniture.

Share Jade The Hooman

What the group chat sellers don’t hear

This brings me back to the group chats, and to the people in them trying to package AI tools for local markets they may not fully understand yet.

I don’t think that makes them villains. In many ways, they’re responding to the same pressure everyone else is responding to: move fast, adapt quickly, monetise what you know before the ground shifts again.

But that pressure can make it easier not to ask what kind of system you are helping distribute.

What’s harder to see is that a system built in a particular language, by a particular population, calibrated by a particular set of evaluators, is being rolled out, globally, as a default. The people selling courses in co-working spaces across Southeast Asia are one small expression of that distribution. So are the enterprise contracts, consumer apps, agencies, education programmes and local entrepreneurs being trained by them. The cultural cost of it hasn’t yet started to register.

There’s also a related caution on who it is we trust to teach. Plenty of people working in and around AI right now are doing it credibly. People who have studied these systems, who understand what they’re trained on, where they fail, who they fail for. People who help companies implement them thoughtfully. I follow a lot of them and learn from them constantly.

That isn’t the version of expertise I’m cautious about.

What I’m cautious about is the version that has become available to anyone with a little familiarity, a persuasive enough pitch, and a market hungry for certainty.

We don’t accept that standard in other fields where the consequences land on someone else. AI seems to be operating, for the moment, in a kind of grace period where that standard hasn’t yet been applied.

And the cost of that grace period lands, predictably, on the people being taught to trust a tool whose voice was never quite theirs.

Permission

If you’ve used AI in a language that isn’t English and felt the output is technically correct but somehow off, that isn’t your judgement failing. That’s the system.

If your writing is starting to sound a little more even, a little more rounded, a little less you, that isn’t laziness. That’s the cadence of the training data, asserting itself through your sentences.

If you’ve watched the way people in your country write to each other quietly shift, the way emails sound similar, the way marketing copy across totally different businesses has the same beats, you aren’t imagining it. The system is doing what it was built to do, in the voice it was built to do it in.

And if you’ve been wondering whether the global enthusiasm for these tools is making space for everyone, or quietly making everyone fit into a smaller space, that question deserves to be asked out loud.

Because these systems don’t just translate language. They export norms - what clarity sounds like, what professionalism sounds like, what persuasion sounds like, and what intelligence is supposed to sound like.

The algorithm has an accent. It’s worth knowing whose, before you start sounding like it.

Note: In this piece, I write from a systems and communication perspective, not as a linguist. But if you are interested in the topic, as I am, I’ve found The Strategic Linguist’s work especially useful for thinking more deeply about the power of language. I highly recommend subscribing to her work.

The Strategic Linguist

It’s kinda nice to read the things I want to read written by one of my favs! This is such a great piece, Jade. And I’m so jealous of your travelling. It gives you such a different perspective seeing this on the ground vs knowing it’s happening but you don’t have proof.

You’re bringing up all the right questions and all the right angles to a technology that requires deeper understanding of its impacts not just what it is. Thank you for fighting the fight on this side of the conversation.

The more people outside of linguistics who understand this the better. I’m grateful for all your insights and research that unpacks the colonialisation we’re seeing in the digital age with language.

1 reply by Jade The Hooman

Sofi

Great piece. I've noticed this a lot when I travel to. I think the point that AI doesn’t just translate language, but carries cultural assumptions about what sounds clear, professional and persuasive, is really important. It makes the risk feel less about bad outputs and more about the reshaping of local voice and communication norms. I liked this angal

4 more comments...

Jade The Hooman

Discussion about this post

Ready for more?