Blog article
governmentbilingualcanada

Building Truly Bilingual AI: What Most Vendors Get Wrong

Offering French as a language option is not bilingual AI. Genuine bilingual capability requires specific technical decisions, operational design, and cultural understanding — particularly for the federal public sector, where language obligations are legal requirements, not product features.

Remolda Team·February 28, 2026·8 min read

The Toggle Problem

When government and enterprise clients ask AI vendors about bilingual capability, they almost always get a variant of the same answer: yes, the system supports French and English. The demo will include a language selector. The interface will render in both official languages. The model has been trained on multilingual data.

This is not bilingual AI. It is multilingual AI with a French option — and the distinction matters enormously, particularly for federal departments operating under the Official Languages Act.

Genuine bilingual capability is not a feature you add to an AI system. It is an architectural decision that shapes model selection, training data, evaluation methodology, content governance, and operational design. Most vendors have not made those decisions. They have built an English-primary system with French translation layered on top, and they are marketing the translation layer as bilingualism.

What the Official Languages Act Actually Requires

Federal departments don't treat bilingualism as a preference — it is a legal obligation. The Official Languages Act requires that services be available in both official languages of equal quality. Equal quality is the operative phrase. A system that produces fluent, accurate, contextually appropriate responses in English and stilted, literal, or error-prone responses in French does not meet the standard.

The Treasury Board's digital standards extend this into AI systems. AI tools deployed to serve the Canadian public or bilingual federal employees must meet the same language quality standards as any other government service. That means evaluation against both official languages using rigorous quality criteria, not just a check that French output is grammatically coherent.

Departments that deploy vendor AI systems without scrutinising French-language performance are creating compliance exposure. When the Auditor General reviews digital services — and AI services are increasingly in scope — language quality is audited. "The vendor said it was bilingual" is not a defensible answer.

The Technical Reality of Bilingual AI

The foundational issue is training data composition. The dominant large language models are trained on corpora that are overwhelmingly English. French-language training data exists in these corpora, but at a fraction of the proportion that reflects actual usage in bilingual Canadian contexts.

The practical consequence is measurable performance degradation in French. Complex reasoning tasks, nuanced tone calibration, domain-specific terminology, and idiomatic expression — all of these are handled with less reliability in French than in English by systems trained on English-dominant data. The gap is smaller than it was two years ago, and smaller still for major commercial models that have invested specifically in French capability. But it has not closed, and it is not uniform across task types.

For Canadian government contexts, the problem is compounded by the specific nature of Canadian French. Federal government communications use standard Québécois French with particular conventions around terminology, formality, and institutional language. A model trained primarily on European French, or on Québécois French from popular media, may produce outputs that are grammatically correct but tonally or terminologically wrong for a government service context.

Model selection for Canadian bilingual AI should include explicit evaluation against Canadian French benchmarks, not just generic French benchmarks. This is a specific technical requirement that most procurement processes don't specify and most vendors don't volunteer.

Evaluation Is Not Optional

A bilingual AI system that has not been rigorously evaluated in French is an unknown system. You do not know how it performs, what its failure modes are, or whether it meets the quality standard required.

Proper bilingual evaluation requires test sets in both languages covering the full range of tasks the system will perform. It requires native French evaluators — ideally bilingual evaluators who can compare outputs across languages — not automated metrics alone. It requires evaluation against Canadian government terminology and conventions, not general French language quality.

It also requires ongoing evaluation. Language model performance can shift with model updates, fine-tuning changes, or shifts in the underlying data pipeline. A system that evaluated well at deployment may perform differently six months later. Bilingual quality monitoring needs to be built into operations, not treated as a one-time assessment at launch.

The Operational and Cultural Dimensions

Technical quality is necessary but not sufficient. Bilingual AI also requires operational design choices that most implementations skip.

Who reviews French-language outputs for quality and compliance? In many AI deployments, human review processes are established for English outputs but not explicitly extended to French. This creates an asymmetry where quality assurance is applied more rigorously to one official language than the other — which is precisely the problem the Official Languages Act is designed to prevent.

What happens when the system produces a response that is technically in French but uses terminology that is wrong for the Canadian federal context — for instance, using a Europeanised term where Canadian administrative language has a specific equivalent? There needs to be a feedback and correction process that is not dependent on users knowing that the output is subtly wrong.

Are the staff who manage and maintain the system able to work in both official languages? An AI system configured and maintained by English-dominant technical staff will drift toward English-primary performance over time as prompts, configurations, and training data are updated without equivalent attention to French quality.

What Genuine Bilingual AI Looks Like

Organisations building AI systems that genuinely meet Canadian bilingual requirements make specific choices. They select models with demonstrated Canadian French capability, not just multilingual capability. They build evaluation processes that test French performance with the same rigour as English. They establish content governance that maintains quality in both languages. They assign explicit ownership of French-language quality to someone with authority and accountability.

They also engage French-speaking stakeholders during design, not just during translation review. The difference between a system that feels right in both languages and one that feels like a translation is often found in decisions made early in the design process — decisions about tone, vocabulary, and framing that are much harder to retrofit than to build correctly from the start.

The federal public sector has obligations that most private sector AI deployments do not. Meeting those obligations requires treating French capability as a first-class requirement, not a localisation checkbox. Vendors who tell you otherwise are telling you what you want to hear.

View all

Related insights

Frequently Asked Questions

Ready to start your AI transformation?

Book a discovery call with our team. We'll assess your situation and tell you honestly what's possible.

Book a Discovery Call

No commitment. No sales pitch. Just a conversation.