We speak to Paul Frommer, who created the Na’vi language for ‘Avatar’, about making fictional tongues feel authentic.
Above: 'ReCore' screenshot courtesy of Microsoft.
In the lead-up to the ReCore's release in 2016, the game's official Twitter account held a contest to win a special-edition copy of the game. The challenge was simple: Be the first person to translate a message at the end of a video, written in the game's robotic, alien language. I came pretty close to winning, though I was ultimately beaten to the punch. The "alien language" turned out to be nothing more than a simple letter-to-symbol substitution cipher, in which each symbol in the alien alphabet was directly correlated to a letter in English.
Substitution ciphers have been used as puzzles for years, with puzzle books and newspapers featuring a plethora of variations on the simple concept for readers to sink their intellectual teeth into. By using a cipher to create their language, developers kill two birds with one stone: They get their freaky alien language and they create a puzzle for the player to solve. It's a simple, very video-gamey approach to creating an alien language, but surely utter rubbish as far as realism goes, right?
To find out, I contacted Paul Frommer, a retired professor of business communication at USC Marshall and, more pertinently to this piece, the man who created the Na'vi language for James Cameron's Avatar. A quick rundown of Paul's impressive resume: He has a degree in mathematics, a doctorate in linguistics, he speaks multiple languages, he's a retired professor (as we've covered), and he was a US Peace Corps volunteer. Basically, his life was pretty fascinating long before he started working on Hollywood blockbusters.
Above: 'Avatar', official trailer. It's still the biggest-grossing movie of all time.
"You need to understand the purpose of the language, the kind of environment it's going to be spoken in," Paul explains as I ask him about the process of developing a constructed language from scratch. "You begin by thinking about what kind of sounds the language will have, what kind of sound presentation the language will have when you hear it spoken; what will it sound like? What sounds are in the language and then, just as importantly, what sounds are not in the language."
"Sounds in Na'vi that have attracted some attention are called ejectives, and I wound up spelling them with an x: 'Px', 'Tx', 'Kx'. They sound like popping consonants which are found in human languages; they're found in Ethiopia and a lot of Native American languages."
Paul pronounces each of those ejectives beautifully—watch this video to hear what they sound like. Adding unique sounds that don't often occur in human tongues helps to differentiate your new language, making it seem more alien. And interestingly, this effect can also be achieved by excluding common sounds, those that the listener would expect to hear.
"Spoken language is the basic property of every human being, your inheritance. But that's not true of written language. Written language is really a product of culture." — Paul Frommer
"B, D and G—'Buh', 'Duh' and 'Guh'—are not found in the Na'vi language, and neither is 'Sh', or 'Shuh', nor 'Ch', 'Chuh'," Paul tells me. "These are not random exclusions. These are classes of sounds that are [purposefully] excluded from the language." These sounds are common in Western languages, so excluding them instantly makes the speech pattern sound foreign to audiences in relevant territories, like the US and UK.
We discuss more ins-and-outs of language creation, covering phonotactic constraints, syntax, and a whole bunch of beyond-me details that I occasionally "mmhmm" at so as to make clear the Skype connection's not dropped. It's all really interesting stuff, relayed with palpable passion by Paul, but perhaps not so relevant to the specific reason I'm calling: To learn how this stuff filters into video games, and whether or not it's ever done well.
Before my call with Paul, I send him some information on Al Bhed, the constructed language of Final Fantasy X.
"I actually had a chance to look at the Al Bhed cipher, and it's a very simplistic substitution cipher," he tells me. "The Al Bhed cipher is in fact even easier than other examples of its kind, because they map vowels onto vowels and consonants onto consonants, so the vowels are kind of a permutation of each other. Now I'm sure the reason they did that is so that what they came up with was more pronounceable than it might otherwise be."
One of the most remarkable things about Al Bhed, and indeed any language created using a substitution cipher, is that it is primarily a written language, with the spoken form being crafted afterwards. This is in stark contrast with how languages usually work, as Paul explains.
"Written language is always secondary to spoken language. There are many human languages that are only spoken and did not develop a written form. Another way to put this is that spoken language is the basic property of every human being, your inheritance. If you're human, you have a spoken language. But that's not true of written language. Written language is really a product of culture, it's a product of development and education."
There are some advantages to creating a language with a cipher over the more traditional and laborious method. Cipher languages have a word for every word in the language they are derived from—which is to say that there is automatically an Al Bhed equivalent for every word in the English language. If you craft your language from scratch, you must invent each word individually, and shit doesn't hit the proverbial fan until you try to speak your new language.
"You get crazy consonant clusters [in Al Bhed] that are totally unpronounceable," says Paul. "I actually worked out a spelling of 'cdnahkdrc'. But then they give you a pronunciation, which is very interesting. What they've done is they've attached an inherent vowel to every consonant. So it's pronounced, 'ku-denah-ha-kuk-de-ra-ku'. What's happening here is that you're taking a one syllable word and you're pronouncing it with eight syllables. So in terms of a spoken language, this is going to be rather unwieldy."
But the main advantage of the substitution cipher is that it's a prebuilt puzzle, a game to put within your game. Ciphers puzzles were used for our entertainment long before the age of video games.
"When I was a kid, and we're talking about a very long time ago, probably when I was about ten years old, I enjoyed decoding these exact kind of ciphers," Paul reveals. It's really no surprise then that ciphers made their way into the entertainment products of today, given their history. By allowing the player to translate the alien language, you draw them deeper into the world and give them that all-important sense of accomplishment that all gamers crave.
But while Paul has a fondness for the games of his childhood, he is vocal about the shortcomings of using ciphers to create a language. With that in mind, I asked him the obvious question: Can you do it better?
"It really depends on what your goal is. Do you want to create a language that's a genuine, realistic, linguistically viable language? Or do you want to come up with a puzzle? If it's a puzzle, there are lots of ingenious ways you can do things. If you're talking about a plausible language that's going to stand up to scrutiny then you've really got to follow the route that we've discussed."
"There's always going to be a very small but ardent and vocal group of fans who will examine everything with a fine-toothed comb. Then you've really got to make something that will stand up to scrutiny." — Paul Frommer
Plenty of games do go down a more linguistically authentic route when coming up with their languages. For example, Far Cry Primal's developers brought in a pair of linguists—Andrew Byrd and his wife Brenna, who we interviewed here—to help develop a Proto-Indo-European language for their game. In that case, rather than developing a new, alien language, the linguists were tasked with recreating, to the best of their abilities, a dead language. Paul is into that concept, albeit with a caveat: "That process of trying to reconstruct a language, to go back in time is a valid one. Although to go back that far, to cave man times, you know, tens of thousands of years, that's going to be largely fanciful I think."
Knowing the amount of work that goes into creating a realistic alien language for a video game (or anything else), it's no wonder that many developers choose to take the easy way out. The universal translator, the substitution cipher language or just having everyone speak English: These are much more attractive options for a studio that doesn't want to bog itself down in unnecessary minutia.
But really, will the average gamer be able to tell the difference between a cheap knock off and the real McCoy when it comes to a newly constructed language?
"Chances are, the average viewer or player isn't going to be aware of anything about the structure of the language," says Paul. "But there's always going to be a very small but ardent and vocal group of fans who will examine everything with a fine-toothed comb. Then you've really got to make something that will stand up to scrutiny because if not, then people are going to analyze it, tear it apart and write about it—sometimes in very stark and not terribly flattering terms. That's just the world we live in."
Still, if you want to fool, or at least satisfy that vocal minority of people who can tell a constructed language from a random mishmash of sounds, you're going to have to put some serious effort it. Even your average listener will know how poor your language is if you just cobble a bunch of sounds together without putting in the legwork, even if they aren't quite sure why they know it's garbage.
"If you create just gibberish, it's going to wind up sounding artificial, and it's going to wind up not sounding like someone's real language," says Paul. "Even if people are not understanding something, if from one scene to another people are talking about similar things, similar concepts and similar objects in the environment then I think viewers will unconsciously hear the same kinds of sounds. They'll hear the same kinds of words and sense that there's something real going on here, that there is some consistency.
"You have to be more imaginative than simply a letter-to-letter cipher. Write a sentence backwards, or maybe have the direction of writing be right to left, rather than left to right." — Paul Frommer
"The problem with an English substitution cipher is that it's going to match the number of letters and the spacing of English so precisely that someone is going to say, 'That looks awfully like the English sentence.' If you want to come up with something in a written form that's going to be convincing, you have to be more imaginative than simply a letter-to-letter cipher. One thing I thought of, off the top of my head, is to simply write a sentence backwards, or maybe have the direction of writing be right to left, rather than left to right. Or to vary the order of words in an unpredictable way."
Ultimately, like many aspects of game design, it's a trade-off. If you want a realistic language that stands up to scrutiny, you're going to have to put in time and money. If you want your language to be translatable, to be a game within your game, then you're going to have to sacrifice some of its authenticity.
The developers Far Cry Primal chose to craft something realistic. It's worth the trade-off here, sacrificing the potential enjoyment that the player gets from solving the language puzzle to create a more believable game world. Conversely, the more arcade-y ReCore wouldn't benefit from the same treatment nearly as much. After all, it's hard to get nitpicky about a language in a world where the main character has rocket boots and some enemies are chronically weak to the color blue.