An amateur linguist loses control of the language he invented (2012)

An amateur linguist loses control of the language he invented (2012)(newyorker.com)

272 points by godarderik 11 years ago | 106 comments

godarderik 11 years ago |

Reading this story brings to mind the history of algorithms in the field of machine translation. Early attempts at the problem attempted to explicitly define the rules of converting between tongues using meticulously laid out systems of vocabulary and syntax. This approach proved untenable, in part due to the complex and ever changing nature of language. Modern systems such as Google Translation make use of machine learning algorithms that are fed large amounts of source material and computationally discern relationships between them.

I wonder if a similar approach could be taken with language construction. Instead of spending 25+ years fleshing out the details of a language in painstaking detail, computer programs could be devised that, using large amounts input, determine the most "efficient" means of expressing information. The approach would not only be far less labor intensive, it could also accommodate the rapidly evolving nature of language, for example adding to its "dictionary" in response to new phenomena in need of naming.

andreasvc 11 years ago | |

Interlingua was constructed this way, at least its vocabulary. They made the mistake IMHO to make the grammar naturalistic, which made it very easy to read for people who already spoke a Romance language; writing, on the other hand, was made difficult by this.

You could perhaps use a typological database with grammatical features of the world's languages and somehow select an "optimal" combination from it, but that's a far cry from letting a computer determine the most efficient means of expressing information; we have no idea how to define information/meaning, so that it's still an impossible dream. I don't think the problem is that designing languages is hard per se, it's that people can't be bothered to agree on one and learn it.

cgio 11 years ago | |

Maybe it could use Minimum Message Length http://en.wikipedia.org/wiki/Minimum_message_length

mchaver 11 years ago | |

It sounds like an experiment worth testing out and could lead to some interesting results. On the other hand, I am imagine most conlangers enjoy devising the details of their language.

erikb 11 years ago | |

In fact it doesn't just sound like an experiment worth doing. It sounds like something that somebody somewhere might have already done.

voronoff 11 years ago | | |

To be glib, it has been done. We call it language.

Seriously, though, take a look at the link I posted in https://news.ycombinator.com/item?id=8180924

One of the techniques used is to computationally create a space of possible ways to partition semantic domains on a plane whose dimensions are simplicity and informativeness, in order to look at where in the possible space it is that real languages lie. While it's not been done (to my knowledge) for a whole language, it's potential direction to go.

voronoff 11 years ago |

For anyone who is interested in what an ideal language would look like, particularly in respect to brevity vs. informativeness I'd highly suggest looking into Terry Regier's work: http://lclab.berkeley.edu/

I worked in his lab on one of many projects showing that most human languages use a near optimal trade-off in various semantic domains (so far - color, kinship, containers, and spatial relations). His work also includes some of the best evidence for some language dependent forces in cognition interacting with some universal ones.

MichaelDickens 11 years ago |

Ithkuil seems like what a language should be: as the article said, it is both precise and concise. It looks the way Esperanto ought to have looked. I find Quijada's effort deeply impressive.

I don't know much about designing human languages, but I know how hard it is to design a decent programming language (see http://colinm.org/language_checklist.html), and building a serious human language seems orders of magnitude more difficult. I've never seen an attempt that really intrigued me until I found Ithkuil.

ejr 11 years ago |

If anyone wants to hear what Ithkuil sounds like : https://upload.wikimedia.org/wikipedia/commons/c/c9/Ithkuil_...

From https://en.wikipedia.org/wiki/Ithkuil

SideburnsOfDoom 11 years ago | |

Oh gods! It sounds like a tongue-twister played backwards. Clearly not designed for ease of pronunciation or for singing songs.

lloeki 11 years ago | |

> Ithkuil does not use the concept of zero

Interesting. How is one supposed to talk about math without a concept of zero?

hrvbr 11 years ago | | |

It's likely not missing the concept of zero but rather the symbol in the numbering system. The 0 is not necessary for representing integers (if there's a new symbol for 10). I'm pretty sure it is necessary to write the decimal part of a real number.

ejr 11 years ago | | |

It's a fascinating problem. It makes me wonder, without zero, what paths mathematics would have taken. There have been civilisations that used math extensively without zero and I hope Ithkuil-fluent mathematicians some day would continue exploring this.

tomkinstinch 11 years ago |

The same thing happened to Blissymbols[1], as documented by radiolab[2].

1. https://en.wikipedia.org/wiki/Blissymbols

2. http://www.radiolab.org/story/257194-man-became-bliss/

MBCook 11 years ago | |

That was a great episode.

This has also happened with the language Lojban[1], which was 'forked' from Loglan[2] when the creator starting making copyright complaints so the community could maintain control.

Such an odd concept that someone could 'own' a language, but I guess if you created it I can see why you would want to.

[1] http://en.wikipedia.org/wiki/Lojban [2] http://en.wikipedia.org/wiki/Loglan

mchaver 11 years ago | | |

I think it's a great lesson for any creator. Just because you have invented something does not necessarily mean you have the right or are capable of dictating how people use it.

I seem to remember Umberto Eco mentioning that he does not offer his own interpretations of his novels for a similar reason, but I can't find the quote.

GeneralMayhem 11 years ago | | |

It's a less odd concept if you compare it to Elvish rather than English.

tokenadult 11 years ago |

This is attracting some reader interest here, so I should probably mention, for other Hacker News participants deeply interested in human languages, a definitive analysis of Esperanto[1] explaining why Esperanto has not caught on with more speakers.

[1] http://www.xibalba.demon.co.uk/jbr/ranto/

ilaksh 11 years ago |

Ithkuil is definitely one of the most amazing pieces of work I have ever come across. I having been using the name as my email address for many years and another variant of it he had called 'ilaksh' as my screen name (note I didn't have anything to do with the creation of ithkuil/ilaksh, just a fan). I think not only other conlangers but also anyone interested in fields like linguistics, computer programming, knowledge representation, etc. can be inspired by what Quijada did.

I did get a few somewhat weird emails that I think were in Russian some years ago, but I think they figured out pretty quick that it wasn't the right email address to reach Quijada.

jqm 11 years ago |

Losing control of a language seems to be standard procedure.

If this invented language were to catch on, it likely wouldn't be a generation or two and kids who grew up speaking it would start saying the Ithkuil equivalent of things like "yo dog, that's the rad shizaz!". Then, several generations thereafter grandmothers would be regularly using the word "shizaz" and they would have to put it in the dictionary. That's just the way it goes and is probably the reason we don't all speak the same language in the first place.

That being said, I've always been fascinated by the idea of a systematically created universal language and think the world would be much better place with one....if that were possible.

This was a neat article.

Terr_ 11 years ago |

I think there's some research out there that suggests all natural languages have about the same information density, when you factor how two people in conversation will add error-correction or extra context to frame an idea.

IMO this suggests the bottleneck is something about our brains on a biological rather than linguistic level.

godarderik 11 years ago | |

According to a study published several years ago, mainstream languages seem to operate on an information density/speed tradeoff [1].The authors found that languages that are spoken faster seem to encode less information per syllable than those uttered at a slower pace.

This does seem to suggest that biology may be the limiting role in controlling the rate at which humans convey information. Indeed, the language mentioned in the article seems almost laughably cryptic and dense. However, I feel that the limitation of the mentioned study results from the fact that it treats information on a relatively limiting per syllable basis. Quijada seems to suggest that an artificially constructed language has the ability to incorporate all the implicit meanings of a phrase that are left unsaid in normal conversation.

Ultimately, while Quijada's project seems quite unlikely to catch on among those who are not fringe pseudoscientists, it poses interesting philosophical questions about the nature of speech and communication and perhaps earns its title as a "conceptual-art project."

[1] http://rosettaproject.org/blog/02012/mar/1/language-speed-vs...

notahacker 11 years ago | | |

The article seems to support the information density / speed tradeoff, in hinting several times that the language's inventor puts at least as much cognitive effort into agglutinating syllables to form a word in his language as he would into joining words to make a sentence in a second language.

hueving 11 years ago | | |

>The authors found that languages that are spoken faster seem to encode more information per syllable than those uttered at a slower pace.

I think you mean the inverse.

gabemart 11 years ago |

I found this article fascinating and satisfying.

I'm curious about the desire to reduce ambiguity, which seemed to be emphasized as a motivation for the creation of Ithkuil and some of the other languages mentioned.

Is it desirable to completely eliminate ambiguity? I can see why it would be desirable in a scientific paper or a public political debate. But in everyday interactions, (intentional) ambiguity plays many important roles.

In my experience, politeness is bolstered by some level of ambiguity. Rather than explicitly state your needs, desires or opinions, you imply them at some level of abstraction, allowing other participants in the conversation to accept or decline more easily. Imagine Jessica who has brought two friends who don't know each other to see a play. They chit-chat a little afterwards, then Jessica goes home early leaving two virtual strangers to have a drink together. It's not hard to imagine the conversation going like this:

A: "Did you enjoy the play?"

B: "It was very interesting. I thought the stage dressing was a little unconventional."

A: "Yes, I noticed that too. Very creative. I was intrigued by the style of the narration. It really let the audience write the story for themselves."

B: "It certainly didn't constrain the imagination did it? I couldn't help noticing that many of the actors took a somewhat avant-garde interpretation of the source material."

A: "Yes, as if they didn't want it to seem like they were 'acting', so to speak?"

B: It was awful wasn't it!?

A: Thank god! Yes, worst thing I've ever seen!

Ambiguity allows subtle social cues (not so subtle in my example!) that avoid direct confrontation when it might be uncomfortable. If one person loved the play and the other hated it, they each might want to avoid offending the other.

Intentional ambiguity plays an important role in other social interactions like dating or friendship-making. Correct use of ambiguity protects feelings, demonstrates subtlety and good judgement, and avoids non-productive conflict.

In artistic expression too, ambiguity is often intentional or even necessary to the effectiveness of the work. Consider a poem like "My Papa's Waltz" [1]. Does it describe happy memories of the narrator's father, or dark memories of childhood abuse [2]? Can it describe both? Is there something in between? The ambiguity isn't a byproduct of imprecise language. The ambiguity is the meaning. To resolve it is to remove the point of the work. The poem cannot be effectively communicated in any medium that does not allow for the existence of ambiguity.

[1] http://www.poetryfoundation.org/poem/172103

[2] 'Yet, this poem has an intriguing ambiguity that elicits startlingly different interpretations. Kennedy calls it a scene of "comedy" and "persistent love", and Balakian, in part, labels it a "comic romp" (62). In contrast, Ciardi sees it as a "poem of terror"' - from http://www.mrbauld.com/exrthkwtz.html

JoeAltmaier 11 years ago |

"Among the Wakashan Indians of the Pacific Northwest, a grammatically correct sentence can’t be formed without providing what linguists refer to as “evidentiality,” inflecting the verb to indicate whether you are speaking from direct experience, inference, conjecture, or hearsay"

This is amazing. But I can't grasp the difference between inference and conjecture - they are both 'figuring out' what happened rather than knowing or hearing?

lotsofmangos 11 years ago |

I wonder how well Ithkuil can be represented in Ian Banks' Marain script. http://trevor-hopkins.com/banks/a-few-notes-on-marain.html

gamegoblin 11 years ago | |

Ithkuil has well over double the number of phonemes as Marain, so the answer would probably be "not so well".

lotsofmangos 11 years ago | | |

Rotation and reflection of the basic set extend the phonemes and can link together similar sounding ones in Marain, so I would have thought it would be achievable.

arsalanb 11 years ago |

>"Languages are something of a mess. They evolve over centuries through an unplanned, democratic process..."

I'm in awe of the creator of Any language. Because to create a (Good) language isn't easy. This is true or both programming languages and otherwise. However, it comes without saying that adoption is a vital component of any language, and with mass adoption comes evolution.

People will often make changes in languages, make their own dialects (based on things perhaps the can relate to on a deeper level, etc..). This isn't a bad thing. To me it only signifies growth and expansion of the language.

pohl 11 years ago |

I really enjoyed this article when it was new. Not long ago, when I was learning Octopress, my first post was Hello World in Rust and Ithkuil. (I just wanted to make sure code formatting was working.) I have no idea how correct the translation is. I just googled around until I found someone else's.

http://screaming.org/blog/2014/07/12/ettawil-cutx/

wyager 11 years ago |

Can someone list a few popular constructed languages (maybe comparing them to programming languages)? I'd only heard of Lojban and Esperanto before reading this.

doublec 11 years ago | |

Try http://www.reddit.com/r/conlangs for discussion and lists of languages. There are many subreddits for specific languages. For example, Toki Pona [1] and Lojban [2].

[1] http://www.reddit.com/r/tokipona [2] http://www.reddit.com/r/lojban

mchaver 11 years ago | |

Toki Pona is a minimalist language with a bit of a following. It has 120 root words and tries to build all concepts based on those.

Loglan is a predecessor and inspiration to Lojban.

Slovio is the Slavic version of Esperanto.

Dothraki, Elvish (Quenya, Sindarin), Klingon, Na'vi are constructed languages from popular novels/movies.

dunham 11 years ago | | |

One novel thing about Tolkien's languages: he constructed a root language (like proto-indo-european) and etymologies for them. (See http://en.wikipedia.org/wiki/The_Etymologies_(Tolkien) )

iLoch 11 years ago | | |

> Slovio is the Slavic version of Esperanto.

Someone doesn't understand the point of Esperanto.

StavrosK 11 years ago |

TFA:

> A sentence like “On the contrary, I think it may turn out that this rugged mountain range trails off at some point” becomes simply “Tram-mļöi hhâsmařpţuktôx.”

Wikipedia:

> Romanization: Oumpeá äx’ääļuktëx.

> Translation: "On the contrary, I think it may turn out that this rugged mountain range trails off at some point."

yongjik 11 years ago |

That was an interesting read, but the reporter's breathless assertions frequently got in the way of appreciating Quijada and his idea.

I mean, things like:

> A sentence like “On the contrary, I think it may turn out that this rugged mountain range trails off at some point” becomes simply “Tram-mļöi hhâsmařpţuktôx.”

Simply?

We could have used LZW algorithm and the sentence could probably become even shorter, just a "simple" sequence of random-ish bytes. If you increase the number of allowed symbols, of course you need less symbols to convey the same information. If you allow for a limitless set of words that are dynamically generated from combining many roots, of course the number of words decreases... sometimes down to 1, as in polysynthetic languages. This is Information Theory 101.

moconnor 11 years ago | |

The English text is 97 ASCII encoded bytes.

Compressed with zlib: 86 bytes.

Compressed with lzma: 98 bytes.

The Ithkuil representation is just 30 UTF-8 encoded bytes.

Compressed with zlib: 39 bytes.

Compressed with lzma: 47 bytes.

(Measured using python's zlib/pylmza modules to avoid e.g. file header overhead)

It's hard to achieve this kind of compression without an external dictionary. What Quijada has created with Ithkuil is, in part, a dictionary for the space of human thought and concepts, something I wouldn't have expected to work in the way the article describes it.

Dylan16807 11 years ago | | |

Actually, using zlib format gets you an unnecessary 2 byte header and 4 byte footer, so the proper sizes are 80 and 33.

I'm having trouble figuring out what's going on with lzma because the spec is lying about the header, so I won't attempt to guess the correct number there.

based2 11 years ago |

http://www.reddit.com/r/linguistics/comments/2dlsgl/utopian_...

mariusz79 11 years ago |

While it looks like it's an impossible language to use in every day, I'm wondering if it could be used for science and technology. Just imagine having all scientific papers in it :)

alxndr 11 years ago |

I'm amazed that the article doesn't mention Lojban at all.

thisjepisje 11 years ago |

Off topic: Are the drop caps supposed to be lower than the line of text to which they belong? It looks kind of silly IMO.

lawlessone 11 years ago |

Any font files for this? would be interesting to use.

stuaxo 11 years ago |

If there was a site that summarised New Yorker articles in 2 pages I would be there in a flash.

Fastidious 11 years ago | |

That would be atrocious! You need to flavour and enjoy the reading, just the same you enjoy a nice drink, or a good cup of coffee, or you take the time to make coitus a never-ending engagement.

Just enjoy it.

StavrosK 11 years ago | | |

Sometimes I need a quick snack, for the calories.

JohnTHaller 11 years ago | |

You can't say that on Hacker News. The downvote brigade will nail you every time for disagreeing with them. You can not comment on the verbosity of articles or unrelated extra paragraphs and asides that don't serve the overall narrative in The New Yorker, The Atlantic, etc.

QuantumGood 11 years ago |

Seems this could contribute to accelerating artificial intelligence towards the possibility of the singularity.

_9hey 11 years ago |

another hacker news TL;DR article

knieveltech 11 years ago | |

Too bad. It's a pretty good read.

JohnTHaller 11 years ago | | |

It's a very interesting article. But it's done in old journalism/academic paper style where it takes 5 pages to get to the point and has huge multiparagraph asides that the reader is often uninterested in. I already know the history of esperanto... most people won't even care about it. I don't care at all that George Soros learned it as his first language, it's unrelated nonsense. Tell me about the topic of the article. If you want interested people to be able to learn more about Esperanto, link to a side article. We can do this today.

selimthegrim 11 years ago |

Two things struck me about this article in hindsight when I read it.

-- Whose pot did the Croats, Bosnians and Slovenes piss in to not make it into this super Slavic union?

-- China Mieville wrote a book[0] along very similar thought lines which won the Locus Award.

Also, Garkavenko appears not to have taken the obvious side [1] in Ukraine's present conflict given how he is described in Foer's article

[0] https://en.wikipedia.org/wiki/Embassytown

[1] http://maidantranslations.com/2014/06/24/russian-volunteers-...