Emoji and Deep Learning(getdango.com) |
Emoji and Deep Learning(getdango.com) |
I wish they had explained details, such as what two-dimensional non-linear projection they're using for their map.
I also don't see it fully explained how they're getting representations of sequences of emoji. They explain how their RNN handles sequences of input words, but the result of that is a vector that they're comparing to their emoji-embedding space. Does the emoji-embedding space contain embeddings of specific sequences as well?
The sequences of emoji we ended up glossing over here (difficult balance making these concepts as accessible as possible). In the app we can beam search to predict combos, just as you would in sequence to sequence learning. That's not demo'd on the live website though.
White arm: http://i.imgur.com/KTNky0O.png Obvious connection to sports, sunglasses(like saying "cool" in this context)
Black arm: http://i.imgur.com/uXtSRfc.png Policeman searching something, a location marker(search location?)
This is definitely a concern and something we've though about but not yet fully solved. The neural net is trained on real world data which unfortunately includes various types of questionable, racist, sexist, etc content. We already blacklisted emoji combination that are too often triggered in racist ways. However such a system is very difficult to audit completely.
Your example comparing different skin tone modifiers is a good one that we hadn't thought of. I've made a note of it so we can try and improve.
Although in this particular case it's actually just a bug: Dango gets confused by any skin tone modifier character, since they're not supported on Android (our target platform). Try putting in a "white" arm and you'll see the same results. They're actually just our "Dango is confused by this input" results.
We should fix the bug, of course!
Who are these people that type a sentence (with a single meaning, clear-cut enough for Dango to detect), and then want to add a redundant pictorial representation of the same words they just typed?
However, Dango's training data includes people using Emoji to augment rather than repeat their sentence. So if there are two different interpretations and an emoji could disambiguate, the ideal is that Dango has seen people use that phrase both ways and, and that it suggests both possibilities and you can pick the one that you meant. In many cases this works now, in many cases we still have work to do.
It also suggests based on messages sent to you, so if there are a couple different replies it can show you them all (although this feature still needs work).
But yeah our main focus is suggestions. You can use Dango concurrently with the normal emoji keyboard, of course! It can just sit there showing you emoji you might not know about "ambiently"
Have you given any thought on integrating this with some sort of bluetooth thimble-like button (makey makey?) on each finger for untethered typing?
I've written more about this line of reasoning here[1] if you're interested. Feel free to ping me on twitter if there's any way I can help. Congrats on this awesome project!
What does this mean?
The issue of multiple meanings is that if you strongly predict an ambiguous emoji (say the prayer emoji) how do you then extrapolate what concept is contained in the sentence (e.g. was the person saying "thanks" or "high five" or "please").
[I'm also a Dango dev]
In fact I made a website that tracks the live count of emojis used on Venmo:
The source is here for anyone interested:
Unfortunately in the app we can't give you emoji that your phone doesn't support so we don't always show all the results.
Well that's disappointing. Is it country-restricted ?
We can get an actually supported one up officially if there's interest. Email me at xavier@whirlscape.com!
One reason we're interested in visual communication with Dango, though, is that regular text input is pretty good already. Chorded keyboards exist and are way faster, but people mostly can't be bothered to use them. QWERTY is just good enough. But the field is wide open for rich communication with images, nothing out there is particularly good yet.
Replace thimble-keys with OpenBCI and you already have it.
Urbit.org sounds like a good fit for the immutable append-only content-addressable private keylog (now would be the time for a portmanteau generator). I would love to help in any way to make this happen.
And then the t-SNE projection shown in the article is based on this same layer (one before prediction)?
And yes, we do the t-SNE on that pre-projection space. That's why we can visualize the targets (emoji) in it. We can also t-SNE the word embeddings themselves — the input to the RNN — which is also kind of interesting. It automatically learns all kinds of structures there. Chris Olah has a good post on word embeddings if you're interested: http://colah.github.io/posts/2014-07-NLP-RNNs-Representation...
So there's a good chance we could get it to work! We've not focused on that possibility… yet.