There is no point antagonising people by guessing information about them wrongly - particularly if it's something they've become sensitised to by it occurring frequently.
If you need to know someone's gender (and largely, you don't), then ask them.
Except, of course, that I am male. My name is used for both genders. The thing completely failed on a few other ambiguous names I tried. I'll second AndrewDucker's opinion—just don't.
The numbers are honest enough to admit that the result is crap in this case - this type of statistical openness should be encouraged.
{"name":"maria","gender":"female","probability":"1.00","count":700}
{ "name": "kay", "gender": "female", "probability": "0.93", "count": 57, "country_id": "US", "language_id": "en" }
If any service needs to know gender (and I'm having a hard time thinking of times you NEED to know gender - dating sites?) - why not just ask? surely in a situation where you're reliant on having accurate gender information, guessing from $firstname and getting it wrong is worse than asking.
Male homepage: Die Hard, Star Wars, Bridget Jones.
Female homepage: Bridget Jones, Twilight, Star Wars.
Both males and females are shown primarily movies they are more likely to be interested in and your bounce rate goes down.
This is Hacker News. Such enlightened thought is frowned on by our new brogrammer overlords. Here's your beer.
If you wanted to deride the fact that many folks here won't spent multiples of effort on special, experimental, no-right-answers-and-likely-to-be-criticized-for-it-if-you-even-try cases that affect minuscule fractions of their potential user base, well...get in line behind the IE5 advocates, I guess.
Someone recommends using a free form entry for gender. No amount of normalization will fix the "ham sandwich" entries (except that we know they are nearly all male), so you'd trade the integrity of a small percentage of your data for the appearance of "making an effort" for the vanishingly small percentage. Net fail.
Just to be clear, my primary feeling here is that -- in the hypothetical case where gender matters -- you're best served by keeping it simple: (female | male | other/it's complicated | prefer not to answer). This should serve all cases equally.
Do you simply add some extra genders? Male-to-female transexual, female-to-male transexual, intersex? No matter how many categories you add, you'll always annoy someone for missing them out. Does 'genderqueer' and 'genderfluid' count as the same category, or different ones?
Maybe just add a free text form for people to input their gender? But then it's impossible to normalise if you want to do any analysis.
Maybe we should just be enlightened and ignore gender altogether? But sometimes knowing your user's gender is really important, and it seems weird to discard this data because some people don't fit. Maybe the best compromise is simply to have 3 categories - male/female/other - though even then you'll get complaints. "Who are you calling 'other'?"
Anyone have any other thoughts?
PS: I seem to see way more people in tech complaining about brogrammers than actual brogrammers.
A better approach, in the absence of more complex models, would be to use Laplace's sunrise formula.
She isn't the only one either, there are hundreds of them that took their name from a Catholic saint.
http://api.genderize.io/?name=eloi&language_id=ca
http://api.genderize.io/?name=tomeu&language_id=ca
http://api.genderize.io/?name=rigoberta&language_id=es
http://api.genderize.io/?name=presentaci%C3%B3n&language_id=...
Credit for distinguishing between names in languages, though! Joan returns female in English, but male in Catalan.
A lot of complaints, excluding the binary gender complaints, totaly forget about how languages like portuguese / french have male / female differences for nouns and other language constructs.
Let´s say I have to build a phrase where I have the user profession like engineer and I don't know upfront, for portuguese male would be "engenheiro" or " engenheira" for female. It does have a lot of practical uses. And with a big enough training, the decision to use for that user is on your hands.
Another strategy is to use gender-neutral terms until you find out the gender, as asking directly might be considered rude in some cultures.
"Benedikt-SSON is definitely male while Katrín Jakobs-DÓTIR is female"
(Hey, I swear it was before taking a look http://en.wikipedia.org/wiki/Icelandic_name)
Yeah, how about no.
It also seems accurate:
Pat = about 50/50 David = All man Jessica = All woman
Also, wrt to "binary gender identity" complaints, are we all college freshmen here?
* my own name (Nord) sucks and gave a gender of null. Spent my whole life being called Nerd, Nora, etc. I'm not flipping out.
We aren't, which is exactly why it's a problem.
I fail to see how this API needs to accommodate transpeople in its 0.1 release.
In UI, provide a form with "Male, Female, Other". If they click other, reveal an optional text field where they can enter what they wish. Store.
Normalize the synonyms of male and female to lower case "male" and "female" when doing analysis. You don't have to get 100% normalisation. But you'll probably get 80-90%.
Contemplate what you are actually using the data for, listen to users.
To be honest, if you manage to not put transgender as a sexual orientation, you'll be doing better than most people.
In lieu of questioning who the user is, may I suggest see what the user does instead? Behavior, in the end, is the real key to unlocking demographic potential. What purchases are made, what items were clicked on, where they had browsed and what titles on pages attracted them on the site can tell far more than a simple field.
This, of course, takes time and testing plus accumulation of data. But since a fair number of HN users also like to exalt the status of "Big Data", there's a more productive use for it.
Don't try to finagle who the user is or isn't. Just find out what they want.
also, "the hypothetical case where gender matters" is a glorious illustration of straight privilege
Gender has nothing to do with orientation. Please don't propagate such normative misunderstandings.
If I may restate for clarity: in most cases of software implementation, a user's gender is not important data (obvious exceptions include medical and related fields).
Generally, gender should not be requested. Where requested, it should not be required. Where required, one should have no compunction against answering randomly.
That's prerogative, not privilege.
I fail to see how this API needs to accommodate transpeople in its 0.1 release.
You have failed at empathy.Speaking to the fluidity of human gender, "Other" is the majority of the spectrum and defaulting to binary is just as naïve as defaulting to ASCII as expected input in an application/API written in 2013.
Restricting yourself that early in the release cycle (and I'm still dubious of the merits of this project), doesn't bode well for its future.
Edit: I just read your comment history and, if I'm not mistaken, you're already biased. Or would you care to elaborate what you wrote here? https://news.ycombinator.com/item?id=6451454
I agree 100% about the fluidity of human gender, but rather than lecture people via a form/api etc. it is probably simplest to have words that most people use like male and female and something ("other", "trans", "enlightened", whatever) for the 3rd option.
>> I just read your comment history and, if I'm not mistaken, you're already biased. Or would you care to elaborate what you wrote here? https://news.ycombinator.com/item?id=6451454
First of all that was a joke in the spirit of the Hangover 2. Might not have been that funny, but was an attempt at humor based on what it was responding to.
Secondly, I believe you are "biased" :) because though it is chronologically juxtaposed to this comment that's probably a coincidence because if you go through my comment history it might be the only one touching on the subject (I think, no guarantees).
edit - I did make a prison rape joke a year+ ago (https://news.ycombinator.com/item?id=4148572) but I actually heard from people that it was hysterical* because of the play on "backbone".
* hysterical is a sexist word, I know.
it is probably simplest to have words that most people use like male and female and something...
The simplest is a text field with: "What prefix would you like us to use?" The End.There are no assumptions, no assignment of labels, not one bit of imposing your cultural norms on anyone else. The hardest part of getting over biases is acknowledging that you have them.
Learn some sensitivity, please.
A text field adds time to type out (which can lose customers, alienate handicapped etc.), all to accommodate an exception, rather than a rule.
I don't care if you have a slider, dropdown, circle, whatever, but for usability, a gender option should have poles that require 0 or 1 clicks to get to (though a text-field for further elucidation is okay). Continuing down this path, the further step is saying a shoe-size option insults amputees and lymphedema victims and should be a text-field...
Edit - Another fact is that the person filling out this form might not be the person described by the form (say for a CRM tool) in which case it matters more to KISS.