HackerRank open sourced its ATS. My resume scored 90/100. Oh wait 74. No – 88

HackerRank open sourced its ATS. My resume scored 90/100. Oh wait 74. No – 88(danunparsed.com)

206 points by sambellll 4 hours ago | 43 comments

dvt 1 hour ago |

An alarming number of people don't understand that LLMs work via purely stochastic processes, so I'm happy to see in-depth pieces like this. I'm looking for a job and maybe this is why it's so hard to get a callback these days: resumes are just dumped in some LLM black hole and no one really knows how it works. The author says:

> temperature 0.1 — low, supposedly nudging the model toward deterministic outputs

This is not correct (and is briefly touched on later in the piece when he sets temperature to 0), temperature is not some kind of "deterministic" switch, but rather it affects the sampling distribution (which becomes more "spiky"—but is still very much a distribution).

aesthesia 52 minutes ago | |

A distribution with all probability mass on one outcome is deterministic, so in principle, setting temperature to 0 _should_ result in deterministic outputs. There are a few reasons it might not, but I don't think any of these apply when running a local model like the author did.

317070 20 minutes ago | | |

> so in principle, setting temperature to 0 _should_ result in deterministic outputs

It is a common misconception, but it is not true even in principle. If I have 2 or more logits which are equal to the maximum of my logits, I will sample uniformly random from them with any temperature, even zero. Sampling from softmax([1, 0, 1]) is still stochastic at temperature 0, because the limit is to sample uniformly from the first or the last element.

Anyway: "GPUs don't do deterministic matrix multiplications" is the biggest source of randomness in LLMs. GPUs put the associativity of the sums in matrix multiplications in arbitrary order, and this has a huge impact on the logits coming out of the neural network.

easygenes 29 minutes ago | | |

There are. If the kernels are nondeterministic (e.g. timing issues) there are minor changes between runs, on a single system, even with eager decode enabled (typically what temperature=0 achieves).

valzam 43 minutes ago | | |

I mean the easiest explanation would be that the model harness doesn't always take the most likely token but does top-k sampling or similar. temperatur just means that probabilities get more and more equalized, boosting the chance that an unlikely token gets picked. but even with temp 0 you could have 0.8 T1, 0.19 T2, ... and sometimes sample T2

IshKebab 28 minutes ago | | |

Setting the temperature to 0 should give deterministic results but that's not any better - it's just hiding the huge variance by only taking one sample.

bluechair 1 hour ago | |

Willing to be corrected but I believe this type of automated resume filtering is illegal. Not saying it never happens but my understanding is it is not typical.

thayne 1 hour ago | | |

I would expect that to depend on jurisdiction.

I don't know for sure, but I would be surprised if it was illegal in my particular US state. You might be able to argue the AI has inherent biases that introduce illegal discrimination in the hiring process, but my understanding is winning I case like that would be very difficult, especially since most employers are very cagey about their hiring process and why they mades a decision.

small_scombrus 1 hour ago | | |

They don't need to actually filter/blackhole to have have the same virtual effect.

Show someone a list of resumes with an "applicant score*" and they'll naturally ignore the ones with a low ranking

*scores are generated with AI, mistakes may be made, use only as a guide and verify results

ivan_gammel 1 hour ago | | |

In situations when you get hundreds of applications for one open position (real market now), whatever reduces your pool to the size a human can handle, works. You can preserve some diversity metrics in the process. This particular filtering is rather primitive, but LLM as a first filter can definitely do the job. You may burn less tokens than the hourly rate of your HR and it will be fairer than just dumping 50% of unread CVs in trash.

dgellow 14 minutes ago | | |

Illegal where?

make3 9 minutes ago | |

A more spikey distribution exactly makes the distribution closer to deterministic. That's not the point though. Even in greedy (deterministic) decoding, it is still a black box though that reacts in ways ways that are unpredictable to the inputs. Switching one word around might lead to different scores for example.

Aurornis 35 minutes ago |

> The default model is gemma3:4b

That’s a tiny model. No LLM is going to be a perfect and repeatable judge, but a tiny 4B model is like plugging an RNG into this system.

This whole exercise feels like someone vibe coded an ATS and got it to the point where the tests were passing because they decided they should have an open source ATS project.

gs17 58 minutes ago |

I'm a little confused, is this an ATS system that anyone actually uses? If not, I'm not sure how it's better than just asking ChatGPT to score your resume out of 100. Why would you want to optimize your resume for a system no one is using to score it?

petesergeant 26 seconds ago | |

(Almost) everyone’s using some kind of ATS, every ATS is adding AI auto-ranking (and has been for 15 years), and almost all HR people feel like they have too many obviously bad CVs to read. Whether or not someone is using this ATS specifically, if you submit several CVs to several places, your CV is going into at least one magical 8-ball.

ryukoposting 1 hour ago |

At this point we might as well adopt that joke where you blindly throw away half the resumes because you don't want to hire unlucky people.

makeavish 1 hour ago |

Hiring and job search has been so hard and AI has amplified the existing problems instead of solving any.

sevenzero 53 minutes ago | |

Wdym, cant you just litter your applications with buzzwords and other bs to automatically get a high score in these systems?

jerrythegerbil 1 hour ago |

> I fail 65% of the time. Same exact resume, different luck.

As someone who’s run hiring pipelines for technical roles in the past few years, that’s actually a fantastic number. I objectively hate saying that, but it’s true.

35% chance of elevating a technical individual to the next stage with no effort? I’ve seen as many as 100+ applicants an hour even when including a domain specific screener question. That’s 35 “screened” applicants in an hour. Were valid candidates screened out? Yes. Does you still have a candidate pool 35x larger than you need? Unfortunately, also yes.

The volume of applicants is SO HIGH such that your chances of getting moved to the next stage are actually markedly worse if AI isn’t involved. If you didn’t apply immediately (using an AI bot) there’s 50+ people ahead of you, and an exhausted technical leader if they ever make it to your resume.

Referral bonuses exist for a reason.

kyralis 1 hour ago | |

Is it? Or is it a 65% chance of a resume getting ignored before a single human sees it, reducing your pipeline's likelihood of catching qualified candidates by the same?

Gates that reduce resume flow-through are only useful if their reduction is correlated with quality. Otherwise they're just dragging out your hiring process or unnecessarily causing you to ultimately lower your hiring bars.

jerrythegerbil 1 hour ago | | |

> Gates that reduce resume flow-through are only useful if their reduction is correlated with quality.

The volume is infeasible to review everyone for quality, even at an hour scale. The conclusion and solution is inevitable, though I wish it were different. 35% is actually really good if you’re not coming in through a referral.

The current reality is <1% and the person reviewing you is exhausted.

rkuska 1 hour ago |

This reminds me of my former CTO. He would take bunch of CVs and randomly throw some of them in a bin. He didn’t want to work with “unlucky” people.

hahahaa 1 hour ago | |

The problem is with this system he only worked with unlucky people.

psalaun 1 hour ago | |

I thought this was only an old urban legend; some people actually use this technique? Especially in a trade supposed to be led by people trained in sciences?

steve_j_choi 1 hour ago |

This could be used as a good way to self-evaluate one's current position from the company's point of view. you would tweak prompts and guidelines that are expected from the company and see how you score

hahahaa 1 hour ago | |

I sort of hope we land on 2 agents, one working for the candidate and one for the employee do a screen round. Salary compatiability could be negotiated by a 3rd party bot that knows both parties ranges and what would be needed each end of range, and figure out yes/no worth going ahead. Such a time saver.

neya 1 hour ago |

I wonder how is this even legal? The only useful job the HR departments are ever required to do - they decide to automate it? Aside from being a daycare for adults, what exactly does HR accomplish? It's clearly NOT on the side of employees, but this seems like they're clearly NOT on the side of employers, either.

While resume's are being filtered left and right, they just make TikTok's on company's dime [1]. What a sad state of affairs.

[1] https://www.youtube.com/shorts/wSug80Vg5JU

dc3k 1 hour ago |

Disregarding the fact that this thing is completely broken, its grading rubric is ridiculous to begin with (as was mentioned in the article itself, but I must reiterate how completely stupid this is):

> 35 points for open source contributions

> 30 for personal projects

I don't contribute to open source or have personal projects because I don't spend my free time doing what I do 40 hours a week to make a living. My 15 years of work experience is worth a maximum of 25%, so any company using this idiotic system would pass on me immediately. Open source and personal projects are fine, but in no sane world are they worth 65% of a resume's score.

adrianN 1 hour ago | |

They are selecting for people who are fine working in their free time. If you contribute to open source you are more likely to contribute to the company on weekends. If instead you have other hobbies or a family that takes up non-work hours you are more likely to drop your pen after forty hours.

matheusmoreira 52 minutes ago | | |

Maybe they're selecting for intrinsic motivation. People who enjoy programming to the point they do it for fun, not just because it pays.

Free software work doesn't imply we work for free. We work on our projects, the stuff that we actually enjoy working on. Nobody is going to work on corporate products without adequate compensation.

emj 1 hour ago | | |

You might have numbers on that but after working in a place with a strict no more than 40 hour policy my view is that people overwork for many reasons. Being an open source enthusiast is not one of them.

stevesimmons 53 minutes ago | | |

I'm not sure that follows. I stopped making open source contributions when I switched from mature companies to startups.

Now all my "non-work" time is spent on startup work. And none of that is visible via GitHub.

cyberax 1 hour ago |

Ah... The AI learned the old HR trick: take 50% of resumes and throw them out without looking. Rationale: "we don't need unlucky losers".

quink 1 hour ago |

"A computer can never be held accountable, therefore a computer must never make a management decision."

yieldcrv 55 minutes ago |

this will get patched, as in I'll optimize my resume for this and so will many other people that any edge disintegrates

glouwbug 1 hour ago |

I guess at least HR doesn’t have to read 1,000 resumes. Heck, to be frank, could they make sense of the first 10 resumes?