Competitive Programming with AlphaCode

Competitive Programming with AlphaCode(deepmind.com)

678 points by yigitdemirag 4 years ago | 397 comments

It never ceases to amaze me what you can do with these transformer models. They created millions of potential solutions for each problem, used the provided examples for the problems to filter out 99% of incorrect solutions and then applied some more heuristics and the 10 available submissions to try to find a solution.

All these approaches just seem like brute-force approaches: Let's just throw our transformer on this problem and see if we can get anything useful out of this.

Whatever it is, you can't deny that these unsupervised models learn some semantic representations, but we have no clue at all what that actually is and how these model learn that. But I'm also very sceptical that you can actually get anywhere close to human (expert) capability in any sufficiently complex domain by using this approach.

noduerme 4 years ago | |

>> filter out 99% of incorrect solutions

And next year they can filter out 99.99%. And the year after that, 99.9999%. So literally, an exponentially greater number of monkey/typewriting units. (An AI produced Shakespeare play coming soon).

>> we have no clue at all what that actually is and how these model learn

This is why I'm super cool-to-cold about the AI/deep learning classes being sold to young people who would otherwise be learning fundamental programming skills. It appears to me like trying to teach someone to ride a horse before they understand what skin, bones, muscles, animals, and horses are.

>>get anywhere close to human (expert) capability in any sufficiently complex domain

You can get close enough to scalp a lot of billionaires, but at the end of the day it's always going to be human coders banging our heads against management, where they ask for shit they can't visualize and it's our job to visualize how their employees/customers will use it. Yes it involves domain specific knowledge, but it also requires, er, having eyeballs and fingers, and understanding how a biological organism uses a silicon-based device. That's kind of the ultimate DS knowledge, after all. Now, lots of coders just copy-pasta a front end, but after all the hooplah here I'd be extremely surprised if in ten years an AI has caught up to your basic web mill in Indonesia when it comes to building a decent website.

TOMDM 4 years ago | | |

Surely if your discrimintator gets orders of magnitude better like your describing, we could train the transformer GAN style, and reduce the dependence on generating so many examples to throw away.

parentheses 4 years ago | | |

i like that you drew a connection with monkeys on typewriters.

briga 4 years ago | |

Another way to frame it is that these models still perform very poorly at the task they're designed to do. Imagine if real programmer needed to write a solution a hundred times before they were able to achieve (average) performance. You'd probably wonder if it was just blind luck that got them to the solution. You'd also fire them. What these models are very good at doing is plagiarizing content, so part of me wonders if they aren't just copying previous solutions with slight adjustments.

Vetch 4 years ago | | |

> Imagine if real programmer needed to write a solution a hundred times

To be fair, a lot of creative work requires plenty of trial and error. And since no problems are solved from scratch, all things considered, the most immediate contributors to your result and you might have iterated through tens of dozens of possibilities.

My advantage as a human is I can often tell you why I am eliminating this branch of the search space. The catch is my reasoning can be flawed. But we do ok.

> just copying previous solutions with slight adjustments.

It's not just doing that, Copilot can do a workable job providing suggestions for an invented DSL. A better analogy than autocomplete is inpainting missing or corrupted details based on a surrounding context. Except instead of a painting we are probabilistically filling in patterns common in solutions to leetcode style problems. Novelty beyond slight adjustments comes in when constraints are insufficient to pin down a problem to a known combination of concepts. The intelligence of the model is then how appropriate its best guesses are.

The limitations to GPT3 codex and AlphaCode seems to be they're relatively weak at selection and that they require problem spaces with enough data to distill a sketch of and how to inpaint well in them. Leetcode style puzzles are constructed to be soluble in a reasonable number of lines, are not open ended and have a trick to them. One can complain that while we're closer to real world utility, we're still restricted to the closed worlds of verbose apis, games and puzzles.

While lots of commenters seem concerned about jobs, I look forward to having the dataset oliphaunt and ship computer from Fire Upon Deep someday soon.

plutonorm 4 years ago | | |

How do you know the inner workings of the mind don't operate in a similar manner? How many different solutions to the problem are constructed within your mind before the correct one 'just arrives'?

faizshah 4 years ago | | |

I was really impressed with a lot of the GPT3 stuff I had seen people showing so I gave it a spin myself. I was surprised by how repetitive it seemed to be, it would write new sentences but it would repeat the same concepts among similar prompts. I wish I saved the examples, it was like when a chat bot gets in a loop but GPT3 varied the sentence structure. I think that if you look closely at transformer models outputs you can expect the same sort of thing. Its like in high school when people would copy homework but use different wording.

I also think generally in ML and DL the overarching progress gets hyped but in the background there are murmurs about the limitations in the research community. Thats how we end up with people in 2012 saying FSD is a couple years away but in 2022 we know we aren't even close yet. We tend to oversell how capable these systems are.

MattRix 4 years ago | | |

They specifically stated that they tested it on 10 challenges that were newer than their training data, so it couldn’t just be plagiarizing content.

bricemo 4 years ago | |

What do you think then is the difference between going from 50th to 99.9th percentile in their other domains? Is there something materially different between ago, protein folding, or coding? (I don’t know the answer, just curious if anyone else does)

YeGoblynQueenne 4 years ago | | |

>> What do you think then is the difference between going from 50th to 99.9th percentile in their other domains? Is there something materially different between ago, protein folding, or coding?

Yes, it's the size of the search space for each problem. The search space for arbitrary programs in a language with Universal Turing Machine expressivity is infinite. Even worse, for any programming problem there are an infinite number of candidate programs that may or may not solve it and that differ in only minute ways from each other.

For Go and protein structure prediction from sequences the search space is finite, although obviously not small. So there is a huge difference in the complexity of the problems right there.

Btw, I note yet again that AlphaCode performs abysmally badly on the formal benchmark included in the arxiv preprint (see Section 5.4, and table 10). That makes sense because AlphaCode is a very dumb generate-and-test, brute-force search approach that doesn't even try to be smart and tries to make up for the lack of intelligence with an awesome amount of computational resources. Most work in program synthesis is also basically a search through the space of programs, but people in the field have come up with sophisticated techniques to avoid having to search an infinite number of programs- and to avoid having to generate millions of program candidates, like DeepMind actually brags about:

At evaluation time, we create a massive amount of C++ and Python programs for each problem, orders of magnitude larger than previous work.

They say that as if generating "orders of magnitude more" progams than previous work is a good thing, but it's not. It means their system is extremely bad at generating correct programs. It is orders of magnitude worse than earlier systems, in fact.

(The arxiv paper linked from the article quantifies this "massive" amount as "millions"; see Section 4.4).

FiberBundle 4 years ago | | |

Well with respect to Go the fundamental difference afaict is that you can apply self-supervised learning, which is an incredibly powerful approach (But note e.g. that even this approach wasn't successful in "solving" Starcraft). Unfortunately it's extremely difficult to frame real-world problems in that setting. I don't know anything about protein-folding and don't know what Deepmind uses to try to solve that problem, so I cannot comment on that.

jahewson 4 years ago | | |

That’s a big question but I’m tempted to answer it with a yes. A protein sequence contains a complete description of the structure of a protein but a coding question contains unknowns and the answers contain subjective variability.

derangedHorse 4 years ago | |

We have a clue as to what it is (these are just functions at the end of the day) but don't know how the model's learned parameters relate to the problem domain. I saw a talk (maybe of Jeff Dean?) a while back that discussed creating models that could explain why certain features weighed more than others. Maybe with more approaches targeted towards understanding, these algorithms could start to seem less and less like a semantically opaque computational exercise, and more in line with how we humans think about things.

mikesabbagh 4 years ago | |

github autopilot scares me every time I write code on my personal pc and get those auto-suggestions. I am happy we dont have it at work yet.

It is clear writing code will soon be something of the past; maybe it is a bad idea to train our children to code. Let's make sure we milk every penny before the party is over!

evouga 4 years ago | | |

Maybe… maybe… tools like Copilot will allow us to work at a higher level of abstraction (like optimizing compilers have allowed us to do).

I say maybe because so far the code that Copilot has generated for me has been impressive for what it is, but riddled with obvious and subtle bugs. It’s like outsourcing my function implementations to a C-student undergraduate intern. I definitely wouldn’t use any of its code without close scrutiny.

AI will make some software engineering tasks more efficient and more accessible but human programmers are not going anywhere any time this side of the Singularity.

doctor_eval 4 years ago |

I sometimes read these and wonder if I need to retrain. At my age, I’ll struggle to get a job at a similar level in a new industry.

And then I remember that the thing I bring to the table is the ability to turn domain knowledge into code.

Being able to do competitive coding challenges is impressive, but a very large segment of software engineering is about eliciting what the squishy humans in management actually want, putting it into code, and discovering as quickly as possible that it’s not what they really wanted after all.

It’s going to take a sufficiently long time for AI to take over management that I don’t think oldies like me need to worry too much.

FemmeAndroid 4 years ago |

This is extremely impressive, but I do think it’s worth noting that these two things were provided:

- a very well defined problem. (One of the things I like about competitive programming and the like is just getting to implement a clearly articulated problem, not something I experience on most days.) - existing test data.

This is definitely a great accomplishment, but I think those two features of competitive programming are notably different than my experience of daily programming. I don’t mean to suggest these will always be limitations of this kind of technology, though.

msoad 4 years ago |

This seems to have a narrower scope than GitHub Copilot. It generates more lines of code to a more holistic problem vs. GitHub Copilot that works as a "more advanced autocomplete" in code editors. Sure Copilot can synthesize full functions and classes but for me, it's the most useful when it suggests another test case's title or writes repetitive code like this.foo = foo; this.bar = bar etc...

Having used Copilot I can assure you that this technology won't replace you as a programmer but it will make your job easier by doing things that programmers don't like to do as much like writing tests and comments.

gfd 4 years ago |

Relevant blogpost on codeforces.com (the competitive programming site used): https://codeforces.com/blog/entry/99566

Apparently the bot would have a rating of 1300. Although the elo rating between sites is not comparable, for some perspective, mark zuckerberg had a rating of ~1k when he was in college on topcoder: https://www.topcoder.com/members/mzuckerberg

ahgamut 4 years ago |

I find almost every new advance in deep learning is accompanied by contrasting comments: it's either "AI will soon automate programming/<insert task here>", or "let me know when AI can actually do <some-difficult-task>". There are many views on this spectrum, but these two are sure to be present in every comment section.

IIUC, AlphaCode was trained on Github code to solve competitive programming challenges on Codeforces, some of which are "difficult for a human to do". Suppose AlphaCode was trained on Github code that contains the entire set of solutions on Codeforces, is it actually doing anything "difficult"? I don't believe it would be difficult for a human to solve problems on Codeforces when given access to the entirety of Github (indexed and efficiently searchable).

The general question I have been trying to understand is this: is the ML model doing something that we can quantify as "difficult to do (given this particular training set)"? I would like to compute a number that measures how difficult it is for a model to do task X given a large training set Y. If the X is part of the training set, the difficulty should be zero. If X is obtained only by combining elements in the training, maybe it is harder to do. My efforts to answer this question: https://arxiv.org/abs/2109.12075

In recent literature, the RETRO Transformer (https://arxiv.org/pdf/2112.04426.pdf) talks about "quantifying dataset leakage", which is related to what I mentioned in the above paragraph. If many training samples are also in the test set, what is the model actually learning?

Until deep learning methods provide a measurement of "difficulty", it will be difficult to gauge the prowess of any new model that appears on the scene.

37ef_ced3 4 years ago |

The example problem (essentially, is T a subsequence of S with deletions of size N) is a classic problem with no doubt dozens of implementations in AlphaCode's training set.

And yet, what a garbage solution it produces.

To illustrate the difference between intelligence and regurgitation, someone tell me what CoPilot generates for this:

  // A Go function to swap the sixth bit and seventeenth bit of a 32-bit signed integer.

Here is a human solution:

  func swap(x int32) int32 {
      const mask = 1 << 5
      var (
          xor1 = (x>>11 ^ x) & mask
          xor2 = xor1 << 11
      )
      return x ^ xor1 ^ xor2
  }

CoPilot cannot reason numerically like this (understand "seventeenth bit" and "sixth bit" and generate the right code for that combination). It needs to understand the size of the gap between the bits, i.e., 11, and that's too hard.

jakey_bakey 4 years ago |

At the risk of sounding relentlessly skeptical - surely by training the code on GitHub data you're not actually creating an AI to solve problems, but creating an extremely obfuscated database of coding puzzle solutions?

ogogmad 4 years ago | |

We validated our performance using competitions hosted on Codeforces, a popular platform which hosts regular competitions that attract tens of thousands of participants from around the world who come to test their coding skills. We selected for evaluation 10 recent contests, each newer than our training data. AlphaCode placed at about the level of the median competitor, marking the first time an AI code generation system has reached a competitive level of performance in programming competitions.

[edit] Is "10 recent contests" a large enough sample size to prove whatever point is being made?

YeGoblynQueenne 4 years ago | | |

The test against human contestants doesn't tell us anything because we have no objective measure of the ability of those human coders (they're just the median in some unknown distribution of skill).

There's more objective measures of performance, like a good, old-fashioned, benchmark dataset. For such an evaluation, see table 10 in the arxiv preprint (page 21 of the pdf), listing the results against the APPS dataset of programming tasks. The best performing variant of AlphaCode solves 25% of the simplest ("introductory") APPS tasks and less than 10% of the intermediary ("interview") and more advanced ones ("competition").

So it's not very good.

Note also that the article above doesn't report the results on APPS. Because they're not that good.

solididiot 4 years ago | |

Does it need to solve original problems? Most of the code we write is dealing with the same problems in a slightly different context each time.

As others say in commends it might be the case where we meet in the middle. Us writing some form of tests for AI-produced code to pass.

qualudeheart 4 years ago | |

That’s been a common objection to Copilot and other recent program synthesis papers.

The models regurgitate solutions to problems already encountered in the training set. This is very common with Leetcode problems and seems To still happen with harder competitive programming problems.

I think someone else in this thread even pointed put an example of AlphaCode doing the same thing.

hmate9 4 years ago |

Between this and OpenAI's Github Copilot "programming" will slowly start dying probably. What I mean by that is that sure, you have to learn how to program, but our time will be spent much more on just the design part and writing detailed documentation/specs and then we just have one of these AIs generate the code.

It's the next step. Binary code < assembly < C < Python < AlphaCode

Historically its always been about abstracting and writing less code to do more.

mirrorlake 4 years ago |

I've been wondering this for a while:

In the future, code-writing AI could be tasked with generating the most reliable and/or optimized code to pass your unit tests. Human programmers will decide what we want the software to do, make sure that we find all the edge cases and define as many unit tests as possible, and let the AI write significant portions of the product. Not only that, but you could include benchmarks that pit AI against itself to improve runtime or memory performance. Programmers can spend more time thinking about what they want the final product to do, rather than getting mired in mundane details, and be guaranteed that portions of software will perform extremely well.

Is this a naive fantasy on my part, or actually possible?

qayxc 4 years ago | |

> Is this a naive fantasy on my part, or actually possible?

Possible, yes, desirable, no.

The issue I have with all these end-to-end models is that they're a massive regression. Practitioners fought tooth and nails to get programmers to acknowledge correctness and security aspects.

Mathematicians and computer scientists developed theorem solvers to tackle the correctness part. Practitioners proposed methodologies like BDD and "Clean Code" to help with stability and reliability (in terms of actually matching requirements now and in the future).

AI systems throw all this out of the window by just throwing a black box onto the wall and scraping up whatever sticks. Unit tests will never be proof for correctness - they can only show the presence of errors, not their absence.

You'd only shift the burden from implementation (i.e. the program) to the tests. What you actually want is a theorem prover that proofs the functional correctness in conjunction with integration tests that demonstrate the runtime behaviour if need be (i.e. profiling) and references that link implementation to requirements.

The danger lies in the fact that we already have a hard time getting security issues and bugs under control with software that we should be able to understand (i.e. fellow humans wrote and designed it). Imagine trying to locate and fix a bug in software that was synthesised by some elaborate black box that emitted inscrutable code in absence of any documentation and without references to requirements.

algon33 4 years ago |

How suprising did you guys find this? I'd have said there was a 20% chance of this performing at the median+level if I was asked to predict things beforehand.

agentultra 4 years ago |

This is kind of neat. I wonder if it will one day be possible for it to find programs that maintain invariant properties we state in proofs. This would allow us to feel confident that even though it's generating huge programs that do weird things a human might not think of... well that it's still correct for the stated properties we care about, ie: that it's not doing anything underhanded.

qualudeheart 4 years ago |

Calling it now: If current language models can solve competitive programming at an average human level, we’re only a decade or less off from competitive programming being as solved as Go or Chess.

Deepmind or openAI will do it. If not them, it will be a Chinese research group on par with them.

I’ll be considering a new career. It will still be in computer science but it won’t be writing a lot of code. There’ll be several new career paths made possible by this technology as greater worker productivity makes possible greater specialization.

d0mine 4 years ago |

It reminds me that median reputation on StackOverflow is 1. All AlphaSO would have to do is to register to receive median reputation on SO ;) (kidding aside AlphaCode sounds like magic)

Inventing relational DBs hasn't replaced programmers, we just write custom DB engines less often. Inventing electronic spreadsheets hasn't deprecated programmers, it just means that we don't need programmers for corresponding tasks (where spreadsheets work well).

AI won't replace programmers until it grows to replace the humanity as a whole.

falcor84 4 years ago | |

>AI won't replace programmers until it grows to replace the humanity as a whole.

Yes, but after seeing this progress in the former, my time estimate of time remaining until the latter had just significantly shortened.

d0mine 4 years ago | | |

Given close to zero chances of a safe AI, I'm optimistic that AI is a much tougher problem and we are not significantly closer to the solution than e.g., in 60s when computer vision was a summer project.

There is a progress in certain domains (such as image recognition) but (outside specialized tasks) gigantic language models look like no more than impressive BS generators.

qualudeheart 4 years ago | |

I don’t even think the “will AI replace human programmers” question is that interesting anymore. My prediction is that a full replacement won’t happen until we achieve general artificial intelligence, and have it treat programming as it would any other problem.

Elsewhere ITT I’ve claimed that to fully automate programming you also need a model of the external world that’s on par with a humans.

Otherwise you can’t work a job because you don’t know how to do the many other tasks that aren’t coding.

You need to understand what the business goals are and how your program solves them.

londons_explore 4 years ago |

> AlphaCode placed at about the level of the median competitor,

In many programming contests, a large number of people can't solve the problem at all, and drop out without submitting anything. Frequently that means the median scoring solution is a blank file.

Therefore, without further information, this statement shouldn't be taken to be as impressive as it sounds.

aidenn0 4 years ago |

> Creating solutions to unforeseen problems is second nature in human intelligence

If this is true then a lot of the people I know lack human intelligence...

blt 4 years ago |

I am always surprised by the amount of skepticism towards deep learning on HN. When I joined the field around 10 years ago, image classification was considered a grand challenge problem (e.g. https://xkcd.com/1425/). 5 years ago, only singularity enthusiast types were envisioning things like GPT-3 and Copilot in the short term.

I think many people are uncomfortable with the idea that their own "intelligent" behavior is not that different from pattern recognition.

I do not enjoy running deep learning experiments. Doing resource-hungry empirical work is not why I got into CS. But I still believe it is very powerful.

mwattsun 4 years ago |

Seems to me that this accelerates the trend towards a more declarative style of programming where you tell the computer what you want to do, not how to do it

BoardsOfCanada 4 years ago |

Do I understand it correctly that it generated (in the end) ten solutions that then were examined by humans and one picked? Still absolutely amazing though.

thomasahle 4 years ago | |

No human examination was done.

But it generated 10 solutions which it ran against the example inputs, and picked the one that passed.

Actually I'm not sure if it ran the solutions against the example inputs or the real inputs.

aliceryhl 4 years ago | | |

They used the real inputs. The example inputs were used to filter out which candidates to submit for the 10 tries.

aliceryhl 4 years ago | |

No, they gave the algorithm 10 tries and tested all of them, and said that it was solved if any one of them worked.

erwincoumans 4 years ago |

It would be interesting if a future 'AlphaZeroCode' with access to a compiler and debugger can learn to code, generating data using self-play. Haven't read the paper yet, seems some impressive milestone.

mrsuprawsm 4 years ago |

Does this mean that we can all stop grinding leetcode now?

rabbits77 4 years ago |

What I always find missing from these Deep Learning showcase examples are an honest comparison to existing work. It isn’t like computers haven’t been able to generate code before.

Maybe the novelty here is working from the English language specification, but I am dubious just how useful that really is. Specifications are themselves hard to write well too.

And what if the “specification” was some Lisp code testing a certain goal, is this any better then existing Genetic Programming?

Maybe it is better but in my mind it is kind of suspicious that no comparison is made.

I love Deep Learning but nobody does the field any favors by over promising and exaggerating results.

EGreg 4 years ago |

To me, coding in imperative languages are one of the hardest things to produce an AI for with current approaches (CNN’s, MCTS and various backpropagation). Something like Cyc would seem to be a lot more promising…

And yet, I am starting to see (with GitHub’s Copilot, and now this) a sort of “GPT-4 for code”. I do see many problems with this, including:

1. It doesn’t actually “invent” solutions on its own like AlphaZero, it just uses and remixes from a huge body of work that humans put together,

2. It isn’t really ever sure if it solved the problem, unless it can run against a well-defined test suite, because it could have subtle problems in both the test suite and the solution if it generated both

This is a bit like readyplayer.me trying to find the closest combination of noses and lips to match a photo (do you know any open source alternatives to that site btw?)

But this isn’t really “solving” anything in an imperative language.

Then again, perhaps human logic is just an approaching with operations using low-dimensional vectors, able to capture simple “explainable” models while the AI classifiers and adversarial training produces far bigger vectors that help model the “messiness” of the real world and also find simpler patterns as a side effect.

In this case, maybe our goal shouldn’t be to get solutions in the form of imperative language or logic, but rather unleash the computer on “fuzzy” inputs and outputs where things are “mostly correct 99.999% of the time”. The only areas where this could fail is when some intelligent adversarial network exploits weaknesses in that 0.001% and makes it more common. But for natural phenomena it should be good enough !

qualudeheart 4 years ago | |

Can you write more about how Cyc would help? The idea behind Cyc is cool but I don’t think I’ve seen anyone discuss using it for program synthesis.

timetotea 4 years ago |

If you want some video explanation https://youtu.be/Qr_PCqxznB0

errcorrectcode 4 years ago |

And this is how we reach the technological singularity and how programmers become as equivalently out-of-demand as piano tuners: self-programming systems.

AI will eat any and all knowledge work because there's very little special a human can do that a machine won't be able to do eventually, and much faster and better. It won't be tomorrow, but the sands are inevitably shifting this way.

prideout 4 years ago |

It is obvious to me that computer programming is an interesting AI goal, but at the same time I wonder if I'm biased, because I'm a programmer. The authors of AlphaCode might be biased in this same way.

I guess this makes sense though, from a practical point of view. Verifying correctness would be difficult in other intellectual disciplines like physics and higher mathematics.

thomasahle 4 years ago | |

Just make it output a proof together with the program.

qayxc 4 years ago | | |

That won't work because the systems aren't trained on proofs and proper theorem provers don't work that way either.

udev 4 years ago |

I am thinking whether this result can create a type of loop that can self-optimize.

We have AI to generate reasonable code from text problem description.

Now what if the problem description text is to generate such a system in the first place?

Would it be possible to close the loop, so to speak, so that over many iterations:

- text description is improved

- output code is improved

Would it be possible to create something that converges to something better?

machiaweliczny 4 years ago | |

I am actually trying this. Basically by asking questions to AI and teaching it to generate code / google when it doesn't know something. The other process checks if code is valid and either ask it to get more context or executes code and feeds back to file :)

machiaweliczny 4 years ago | | |

I think one can make problem "differentiable" via some heuristics and if you have NN trained to rate code quality and some understanding what should be used for type of problem, memory and speed and than can classify problem to group then rate solution it should be able to guide the process (in competitive programming).

indiv0 4 years ago | | |

Do you have a blog or a github or something? This sounds really neat.

knowmad 4 years ago |

I agree with most of the comments I've read in this thread. Writing code to solve a well defined narrowly scoped problem isn't that hard or valuable. It's determining what the problem actually is and how software could be used to solve it that is challenging and valuable.

I would really like to see more effort in the AI/ML code generation space being put into things like code review, and system observation. It seems significantly more useful to use these tools to augment human software engineers rather than trying to tackle the daunting and improbable task of completely replacing them.

*Note: as a human software engineer I am biased

thomasahle 4 years ago |

Next they can train it on kaggle, and we'll start getting closer to the singularity

tasubotadas 4 years ago |

I just hope that this shows how useless competitive programming is that it can be replace by the Transformer-model.

Additionally, people should REALLY rething their coding interviews if they can be solved by a program.

derelicto 4 years ago |

Hey, honest question: how does one get into competitive programming? I imagine it goes far beyond just leetcoding but honestly i don't even know where to start.

throwaway5752 4 years ago |

Most people here are programmers (or otherwise involved in the production of software). We shouldn't look at RPA and other job automation trends dispassionately. SaaS valuations aren't were they are (and accounting doesn't treat engineering salary as cost of goods sold) because investors believe that they will require armies of very well paid developers in perpetuity.

countvonbalzac 4 years ago | |

what?

a-dub 4 years ago |

> In our preprint, we detail AlphaCode, which uses transformer-based language models to generate code at an unprecedented scale, and then smartly filters to a small set of promising programs

if you're using a large corpus of code chunks from working programs as symbols in your alphabet, i wonder how much entropy there actually is in the space of syntactically correct solution candidates.

deepbream 4 years ago |

This result is well worth a meme.

https://opensea.io/assets/0x495f947276749ce646f68ac8c2484200...

nsikorr 4 years ago |

I suspect these code generating AIs will bring the singularity at some point in the future. Even if we don’t manage to create an artificial general intelligence, they will. I imagine they will learn to code on super human levels through self play just like AlphaGo and AlphaZero did. This will be awesome.

xibalba 4 years ago |

Between developments like this (and Copilot [Is there a general accepted word for this class of things e.g. "AI Coders"?) and the move toward fully remote, I predict the mean software engineering salary in the United States will be lower in 10 years (in real dollars) than it is today.

evouga 4 years ago | |

I think this is a safe bet, but I would make it with or without the presence of AI Coders. We're clearly in the middle of Tech Bubble 2.0 and it's sure to pop in the next 10 years (and probably much sooner, given the recent crypto and NASDAQ rumblings).

hnfong 4 years ago | | |

People have been talking about tech bubbles for years, there might be a small financial bubble due to the money printing in recent years but I'm not seeing a big bust coming like dot-com. Tech compensation is probably more influenced by the discrepancy between locations. Once people figure out how to properly handle remote workers and remote teams (which is happening due to Covid), global compensations level will probably level out.

dantodor 4 years ago |

Great. Now the only thing remaining is POs being able to come with a clear spec and I'm out of job

thorwwaskeas 4 years ago |

Since they used the tests this is not something you can do if you don't have a rich battery of tests.

Perhaps many problems are something like finite automata and the program discover the structure of the finite automata and also an algorithm for better performance.

YeGoblynQueenne 4 years ago |

>> AlphaCode ranked within the top 54% in real-world programming competitions, an advancement that demonstrates the potential of deep learning models for tasks that require critical thinking.

Critical thinking? Oh, wow. That sounds amazing!

Let's read further on...

>> At evaluation time, we create a massive amount of C++ and Python programs for each problem, orders of magnitude larger than previous work. Then we filter, cluster, and rerank those solutions to a small set of 10 candidate programs that we submit for external assessment.

Ah. That doesn't sound like "critical thinking", or any thinking. It sounds like massive brute-force guessing.

A quick look at the arxiv preprint linked from the article reveals that the "massive" amount of prorgams generated is in the millions (see Section 4.4). These are "filtered" by testing them against program input-output (I/O) examples given in the problem descriptions. This "filtering" still leaves a few thousands of candidate programs that are further reduced by clustering to "only" 10 (which are finally submitted).

So it's a generate-and-test approach rather than anything to do with reasoning (as claimed elsewhere in the article) let alone "thinking". But why do such massive numbers of programs need to be generated? And why are there still thousands of candidate programs left after "filtering" on I/O examples?

The reason is that the generation step is constrained by the natural-language problem descriptions, but those are not enough to generate appropriate solutions because the generating language model doesn't understand what the problem descriptions mean; so the system must generate millions of solutions hoping to "get lucky". Most of those don't pass the I/O tests so they must be discarded. But there are only very few I/O tests for each problem so there are many programs that can pass them, and still not satisfy the problem spec. In the end, clustering is needed to reduce the overwhelming number of pretty much randomly generated programs to a small number. This is a method of generating programs that's not much more precise than drawing numbers at random from a hat.

Inevitably, the results don't seem to be particularly accurate, hence the evaluation against programs written by participants in coding competitions, which is not any objective measure of program correctness. Table 10 on the arxiv preprint lists results on a more formal benchmar, the APPS dataset, where it's clear that the results are extremely poor (the best performing AlphaCode variant solves 20% of the "introductory" level problems, though outperforming earlier approaches).

Overall, pretty underwhelming and a bit surpirsing to see such lackluster results from DeepMind.

mcast 4 years ago |

The year is 2025, Google et al. are now conducting technical on-site interviews purely with AI tools and no human bias behind the camera (aside from GPT-3's quirky emotions). The interview starts with a LC hard, you're given 20 minutes -- good luck!

jakey_bakey 4 years ago | |

I think Amazon already tried this and it had surprisingly racist results

softwaredoug 4 years ago |

I think CoPilot, etc will be revolutionary tools AND I think human coders are needed. Specifically I love CoPilot for the task of "well specified algorithm to solve problem with well-defined inputs and outputs". The kind of problem you could describe as a coding challenge.

BUT, our jobs have a lot more complexity

- Local constraints - We almost always work in a large, complex existing code base with specific constraints

- Correctness is hard - writing lots of code is usually not the hard part, it's proving it correct against amorphous requirements, communicated in a variety of human social contexts, and bookmarked.

- Precision is extremely important - Even if 99% of the time, CoPilot can spit out a correct solution, the 1% of the time it doesn't creates a bevy of problems

Are those insurmountable problems? We'll see I suppose, but we begin to verge on general AI if we can gather and understand half a dozen modalities of social context to build a correct solution.

Not to mention much of the skill needed in our jobs has much more to do with soft skills, and the bridge between the technical and the non technical, and less to do with hardcore heads-down coding.

Exciting times!

jdrc 4 years ago |

I think it would be interesting the train a system end-to-end with assembly code instead of various programming languages. This would make it a much more generic compiler

wilde 4 years ago |

Oh sweet! When can skip the bullshit puzzle phone screens?

errcorrectcode 4 years ago | |

Ali Group CAPTCHA's or Android unlock?

alasdair_ 4 years ago |

The interesting stuff happens once AlphaCode gets used to improve the code of AlphaCode.

jdrc 4 years ago |

"And so in 2022 the species programmus programmicus went extinct"

NicoJuicy 4 years ago |

I would stop programming if all we needed to write was unit tests :p

FartyMcFarter 4 years ago | |

To compensate, lots of people would start programming if that happened though. Many scientists would be interested in solving their field's problems so easily - certainly maths would benefit from it.

rmujica 4 years ago | | |

wasn't it this the motivation for Prolog?

pedrobtz 4 years ago |

What about finding bugs, zero-day exploits?

zmmmmm 4 years ago |

Has nobody yet asked it to write itself?

pretendscholar 4 years ago |

I am a little bitter that it is trained on stuff that I gave away for free and will be used by a billion dollar company to make more money. I contributed the majority of that code before it was even owned by Microsoft.

Permit 4 years ago | |

Can you elaborate and give some history? What code did you contribute, and how did it end up being used by Microsoft and then DeepMind?

arendtio 4 years ago | | |

> We pre-train our model on selected public GitHub code and fine-tune it on our relatively small competitive programming dataset.

But since the code was 'selected' you don't know if your code was used. However, they seem to have used Python and C++, so my code is probably not part of it.

visarga 4 years ago | |

Paying it forward, it will help others in turn.

pretendscholar 4 years ago | | |

Yes it will help the already powerful players disproportionately.

ensan 4 years ago |

Wake me up when an AI creates an operating system on the same level of functionality as early-years Linux.

errcorrectcode 4 years ago | |

That will happen faster than you can conceive because you won't be aware of the progress until it is announced.

And, have you tried polling? I hear it keeps the CPU warm in winter. Interrupts are so ... this just in, Nike's stock jump 3% ... Where was I? Did I save my task context properly? Did I reenable interrupts?

jonas_kgomo 4 years ago |

Genuine question, what are the reasons to be a software engineer without much ML knowledge in 2022. Seems like a wake up call for developers

jonas_kgomo 4 years ago | |

7 months ago, I asked natfriedman the same question, of which he responded: "We think that software development is entering its third wave of productivity change. The first was the creation of tools like compilers, debuggers, garbage collectors, and languages that made developers more productive. The second was open source where a global community of developers came together to build on each other's work. The third revolution will be the use of AI in coding. The problems we spend our days solving may change. But there will always be problems for humans to solve."

https://news.ycombinator.com/item?id=27676266&p=2

eulers_secret 4 years ago | |

> what are the reasons to be a software engineer without much ML knowledge in 2022.

I'm not quite sure what you're asking, but my reason is that I do not enjoy working on/with ML. I'd personally rather quit the industry.

But I work in embedded/driver development. I do not worry about ML models replacing me yet, but if I were just gluing together API calls I would be a bit worried and try to specialize.

slingnow 4 years ago | |

Genuine question: what are the reasons to be a carpenter without much robotics / automation knowledge in 2022. Seems like a wakeup call for carpenters.

qualudeheart 4 years ago | |

Find something that’s hard and interesting. Someone will probably have a business trying to solve it and will hire you.

0xdeadbeefbabe 4 years ago | |

I hope you are right, but just to answer the question: all those other AI winters.

jonas_kgomo 4 years ago | | |

Thats a good meditation. I think the winters were more driven by research dichotomy, for example Marvin Minsky's critique of the perceptron really slowed the research by 10 years. Advances made thus far have too much commercial relevance that companies invested dont look like they are gonna stop soon. But its a valid point. Looks like there is more upside being in subsets of computing like quantum computing, web3, metaverse etc than being a regular front-end engineer

for _ in range(int(input())): a = list(input()) b = list(input()) while a and b: if a[-1] == b[-1]: a.pop() b.pop() else: a.pop() if a: a.pop() print("NO" if b else "YES")

from collections import defaultdict def backspace(s1,s2): h = defaultdict(lambda:0) for x in s1: h[x] = h[x] + 1 for x in s2: h[x] = h[x] - 1 j = 0 maxj = len(s2) - 1 for x in s1: if x != s2[j]: h[x] -= 1 elif j < maxj: j += 1 else: break return j == maxj and all(y >= 0 for y in h.values()) def random_backspace(s1): res = [] for x in s1: if randint(0,1) == 0: res.append(x) return "".join(res) def backspaceTest(s1): return all(backspace(s1,random_backspace(s1)) for _ in range(100))

# A function to swap the sixth bit and seventeenth bit of a 32-bit signed integer. def swap_bits(x): # Get the value of the sixth bit. bit6 = x & (1 << 5) # Get the value of the seventeenth bit. bit17 = x & (1 << 16) # Swap the values of the sixth and seventeenth bit. bit6 = bit6 >> 5 bit17 = bit17 >> 16 # Combine the swapped values of the sixth and seventeenth bit. x = x ^ (bit6 << 16) x = x ^ (bit17 << 5) return x

def swap_six_seventeen(x): # Get the binary representation of the integer. binary = bin(x)[2:] # Add zeros to the beginning of the binary representation. binary = '0' * (32 - len(binary)) + binary # Swap the sixth and seventeenth bit. binary = binary[:5] + binary[17] + binary[5:17] + binary[18:] # Convert the binary back to an integer. return int(binary, 2)

bin(swap_bits(0b_1_0000000000_0_00000)) '0b10000000000100000' bin(swap_bits(0b_0_0000000000_1_00000)) '0b10000000000100000' bin(swap_bits(0b_1_0000000000_1_00000)) '0b0' bin(swap_bits(0b_0_0000000000_0_00000)) '0b0'

# https://stackoverflow.com/a/20918545/1763356 def reverse_mask(x): x = ((x & 0x55555555) << 1) | ((x & 0xAAAAAAAA) >> 1) x = ((x & 0x33333333) << 2) | ((x & 0xCCCCCCCC) >> 2) x = ((x & 0x0F0F0F0F) << 4) | ((x & 0xF0F0F0F0) >> 4) x = ((x & 0x00FF00FF) << 8) | ((x & 0xFF00FF00) >> 8) x = ((x & 0x0000FFFF) << 16) | ((x & 0xFFFF0000) >> 16) return x # My ver def reverse_format(x): return int(f"{x:032b}"[::-1], 2)

package main import ( "fmt" "math" ) func main() { var a int32 = 0b1010101010101010101010101010101010101010101010101010101010101010 fmt.Printf("%b\n", a) fmt.Printf("%b\n", swapBits(a, 6, 17)) } func swapBits(a int32, i int, j int) int32 { // convert to binary bin := fmt.Sprintf("%b", a) // get the bits bit1 := bin[i-1 : i] bit2 := bin[j-1 : j] // swap the bits bin = bin[:i-1] + bit2 + bin[i:] bin = bin[:j-1] + bit1 + bin[j:] // convert back to int return int32(bin2int(bin)) } func bin2int(bin string) int64 { var sum int64 for i, v := range bin { if v == '1' { sum += int64(math.Pow(2, float64(len(bin)-i-1))) } } return sum }

unsigned int swapbits(unsigned int a) { bool bit6 = a & (1 << 5); bool bit17 = a & (1 << 16); if (bit6 == bit17) return a; //bits are the same, do nothing return (a ^ (1 << 5) ^ (1 << 16)); // flip both 6th and 17th bits }

swap: # @swap mov ecx, edi shr ecx, 11 and ecx, 32 mov eax, edi and eax, -65569 or eax, ecx and edi, 32 shl edi, 11 or eax, edi ret swap: mov eax, edi mov edx, edi and edi, -65569 sal eax, 11 shr edx, 11 and eax, 65536 and edx, 32 or eax, edx or eax, edi ret /* only works on little-endian! */ typedef union { struct { unsigned bit1: 1; unsigned bit2: 1; unsigned bit3: 1; unsigned bit4: 1; unsigned bit5: 1; unsigned bit6: 1; unsigned bit7: 1; unsigned bit8: 1; unsigned bit9: 1; unsigned bit10: 1; unsigned bit11: 1; unsigned bit12: 1; unsigned bit13: 1; unsigned bit14: 1; unsigned bit15: 1; unsigned bit16: 1; unsigned bit17: 1; unsigned bit18: 1; unsigned bit19: 1; unsigned bit20: 1; unsigned bit21: 1; unsigned bit22: 1; unsigned bit23: 1; unsigned bit24: 1; unsigned bit25: 1; unsigned bit26: 1; unsigned bit27: 1; unsigned bit28: 1; unsigned bit29: 1; unsigned bit30: 1; unsigned bit31: 1; unsigned bit32: 1; }; unsigned int n; } mybits; unsigned int swap(unsigned int n) { mybits foo; foo.n = n; unsigned tmp = foo.bit6; foo.bit6 = foo.bit17; foo.bit17 = tmp; return foo.n; }

Filtered From (k) Attempts (k) Introductory Interview Competition n@k n@k n@k GPT-Neo 2.7B N/A 1 3.90% 0.57% 0.00% GPT-Neo 2.7B N/A 5 5.50% 0.80% 0.00% Codex 12B N/A 1 4.14% 0.14% 0.02% Codex 12B N/A 5 9.65% 0.51% 0.09% Codex 12B N/A 1000 25.02% 3.70% 3.23% Codex 12B 1000 1 22.78% 2.64% 3.04% Codex 12B 1000 5 24.52% 3.23% 3.08% AlphaCode 1B N/A 1000 17.67% 5.24% 7.06% AlphaCode 1B 1000 5 14.36% 5.63% 4.58% AlphaCode 1B 10000 5 18.18% 8.21% 6.65% AlphaCode 1B 50000 5 20.36% 9.66% 7.75%