I Spent 24 Hours with GitHub Copilot Workspaces

I Spent 24 Hours with GitHub Copilot Workspaces(every.to)

136 points by dshipper 2 years ago | 72 comments

extr 2 years ago |

I've noticed the same issue with AI coding, where you start to write requirements and then realize that you yourself don't have a perfect idea of what exactly this feature should be, or how it should work. It's easy to say the answer should be to simply think harder, or enter a dialogue with the AI about missing details, but if you try that you'll find yourself supplying an enormous amount of context you didn't expect to have to communicate. Context not even directly related to the code at hand, but about the broader business or industry, past lessons learned, something the CEO said to you last week about the feature, etc.

It's this kind of thing that makes me think tackling big feature requests is still an AGI-complete problem. Perhaps if it gets good enough at pure coding you can iterate your way to success.

ossobuco 2 years ago | |

> but if you try that you'll find yourself supplying an enormous amount of context you didn't expect to have to communicate. Context not even directly related to the code at hand, but about the broader business or industry, past lessons learned, something the CEO said to you last week about the feature, etc.

Basically you go from programmer to product manager, except you also get to micromanage a non-sentient programmer

lordswork 2 years ago | | |

What prevents an AI agent from becoming the product manager as well, and communicating with you (the customer) to clarify requirements?

duxup 2 years ago | |

I don't know if we're talking about exactly the same thing but this is my side story:

Even small requests to AI I find myself accidentally including some words or phrases that seem to indicate to AI "Oh he wants this as a function that does all the things very manually".

So I get some fairly capable, but very verbose and often inflexible code.

Yet, that's not what I was asking for, but something in the context set the AI off in that direction. In reality I'm not sure what I want and I'm open to anything.

Often I suddenly realize "Wait, there's gotta be some built in things in this language that does this or part of this..." and often there is that is far more reliable and a better way to do it. Somehow AI skipped that and gave me a different answer.

It strikes me as similar to customers who come to me with "I want an email that's sent on Tuesdays that are single digit calendar dates and this field contains the letter Q in them and ..." But when I ask them what they're trying to accomplish I find all that specificity isn't needed, and they really mean they order all their grapes on Tuesdays at the begging of the month and they just want a list of their grapes orders every few weeks.

extr 2 years ago | | |

Yeah this is a similar phenomenon. AI is not so good at recognizing that you're looking for the "general" solution to the problem, one that will holistically fit in with the rest of the codebase/objective, and what has been provided as an example is really just a special case.

I think part of the problem is that instruction fine tuning is not done on full codebases, just shorter problems that fit into reasonable (8K, 32K) context windows. By nature these problems are more specific, so they are biased in that direction from the start.

steve1977 2 years ago | | |

And that is why the I in AI is still misleading.

sdesol 2 years ago | |

> then realize that you yourself don't have a perfect idea of what exactly this feature should be

I talked about it the last time that Copilot Workspaces reached the front page two days ago and that was, I don't think the value is in the code generation, but rather in the ability to capture our thought process. CW is currently a bottleneck in my opinion and I think the code generation will have to get pretty good before we can see the value in writing everything down vs just coding as we have always done.

HanClinto 2 years ago | | |

Agreed.

The most compelling part of the demo showcased in this post is the way that the tool built the bulleted list of success criteria -- that's so often a tedious and overlooked part of writing user stories, but its importance shouldn't be understated -- the fact that it bakes that step into the workflow feels like the most valuable piece of the puzzle here.

layer8 2 years ago | |

The same is true on the code level, which can be viewed as a more detailed specification.

Part of the fun of software development is exploring the solution space by implementing, and gaining a deeper understanding in the process, as well as coming up with the corresponding design decisions.

It seems that with current AI, in order to steer it and evaluate its output, you would have to build that deeper understanding up front without doing the work, which seems difficult.

mamcx 2 years ago | |

At the moment you have clear which are the requirements you have already solved the program.

Programming is the task of finding the real requirements!

jprete 2 years ago | |

To me this looks similar to rubber-ducking or technical writing. All three involve mentally modeling the perspective of someone who may not share your knowledge or assumptions.

Swizec 2 years ago | |

> It's this kind of thing that makes me think tackling big feature requests is still an AGI-complete problem. Perhaps if it gets good enough at pure coding you can iterate your way to success.

I think you’ve just invented product managers. This used to be part of a software engineer’s job. Back when inputting code into a computer was so labor intensive that you’d write your program then hand it off to another human to translate into machine code.

Then we invented compilers and now programming can take up a whole person’s day so programmers stopped having time to do product management. That became a full-time job supplying 4+ programmers with enough work to stay busy.

If we can replace those 4 programmers with AI, software engineers will once more turn back into product managers.

The best product managers I’ve worked with have some combination of a comp sci and business background. The CS background helps a lot.

And some of the best software engineers I’ve worked with are basically their product manager’s right hand. Partnering smoothly in developing requirements, communicating technical feasibility, and deeply understanding their customers. They could be product managers but choose not to.

willsmith72 2 years ago | |

and that's because software engineering is <50% "writing code"

TDD is a great way to show exactly how much you understand what you're about to build. the make all the decisions about edge cases and various conditions ahead of time, before even getting to the code

chasd00 2 years ago | |

it sounds like pseudocode sort of, like analyzing the requirements and needs to a point all that's left is typing it out in whatever programming language you're using. I can see an LLM being pretty good at that but then that's just a higher level version of a compiler going from a programming language to what the machine understands. You start with very well structured human language, the llm turns that into something the compiler understands, and then that is turned into something the machine understands.

It sounds like using an LLM to write code requires careful preparation and wording ahead of time that it's basically like writing in a very high level programming language itself.

extr 2 years ago | | |

Yeah, this is my experience as well. Once I've fully fleshed out the requirements to the point that there is zero ambiguity in what I want, I've basically written a pseudocode implementation already and the AI is just saving me some typing.

sottol 2 years ago |

The main thing that makes me skeptical is still what happens to a code base when you do this longer-term. And not just the code base but also the company when nobody understands the code any longer, but maybe neither are problems.

A couple questions:

* Will the codebase turn into a mess over time by having the AI apply changes over changes over changes? Do we even care? Or do we want a human to still be able to follow what is going on?

* Will you just be able ask the AI to refactor it all and clean it up? Then it wouldn't be a problem I presume.

* Are product-based tech companies/startups still defensible if anyone can basically recreate the product with some English?

* I don't know Codepilot Workspaces - are the prompts that generate and change the code kept somewhere? Imo they're part of the codebase now.

siliconc0w 2 years ago |

This debunking video(https://www.youtube.com/watch?v=tNmgmwEtoWE) of Devin really questioned the usefulness for me. It created a file in the repo and spent a lot of time debugging its own unnecessary code rather than reading the Read Me to understand that the code it needed to use already existed and just needed to be run with different inputs.

It's not clear it we're even near a point where it can independently and meaningfully contribute to an existing codebase rather these greenfield demos. Feels similar to the self-driving AI hype where level 5 is still pretty far from realized (Waymo is closest but AIUI still uses a lot of remote human intervention).

whamlastxmas 2 years ago | |

Waymo is definitely not the closest

HanClinto 2 years ago |

When reading posts such as these, it occurs to me that AI is increasing the rate / lowering the bar for developers to make the jump to leader / architect.

Look at the lessons that the author has learned here:

* More specificity == better

* The importance of clear bulleted delivery items / criteria-for-success

* Unspecified details around a general goal is a ripe area for disappointment

All of these are things that a product owner / team leader learns in their first few projects (and so often must re-learn as the years go by).

AI is lowering barriers and promoting more developers to this role earlier. But everything that we learned about good Agile development in the past will still apply to the future.

ec109685 2 years ago | |

This isn’t architect level planning. Even a junior developer should be able to work from vague requirements and build a mental like that.

morbicer 2 years ago | | |

Exactly. Companies relying on architects making all decisions and providing detailed specs are doomed. Architects in general often sucks. Empower any rank to create designs and make technological decisions. They will grow. If you can't trust them you have a problem.

frereit 2 years ago |

I'm honestly surprised at the relatively positive reception to this. While there isn't any problem with the code shown, the same effect couldn've probably been achieved with a few well thought out shortcuts in any IDE (delete outerHTML of svg tag, add new tag, add attributes). The only "more complex" output that is shown is the specification that CW produces, which literally contains an error in the first line ("Sp<logo>ral").

Moving on to the complex task, the author simply hand-waves "this isn't good yet but surely it will be". No evidence is given as to _why_ there should be any expectation of LLMs getting there.

And the perceived benefit of discovering that their idea of the more complex task was not thought out enough did not come from the LLM, it came from the author itself. They may as well have spoken to ELIZA or a rubber duck.

What am I missing?

tymscar 2 years ago | |

Youre missing the koolaid. I do wonder if people who cut too much slack to this sort of tech are just doing it because they’re scared of going against the grain. Sort of a vicious cycle.

doug_durham 2 years ago | |

This is a pretty reductive argument. I'm not quite sure what "a few well thought out IDE shortcuts" are. I've never experienced an IDE that allows any kind of sophisticated "shortcut" that will write arbitrary code.

throwaway71271 2 years ago |

Copilot is so strange for me, I use it, but it deeply conflicts with the way I code.

As I type the code I get a feeling if I like it, I also pretend to use it even when its unfinished, kind of like playing a game. Even if I spent a lot of time thinking about what I am going to write, until it exists and I play with the code, I don't know if its good.

Now Copilot writes so much code, even if it exactly what I was going to type, I kind of lost the intuition, and I hate it.

So I just enable it when I do things that I don't consider programming anymore.

I still think it is absolutely amazing tech though, and I know it will get better and better, and at some point it will be hard to not use it, but I really enjoy playing with the code as I write it.

anotherpaulg 2 years ago |

My open source tool aider [0] has long offered an "AI pair programming" workflow that is similar but not identical to Copilot Workspaces.

Aider is more of a collaborative chat, where you work with the LLM interactively asking for a sequence of changes to your git repo. The changes can be non-trivial, modifying a group of files in a coordinated way.

Workspaces seems more agentic. You need to do a bunch of up-front work to (fully) specify the requirements. Even with a perfectly formulated request, agents often go down wrong paths and waste a lot of time and token costs doing the wrong thing.

That's also not how I code personally. My process is usually more iterative.

Another big difference compared to Workspaces is that aider is primarily a CLI tool. Although I just released an experimental browser UI [1] yesterday, making it more approachable for folks who are not fully comfortable on the command line.

[0] https://github.com/paul-gauthier/aider

[1] https://aider.chat/2024/05/02/browser.html

throwaway918274 2 years ago |

Nobody is running faster towards the cliff of their own destruction than programmers.

isoprophlex 2 years ago | |

Good. I'm looking forward to my future career as a woodworker.

layer8 2 years ago | | |

How will you make a living though?

ike2792 2 years ago |

I might be a curmudgeon, but I think that even teaching CS in Python is too new-fangled and high-level for CS students. Learning the hard way with C/C++ (or for a more modern flair Go or Rust) and understanding how to handle pointers and memory allocation makes it a lot easier to debug things when the higher level languages and frameworks have issues. A class or two on coding with AI would be great at the undergrad level, but not basing an entire curriculum on it.

FrustratedMonky 2 years ago | |

Agree.

And not joking, I think there should be engineering classes taught with slide rule, to get students to learn old school ability to work with orders of magnitude in their head.

Of course students have to learn new things too. But do think we are really losing some of the basic skills, methods of thinking, that you get with the old methods.

Like tracking down some pointer errors, it takes time, it's a difficult struggle, but you do learn a lot about how things work.

Have classes with 'new' tech, then have classes that require 'old' tech. Exams without calculators, or make an Assembly language class mandatory.

tbeseda 2 years ago |

My experience was similar[0] and my conclusions line up with the author here. Summed up: thinking about the problem is the hard part. I can think faster than I can code, but I can code faster than I can write out (in a detailed enough way to achieve my goal with Copilot Workspace) the spec.

[0] https://tbeseda.com/blog/previewing-github-copilot-workspace...

vundercind 2 years ago |

… am I wrong for thinking the actual play Workspaces is making is in corporate spyware, and the rest is mostly secondary as far as what may get businesses to pay for it?

kbenson 2 years ago | |

I don't know, but I think you and I have vastly different base assumptions.

Its a huge legal liability to have statements about how data won't be used and then use it, when you're a company that might compete in similar spaces, and Microsoft competes almost everywhere.

While I trusted githib when they were independent, I trust this feature from MS owned github more than I would them because the liability misuse opens them up to is so much more. If I was building a product and I was able to prove some MS depot used my info in an unauthorized way to build a product, I could sue that product out of existence, and someone always talks, so MS can't assume it will never be known, and they know that.

Marsymars 2 years ago | | |

> Its a huge legal liability to have statements about how data won't be used and then use it, when you're a company that might compete in similar spaces, and Microsoft competes almost everywhere.

Almost everywhere in tech, but almost nowhere outside of tech. I work for a large non-tech conglomerate, and as far as I'm aware, we don't compete with any MS products/services.

jmole 2 years ago | | |

Yeah, but look at this through the lens of enshittification.

Microsoft will sell "Copilot enterprise" to companies that can afford to negotiate. But every individual out there on a normal subscription gets data mined.

OpenAI is similar - you can't negotiate a "no-logs" deal with them unless you are a player the size of say, Epic (the health industry giant).

vundercind 2 years ago | | |

I mean “here’s some telemetry (spying) data on your employees, in a nice little dashboard”

kulor 2 years ago |

There was an impressive demo at AWS Summit London of their Code Whisperer and Q products taking a similar route to CW. Provide a user story and it'd create a PR.

I could see "AI workspace driven development" being the future of at the very least cutting through the smaller tickets of work and generally improving developer workflows.

HanClinto 2 years ago | |

It feels like CW is taking a step further left -- it takes a description of the problem and the codebase and creates a detailed user story (with bulleted points for success criteria and everything).

That feels like the right way to go -- almost baking an "agile done right" workflow into its engine.

Kon5ole 2 years ago |

I think AI copilots are great for coding. The IDE and compiler are a second source of truth so you can quickly eliminate AI generated nonsense and figure out what kind of problems it is good at solving.

To me the effect seems similar to going from assembly language to C or from C to Java or Visual Basic. It's a new level of abstraction that saves massive amounts of time.

I think the amount of work for software developers will increase just like it did back then. Many software projects are never started because they will be too expensive. If they can be done by half the number of people in half the time using AI tools, they might get a "go" instead.

bengale 2 years ago |

I think the disconnect with these tools is that their endgame is not to be a developer tool, it’s to take them out of the loop.

This is a tool for product owners, it’s just too early for them to use it by itself.

aantix 2 years ago |

The author is a software engineer and his last name is "Shipper".

Talk about high expectations!

This guy ships code.

ozten 2 years ago |

No mention of Cost for a task completed.

A similar system, CrewAI, I ran their hello world and it cost $4 against GPT-4.

There is a trade-off between my time and the cost of the feature against me just coding it up with LLM assistance which has a fixed cost of $20 per month.

Fin_Code 2 years ago |

I'm still not sure how this is different than the vs code plugin. It seems to function in about the same way. Just uses a bit of different context reference. But that scope can easily lead to incorrect code targeting.

HanClinto 2 years ago | |

I haven't played with CW yet, but based on the screenshots and whatnot, it feels like CW adds another layer of requirements-gathering to its workflow (along with clear bullet points for what the terms of success look like) that regular Copilot doesn't have.

justinclift 2 years ago |

> CW took two to three minutes to return.

Hmmm, wonder if there's cheaply sourced labour of the human variety in that loop then?

akiselev 2 years ago |

Any way to get access? All the AI product waitlists are killing me and I was stuck for months on the last GHNext waitlist.

andrewstuart 2 years ago |

I tried GitHub copilot in vscode. It was immensely frustrating.

The main problem was context. It didn’t seem to know what files to use for our discussion, didn’t listen when I told it, didn’t remember when I told it, had no effective way that I could bring files in and out of the discussion.

All this led to a deeply frustrating session of interaction and frankly I hated it. Easier to use ChatGPT web ui and copy and paste in and out.

GitHub copilot I found better in jetbrains ides. It seemed mostly to know what I was asking about though it’s very long was from being good at managing context.

It’s surprising that after the amount of development they’ve put into copilot it still is so bad at what I’d consider to be barest minimum functionality to integrate into an IDE.

intended 2 years ago |

ChatGPT will happily tell you how to build ocean liners in landlocked deserts, or how to ice skate up a hill.

marc_ranieri 2 years ago |

It's pretty much having an assistant changing hieroglyphs to the alphabet...

ianbutler 2 years ago |

To answer the question of whether something like this is the future of programming posed at the end of the article: I think in a lot of ways yes. It reduces the iteration time for making a new feature and handles a lot of the project management too. As AI get's smarter it makes sense to design workflows around how their capabilities can complement ours as developers and not just force them into existing workflows.

We're working on something similar to workspaces: https://www.bismuthos.com

We provide a workspace to build Python backends. Chat on the left, code and visual editors on the right. However, we also handle deployments, data storage (we have a blob store), serving (we built a home grown function runtime) and logging.

The experience is tightly integrated with our copilot and the idea is to get ideas off the ground as quickly as possible with as little devops hassle. Right now the focus is on building something new, but we're in the process of making it easier for existing projects to integrate with us too.

Feel free to drop by our (very) new discord too: https://discord.gg/E5Yn3vaM