The Mythical Non-Roboticist(generalrobots.substack.com) |
The Mythical Non-Roboticist(generalrobots.substack.com) |
Got a controls problem? forward predict using the magic sensor.
Got a planning problem? just sense the world as a few matrices and plug it into an ILP or MDP.
What did the user mean? Ask the box.
etc etc. Distilling the world into the kind of input our computers require is immesnely difficult, but once that's done "My" problem (being a planning expert) is super easy. I'm often left holding the bag when things go wrong because "my" part is built last (the planning stack), and has the most visible "breaks" (the plan is bad). But it's 90% of the time traceable up to the perception, or a violated assumption about the world.
TFA is spot on - it's just not clear how to sense the world to make "programming" robotics a thing. In the way you'd "program" your computer to make lines appear on a screen or packets fly across the internet, we'd love to "program" a robot to pick up an object and put it away, but even a specious attempt to define generally what "object" and "put away" mean is still 100s of PhD theses away.So it's like we invent the entire ecosystem from scratch each time we build a new robot.
It’s also made me draw parallels between the experiences with actual people, especially others in my household. With young children who are at the early parts of “doing household chores” of development there is basically constant refinement on what “clean the floor”, “put things away”, etc. _really_ means. I know my wife and I have different definitions on these things too. Our ability to be clear and exhaustive enough upfront on the definitions to have a complete perception and set of assumptions is basically non-existent. We’re all only human! But our willingness to engage in fixing that with humans is also high. If my kids repeatedly miss a section under some chairs when vacuuming we talk about it and know it will improve. When my Roomba does it it sucks and can’t do its job properly. Even thinking about hiring professional trades people to come do handiwork it’s rarely perfect the first time. Not because they’re bad, just because being absolutely precise about things upfront can be so difficult.
Like last night on Twitter I saw an opening for Robotic Behavior Coordinator at Figure. I know for sure, having analyzed this problem with "nothing else" to do for 20 years, I would crush it with humility, and humanity would profit in orders of magnitude.
But they are not set up to hand me control of the rounding error of $40M I'd like [and would pay forward], *nor would their teams listen to me, due to human nature and academ-uenza*.
Such is our loss.
(as you ~say, "reinventing the ecosystem from scratch...")
> humanity would profit in orders of magnitude
Have my Y-C idea now.
here we gooooo ..!.. ;)
Is this part still true? There are widely available APIs (and even running at home on consumer level hardware to some extent) that can pick an object out of an image, describe what it might be useful for and where it could go.
Imagine the frustration if the robot kept returning to you saying "I cannot put this away". You'd get rid of the robot quickly. Reasoning at that level is so difficult.
But then imagine it was just a towel all along - oops, your perception system screwed up and now you put the towel in the dishwasher. Maybe this happens 1/1,000,000 times, but that person posts pictures on the internet and your company stock tanks.
Occlusions, short-lived tracks, misassociations, low frame rate + high-rate-of-change features (e.g. flashing lights) are all still very challenging when you get down to brass tacks.
It's a lot easier to get started on something interesting and maybe even useful than it was even 10 years ago.
A lot of the "ah we can just use X API" falls apart pretty fast when you do risk analysis on a real system. Lots of these APIs are do a decent job most of the time under somewhat ideal conditions, beyond that things get hairy.
You have to do it in real time, from a video feed, and make sure that you're tracking the same unique instance of that object between frames.
If they can put ImageNet on a SOC, they can do it. [probably too big/watt]
Better yet: ImageNet bones on SOC, cacheable "Immediate Situation" fed by [the obvious logic programming that everyone glances past :) ]
I would add supply chain, however.
Assumption: Apple's supply chain is gold standard [~max iterative tech envelope push & max known demand]
Hypothesis: This is swiftly re-creatable for any [max believable & max useful] product. "Detroit, waiting".
I've been working on robotics pretty much my whole career and people usually miss how complicated it can get even for simple things once you consider what can go wrong AND it's a meeting place for a multitude of areas: hardware, software, mechanical, electrical, machine learning, computer vision, control, driver, database, etc. An issue can hide in between any of those for months before it shows up with bells and whistles.
What is sometimes difficult to get across to people is that building robots is not only difficult per se, but the base of comparison is unusually unfair: if you build an e-commerce website you benchmark it against other e-commerce websites, maybe Amazon, maybe ebay; for robots usually the benchmark is against people, the most adaptable and fault tolerant machine that exists, every robot will suck compared to a human doing the same task, but that's what we compare it to every time.
Yes! I am not a roboticist (or at least a good one in any sense) but I was having a similar discussion regarding enabling non-technical users do data analysis. Once they start doing anything more complicated than `SELECT COUNT(*) FROM blah WHERE foo=otherblah` it's going to get real real quick. You can't just give them some cheap point and click stuff because their questions will immediately overrun the extent of what's practicable. Asking interesting questions of data is roughly as difficult as phrasing the questions in SQL (or any other formal query language) and anyone who can do the first can do the latter easily enough.
(or the point and click stuff _is_ really powerful but it's some proprietary non-googleable voodoo that requires a month long training course that costs $5K/week to get a certificate and become middlingly powerful)
It will be the same in any branch of programing you look.
I like it that we have a name for this now. Let's keep calling it the "low-code fallacy", because I'm tired of explaining over and over the same idea that semicolons and for loops are not what makes programming hard.
Rethink Robotics went bust because they couldn't solve this usability problem. It's a problem at a much higher level than the author is talking about. If you're driving your robot with positional data, that's easy to understand, but a huge pain to set up. Usually, you have very rigid tooling and feeders, so that everything is where it is supposed to be. If it's not, you shut down and call for a human.
What you'd often like to do is an assembly task like this:
- Reach into bin and pull out a part.
- Manipulate part until part is in standard orientation.
- Place part against assembly so that holes align.
- Put in first bolt, leave loose.
- Put in other bolts, leave loose.
- Tighten all bolts to specified torque.
Each of those is a hard but possible robotic task at present. Doing all of those together is even harder. Designing a system where the end user can specify a task at that level of abstraction does not seem to have been done yet.
Somebody will probably crack that problem in the next five years.
So, to what level of granularity do you have to specify a system task in order for it to do the thing you want it to do, at the level of accuracy that you wanted to operate in?
That all depends on how accurate you can specify what you want to do
which means you have a sense of all of the systems that interact with, and impede the successful task of the set of systems
We can build abstraction layers we can build filters, but at some point somebody has to map a set of actions with a set of inputs and outputs, in order to sequentially build this set of tasks, which rolls out into the function of a physical manifestation of some sort
Add to that the complexities of mobile actuation complex environments and just the general state of power, computing, routing, etc. and you have a 15 body problem simply to have anything that someone would look at as benefit to humanity
Only a couple of disciplines can totally encapsulate all that and none of them are available to study anymore primarily cybernetics, and all of the interactions necessary to fully build a human machine symbiotic system
I like that! Although...Physics [so gpu] is enough to do it, when supplied with an optimized way to "know" momentary_[intent/\status] as a reduced ongoing string of equations.
We all had to buy roombas to program. The final exam was getting it to traverse a maze. It seemed so simple! They even gave us the exact dimensions and layout ahead of time. Just hard-code the path, right? Spin the wheels so many rotations, turn 90 degrees, spin some more.
Except the real world is messy, and tiny errors add up quickly. One of the wheels hits a bump, or slips a little on the tile, and suddenly you're way off course. Without some kind of feedback loop to self-correct, everything falls apart.
My excitement for robotics died quickly. I much prefer the perfectly constrained environment of a CPU.
I am still excited for robots though, but haven't worked on one in quite a while.
We were expected to assist in machining parts, building control libraries from scratch, working out algorithms from scratch for path generation, etc.
The goal was to shear a sheep: https://www.youtube.com/watch?v=6ZAh2zv7TMM
This is definitely applicable outside of robotics. For example, I work on a large-scale LLM training framework and tend to think this way when thinking about design decisions.
Lol, I work in the field of test automation and this is exactly how no/low code frameworks get pushed as well. And, it rarely does play out in a way that people think it will.
In fact, having read the entire article, I feel like a lot of it can be applied more broadly. Basically any time people go "X sure is complex, we should make a simple to use framework for non-X folks to use". Not that it will always fail, but I have seen it happen enough to recognize a pattern.
What could it do to stand out to you? What could it demonstrate in 15-20 seconds for you to think "OK this is different"?
It is not impossible. Out the top of my head Robot Framework does fit those criteria. But I'd argue that Robot Framework isn't really low code, but rather a coded framework in a low code trench coat.
Neural network people: watch this space, I have a shotgun.
Many people seem to long for a magical technology that you could just pour over things and they will work out in the ways you wanted, while miraculously sensing the ways you didn't.
Those with the edge on the new tech will always be those who have a good understanding of it's limitations, because once a new thing comes around they immediately see the possibilities.
“Oh yeah, if you try to move the robot without calling enable() it segfaults. That's a safety feature… I guess? But also if you call it twice, that also segfaults. Just call it exactly once, ever.”
From the article: Design your APIs for someone as smart as you, but less tolerant of stupid bullshit.
One of the most painful parts of doing this professionally is that the people that work at a few of our vendors are incredibly smart and are selling us hardware that we can’t really get anywhere else, but they’re generally Electrical Engineers or Optical Engineers or Physicists and don’t even realize that the APIs they’re providing are bad. You file a bug, they tell you you’re holding it wrong, you point out the footgun in the API, and they come back and ask what that even means.
…It’s not until you debug their closed source library using Ghidra and tell them they missed a mutex in a specific function that they start treating you as anything more than a moron.
Anyway </rant>
I’ve been through exactly this scenario in two very popular robotics frameworks.
Countless hours were spent by well intentioned framework developers to abstract away the underlying PID loop from the end users. Then countless more hours were spent by the end users to work around this by implementing a PID loop on top of the abstraction.
That sort of one step implementation seems to be a sweet spot for llm
Problem is a lack of available examples for training?
I think, really, the emperor has had no clothes for quite some time. But -- now that we are here, the optimal path is towards open standards.
All historical business acumen points straight to black-box profit bubble. The enormity of what "Useful Robotics" will bring about has got to transcend that.
Having worked in both industries, I concur that robotics is much, much messier, as the system has to engage via hardware with the super-messy physical world as opposed to the comparatively modestly messy world of business transactions, data analysis, or whatever. But if we stop trying to solve for "programming for nonprogrammers" and assume that anyone who uses a language or API is a programmer (because once you start programming, that's what you become, irrespective of what's in your job title), we can remove a whole lot of wasted effort from the industry.
I feel like this has been the problem plaquing the ROS navigation stack since move_base and now nav2. They design the API for people a few standard deviations smarter than everyone else on the planet. Billions of parameters that affect each other in unpredictable ways and you're supposed to read the thesis on each one.
Or do what most everyone else does and use the defaults and hope for the best, lmao. You either make an API that the average user will understand or it'll inevitably be used as a black box.
Robotics is dead. Long live robotics.
Maybe this new ML wave will bring about a more generally useful robot, it certainly feels like it will at least open up a ton of new avenues for R&D.
And per Murphy’s law, it happened for the first observed time in a relatively high-stakes situation while there were a lot of eyes on it. Naturally.
What about a factory robot, welding together a part of a car?
As soon as it gets practical it stops being robotics.
I've entertained the idea of entering that space as a software engineer. No real experience in robotics though.
Projects are usually complex in part due to having a lot of moving parts (hw, sw, mechanical), iterating (bad) designs/components is not practical due to support reasons, so you may be stuck with a known bad stack.
And copying a previous comment from me on another thread: Robotics is very niche and the market is dominated by early stage startups (since most of them go out of business a few years in), so salaries are average unless you are working specific jobs for FAANG (which is a small pool). Job hoping usually means moving elsewhere, since working close to the hardware makes it much easier, which in turn means having a good picture of what is a competitive salary sometimes is not obvious.
Overall I would say that if you are optmizing for money / career mobility robotics is not great and you can do better some place else.
- one year (someone is building this)
- five years (no one knows how to solve this problem but a lot of people are working on it and y'know, eventually you get lucky)
- ten years (this isn't forbidden by the laws of physics but it's bloody impossible as far as anyone knows)
Nothing in your list has really changed in the last 5 years. What makes you think we are significantly closer now?
NB: I'm not saying we aren't making strides in robotics. A lot of these problems are really tough though; smart people have been working hard on them for the last 4+ decades, and making some headway. We are definitely enjoying the benefits of that work, but I don't have any reason to think we're "nearly there"
What I do think is much improved in the last decade or so is the infrastructure and vendor ecosystem - you can get a lot done no with commodity and near-commodity components, there is less need to start "from scratch" to do useful things. But the hard problems are still hard.
Vision. Computer vision keeps getting better. Depth sensors are widely available. Interpretation of 3D scenes kind of works. A decade ago, the state of the art was aligning an IC over the right spot and a board and putting it in place.
> What I don think is much improved in the last decade is the infrastructure and vendor ecosystem - you can get a lot done no with commodity and near-commodity components, there is less need to start "from scratch" to do useful things.
Very true. Motors with sensors and motor controllers alone used to be expensive, exotic items. I once talked to a sales rep from a small industrial motor company that had just started making controllers. He told me that they'd done that because the motor and the controller cost about the same to make but the controller had 10x the markup.
Shoulda teamed up with the Nintendo folks, probably.
>> but please believe, I would not risk ostracism on this (my favorite) forum if I were not [approaching] 100% sure.
Problems like "how do we build better automated surveillance robots? it's so inconvenient to have to actually have a human remotely piloting the kill-bots"
Which is the other, equally shiny part of the coin.
Elder care, anyone? They're as cool as you and me (+30yrs) :)
Otherwise the Teslas would have indeed full self driving mode, using only cameras.
The costs of doing so are hugely dependent application. It is not, for example, an attractive strategy for an image-guided missile, though it's probably fine for an autonomous vacuum cleaner.
The earlier "when compared to humans" statement definitely sounds pretty accurate to me, worded as "mutli-purpose robots currently always are less robust than humans at the same set of tasks" (or similar)
Specialization has tradeoffs. Humans are very optimized generalists but very few of us become specialist at more than one thing. Even in that case a specialized machine/robot can be far faster, depending on the task of course.
Of course humans have a lot of trade offs for their abilities as generalists... taking years to mature, requiring sleep, poor integration with computer systems are just some of them.
"They have a better chance at proving or disproving this than we do."
1 and 2 make perfect sense and are easy do demonstrate but 3 seems to me to be incredibly difficult.
I haven't found an easy way to advertise convincingly to somebody who (quite reasonably) grants you a limited amount of attention that custom things won't be a nightmare. It's the kind of thing you only tend see when you get dug in the weeds and hence people will tend to make assumptions based upon surface details.
This is a problem I'm struggling with.
I think robot/cucumber could require less code if they were better abstractions (and would be more loved), but I find it hard to illustrate that an abstraction is going to be good or bad, particularly to people with limited attention and particularly to people who don't necessarily have the skills to recognize a good abstraction.
I'd say, have a highly emphasized set of examples of the things people are most likely to want to customize.
You probably don't want to put the examples inline with your basic description, but link them there.
(And semicolons are ugly and I avoid them, wherever I can get away with it, but no, are probably not the reason)
I also agree that 0.1 + 0.2 != 0.3 is another thing that makes programming hard. This is intrinsic complexity, because it is a fundamental limitation in how all computers work. The way around this is -- you guessed it -- better programming languages, that help you "fall into the pit of success". Perhaps floating point equality comparisons should even be a compiler error. Again, low-code goes the opposite direction, by simply pretending this kind of fundamental complexity doesn't exist. You are given no power to avoid it biting you nor to figure out what's going on when it does. Low-code's entire premise is that you shouldn't need to understand how computers work in order to program them, but of course understanding how floating-point numbers are represented is exactly how you avoid this issue.
The SQL `numeric` makes the right choice here, putting the problem right at the front so you can't ignore it.
That said, I completely agree with your main point. Modern software development is almost completely made of unnecessary complexity.
Once you know how many degrees of freedom are truly needed to solve a problem, you start removing unnecessary parts in the design to lower cost and assembly complexity.
Thus, once your cool new C-3PO has perfected the art of making toast, it's only a matter of time until you re-engineer it into looking like a toaster.
The best illustration of this subtle difference is how I'm contemplating snow and ice management. I have the solid state idea of installing quartz IR lights around the building to control the ice and snow. I also have been working on using de-icing and pre-icing liquids with hopes of getting some droids to take over the physical part of applying the liquids and brushing away the snow.
I have settled on doing both with the building controller acting as the overall manager of the process.
I looked at the posetree.py that the author wrote and linked to and it looks like as good a place for me to start.
Form factor is critical in assigning human names and commumnicating use. I find when organizing a solution to a problem adopting a form factor too early is a hidderence.
The question is: how much information is lost in the process? How many layers of complexity we would add to a machine ensemble to be able to operate together at a satisfactory level? The machine learning corollary of understanding the whole picture of the problem/solution space and that leading to simpler solutions (because you don't have to optimize further) applies here. At the end of the day, cost, complexity and practicality will have the final word.
This idea co-evolved in "AI"
I disagree, at least with this as evidence for your 5 year timeline - computer vision has been improving, yes, but nothing earth shattering in the last 5 years that I've seen. We've seen good incremental improvements over 30 years here but they don't seem to be approaching "good enough" yet, at least not in a way that would give me confidence we're at an inflection point. Most of the most recent interesting improvements have been in areas that don't push the boundaries - they make it easier to get closer to state of the art performace with less - fewer sensors, less dimensional & depth info, etc. But state of the art with expensive multiple sensor setups isn't good enough anyway, so getting closer to it isn't going to solve everything.
Same with the 3D scene stuff still people have been plugging away at that for 30 years and while I think some of the recent stuff is pretty cool, still has a long way to go. Whenever you start throwing real world constraints in the limitations show up fast.
Which gets us, for example, cost-effective robotic weeding, and sorting of recyclables. When each sensor only needs about a smartphone's worth of processing capacity, and cameras are cheap, they can be applied in bulk to mundane tasks.
However it doesn’t really speak to your contention. This is an example of doing less than state of the art perception for much cheaper, but to meet your goal (5 years or otherwise) we need to significantly improve the state of the art.
I totally and completely disagree. Sure, "computer vision" industrial cameras doing edge detection haven't changed much, but the computer vision my phone can do is many orders of magnitude better today than it was 5 years ago.
There's tools now that can take a short video of your bookcase and identify every book. That's serious progress!
Edit: This is the example I was referencing https://simonwillison.net/2024/Feb/21/gemini-pro-video/
Breaking down video into tokens for large language models and asking for structured data out. That's ground breaking compared to any non-LLM style machine vision.
I just don’t think it moves the needle significantly in this particular area. For example, structured data out of a single camera is way better than it was 5+ years ago, but it isn’t as good as a dedicated multi sensor setup (ie state of the art for robotics) and that in turn isn’t good enough for the problems in GP post - which was the point.
Eh, they're better than they were, but there's nothing that can meet the needs of generalizable robotics.
Every depth camera on the market does badly in some common situations. Even the ones that cost as much as a house.
> A decade ago, the state of the art was aligning an IC over the right spot and a board and putting it in place.
Are you sure you don't mean 3-4 decades ago?
And if that feels too expensive and space-intensive for mere toast, just think of how much worse a robot would be!
A language that was built around the philosophy of constructivist math in order to allow arbitrary precision arithmetic would basically treat every number as a function that takes a desired precision and returns an approximation to within that precision, or something very similar to that. All numbers are constructed up to the precision they're needed, when they're needed. But it would still not be able to evaluate whether (Pi / 2) * 2 == Pi exactly in finite time -- you could only ask if they were equal up to some number of digits (arbitrarily large, but at a computational cost). If you calculate some complex value involving exponentials and cosines and transcendentals using floating point, you can just store the result and pass it off to others to use. If you do it with arbitrary precision, you never can, unless you know ahead of time the precision that they're going to need. There are no numbers: only functions. You could probably even come up with a number that suddenly fails at the 900th digit, which works perfectly fine until someone compares it to a transcendental in a completely different part of the software and it blows up.
This does not sound like it's simplifying anything. Genuinely, a healthily-sized floating point is the simplest way to represent non-integer math; this is why Excel, many programming languages, and most science and engineering software uses it as their only (non-integer) number format. It's actually hard to come up with a situation where arbitrary precision is actually what the users need; if it really seems like you do need it, then you might actually want a symbolic math package like MATLAB or Mathematica/Wolfram Alpha or something.
There are autonomous forklifts, but a humanoid robot that could sit in a normal forklift, regulations aside, would be almost an insta buy in logistics.
I'm designing for a future that is as far out that I both see it and achieve on the scale of an 8 unit apartment building.