Codex is now in the ChatGPT mobile app(openai.com) |
Codex is now in the ChatGPT mobile app(openai.com) |
In an ideal world the would allocate 50% of compute to find errors in that rewrite and publish how bad Claude is, but that would undermine confidence in slop in general so that is not going to happen.
It integrates with your issue tracker and makes the tracker the UI for the LLM. It also clones the repo for every ticket, and can set up fixtures/etc. I can work on multiple items at a time, which is fantastic because otherwise you have to wait for the LLMs a lot.
The main issue is reliability, so I think the corporations are going to take a much more gradual, piecemeal approach, and probably end up with something like Claw within a year.
Both of the Codex apps are very good.
I tried this out and it works significantly better than Claude's remote control in fact the first few times I tried Claude's remote control it didn't even work and to this day is very buggy.
Other than those limitations, the connection has been very stable for me, definitely more reliable than alternatives like happy.engineering or Omnara. What’s been buggy for you specifically?
Edit: actually it gets worse -- you can't start any tasks any longer in your mobile, you are required to sign in from your desktop/workstation. You can't "sign in" from your CLI.
Whoever made this, paper cuts I wish onto you. It sucks.
Edit2: actually you can ssh, so I presume that allows you to do the CLI -- and you still can do mobile first tasks, but it's not intuitive at all. Mobile first tasks you still can't pick the model, and I haven't tested the workstation connection one. That said, indeed paper cuts.
Now I can run scripts as hoc or stop my loss making stock market script without going through the hoops
you need a model server - ollama/llama.cpp/lm studio
Do you mean supporting oai-compatible api URLs in copilot? If so then you need either VS Code Insiders, or a VS Code extension I believe?
> Stay connected to active work from anywhere
... (and anytime because it's on your phone). No thanks.
`codex remote-control`
but it seems to be limited/sandboxed to a specific directory.
In my case, I run all coding agents under a specific user, `agent`, and when I access it over remote from my phone, I seem to be limited to the working directory `/var/lib/agent/Documents/Code` with no obvious way to change this.
It won't let me go to my existing project directories (including the directory where I started `codex remote-control`).
I haven't tried symbolic links or anything complicated yet, as this is mostly just a novelty to me.
Packing a Linux mini-pc in my rucksack, connected to display glasses, and voice-to-text with handy. Voice to text gets injected into a remote (Docker) codex session, running a hot reload web stack. I prompt to implement various features in an existing code base, where codex understands the structure and requirements. If a feature is done, I take a moment to inspect the results on the display glasses, then move onto the next feature or keep iterating. It's not perfect, but I was able to implement a couple of not too complex features while walking my local national park. The display glasses have a built-in 4-microphone array, and solid speakers. No need for a bulky headset or earbuds. Glasses come with monochromatic dimming, you can easily switch between dimming and see through.
If this comes with Linux integration, I will certainly give it a try.
So the whole idea is not to make work more efficient. It's just to make you work more, all the time, while waiting for your coffee, while in your commute.
Ask yourselves: is that the society we want?
Or ask Codex to create image that explains xyz.
But a person can use subagents, if they want, to filter that down. This burns tokens in a big hurry, but I think subagents can be arbitrary local commands (eg, a local LLM).
Or, you know: Just slow down. :) It doesn't always have to be a race, does it?
A UI like Jira/Trello to stage features and see (agentic)team status. A Figma-like UX to actually build out the app/interface/features. A system that aids human review. There's tons of paradigms to explore and improve upon.
Feels like a testament to the value in taking time and doing it properly.
Now if only codex got its 1M token context window back.
---
Edit: Hmmm. Maybe I spoke too soon. Sigh. Definitely _more_ reliable by far overall, but still have queued messages with responses on my phone that don't show up on my computer, and responses that don't show up on my phone.
Edit 2: New threads created from my phone seem to have a little stall-out, but ones that are underway are behaving reasonably well.
But I will still consider to release it anyways
1) Keep getting investors to give them money.
2) Convince the right people that OpenAI is "critical to national security" so that when 1 runs out, they can get bailed out by the government.
Everything else is just set dressing.
Neither does OpenAI.
they added some new stuff, like remote control to wherever the desktop codex app is running, but these companies need to work much more on their press releases.
In those scenarios, the goal is not "work at any time" but to "be anywhere at any time", or, rather, to "be able to work from anywhere, doing anything".
Sort of....I guess.
I’m not a swe but damn, I’d hate to be one.
Once you've used these coding agents a lot, you develop a pretty intuitive feel for how they work, what they're capable of, what they're good at, and where their weaknesses are. Hopefully, you're already pretty familiar with the code base you're working on. Combining the two, this means you can get quite far essentially "vibe coding" (i.e. not looking at the actual code) on a new branch.
So if you have some idea or some issue you want to fix on the go, you just iterate with the agent for a bit (presumably no more than a couple hours) until the agent outputs an implementation. Here, I do claim there is some "skill" (which is a function of your codebase familiarity, general SWE ability, and facility with AI agents), and if you're good, this implementation will be halfway decent a high percentage of the time. Then when you're back at your desktop, you can review the changes carefully/do some proper testing/debugging etc. But you've saved a good chunk of time- an initial draft is already waiting for you.
I unsubscribed from Claude after the performance regressions around the time of the Opus 4.7 update made it unusable. Been using Codex since then, and I've definitely missed being able to make these drafts. So I'm looking forward to trying this out.
Process (all on my phone)
* Create new repo on github
* Tell Chatgpt the project and ask for a readme and agents file
* Manual upload the files to github
* Go to Codex and tell it to review the code and carry out steps in readme
* Connect project to Vercel
* If needed, create a DB
* Ask ChatGPT for the schema and run the sql
I have done this kind of work for years and now I can create things like this on the way back from a meeting. It's broken my business model by the way.
Here is one of the apps, for mental health - pretty much all done on my phone.
https://you-are-ok.vercel.app/
So having Codex in app removes one little barrier and I will take that.
Actions were not working on the android app. Known issue for a long time
https://community.openai.com/t/gpts-custom-api-calls-not-wor...
I did the same solution described above - Bookmarked the custom GPT, added to home screen and would run it in browser.
It seems to be working now. So I can start building out custom GPTs, share with my clients and not have a big training piece to onboard them.
Edit: https://chatgpt.com/g/g-697edc35e2888191a331217cd0483a67-men...
Take a pic of a menu and create a group meal/degustation for your budget.
I have Claude Code with access to Azure environment (via CLI) where app components are deployed and also to the code base repo. I paste an error message or explain the symptom. CC works through various configuration checks and network tests etc across the Azure resource list and also the application logic and surfaces the root cause of the error precisely. Easily 1-2 days of effort if I had done all those myself (this is an inherited code base) -- would have had to learn a few of those skills along the way or may not have thought about some of the checks if i were on my own -- all done in about 45 mins with basic human-in-the-loop guidance.
Of course learning it the hard way would have meant deeper understanding and first-hand exprience for me. But there is no guarantee I wouldnt have given up mid way frustrated or other priorities prevented me from pursuing this in full.
(The refactor's been to support Jujutsu VCS.)
Case in point, I have a Rust project with a target/ directory with about 10GB. Compile times from scratch takes about 10 minutes. (I do not love this)
With this mobile app I need to upload the code to the cloud, right? Or does OpenAI expects me to compile huge projects on my phone?
So basically, it is like you are typing on your terminal on your computer from your phone.
I'm also completely fine if it gives me hold mustic while it's working.
Would make my walks much more productive.
You must be kidding me.
https://x.com/karpathy/status/1886192184808149383
Forgetting code exists is by definition not suitable for serious work. However, OP said in the following paragraph, that this would be a first draft, and that the code would actually be reviewed and tested properly before being integrated.
At which point it is by definition no longer vibe coding, because you do care about the code! It's just an AI assisted workflow, but now we call all of those vibe coding for some reason. (Naming things is hard!)
If vibe coding means not caring about the code, then a literal translation of the term would be "not caring about coding" coding.
How do you think the world has worked for the past thirty years? AI has just caught up with human skill is all.
A key point is that after the "vibe" session you should also have a lot of tests written. So they can easily refactoring the code afterwards if there are major aspects you don't like when you get back to your desktop.
Imagine saying that you don't need to look at the roads or have no hands on the wheel whilst driving because someone-else said that the car can 'drive' itself; therefore, no need for anyone (including taxi drivers) to learn how to drive.
Just because a machine can generate plausible looking code does not mean you don't need to look at it or not know how it works or why it doesn't work.
Wouldn't you be doing the exact same thing had you been sitting at your computer when you had the idea?
Perhaps the person who wrote that had the mindset of "when I am away from my work, I want to be disconnected and present with the world around me, this updates now makes it so that I now have an excuse to carry work with me"
Maybe they're in a toxic/abusive work relationship where taking breaks is already difficult and this might lead to justifying working from your phone as "expected"
My question to you is: what is wrong with moving a little slower? Is time to prompt an optimization of a real bottleneck?
At least, that's how I code through my phone. But it does require some forethought in establishing your automated workflows. I'm at the point where my entire dev system has established templates for CI/CD so I can preview work in staging and production is still a manual step (obviously).
Of course I am aware that the caveat here is that all my interaction is part of training, but I’m fine with that. Even Qwen Cli discontinued the free plan.
I was initially quite excited, but I’ve found the results are less than great compared to being at a keyboard.
Something about the smaller screen size and/or lack of keyboard causes me to direct the agent less, which in turn creates more tech debt/code churn/etc.
Maybe I’m just showing my age, and I should practice voice dictation or something more, but my thoughts flow faster and more clearly on a keyboard (less ums).
Don't get me wrong, I still use Codex (and sometimes Claude Code) remotely every day, and am overall excited for this release, it's just that the benefit wasn't as high as I had initially hoped.
Part of this is due to the models getting better (no need to prod along with "continue"), and part of this is the nature of how I use my phone (short bursts of attention).
But again, maybe I'm just old and prefer big screens with a keyboard.
They might just not have cut a new build yet, today. It 'works' on master, but the mobile app thinks that your build is outdated (v0.0.0) if you build from master without overriding version, so probably easiest to wait until they cut a build if they haven't.
Woah, hadn't seen this before!
Off-topic, how long compile times do people have for codex-rs in openai/codex? Even my very beefy computer takes like 30 minutes to compile in release mode, makes me wonder why it's so slow and how this TUI got so large. But then I remember, agents like to write a lot of code, compilers get slower when they have to compile a lot of code :)
Mobile remote connection works, pushed the PR earlier today.
Claude on the other hand has been jank all around from the UX to the UI to the AI itself that it's baffling how it's more popular here on HN: https://i.imgur.com/jYawPDY.png
Sadly this remote control feature doesn't seem to be for Mac to Mac yet? I love the MacBook Neo as a "thin client" for AI and keep the MacBook Pro at home/hotel, and it would be nice to share Codex desktop sessions (without SSH → resume link)
Oh there is a "Control other devices" option in Settings → Connections, but it wasn't working in the build at the time of the original post
Edit: Running into issues setting it up on Windows. There's no "/remote-control" command in the CLI, so I installed the Windows Codex app. Then I updated the iOS app which now has the "Codex" feature in the sidebar, which should allow remote access to the Windows machine's instance - except it doesn't connect. The iOS app shows my desktop's hostname, so it knows there's an instance there, but refuses to connect. Issues like this would persuade a lot of folks to switch back to Claude.
My experience today with the new Codex remote control has been that it doesn't connect at all.
If I were to do this Codex flow I would want to have it setup in a dev container, most people are probably not going to do that, so we are going to see a lot of vulnerabilities introduced / computers "breaking". Breaking in quotes because the computer is not actually broken, but to a novice it might appear it is, when in reality it's just out of disk space or the agent executed a setting, it shouldn't have. Unfortunately, if the computer is out of disk space, a novice won't be able to spawn their coding agent to fix it, so their next logical course of action is Geek Squad/IT? I don't even know.
This is a non-issue problem raised by anti-ai zealots, much like data center water use is overblown. Headlines and lies.
And novices ARE absolutely using these things, I have a handful of friends with 0 CS background using claude to write apps, automation, etc
My method of walking to work is back (going for an 8 hour walk , voice dictating the whole way)
Who is this good for? My company, Anthropic/Openai, but not me? Maybe if I was investing in a side project I’d feel differently.
At any rate, the thought of bringing the agent out of my work machine and into my pocket or bed with me is terrifying. I hope it doesn’t come to that.
Those two are killing features.
But, for whatever reason, no one uses Google Jules.
I don't want my phone to have the ability to execute things on my computer. Much less with a LLM in-between!
only surprising product that feels non google for me is Google Stitch https://stitch.withgoogle.com/
You can run your local LLM and just connect the docker containers. I'm paranoid of being disconnected from the LLM, so I never run any of this on the same machine, so orchestrating a docker-compose file that provides the necessary services is important.
I'm still trying to find a good remote file system to loop into the setup for improved switching between cli and these web containers.
I can do some tasks on mobile, especially if they are follow up and steering only, greatly increasing productivity as you can keep working whilst in transit, etc.
This specific feature is more akin to Remote Control in Claude. You could already kick off Codex Cloud tasks (although it's just a little more fiddly to do so).
If you can move to Codex Cloud (or "Claude Code for the Web"), I think it's the superior approach. Start it there, and just pick it up from the PR if necessary.
And here I thought AI was gonna automate the world and we were gonna work less.
Turns out you’re gonna work 24/7 no matter where you are!
I want to code from my mobile device when my laptop is off or unavailable, pushing PRs directly to GitHub. Codex mobile only works with a desktop machine, at which point, I'll just use that machine, what's the point.
Codex is far less frustrating and manages context better. It's also costing me about 1/3rd as much as Opus 4.7 on CC.
Claude was more autonomous and still is a little, but I think GPT 5.5 closed that gap a lot. Claude is far better at front end design. I think it's still better at big picture planning.
Codex is far better at code review and catching bugs that actually matter. I think it's better at following directions, although I think that regressed a bit with 5.5 (flip side of the autonomy I mentioned earlier). A lot of CC users claim to not like Codex's personality (or lack of), but personally I prefer it.
Very fast
(side note I like that they rate limit messages and not tokens. at least it does not just stop mid reply)
I'm using paid on TypeScript and it's genuinely terrific. Subjectively I think it has the edge over Opus.
I'd be surprised if OpenAI is hamstringing the free version. That would seem crazy from a GTM PoV. If anything the labs seem to throttle the heavy paid users.
In my experience, although the build is a little slow, it's that LTO step that takes a million years.
Also, your worst case was... having to dig for some cables and peripherals, which of course you have around, because you are into computers and self hosting.
(Have them cover their own token costs, hehe).
This is just like everyone who says, “An iPad is not suitable for serious work.”
By which they (and you) generally mean, “What I do is serious work. What you do is unserious work.”
I think I do serious work – I mean they pay me for it? And I have only copy/pasted and just run whatever code’s been generated by AI for the past 12 months or so. Whenever I can I just let the AI run it itself.
Sad to learn that I’ve been so unserious all this time.
> Naming things is hard!
Indeed.
Thanks!
I also put together this ridiculous thing[1] because I missed the font and color scheme of Claude.
[0] https://gist.githubusercontent.com/dmd/91e9ca98b2c252a185e8e...
I can go through a 5-hour limit with a $20/mo Plus subscription in a few minutes with 5.5 Extra High. This causes me to reserve the latest/best rev for the harder problems.
5.5 really does seem to be very superior to 5.4, but it's also very expensive to run: The gas gauge moves fast. It's not very clearly defined whether 5.5 will cost less to get a problem solved quickly, or if a bunch of automatic iterations of 5.4 will solve it less-expensively. Both are often frustrating to me on the $20 plan.
(Also: Are you sure you're seeing it right? 5.5 has been in the wild for less than a month, so far. https://openai.com/index/introducing-gpt-5-5/ )
MAYBE the 50% overall is true, but the double usage during a 5 hour window i just dont see it at all. I've maxed 3 5 hour windows since this happened, 0% chance it was double as much as normal, i ate up about 4-5% of my weekly total each time(this was ~10% each time pre announcements). wish i could give token numbers but its obscured i just know it was around 120k 4.6 with some delegation to sonnet subagents.
So SURE its almost certainly more allotted weekly, but if those totals are consistent for 5 hour blocks, you gotta split your daily usage into at least 3 sessions with 5 hours between them to even hit that weekly limit. its unreal how much they have burned their good reputation in a 2 month stretch, i am positive its also being astroturfed with bots more than happy to advance the narrative.
the internet is annoying, these tools are overall cool, just wish anthropic would go back to being semi predictable.
There are also many people running 4.5 with specific parameters that claim to be having luck.
I'm not entirely clear on the mechanism by which memories make it into context, so it's possible some of it isn't all the time, but it does seem to be working reasonably.
Again, it's not as good as Claude when it comes to writing "not like an AI". But it's significantly better than it was.
I think you just need to type more rather than feeling constricted, as it's actually a form of liberation, to produce (or have an AI produce, whatever) something from wherever you are rather than needing to sit down on a laptop where you're gonna be waiting around anyway.
What tunnel setup do you use by the way? I'm on Android so it's kind of annoying all the LLM remote coding apps are iOS only.
It isn’t so much that I feel restricted, I guess it’s that mobile wasn’t as big of a game changer as it was ~6 months ago.
My bandwidth feels more restricted by my own cognitive capacity (usually due to do context switching), rather than the limits of the model itself, and the mobile interface makes that worse.
I’ve recently found myself reserving larger tasks for “keyboard time” and reverting my thinking back to notes (in mobile), which I’ll then formulate to the LLM at some future time.
> What tunnel setup do you use by the way?
I “vibecoded” an agentic runtime that operates my machine generally (including TUIs like Codex/Claude Code), which I connect through a custom proxy and mobile app (both also vibecoded).
I previously tried Cloudflare Tunnels and an SSH setup, but it all felt a bit hacky.
Unfortunately the app is iOS only, but I could open source it and you’d probably be able to make an Android clone quickly (:
Where it starts to become a pain is when the task demands a lot of formatting, symbols/punctuation, uncommon words, non-linear writing/editing, or referencing of outside information. The more I have to multitask, and the less I can just stay in a flow and churn out effectively a stream of consciousness, the more constraining a mobile device is going to feel. But for lots of things it's surprisingly great; sometimes I'll intentionally do the heavy lifting on a longer document from my phone and then handle editing/formatting/proofreading from my laptop.
Anyway, I set up Tailscale and aRDP a few months ago (as well as Termius, but have gravitated more toward aRDP in practice), and it's been a pretty substantial efficiency boost. On one hand, I've sort of experienced the same thing as the parent — not necessarily longer, but more complex prompts often have me putting down the phone and grabbing my laptop. On the other hand, lots of prompts are totally fine from mobile. There are also entire categories of tasks where every few hours I just need to sanity check the current diff, latest commits, and Codex output, then resend some variation of "please continue" from my prompt history and maybe answer some follow-up questions; mobile is perfect for that.
Does the tunnel setup feel clunky to use? That's the main thing that stops me.
I think you may be able to optimize your workflow more by drafting your prompt in ChatGPT first; get it to expand out the intent for you. Doing that has made phone coding a lot more tolerable for me.
I like to think that I've given phone coding a fair shot (and I continue to do it), but I agree with the other poster that there's something about the lack of a keyboard that really gets to me :) I wish I knew what it was.
Most of those commits since the last few months are thanks to Codex reviews (but the code is not AI generated): 5.5 since it came out, and 5.4 etc before that, almost always on Extra High because it's for a framework that underlies the other stuff I do so I want make to sure everything's correct.
Sometimes I have to run multiple passes on the same task: I rarely continue any session beyond 4-5 prompts to avoid "bloat" or accumulate "stale context", so sometimes Codex finds different stuff in subsequent reviews of the same file/subsystem.
The project is modular enough where each file can be considered standalone with only 1-2 dependencies, and I already used to write a lot of comments everywhere (something some people laughed at), so maybe that helps the AI along?
I'm taking this, along with my own experience, to mean that the GPTs are cheaper to use for refactors of an existing body of work than they are for creating a new one.
(And perhaps part of that is in the name? These "LLM" contraptions are very good at translation, after all. And tokens seem to relate more to concepts than to specific phrases or words.)
Another thing I didn't mention is copying to the clipboard, which kind of sucks on mobile in general, but is particularly a hassle within RDP. If I'm going to need to copy a bunch of terminal output, snippets of files from VS Code, maybe some browser console errors, etc., I generally don't bother attempting to put that prompt together from my phone.
Tailscale is fairly polished and seamless to use for creating the actual tunnel to the dev machine. The RDP part may be a bit hacky, but it does everything I need and works well enough that at this point I haven't invested time in trying out alternatives. Using a full Linux desktop from a 6" smartphone is inherently going to be clunky, but the flip side is it's 100% batteries-included. You'll never have to rely on some app to reimplement end-to-end support for your entire dev workflow, because it's already a direct interface to your actual dev box.
aRDP deserves a lot of credit for how practical this is. It's clear that a lot of care was taken to map mobile interaction metaphors to desktop UIs in a way that was as natural as could reasonably be done. For what it is, the UI/UX is surprisingly smooth.
I also tested the new ChatGPT feature. Not a full RDP replacement, but it'll be a super handy companion UI after Plan mode is fully supported.