How Uber Deals with Large iOS App Size(eng.uber.com) |
How Uber Deals with Large iOS App Size(eng.uber.com) |
https://news.ycombinator.com/item?id=25376346
Uber is a global app, so the other 9/10s of the code is for all the features and functionality you'll never see outside your region since there is currently no way to split up binaries by region.
It's like graffiti.. the app is so big already, that the devs don't give a damn about optimizing.. why bother if they are just A/B-feature tests? 30mb for some unoptimized screens for example
And then there wouldn't be anything to hog-up 1/3 of a GiB on every customer's phone, and it would always be up-to-date. Just don't ever lose internet access.
Does the app use SMS when the internet connection is lost?
https://en.m.wikipedia.org/wiki/Progressive_web_application#...
Its not exactly a web app but it could be made that way with WebRTC.
That their "app" is large is irrelevant to the scam.
It is weird that something as exploitative as Uber can't even stay in business in the long term. Eventually profit does matter and the weak stocks will be culled in the next panic. Uber will be one of them.
If your app is serving hundreds of cities with specific per-city customizations and all code and assets are in a single binary, life gets tough.
The localization files (50MB) -> All the strings files are double the size (unpacked), because of useless comments. There's 25MB already.
In the assets catalog -> half an MB for an upscaled (!) visa card. Other images where jpgs of heif are a better choice. probably in total 10-20 mb.
Strip all ICC/Gamma from the PNGS -> another 10mb.
pngcrush the images -> about 40%
And then of course the binary itself which is probably full of unused information.
Few app developers have the time/bandwidth to do these things and it would be a very inefficient use of resources to have everyone do it over and over again.
The image optimizations, precalculation should be done be apple. But the dev could use a lossy format for certain files. That’s up to the dev, not Apple.
Maybe it’s because I used to write some j2me games. Or some games when Apple only allowed 15mb I think. I had to optimize the hell out of my assets. Still I think I’d Lucy’s that certain apps are almost half a gb in size
Especially with A/B tests, because they are just temporarily
https://www.infoq.com/news/2020/04/uber-piranha-unreachable-...
I also know it first-hand as just last week I've been doing a mega-cleanup of years-old A/B test flags in our own code.
This is why some global apps have different apps for different countries. It's a trade off. Would you rather have a single fat Uber app, or have to download Uber India when you arrive there?
If I have to deal with the airport’s wifi... I don’t want to depend on downloading a 100mb binary over it.
Maybe they could have a local and global version available, but that’s already making it more complicated.
We use this for devices which don't have NFC. If the device doesn't support it, then there is no reason to download the module for identification via passport NFC scans.
When I traveled to India (pre-COVID), it was great that I could just open the app & get where I needed -- even if I wanted to travel via tuk tuk.
It was more annoying when I got to Ireland, had to figure out the app to use was MyTaxi and get it all working (including payments), particularly as a foreigner.
Do you have some examples? Genuinely curious. To my knowledge, most of the major FAANG apps are single-binary.
As an engineer? The first option too.
Unfortunately the way Apple and Google set up their walled gardens makes this impossible. I guess Apple would prefer if the Uber app dropped all of that and just made everybody use ApplePay instead.
When I'm opening Uber at 1am in the cold to get a ride home, this is not the time to download a payment SDK update.
I'd like to see them open up this possibility in a controlled way one day. Something like a review process for feature modules that could be updated in a similar process to full apps.
https://mobile.twitter.com/stantwinb/status/1336890442768547...
I can’t seem to find the old link, but I’m pretty sure it was on hacker news and somebody posted a nice collation of the posts.
Any discussion about software distribution will inevitably result in an argument about dynamic linking vs. static linking.
Hopefully, swift ABI stability will reduce that. The new bytecode stuff will help to reduce bloat, as well, but he notes that a lot of SDKs are used. In my experience, SDKs and dependencies often won’t work, compiled with bytecode. Hopefully, that’s changing.
That code repetition thing also happens when a lot of dependencies are used; which often reinvent the wheel. That just comes with the territory. It can be addressed by using highly granular dependencies, but that sort of flies in the face of why we use dependencies. One advantage that Uber has, is they are an 800-lb gorilla. They could contract for specific configurations of dependencies.
I’m not a big Uber user (but I’ve used it a few times). I think it’s a fairly well-done app, as a user.
This post is next level though, deeep optimization. All of it is just increasing the ceiling though, there is some limit on number of features Uber can offer in one app.. and eventually that limit will be reached, doesn’t sound like they are willing to accept being over the limit either. Wonder what space saving techniques are left in the box?
We parse Obj-C and Swift runtime metadata to determine size contributions of individual types and functions in your app. We use this analysis to post PR comments with granular size diffs to help devs write smaller, better code.
I tried it out on the Uber app and immediately noticed a disproportionate impact from their code-gen dependency injection framework, Needle. The codegen is responsible for over 30k classes in the app binary, and contributes over 10mb! In general codegen is a common problem with Swift binary sizes, and the fewer reference types generated the better, it even helps with startup time!
We’ve written a blog post with case studies about how 7 of the most popular iOS apps could reduce their size: https://medium.com/swlh/how-7-ios-apps-could-save-you-500mb-...
I read this as: Lyft installed Dark mode for 35mb. I can only imagine what my JavaScript modules are doing behind the scenes.
Launch HN: Emerge (YC W21) – Monitor and reduce iOS app size - https://news.ycombinator.com/item?id=26014180 - Feb 2021 (44 comments)
Currently they only deliver the binary for the device's CPU, and only the assets for the device's asset class. There's then some tech targeted at game devs for on-demand assets for things like game levels that you don't need all of on device at one time.
I suspect the limitations of this are around the binary not being subject to this, but maybe it could be. I can see a couple of options, one is some way of extending the asset classes to code features, so that the App Store doesn't have to download iPad screens for iPhones, etc. Perhaps this could be extended with either App Store account region or locale so that, Uber in this example could not include the Venmo SDK outside of the US where no one has heard of Venmo.
Or perhaps Apple could extend the on-demand assets to allow for some sort of plugin system, perhaps backed by Swift Packages, such that apps can on-demand decide they need the Venmo SDK because they're in the US, and download just that. I don't think we want a generalised package manager here, I don't envision that SDK coming from Venmo directly, but allowing an app author to upload all their separate packages if they want to.
With feature heavy, international apps such as Uber I'd expect this to dramatically improve things. I'm not sure whether this benefit would translate to that much demand across the whole App Store though as I think this matters more to a very few big apps. Apple is at that optimisation point in the iOS lifecycle though so perhaps it's worth it to them.
> The choice of Swift as our primary programming language, our fast-paced development environment and feature additions, layered software and its dependencies, and statically linked platform libraries result in large app binaries
but can somebody familiar with iOS development explain what makes app bundles so big? Actual CPU instructions or config can't contribute this significantly. The entire Bible is about 4.5mb. If you're writing an app by yourself you almost certainly didn't write that much text in the source code. A sibling comment links to https://news.ycombinator.com/item?id=25376346 which says that they have a lot of screens but even something like "PayTM (15+ screens)" is still just textual source code and config that I don't follow how it gets beyond kilobytes. The App Store places them at 309mb, so ~68 bibles.
I understand when games are large because they typically ship with images and videos included in the binary for game assets. But for a normal application where does the size come from?
Is it dependencies? (And how did _they_ get so big?) That weird intro video they have on the loading screen? Are they shipping bitmaps of the cities they have markets in?
I wonder if Uber is planning to do anything about that? The technique described in the article (whole program instructions outlining optimisation) is a band aid style solution, merely delaying the inevitable: the code produced by numerous teams independent of each other will inevitably cross first the download size limit threshold, and later maintainability threshold.
Includes, among other things: forcing Apple to increase cellular download limits, 45 seconds for letters to appear in XCode, 12 seconds to call main, rewriting the linker and so on.
Usually, sure. But sometimes there is a lot to do, and if I may, Uber is not your usual app. At the point where you're being very choosy about the access modifiers on your classes, you probably thought about icon assets already.
Someone elsewhere in thread linked to a partial list of concerns the app needs to cover, many of which are location-specific. You might say "well split the app by geography", but that just trades one set of problems for another, and that new set of problems could well be worse for the business overall. Paying a team of people to do this junk may be a whole lot cheaper than suffering a reduction in customer engagement when they fly to a new country and don't have the right app anymore.
You'd think, but many of the most popular apps accidentally ship these all the time. I think another comment mentioned that much of the code size seems to be coming from a code generation framework.
Made me chuckle. Maybe the authors should look at getting an ACM subscription.
What?
Linux has 30 million of C!
I'm speechless. I cannot fathom how & why.
It's a variant of the "I didn't have time to write you a short ____ so I wrote a long one instead" adage.
I would guess (but only guess) that this article erred on the side overstating size.
Why? If you have 100+ engineers at any given time, shipping features over a period of a few years, you'll hit 1M in no time.
It sounds like a lot, but it really isn't when you consider the amount of people working on it.
Now whether or not you can build the same thing with less LoC, probably. But it's not like it was built from the ground up with every piece of functionality planned out from day 1, so there will be inefficiencies.
Comparing it to Linux is pointless. Platforms should be relatively stable, products are ever changing and the shelf-life of the code is sometimes measured in weeks/months.
Apple has dropped limits for large app downloads on cellular. They now put up a dialog to tell the user how big the download is and if they wish to defer to a WiFi download.
I checked the size of the Uber app and it's about 300MB. Uber Driver is 232MB and Uber Eats is 228MB.
I think adding machine outlining into LLVM Pass pipeline is still doable with LLVM plugin (with new PassManager)...worst case just come up with a custom LLVM/Clang
Compiling that down to 200 MB isnt too shabby!
If your app has larger images, don't waste user bandwidth and optimize your assets!
> Overall, 5 rounds of outlining builds in 66 minutes — a 45-minutes addition to the baseline.
* It usually does not reduce the size of the file in transit, as most files are compressed for distribution, and even if they are not most http servers will use transparent gzip compression
* It does not actually reduce the size of files at rest since APFS (and HFS+ before it) support transparent decompression. The layer this is handled at is sufficiently low level most people do not even realize it is happening (stat(2) returns the uncompressed size, you need to look at extended attributes to see the real on disk size). Admittedly this does not handle binaries that are drag installed on macOS. You can find out more details here: https://github.com/RJVB/afsctool
* UPX slows down app launch because you know have to decompress the entire executable before you launch it, which means you need need to read the entire compressed executable from disk before you launch it
* UPX greatly increases the memory overhead of running an application. Because you decompressed it in memory all the executable pages are dirty memory that need to be kept in memory or written to swap. That means you immediately loaded everything into memory instead of just the pages you needed. Normally a binaries pages are brought in from disk as necessary, and because of that they are unmodified clean memory. The built in compression support compresses smaller blocks of the binary and thus can still bring them in individually (technically this reduces the compression, but the trade off of being able to keep page demand loading working is more than worth it).
* UPX makes the system perform worse under memory pressure. The fact that it generates decompresses the pages in userspace means that from the kernels perspective they are dirty. If the kernel needs to evict them due to memory pressure it needs to swap them out. Uncompressed files (or those compressed with the builtin filesystem compression) are clean memory, which means that under memory pressure the kernel can just through them out and then reload them from the file later, no need to write out the pages.
In the past there were legitimate reasons for tools like UPX, and there may still be on other operating systems, but it simply does not make sense on Darwin platforms.
Build times in tens of minutes seem terrible.
Glad to see it near the top - saved me a search.
Just curious.
App size can be measured in many ways like download size, install size, binary size, thinned size. I wrote about the most important ones here: https://docs.emergetools.com/docs/what-is-app-size
(I do understand that source code isn't what ships in the binary, but for the sake argument let's say they're 1:1 in size.)
Also, the Uber app has a LOT more features than you would expect at a glance, due to extensive customization of the experience (i.e. feature flagging) along many vectors, and so it wouldn't surprise me at all if this ends up adding to a lot of code.
Edit: Linked post from sibling commenter bhupy outlines this in detail.
The binary size is also from the same ballpark as the entire Windows 98 needed for installation.
I'm glad Uber is doing something about this, but in my opinion Apple should tackle this across their entire ecosystem at the toolchain level, devices with less than 64GB of storage can quickly run out of space with just a handful of applications installed.
Unfortunately it's in Apple's interest that people buy devices with more storage, so I don't expect them to invest much effort in this.
This point comes up in a lot of discussions about non-trivial software. My theory is that it's of the same nature as underestimating development complexity when planning your own work as an engineer. Project after project, everyone (me included) keeps forgetting that they _will_ spend 80% of their time (and code) dealing with small issues and edge cases in their product.
Who hasn't thought at some point that they can write a Twitter clone in a weekend, or hasn't been fascinated by the amount of simple bugs in someone's else product, thinking that they are obviously just bad engineers.
(1) if it takes a long time for people to update their apps, that's a crap experience that people are having on Apple devices, which goes directly against the grain of Apple's whole value proposition ("use our stuff and your life will be great!")
(2) For technical reasons, it's in Apple's interest to reduce app image sizes; less strain on infrastructure, easier to scale, etc. (300MB * 1.2M (# of app store reviews) = 360 terabytes transiting their networks whenever Uber pushes an update. All that has to be load balanced, CDN'd, etc.)
One worth calling out (and recently written about) is server-driven UI: https://artem-tyurin.medium.com/screenflow-an-unfinished-att...
The more the can make the app a "thin-client" (effectively just taking configuration from the server on how to display components w/in a Screen), the more product code they can pull from the app.
I mean I imagine no one person or team completely understands the entire app. Different people/teams are responsible for different portions of the app. Each team only needs to understand their modules and the few other modules they interact with.
It still boggles my mind that there could be ~100M meaningful instructions in a program.
WRT to Apple enabling this: I imagine developers could get into a bit of a versioning hell-scape there if they were decouple updates of different modules in their app (do you know if your app is working with FeatureA v1, v2, v3, or not at all? How about FeatureB?) If Apple were to do this it would look something like app extensions do today (separate binary stored within the IPA - that's possibly thinned out and rehydrated on device); probably with very little control over what's loaded (similar to how they did rollouts: this percent on this date and nothing else)
Why? I have no idea.
Google Pay is another one. They have a dedicate app in Singapore.
It seems like a lot of them went to single apps when they realized they could download data packs within the app. Stuff like Rick Steves guided tour apps used to be separate per city, but now it's a single app where you download the data for a certain city.
But I think you're right that all the major FAANG apps are single-binary.
Worked on iOS app size at a FAANG for a couple of years -- this is untrue. At the very least there are different binaries for watch vs iPhone architectures.
That pretty much doomed all these "local Uber competitors". Nobody would start looking for a local competitor, set up an account, get 2FA, register their payment method and then have the privilege of getting a "starting up" screen telling them how and where they can get a ride. Instead they just open Uber and get where they want.
Did it? How much of Uber's usage in a random big city is from travelers based elsewhere vs locals? And it seems like others have succeeded in some narrower regions (Grab, Didi, etc.).
Uber lost in pretty much every market where they had local competition. Are there any counterexamples?
The way this should work is when you set up payment option X, it downloads the relevant payment info & then you're set from then on without any other modules unless you add a new payment option. Likely pre-bundle the generally "global" options (Apple Pay/Android Pay since those are platform-native & credit cards since those are likely small implementations).
The real reason is that you will always have drop-off because the download phase is split in two (on the other hand you'll have increase in installs because the app size is smaller). That would need App Store integration with the loadable modules so that you could say "Install these payment features of the app". This may not be a win because again, it requires the user to do more work. Simple for everyday users will often win the day even if inefficient vs more optimal options that achieve that optimality by pushing complication to the user.
I recently was traveling. I landed at a new destination and checked the internet speed. Mobile via my tablet just outside the airport was theoretically as 50KB/second via Google speed check. However actually downloading a file from US servers was 5-15KB/second because of the latency (3000ms+) being so high that packets were constantly being dropped.
That's at best, 75 seconds waiting for a download. At worst it's 16 minutes.
On the other hand, I was able to get Uber on my phone and though it was painfully slow, it found a driver in under 30 seconds.
Yelp, for example, is what you might call a "straightforward CRUD app" (to Yelp engineers, I know it's probably legit complicated and hard), and that is 292MB on the App Store.
It's probably to do with how the framework handles lifecycle management and combining static assets like text and image with business logic that lives in Controllers.
These are code. Swift is a safe language with more runtime checks than other "zero-abstraction" languages. It also support "value" semantics and can deploy monomorphization for generics (although no guarantee). All these means you can have functions with slightly different view models duplicated many times throughout the binary.
Not to mention the language itself need to generate a lot of retain / releases for refcounting purpose (the blog post also pointed this out).
All in all, Swift as a language is not particularly optimized for small binary sizes, and there are a lot of trade-offs made to improve the usability rather than binary size. That has been said, there can be more opportunities exploited (and right now not) to reduce the binary size from compiler side.
That sounds so weird to me... What do you mean cars don't have seat belts?!
I can confirm the OP's experience albeit in a different country (Mexico). In some regions, I haven't been in a single street taxi with working seat belts. Indeed, I've been in some private cars without them as well, or with more physical space for passengers than there were seat belts available.
Uber vehicles, though, have always provided those safety features. Some drivers work as both street taxi and Uber drivers (as they frequently cross into or live in regions without Uber but drive people to places that have it), so that quality assurance can trickle down in some cases. It honestly goes beyond seat belts though; a Uber car is more likely to have AC, electric windows, etc. than your average taxi, even in ridiculously hot parts of the country.
Maybe because it was in Chennai, a southern state and not in Mumbai or Delhi.
Unless India? [1]
[1] https://www.theguardian.com/technology/2017/jun/08/uber-exec...
The fact that Uber devotes engineering resources to serving the tiny 1% slice of users who care about having an app that works seamlessly across dozens of countries, at the expense of the much larger number of users with limited space on their outdated devices, is really emblematic of the Valley's out-of-whack priorities.
An app for travel should absolutely prioritize UX and ease of use as you travel, however far away your destination is.
For some, Taxi works well enough and is (perceived as) licensed/trusted/reliable so they don't have a need to use Uber. Others are bit of Luddites, or mistrustful. But I guess friction of Taxi / benefits of Uber just aren't high enough :-/
(Personally, I've only ever used Uber on specific travels; for 99% of my transactions, Taxi has been easier/more reliable. Don't get me wrong, I think Taxi licensing/medalion model is outdated, the drivers have worked incentives, and cars aren't as maintained as well as they could be. But I still normally don't find a benefit in Ubering).
Finally, FWIW, even traveling within country, I've noticed significantly different screens/features/options in, say, Ottawa or Toronto airports and vicinity. So I think overall a lot more people benefit from this monolithic model than may be immediately apparent.
I would imagine that travellers are an outsized percentage of high spenders, even if they are a small portion of users.
Not all customers are equal. And if you build your app or site without knowing that you may well chase the wrong features.
Map apps let you look at any city in the world without needing all of that data inside the core app, and if you have enough data to use the Uber app wouldn't you almost certainly also be fine to have it download in the background the required info (coordinates of where pickup is or isn't allowed, specific instructions message to display, etc.) the same as it receives information about local pricing, location of available cars nearby and so on?
Then there are 100 guys with outdated devices who use UberX once or twice a month to get back home from the pub, probably splitting the ride with their friends.
I bring in more money than the latter group of people.
Uber Pool is no longer available due to covid; but hopefully it will come back as vaccination rates go up.
For a significant majority of users, if it doesn't work well out of the gate, it's broken.
As long as the airport rules can be stored as data, not code, then iOS would be fine with pulling them.. but at that point you have a data blob saying effectively "in country X, you can use payment Y", and you still need code that knows "payment Y means enable this section of the app and use this library"... and you can't download that library at runtime on iOS.
So my guess is that the code implementing regional rules have to be bundled with the app due to apple's restrictions, and the associated data that could be dynamically downloaded is to small to not just bundle it too.
You need to think about the larger picture.
Thinking of my experience with Uber at airports, it's generally stuff like "pickups from this terminal are only available at the west exit" (show on map - so needs coordinates and message text), or "There is a £10 airport pickup fee for all private hire vehicles here which will be added to your bill", or whatever along those lines.
But perhaps your point still stands even if it's KBs rather than MBs.
A quick summary like "extra fee for airport pickup: X, pickup location restricted to: Y, etc." should be pretty comparable.
And for the specific UX, sure but isn't that just a single feature for airports in general, and the app can pull the up to date airlines/terminals data for the specific airport when needed?