Fly.io makes infrastructure easy for developers

Fly.io makes infrastructure easy for developers(blog.chiselstrike.com)

185 points by penberg 3 years ago | 121 comments

lbriner 3 years ago |

Sadly, we have all seen these promises, "X makes Y much easier" but you cannot make complex things easy without removing lots of functionality.

Kubernetes for the basics is actually pretty easy despite what people say, I got a fairly simple cluster running with little pain but...it doesn't take long before you want multiple clusters, or vlan peering, or customising the DNS or.... and that is when it becomes complex.

What will fly.io do? Probably what everyone else does, starts simple, becomes popular and then caves in to the deluge of feature requests until they end up with an Azure/AWS/GCP clone. If it stays simple, lots of people will say that you will outgrow it quickly and need something else, if you increase functionality, you lose your USP of making infrastructure easy.

I think perhaps the abstractions are the problem, if you are abstracting at the same level as everyone else e.g. docker images, orchestration etc. then I don't understand how it can ever work differently.

To make my point, the very first comment below (above?) is about container format, a really fundamental thing that noobs are not likely to know about, they will just immediately have some kind of error.

mrkurt 3 years ago | |

You definitely can't make complex things simple just by removing features.

What we did, instead, was built low level primitives, then built opinionated PaaS-like magic on top of those.

If you're running a Phoenix app, `fly launch` gets you going, then `fly deploy` gets you updated.

If you want to skip the PaaS layer and do something more intense, you can use our Machines API (or use Terraform to power it) and run basically anything you want: https://fly.io/docs/reference/machines/

We are very, very different than k8s. In some ways, we're lower level with more powerful primitives.

We probably won't build an AWS clone. I don't think devs want that. I also don't think devs want a constrained PaaS that only works for specific kinds of apps.

I think want devs want is an easy way to deploy a boring apps right now, and underlying infrastructure they can use to solve bigger problems over time.

knicholes 3 years ago | | |

I'm a dev. That's what I want. I want something publicly visible that I can get running in the first sprint so I can show everyone what my idea is looking like as it progresses. I want setting up the DB and certificates to be easy. I want a static IP to point my domain to. I want it all to just work if I have the source code and I tell it to deploy. I want secrets to be included so I don't have to stand up Vault or something. I don't want to set up my own Jenkins. I don't want to deal with resizing volumes in k8s. I don't want to dick with networking rules/configurations that allow my frontend to talk to my backend and my backend to my DB.

I also don't want to set up my own log aggregator, grafana, and prometheus/alert manager, but for a quick "show everyone your app", I don't need those. I can add that harder crap later when the app shows promise and I actually need to debug performance.

chrisabrams 3 years ago | |

> What will fly.io do? Probably what everyone else does, starts simple, becomes popular and then caves

No, mrkurt will not cave, I can guarantee you that. Fly will be a platform that says no to feature requests that don't make sense for their customer base.

I have no affiliation with Fly, other than I've used it on and off since the beginning of the platform's existence. They're a veteran team that knows how to build platforms. I definitely trust them to go in the right direction with their roadmap, and all my new projects go on Fly.

pier25 3 years ago | | |

> No, mrkurt will not cave, I can guarantee you that.

I concur.

Fly is not your typical startup with dreams of becoming the next big corp monster.

They are just a bunch of talented people with a vision having fun making cool stuff.

Thaxll 3 years ago | | |

You should look at the Multi-region PostgreSQL feature they did, not their best work imo.

https://fly.io/docs/getting-started/multi-region-databases/

Dma54rhs 3 years ago | | |

Have other people invested money in them? If that's the case, sooner or later they don't call the shots, but rather who owns the capital and wants it to grow.

ignoramous 3 years ago | |

> it doesn't take long before you want multiple clusters, or vlan peering, or customising the DNS

Then, Fly is not for such an application. Just not yet. I mean, we wouldn't buy a snowboard and complain we couldn't go skiing. Different tools.

The point really is, for a similar thing that which Fly is capable of (and other NewCloud services to an extent like railway.app, render.com, replit.com, convex.dev, workers.dev, deno.com, pages.dev, vercel.com, temporal.io etc), you're better off NOT using AWS/GCP/Azure. I certainly have found it to be true for whatever toy apps I build.

paulgb 3 years ago | |

> Sadly, we have all seen these promises, "X makes Y much easier" but you cannot make complex things easy without removing lots of functionality.

There's certainly a limit, but it makes me so sad that developers see the current state of orchestration and say “welp, it's a complex problem, guess this is as good as it gets” (not you specifically, but it's a common sentiment on HN.)

Sure, there will always be use cases that require getting down to a lower level, but there's definitely space for reducing complexity for common use cases.

diceduckmonk 3 years ago | |

> "X makes Y much easier" but you cannot make complex things easy without removing lots of functionality.

The question is whether customers need or want that "functionality".

tootie 3 years ago | |

Yeah it's neat but idk who the target market for this is. I can run a custom-designed site with all sorts of features using WordPress.com or Shopify or Hubspot. Once I need a real platform and backend services, you'll need a team that can spin up Docker images in AWS or the like and the full gamut of DIY platform tools. Platforms like fly.io appeal to semi-mature orgs but they will either die or outgrow it pretty quickly.

ngrilly 3 years ago | |

It sounds like you’re commenting in general terms without having looked at what Fly.io is actually doing. Yes, choosing the right abstractions is the problem, and what makes Fly.io really interesting is that they chose different ones. It’s really well explained in their docs and blog posts.

ipaddr 3 years ago | | |

Having to read through docs and blog posts to understand what unique abstractions doesn't help those asking. I've read through docs, blog posts, etc. It's for developers who don't want to setup and maintain an environment. Other providers offer this. You end up paying far more than a five dollar droplet in most cases and a lot more than the free Oracle tier offers but much less compared to big cloud like AWS. They offer a free tier and a community helping onboarding. The goal is ecosystem lock-in and they provide enough to win over a certain group in the middle. The fear is the freebies given today will be paid for by the lock-in effects tomorrow.

mroche 3 years ago |

Spent last night migrating a Discord bot from Heroku to Fly as a result of the upcoming closure of the Heroku free dynos. Overall fairly painless, though I opted to provide my own images rather than use a source-to-image pipeline and discovered one little quirk: images require a Docker V2 formatted manifest. I use Buildah and Podman in my workflows which default to the OCI format. This was simple enough to solve once I figured it out, but I only found one or two forum posts on it and spent a lot of time trying to figure out why the deployment couldn't find an image that I manually pushed to registry.fly.io.

bongobingo1 3 years ago | |

For others, you can adjust the format with `podman build --format=docker` (or set `BUILDAH_FORMAT=docker`) or possibly you can push it separately?

https://docs.podman.io/en/latest/markdown/podman-manifest-pu...

Wonder why they don't use OCI format.

mroche 3 years ago | | |

Yeah, I use Buildah for creating my images, so the environment variable or specifying commit --format docker would work. Before I did that I just re-pushed the image itself using the v2s2 format, so:

    # podman inspect quay.io/my/image:latest | jq ".[].ManifestType"
    "application/vnd.oci.image.manifest.v1+json"
    # podman push --format v2s2 quay.io/my/image:latest registry.fly.io/my-app:latest

TurningCanadian 3 years ago |

There are a bunch of rough edges.

Your server doesn't have a static IP for outgoing requests, so to use it with RDS, you can't just open up a port on the RDS side. (They want you to set up your own proxy) https://community.fly.io/t/set-get-outgoing-ip-address-for-w...

The NFS kernel module isn't installed, so you can't use EFS. (They suggest some 3rd party userland tool)

They expect you to set up VPN access with Wireguard for any connections to your containers. You can't just SCP your files to a volume. It's so much more hassle than connecting with kubectl cp or scp, especially if you're hoping to script things.

All that said, I'm happy to see competition in the "we'll run your docker image" space.

tptacek 3 years ago | |

This all seems fair. Just a note: we don't generally expect our users to "set up" WireGuard --- we bake it into `flyctl`, our CLI, which automatically brings up WireGuard sessions as needed. A month or so ago I merged a PR to add sftp support to our SSH server; sftp probably works now with "native" WireGuard, and when I get unburied I'll work out an sftp client interface for `flyctl` itself.

It's a rough edge, to be sure! I just wouldn't want to leave it as "we think the status quo is the right way to handle getting files to and from instances".

arkadiyt 3 years ago |

I like Fly but can't migrate my apps to them because they don't support cron jobs - they've been promising it for more than a year now: https://community.fly.io/t/recurring-scheduled-tasks-like-cr...

albertgoeswoof 3 years ago | |

I run cron jobs on fly, using a process. Eg with a rails app running Puma:

  [processes]
    web = "bundle exec puma -C config/puma.rb"
    worker = "config/cron_entrypoint.sh"

And the shell script looks like this:

  #!/bin/sh

  printenv | grep -v "no_proxy" >> /etc/environment

  cron -f

Although I would say that cron isn’t a great solution for containerized apps on most platforms, it seems like scheduled processes need a rethink for todays infra.

tonyhb 3 years ago | | |

We're building serverless scheduled processes, queues, and event-driven systems at https://www.inngest.com/. It's early days, and we agree — they've needed a refresh to adapt to modern practices.

shortdiv 3 years ago | |

We just released scheduled machines that will help with this. It's similar to the scheduler add-on in heroku and supports general intervals (hourly, daily, weekly, monthly). Feature announcement here -> https://community.fly.io/t/new-feature-scheduled-machines/73...

pier25 3 years ago |

I've been using Fly for 2 years now. Overall I'm very happy and I recommend it to everyone.

If you just want to run a container in multiple regions with anycast, Fly is really the best option out there IMHO. Nothing comes close.

There are some rough edges for certain use cases but they keep polishing the service and the DX keeps getting better.

Personally the only features I'm missing today are:

- PG backups/snapshots. AFAIK these are coming in the form of virtual disk snapshots.

- Scale apps from zero to say 100 VMs like Cloud Run does. There's some autoscaling right now and the machines API, but still needs more polish. Specially for certain use cases like concurrent CPU heavy tasks (video encoding, etc). AFAIK some form of this is also coming in the next months.

aaronbrethorst 3 years ago |

"Although everybody talks about how Kubernetes, for example, is complex and hard, most companies out there use it"

I am very skeptical of this claim.

tyingq 3 years ago | |

Me too, though "Most of the Fortune 500" or similar is probably true. It tends to find it's way in somewhere if the org is large enough.

mrcwinn 3 years ago |

I absolutely love Fly. Cannot recommend them enough for the right use cases.

Kydlaw 3 years ago | |

Could someone elaborate on the right "use cases"?

Just curious about other's people definition, as I imagined Fly.io as a more self-served BaaS (mostly for web applications)

janee 3 years ago | | |

My guess would be something like:

- startups, small teams, no devops.

- horizontally scalable monoliths.

- no complex infra needs.

Haven't used fly.io but strongly considering migrating over from Heroku and above is roughly what I would say is a good fit for Heroku so hopefully it's the same on fly.io

kasey_junk 3 years ago | | |

The big win is web applications that are read heavy and have geographically dispersed users.

tschuehly 3 years ago |

The thing is about all these tools is that starting out with a simple VPS is much more cost effective? https://contabo.com has 4 cores for 5€ and you can scale up to 10 cores and 60 gb ram for 27€ a month. And if you need dedicated cores 12 cores with 96GB is 130€ vs 8 dedicated cores with 64 GB ram for 550€

Choose a docker image and just docker-compose up your application.

If you outgrow that, you might aswell switch to kubernetes and aws/gcp/azure

kasey_junk 3 years ago | |

If you don’t need edge compute (which you do if you have customers dispersed geographically) then what you say is true.

But if you do no amount of Kubernetes on the old school cloud providers is going to get you there. You will encounter the hard problems fly solves for.

akrymski 3 years ago | | |

Do most web apps need edge compute? Don't most backends talk to a central DB server anyway?

Unless data is geographically sharded as well, is there really a benefit? Collaborative apps perhaps?

netcyrax 3 years ago | |

Is this contabo reliable? Their prices are extremely low (way more than Hetzner).

zinclozenge 3 years ago | |

How are they offering such low prices? Overprovisioning users?

dangerbird2 3 years ago | | |

One possibility is that they have fewer regions than something like AWS, so they can put their data centers somewhere where they get favorable electricity/cooling costs.

shafyy 3 years ago |

Has somebody used Render and Fly and can comment on the differences? I switched from Herokku to Render, and really like the good UX and docs. Never tried Fly though.

mrkurt 3 years ago | |

Render is very good. People seem to prefer us (Fly) if they like using a CLI, and they prefer render if they'd rather have a full in browser experience.

If you're trying all the things, https://railway.app/ is also a good option.

Daegalus 3 years ago |

I tried Fly recently for the first time to run a few tiny servers for the challenges on protohackers.com

It has bee pretty great and painless. I have my own docker servers, and so on, but I don't have local registry setup, so dealing with getting images moved around, and all that hassle was going to be annoying.

Made a free fly.io account, did the `flyctl deploy` after making the config file and it just worked moments later. Really a nice flow.

Not sure yet if I would use it for anything else, but this was nice and easy so its definitely on my list of things to check for new projects.

odiroot 3 years ago |

I move my private projects from Heroku to Fly.io. It wasn't as easy as advertised, but oh well, it works now.

There's one thing that's straight out frustrating: no support for easy environment variables management. Yes you can add secrets but it's hard to read them back. Not everything is a secret, e.g. log level.

tptacek 3 years ago | |

You can set environment variables directly in your `fly.toml`.

https://fly.io/docs/reference/configuration/#the-env-variabl...

You can't read secrets back once you've set them. That's because we can't read secrets back once we've set them --- at least, our API can't. The API has write-only access to our secret storage.

odiroot 3 years ago | | |

I'm fully aware of this but it doesn't help. I'd have to commit every change to variables into Git.

Environment is not code. And Heroku had it figured out.

And mind you, Fly.io is trying to sell itself as natural Heroku competitor.

diceduckmonk 3 years ago |

It seems Fly.io originally focused on Elixir, but has identified the product-market need for app server + Postgres.

We didn't know about Fly.io when we chose GCP for this setup.

The initial setup on GCP was exceedingly painful. We used CloudRun for our app server, with the value prop being that "it just works". It didn't. Our container failed to start with zero logs from our servers. Stackdriver was of no help. Eventually we found a Stackoverflow thread revealing that CloudRun didn't like Docker images built from Macs. As always, GCP's official docs and resources are incoherent. GCP docs address a hundred things you don't care about, and the signal-to-noise percentage is in the low teens, if we're being generous. We had to chase down half a dozen bureaucratic things to get our CloudRun app to see and talk to CloudSQL. Apparently with Fly.io, you just run a command to provision Postgres, and pass in an environment variable to your app.

We consoled ourselves that GCP was difficult to setup but now it's set-and-forget. This is also a lie. This week we saw elevated and unexplained 5xx. First was CloudRun randomly disconnecting from CloudSQL. As AWS measures reliability in terms of 9s, the way GCP DevRel responded to this bug is that this is a distributed system and therefore acceptable that things just fail a reliably human-reproducable 1%+ error rate. Yesterday we saw botnet traffic scanning for vulnerabilities on our app. This happens if you're on the web, not inherently GCP's fault. We have GCP's Cloud Load balancer setup but it's not very smart. We were able to manually block specific IP addresses but it's no where as meaningful as Cloudflare. Not a fan of Cloudflare the company but their products address a need. The botnet somehow knocked over our "Serverless VPC connection" to CloudSQL. Basically what that is is a proxy server that you are forced to setup because CloudRun can't actually talk to cloudSQL. All the auto-scaling claims of GCP's serverless are diminished if we are forced to introduce a single point of failures like this in the loop. That serverless VPC connection requires a minimum of 2+ VMs, so the scale-to-zero of CloudRun is no more.

Our experience with GCP is constantly having to come up with workarounds and addressing their accidental complexity. This should not be the customer's problem. For example, CloudSQL doesn't have an interface to query your databases. If you use a private IP for security, you can't even use GCP's command line tools to access this. We found out that GCE VMs are automatically networked to talk to CloudSQL. We ended up creating a "bastion" GCE VM instance and setup Postgres CLI tools in order to do ad-hoc queries of our DB state. For this, we just needed to the cheapest VM but GCP makes even this difficult. As for Stackdriver, it's still been an annoyingly painful UI.

vasilakisfil 3 years ago |

I skimmed through their docs but couldn't find whether they:

* support for abstract apps, not only HTTP/web apps like heroku, let's say I want to deploy a SIP app

* support for HTTP/2 and potentially HTTP/3

If they do support these two I would say it's enough to be considered Heroku killer.

mrkurt 3 years ago | |

You can route arbitrary TCP or UDP services through our load balancing layer just fine. I'm not sure what SIP actually needs, but it might work. We don't currently have a way to route TCP connections directly to individual VMs, so stuff like WebRTC doesn't work.

We do support HTTP/2.

turtlebits 3 years ago |

Coming from a devops background, I kind of wish they didn't support arbitrary docker containers, as I found the experience fairly painful (going through one of their guides), and soured me from using Fly at all. Maybe it was the app's fault, but troubleshooting was not intuitive at all, especially on the networking side.

TillE 3 years ago | |

Unless you're doing something that is explicitly supported with flyctl's automatic setup, I found writing a basic Dockerfile was far far easier than messing with buildpacks.

digianarchist 3 years ago | |

You'd prefer if they only supported build-packs/nix-packs? I can see the appeal from a simplicity standpoint but supporting containers expands the usecase significantly.

site-packages1 3 years ago |

I had a good experience doing the test app with fly, I’d definitely consider it for something internet routable and non-complex like a bot or something. Very excited for when I can run more complicated workloads such as workloads started with Docker Compose, without having to implement all the Docker Compose functionality myself.

bergenty 3 years ago |

I’ve been seeing fly.io all over this site, definitely have to check them out going by their blog posts.

andy800 3 years ago |

I think Fly.io would be a great platform for my apps but sadly they do not support the JVM. Sure, I could Dockerify my apps but at that point a simple VPS isn't much different.

mhh__ 3 years ago |

Fly made infrastructure really hard for me because their hypervisors cpuid emulation broke my program.

Yes, fly employees, I will file a bug somewhere - or email me.

mrkurt 3 years ago | |

That's interesting. I don't think we've run across that before, definitely send us more info!

itake 3 years ago |

Does fly recommend any good tools for measuring client latency (which is their big selling point)? I know they offer graphs for server latency.

mrkurt 3 years ago | |

So far, people seem to get the most out of the `performance.timing` API and a metrics package. There are very expensive real user monitoring tools to do some of this, but that browser API makes it really easy to just wire up an app and collect whatever you care about directly.

itake 3 years ago | | |

Unfortunately, many of my users are in Asia. It is challenging to audit their user experience.

rickette 3 years ago |

Anyone experience running Java/JVM apps on Fly?

rurban 3 years ago | |

Their free tier is limited to 256Mb, so'll need a bigger one.

ransom1538 3 years ago |

Ah. Full circle back to heroku.

jillesvangurp 3 years ago |

Docker containers are good deployment unit. At this point producing them and publishing them to a container registry is pretty straight forward. That's not something that requires devops people. Any reasonably senior backend developer ought to know how to do this or can learn how to do it in a few hours.

The rest is just deploying and running containers. There are lots of ways to do that. I loved using Google Cloud Run a few years ago. Stupidly easy to get started with and flexible enough for many things. With some service discovery on top, it's perfect for a lot of stuff. Add some managed middleware & databases to the mix and you essentially have a close to zero ops CI/CD capable environment. No devops needed for this either. When I did this for the first time, I was up and running with our dockerized app in about 15 minutes. Most of that was just waiting for builds to finish.

I'm CTO of a company currently and I've gotten sidetracked with enough lengthy and super expensive devops type stuff in past projects that I'm on purpose avoiding to go near certain things not because I can't do it but because I don't think these things are worth spending any time on for us right now. So, no terraform, no kubernetes, no microservices. I just don't have the time or patience for that stuff. We run a monolith. So, there's not a lot I actually need from my infrastructure. I need it to be fast, secure, and resilient and be able to run my monolith. But I don't need to have things like service discovery, complicated network setup (bog standard vpc is fine), and all the other stuff that devops people obsess about.

We use a load-balancer, I clicked one together in the Google UI. It's fine. Ten minute job. Doesn't need terraform scripting. We have two of them. And we have a couple buckets and our monolith behind that. I could grab the gcloud command that recreates this thing and put it somewhere. But I have more urgent things to do.

For deployment we use simple gcloud commands from github actions to update vms with new instance templates to tell them to run the latest container that our build produced. We started with cloud run but our monolith has a few worker threads that we don't want killed so we moved it to proper vms. Very easy to do in Google Cloud.

Our deploy command does a rolling restart. We have health checks, logging, monitoring, alerting, etc. Could be better but it works. Initial provisioning of the environment was manual and we scripted together all the commands that are part of our deploy process for automation. We added a managed redis, database, and elasticsearch to this. None of that was particularly hard or worth automating to me. Yes, it's bit of a snowflake. But not that complicated and I documented it. So, we can do it again in a few hours if we ever need to.

The dirty little secret of a lot of devops that it's a lot over over-engineered YAGNY stuff that is super labor intensive to setup and maintain and you end up using it a lot less often than people think.

This is why freelance devops engineers are so in demand: this stuff just requires a lot of manual work! Companies need these people full time and usually more than just one. The devops alone can add up to hundreds of thousands of dollars/euros per year.

It's a lot of manual work that probably should be automated. However, hiring a lot of people at great expense to automate things that are cheap and not that complicated is not always the best use of resources. I've seen companies that spend an order of magnitude more on devops salaries than on the actual hosting bills. If you think about it, that's kind of weird to be spending so much for so little gains. And most of these companies are not particularly big or experience enormous scaling issues.

Scarbutt 3 years ago |

Without PITR for postges I don’t think so.

melony 3 years ago |

There is nothing easy about them, cheaper yes (only for the lower instances), but not easier. Until they have first-class, evergreen, support for Heroku buildpacks, they are a subpar replacement.

They are also missing a proper managed database with point-in-time backups through the web UI like those offered by most proper PaaS services.

mrkurt 3 years ago | |

We aren't going to support Heroku buildpacks. We are working to make more frameworks easy to launch with minimal configurations. Which means – hiring people to work specifically on those frameworks + docs + builds.

We did bake nixpacks into our CLI recently, they seem better for our particular environment than buildpacks. Railway.app did a great job with these: https://nixpacks.com/docs/getting-started

We're working on managed databases, but we're not doing them like Heroku did. We just launched a preview of managed Redis with Upstash: https://fly.io/docs/reference/redis/

This seems like the future of managed databases on a platform like ours. There are companies that build very good managed database services. We're getting to the size where these people will work with us. Getting well managed DBs onto the platform is basically what I'm spending all my time on these days.

Incidentally, we're a lot cheaper than Heroku because we run our own infrastructure.

samwillis 3 years ago | | |

I’m sure you already have something in the pipeline for fully managed Postgres on Fly (I Really want something Heroku Postgres like, but on Fly). If not, have you reached out to Crunchy Data? They seem to have a solid product.

If you happen to have a service beta testing anything I would be interested in joining it.

maxioatic 3 years ago | | |

Do you have plans for Rust (and specifically web frameworks like actix-web)? I see Rust on your main site under "Use the Tech You Love" but there's no dedicated docs.

pyentropy 3 years ago | |

> point-in-time backups through the web UI

Is `fly postgres create --name restoredDb --snapshot-id backupId` that hard that's it's a deal breaker?

> support for Heroku buildpacks

I haven't tried it but there's some buildpack support: https://fly.io/docs/reference/configuration/#builder

melony 3 years ago | | |

It's not a managed database, if things break at 3am, you have to fix it yourself. The buildpacks are an afterthought using third-parties. A real evergreen solution will track the official Heroku solution (since fly.io doesn't even bother to document their buildpacks) to the nearest day. The whole point of the P in PaaS is that somebody else does the DevOps. If I have to do so much DevOps, then there's no point in adopting Fly. I am getting the impression that users here only care about their shiny blogposts and content marketing, versus the actual experience as a business user.

pier25 3 years ago | |

It's not Fly's fault if Heroku decided to remove their free tier. They do have buildpacks for the most popular stacks. Eg: You can deploy a Node app really with just `fly launch`.

As for the managed DBs, one of their founders was from Compose, so yeah they know how these things work. But AFAIK Fly doesn't have much interest in DBs, their focus is really in VMs.

Karupan 3 years ago | |

Looks like you just want Heroku. And I’ve never seen the Fly team mention that they are aiming to be a Heroku replacement. So maybe it’s just not for you?