Reclaiming the lost art of Linux server administration(pietrorea.com) |
Reclaiming the lost art of Linux server administration(pietrorea.com) |
There is not a day going by where a recruiter doesn't tell me "we are urgently looking for an experienced Linux sysadmin. Are you interested?"
I will steal this.
As for the term "DevOps", I am never sure what people mean when they use it. You seem to be using in contrast to traditional linux sysadmin. What exactly does DevOps mean in your definition?
Some seem to think a "DevOps" is a developer who knows how to administrate servers (or vice versa), as a modern term for a general IT person who can do anything, from programming, to firewall administration and repairing the printer.
Another definition is more specific, DevOps means in this case: working with CI/CD tools, programming "infrastructure as code" (Terraform, Ansible, etc) and doing all things "agile". This job is mostly cloud focussed.
Or the absurd prices for stuff that basically does not make sense.
Or for practices that would otherwise be absolutely unlawful but that people let be because cloud providers are just too big to fight.
For business related services I use root servers hosted by e.g. Hetzner. I don't want to deal with hardware maintenance nor the 24/7 power bill.
For private stuff (pictures, videos, movies) I have a cheap old desktop machine at home with lots of storage running Ubuntu. Easy to administer, and I can switch it off if not needed. Data is mirrored and snapshotted.
For long-term backup I encrypt my data and upload it to Amazon Glacier Deep Archive (around 1$/TB!)
That said the cloud in general is great and you can do some things today for cheap that weren't possible for most companies 10 years ago. For some use cases it's the best choice.
In general a lot of workloads can be served orders of magnitudes cheaper than 10 years ago.
Any good resources / practices on making your server safe? and maybe not those kernel level tricks
also automated deployment
so I can commit and it'll be deployed on the server
I thought about using GitHub Actions so when I push, then the server receives HTTP Ping and clones repo and setups the app
Don't use "here documents" or "here strings" for passwords. Even in bash versions as recent as 2020, they create a temporary file within `/tmp/` with the secret inside. If the timing is unlucky, it will get written to disk and therefore leave permanent traces even after reboot. Only shredding will securely delete the data.
In my opinion: when you have choice, get to know all the options (within reason). I have Apache as my default, purely because nginx didn't exist for many years. When nginx turned up, I gave it a while to calm down and now I deploy it quite often. I deploy something like 75% Apache and 25% nginx.
I tend to Apache from inertia but I quite like the clean easy setup for a simplish site with nginx - this is with Debian/Ubuntu style defaults, which do not favour nginx.
It's like manufacturing tires without knowing how an engine works. Don't you want to know how torque and horsepower affect acceleration and velocity? How else will you know what forces will be applied to the tires and thus how to design for said forces?
The total flexibility of such a server (compared to un/managed services) is a (great) bonus (not only at the beginning).
Everything else is per-software files configuration and running commands from the software setup documentation.
Plus, I would run a server with a DE simply because I want to be able to look into databases with a GUI and do config files editing with a nice text editor.
Or, the way things are going, systemd, systemd[1], systemd[2], systemd[3], systemd[4] and systemd[5].
[1] https://www.freedesktop.org/software/systemd/man/journalctl.... [2] https://www.freedesktop.org/software/systemd/man/systemd-jou... [3] https://www.freedesktop.org/software/systemd/man/systemd-mac... [4] https://www.freedesktop.org/software/systemd/man/systemd-log... [5] https://www.freedesktop.org/software/systemd/man/systemd-nsp...
There's much less of a margin for error now.
I think this is ideal, but I've yet to be able to do this or see a solid example.
At my old job I had to do exactly this, and it was really hard to get things right.
I'm much more seasoned now, but I still don't think I could do it lol
Interviewer: That's nice, but how much AWS experience? :(
it would up the status of the industry overnight if everyone was at this level...
When it comes to companies I mostly support cloud these days but when it comes to me and my family I accept every downside and host as almost all of our digital lives in a 42u rack in a gutted closet in our house with static IPs and business fiber.
I know where our data lives and no one can access it without a warrant and my explicit knowledge. I also save myself several hundred a month in third party cloud provider fees to host the same services and can reboot upgrade or repair anything whenever I want, but in general no more maintenance than cloud servers . I also never end up with exciting bills when experiments are forgotten about.
You pretty much get all the pros and cons of home ownership. For me it is mostly pros. Also keeps me dogfooding all the same practices I recommend to my clients.
IIRC we ended up using it as a disposable replica for some non-real time but heavy operations.
API access for managing configuration, version updates/rollbacks, and ACL.
A solution for unlimited scheduled snapshots without affecting performance.
Close to immediate replacement of identical setup within seconds of failure.
API-managed VPC/VPN built in.
No underlying OS management.
(Probably forgot a few...) I get that going bare metal is a good solution for some, but comparing costs this way without a lot of caveats is meaningless.
My wife is quite concerned about her being left with a web of home automation, hosted emails etc. I understand her and I am trying to find a way out.
My current idea is to document how to de-automatize the home and how to deal with emails and fiber access (the main things to worry about).
Any ideas are very much welcome
Technically option 3 should be the best since it also engineers around generations (options 1 and 2 would roughly be locked to a narrower age group). But it also can be a double edge sword, what if they don’t like tech? Or if you overdo it with trying to make them interested in tech and self hosting and it backfires?
So yeah, no real solution yet. But I’d subscribe to that newsletter if there was one
Maybe some kind of script(s) that could be run that just do all the de-automation?
People who are experts in cars can own very expensive cars and tools to tune them.
People who have been working in music can have very expensive instruments and expensive headphones, microphones, sequencers, etc.
We seem to be looking down on experienced "computer experts" and wanting to take their tools away. It's been grinding my gears lately.
> You pretty much get all the pros and cons of home ownership.
And that's the heart of the matter. Everyone here is arguing in circles based on their feelings about cloud vs their feelings on bare metal (and from what I can tell, pride in their own abilities), but at the end of the day it's a cost-benefit tradeoff. Everyone picks that tradeoff for themselves. As my immigrant parents are getting older, I'm thinking of getting them a business internet line and a SIP phone in their house so that if they need me in an emergency (health or otherwise) they can reach me quickly/reliably. It's something I'm weighing based on the cost of the service/infrastructure/maintenance against my parents' technical and social capabilities (which are limited as older immigrants from a non-Western country.)
It's all about tradeoffs. As computing professionals we have the knowledge and the skills to uniquely take advantage of these tradeoffs in our personal lives. Much like I know quite a few cabinetmakers who do actually make furniture for their house with their skills, their tools, and the shop they work in. That doesn't stop most people from buying furniture from Ikea or most businesses from using cloud hosting.
My closet of RPi's are quiet.
Your main point still stands though, not everyone can or wants to do that.
Then there's a set of about six or seven $4 per month KVM VM at geographically distributed offsite 3rd party hosting companies for things like secondary authortative nameservers and such.
And an offsite backup system that mirrors everything on the system in colocation to a medium sized disk array that lives in the corner of a closet in a family member's basement.
I don't think systems administration is a lost art, people just wanted to do things easier for less money and staff, and no one can blame them or the industry as a whole. The art is still right there, linux hasn't changed that much in 20+ years, I still have my first discs. It also takes some experience to "feel" your way through systems administration problems, working from the bottom up of course, feeling like its a network problem still has a high rate of success when troubleshooting for me!
I try everyday to teach some of these skills to people I work with, I just call it devops, or SRE, or some cyber cloud support position someone makes up, they are all systems admins/engineers to me still. I enjoy watching people learn and apply that knowledge to future problems, getting to the root cause or close to it, and the satisfaction that comes from fixing the issue from start to finish on your own, looking things up is not cheating in systems administration!
[0] https://raspberrypi.stackexchange.com/questions/135610/conne...
[ 150.076220] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:468 dev_watchdog+0x308/0x30c
[ 150.076255] NETDEV WATCHDOG: eth0 (bcmgenet): transmit queue 1 timed out
Essentially Ethernet chucks a wobbly until reboot and it's only possible to connect via Wi-Fi.I run my stuff from home too, though it is smaller scale than yours currently. Off-site & soft-offline backups are on encrypted volumes on servers & VMs elsewhere.
I get my static IP by using the smallest VPS from vultr with a wireguard tunnel forwarding http/s traffic to a docker container running nginx proxy manager.
For those wanting to learn, I highly recommend joining r/homelab and r/selfhosted. Those communities have a lot in common and you can learn a lot.
Most modern routers have an option to integrate with dynamic dns providers.
Knowing how to keep your server running and understanding internals is a great skill, but that doesn't mean that progress should stop.
Standalone servers are great, but this greatness comes at a price. It takes time to maintain server, it takes time to configure additional services. But at the same time they bring you joy (and frustrations) and much more knowledge and deeper understanding of what goes under the hood.
I'm gonna run my private cloud merging 3 different un-backed up physical computers, and migrate services off Google.
That's my second free 42U track, but the other was mostly used as shelf space. I've got a third rack rusting in my backyard which I bought for locally for $200, originally intended to run my former employer test infra, that I brought back home after they laid us off.
I have venting into my ceiling which connects outside but I am mostly using consumer gear like Raspberry pis and Intel NUCs and the rack fully fills the doorway so volume and heat is not a real issue.
As far as you know?
Your data is exposed to The Internet so someone could be accessing it.
Who will keep maintaining your infra when you die, say you get hit by the infamous bus later today?
I however mostly detest how almost every third party SaaS sells out our metrics and data for profit, long term consequences be damned. The less I empower profit maximizing machine learning to manipulate me and my family the better.
I'm pretty sure most people sysadmin'ing their Linux servers are actually doing it with rented dedicated servers. TFA btw specifically mentions: "don't manage physical hardware". Big companies like Hetzner and OVH have hundreds of thousands of servers and they're not the only players in that space.
They don't take care of "everything" but they take care of hardware failure, redundant power sources, Internet connectivity, etc.
Just to give an idea: 200 EUR / month gets you an EPYC 3rd gen (Milan) with shitloads of cores and shitloads of ECC RAM and a fat bandwith.
And even then, it's not "dedicated server vs the cloud": you can have very well have a dedicated server and slap a CDN like CloudFlare on your webapp. It's not as if CloudFlare was somehow only available to people using an "entire cloud stack" (whatever that means). It's the same for cloud storage / cloud backups etc.
I guess my point is: being a sysadmin for your own server(s) doesn't imply owning your own hardware and it doesn't imply either "using zero cloud services".
Running your own server is an investment that doesn't make sense for everyone. If you can get it, it is better than you might imagine. Being in full control--the master of your own destiny--is so liberating and empowering. It feels the difference between constantly ordering Lyft/Uber/riding with friends, vs. owning your own car.
Not to mention, again, my hardware resources are so much better. This one server can run multiple profitable SaaS apps / businesses and still have room for experimental projects and market tests. Couldn't be happier with my decision to get off the cloud.
Most businesses have nightly cronjobs generating some kind of report that is then emailed to stakeholders. Why on Earth would you run a dedicated Linux box for that anymore? Glue a nightly trigger to AWS Lambda, send the report via AWS SES, and it's free. Literally, because it fits quite easily within the free plan. No $5/month VPS box, no patching, no firewalling, no phone calls from execs at 6 AM wondering why their report isn't in their inbox and you track it down to the server being down when the cronjob was supposed to fire.
With that said, if you come to me and tell me what you want to add a new business feature to stream video for our customers off AWS, I'll first ask you why didn't you tell me you won the lottery, then I'll berate you for being stupid enough to spend your lottery winnings on the AWS bill.
Pick the right tool for the job.
You are forgetting the future cost of losing knowledge and control over your infrastructure.
Thankfully one person's cloud is another person's on prem infrastructure so sysadmin skills will always be in demand.
From my perspective in enterprise computing, I now see people taking 2 paths. One where they become super deep sysadmins and work on infra teams supporting large scale deployments (cloud or not) and the other being folks who write code and think of infra as an abstraction upon which they can request services for their code.
Both are noble paths and I just hope folks find the path which brings them the most joy.
However, every team I've been on recently has spent a lot of time struggling with gluing their AWS stuff together, diagnosing bugs etc. It didn't seem to save a heck of a lot of time at all.
I couldn't figure out AWS. But I could figure out how to host sites on a linux VPS.
So what's the story here - is serverless something that only makes sense at a certain scale? Because with tools like Caddy the 'old fashioned' way of doing seems really, really easy.
Numerous times there's something weird going on and you're stuck trying to guess and retry based on largely useless logs until it somehow works better but you never really know what the root cause truly was.
Meanwhile on my own server I'll ssh in and have complete visibility, I can trace and dump network traffic, bpftrace userspace and kernel code, attach debuggers, there's no limit to visibility.
Yes lambda/serverless saves you a day or three in initial setup but you'll pay that time back with 10-100x interest as soon as you need to debug anything.
Your competitors would salivate at this statement, fyi. Speed is a competitive advantage. AWS is not "let's rent a big ball of EC2 servers and call it a day", and anyone who treats it like that is going to get eaten alive. If you have not looked at -- for example -- Dynamo, you should. If you have not looked at SQS, you should. The ability to have predictable, scalable services for your engineers to use and deploy against is like dumping kerosene onto a fire, it unlocks abilities and velocity that more traditional software dev shops just can't compete against.
I wonder how you folks manage to work with AWS and not hate it.
However! When I spin up my own side projects. It is sooo much easier to just go into the command line and spin something up directly --- it does make me wonder whether some small amount of expertise can really change things. By the time your orchestrating AWS services, docker containers, kubernetes and more --- Would it have been so bad to run a 10 line bash script on few cheap VMs to set yourself up?
Even typing that, I realize how much time managed services saves you when you need it. Change management is really what those services offer you - even if a momentary setup is easier by hand.
Last week my choice was vindicated: I ran into a critical hardware issue on my linux instance which required a complete OS reinstallation. Wiped my server clean, and was back up and running in an hour. I feel much more secure in the fact that I KNOW I can spin up a completely functional version of my app on any Linux server in the world in less than an hour, rather than relying on opaque cloud backup/load balancers/serverless configs which could fail in unexpected ways, and are usually locked in to a particular vendor. As for a few hours downtime here and there, my business is designed to handle it.
Only thing I dislike is YML, which I think is yucky!
That was our 'perfect world'. Reality was different and we still have a lot of servers running stuff, but what we did push into K8s really reduced our operations workload and we're pretty happy about that.
After I discovered https://efs2.sh I switched over to this simple config management solution, which simply executes commands and scripts over ssh. It is so much simpler and faster (both in regards to creation and execution) than Ansible.
Get your hands a little dirtier installing a lightweight desktop environment like LXDE, programming language of choice & an IDE. Install VNC and you then have a cloud desktop you can code in from anywhere at the same time that it runs your personal website.
Cost: ~$5 per month and a bunch of good experience. Or just do it once as an exercise and cancel after a month.
This "cattle not pets" mentality doesn't make sense for everything and is highly inefficient if the OS itself seamlessly supports immutable workloads and configuration.
You're wise to keep staff around who understand the low level stuff, in addition to the shiny new abstraction based tools.
You'll only find those jobs at one of the handful of cloud companies. Nobody will know how to do anything for themselves anymore and all this experience and knowledge will be lost.
There are no more actual administrators. Just users paying rent.
Rent to AWS actually drives demand up quite a lot since the bills are huge and very few people understand what is under the hood and how it can be optimized.
I doubt very much things will change in the near future. In the far one... who knows.
Edit: car mechanics with their own shop make significantly more money than me and it seems to only get better for them as cars become more complex.
I've always been a fan on "standing on the shoulders of giants" and it's served me very well to have this mindset. I'm fine to dive deep when I have to but diving deep just to dive deep.... not so much.
Semi-recently I had need of a simple blog for a friends/family thing, I spun up a wordpress and mysql container and was done. Over a decade ago I used to setup and manage wordpress installs but it's not a skill I need.
I find this article a little odd since they talk about server admin but then also scripting setup script for your server which is more in the "cattle" category for me and less in the "pet" that I would consider "server administration".
I sometimes wonder whether we need another metaphor, something like a dairy cow, where you only have one, but when it fails you can shoot it and plug in another very quickly and simply (e.g. using a script).
This rings true to me. On Azure anyway. Like the rest of tech you gotta keep up on the hamster wheel! Example: they canned Azure Container Services because of k8s - just imagine if you tightly integrated with that and now you meed to rewrite.
Also not mentioned in the article is cost. Hertzner is loved on HN for this reason.
That said k8s is probably a stable and competitive enough platform it makes a good tradeoff and by using it you invest in ops skills rather than specifically sys admin and I believe k8s skills will be long lasting and less fadish than proprietary vendor cloud skills.
Does anyone have a good source of learning that is comprehensive and practical? I’m talking about a good guided book/tutorial on how to administer a server properly and what things one should know how to fix, not just how to set up Wordpress.
http://www.linuxcommand.org/tlcl.php/
From there picking up configuration management should be pretty straightforward.
That advice can cause substantial headache on Ubuntu/Debian, where the Almquist shell is /bin/sh. This does not implement much of bash and will fail spectacularly on the simplest of scripts. This is also an issue on systems using Busybox.
A useful approach to scripting is to grasp the POSIX shell first, then facets of bash and Korn as they are needed.
-"As a practical goal, you should be able to recreate your host with a single Bash script."
This already exists as a portable package:
https://relax-and-recover.org/
-"For my default database, I picked MySQL."
SQLite appears to have a better SQL implementation, and is far easer in quickly creating a schema (set of tables and indexes).
The hardware is the cheapest part, then you have to pay electricity, manage backups, fix raid problems, have a good internet. Pay attention to how the server is doing. And if you're serving a business, you have to be available debug any issue. Investing a lot of time you could be actually working on the project
But definitely most devs should have a small home server for trying unimportant things. Nothing complicated, just keep the standard hardware config. There are second hand servers available for 50$. Install some Linux and have it running 24/7. Quite fun experimenting and hosting simple things
This means I can run my own servers and the only thing they do is running rke2.
I can take out a node and upgrade the base is without issues or anything.
And still get all the benefits of a high quality cluster is (k8s)
I love it.
And yes it's easier in my opinion and more streamlined to install a storage software (openebs) on my rke2 cluster and backing up those persistent volume than doing backup for my hard drives.
And my expectation is that while it works already very very good that it only gets even more stable and easier.
We are back in timesharing days, only using SSH and Web instead.
I guess I can at least switch my cloud shell colours to green to feel at home.
There are plenty of books around. And there are literally thousands of people worldwide practicing this "lost" art daily.
Starting from small corp up to the major cloud providers. (Someone has to support those computers, running the "serverless" things")
My word of advice: start with the "philosophy". One program doing only one task but extremely well, "everything is a file" etc.
Understand why people are unhappy with SystemD. :-) Find out how kernel schedulers impact databases' IO. Write a boring program in C - network server which forks on accept4. Tip your toe in Perl 5 - there is lots of it in *nix and BSD. Still most stable and efficient way of writing CGI script ... Find out why Ksh is faster than Bash.
It is truly exciting world, and the best news is that it "fits" as a glove the modern world of JS and async programming etc.
I wouldn't call it "lost" - it is just dozen of levels of abstractions down, efficient, boring and complex. But powerful and unforgiving to typos :-)
I am glad someone actually is reading about all that.
Either you need to have pets on tin, or cattle via cloud, but that never was the case. I worked at a hosting company ~2007 whom was an early IaaS provider. We PXE booted xen nodes, that automatically connected to our management layer, allowing customers to provision virtual machines. Most of our own fleet would be cattle well before this was meme worthy.
Today, you could bootstrap a k8s cluster with almost no effort on tin. You'll quickly have autoscaling cattle and a distributed cron. Sure you'd probably pet etcd and maybe the API servers. Running a database, API, and small management layer is well within the responsibilities of a professional system administrator. If this is beyond your orgs / teams capabilities you probably should use the cloud provider.
P.S. Not having a team that can run production services without outsourcing the database is fine. We all have different specilisms.
The storage layer is a bit more complex if you want to roll PVC.
You shouldn't bootstrap a $1m team to defeat a $500k cloud bill.
I do not even need to trust my friend as duplicity encrypts all data against a yubikey held pgp keychain before it leaves.
The backup NAS could phone home from anywhere with internet access.
Here is my Wireguard server (cheap VPS) and client (my home servers) config:
# # Client (the actual self-host local server) #
[Interface] ## This Desktop/client's private key ## PrivateKey = redacted
## Client ip address ## Address = 10.10.123.2/24
[Peer] ## Ubuntu 20.04 server public key ## PublicKey = redacted
## set ACL ## AllowedIPs = 0.0.0.0/0
## Your Ubuntu 20.04 LTS server's public IPv4/IPv6 address and port ## Endpoint = redacted:12345
## Key connection alive ## PersistentKeepalive = 15
# # Server (in the Wireguard context, exposed to the Internet) #
[Interface] ## My VPN server private IP address ## Address = 10.10.123.1/24
## My VPN server port ## ListenPort = 12345
## VPN server's private key i.e. /etc/wireguard/privatekey ## PrivateKey = redacted
PostUp = iptables -i eth0 -t nat -A PREROUTING -p tcp --dport 80 -j DNAT --to-destination 10.10.123.2 # Add lines for more ports if desired
PostDown = iptables -i eth0 -t nat -D PREROUTING -p tcp --dport 80 -j DNAT --to-destination 10.10.123.2 # Add lines for more ports if desired
[Peer] ## Desktop/client VPN public key ## PublicKey = redacted
## client VPN IP address (note the /32 subnet) ## AllowedIPs = 10.10.123.2/32
Enter big-three cloud egress pricing. Designed to make sure that you have to go all-in.
Their prices have spiked because of electricity costs, the entry level used to be around 20 euros. They come with 1Gbit
And when some component of the server fails, your app is unavailable until you can repair it. So you need another server for redundancy. And a load balancer. And a UPS. And a second internet connection.
If your app is at all critical, you need to replicate all of this at a disaster recovery site. And buy/run/administer DR software.
And hardware has a limited lifespan, so the $3000 was never a one-time investment.
I think there is often still a case to be made for self-hosting but the numbers are not as rosy as they seem at first glance.
This sounds expensive if your talking one server and vs a year of AWS charges, but is a tiny bump if it turns out you need to buy a dozen servers to replace a large AWS bill.
Plus, I think most people underestimate how reliable server grade hardware is. Most of it gets retired because its functionally obsolete, not because a power supply/whatever fails. Which brings up the point, that the vast number of failures with server grade hardware are on replaceable components like power supplies, disks, SFP's, etc. Three or four years out those parts are available on the secondary markets frequently for pocket change.
Also "if some component fails or the app is critical" has a lot of nuance, I agree with your sentiment but you should know:
1) Component failures in hardware are much rarer than you think
2) Component failures in hardware can be mitigated (dead ram, dead PSU, dead hard disk, even dead CPUs in some cases: all mitigated) The only true failure of a machine is an unmitigated failure due to not configuring memory mirroring or something' or a motherboard failure (which is extremely uncommon)
3) The next step after "single server" isn't "build a datacenter", it's buying a couple more servers and renting half a rack from your local datacenter, they'll have redundant power, redundant cooling and redundant networking. They'll even help you get set up if it's 2-3 machines with their own hardware techs.
I do this last one at a larger scale in Bahnhof.
also, $3000 will get you about 3-5 years out of hardware, at which point, yeah, you should think about upgrading, if for no other reason than it's going to be slower.
So you have some downtime. Big deal. If this happens once every few years and you need a day to repair it, your uptime is still better than AWS.
Not just everyone hosts a realtime API millions of users depend on every second of the day.
If you are trying to go commercial you might have a different attitude but for those of us who do this mostly for fun and for some donations on the side, over complicating our setups to ensure we add a 10th of a percent to our uptime stats just isn't worth it.
Most applications don't actually have a four 9s uptime requirement. Like, how many otherwise healthy businesses closed up shop during the cloud providers we've seen in the last year because they didn't have their stuff implemented and deployed such that it would remain fully functional when these issues happen?
https://news.ycombinator.com/item?id=13198157
On one meeting we had a typical discussion with ops guys:
- "why wouldn't we optimise our hardware utilisation by doing things a, b, and c."
- "hardware is crap cheap these days. If you need more capacity, just throw more servers at that"
- "is $24k a month in new servers crap cheap by your measure?"
- "comparatively to the amount of how much money these servers will make the same month, it is crap cheap. It is just a little less than an annual cost of mid-tier software dev in Russian office. We account only 12% increase in our revenue due to algorithmic improvements and almost 80 to more traffic we handle. A new server pays back the same month, and you and other devs pay off only in 2 years"
Performance is a complex, many-faceted thing. It has hidden costs that are hard to quantify.
Customers leave in disgust because the site is slow.
No amount of “throwing more cores at it” will help if there’s a single threaded bottleneck somewhere.
Superlinear algorithms will get progressively worse, easily outpacing processor speed improvements. Notably this is a recent thing — single threaded throughout was improving exponentially for decades so many admins internalised the concept that simply moving an app with a “merely quadratic” scaling problem to new hardware will always fix the problem. Now… this does nothing.
I’ve turned up at many sites as a consultant at eyewatering daily rates to fix slow apps. Invariably they were missing trivial things like database indexes or caching. Not Redis or anything fancy like that! Just cache control headers on static content.
Invariably, doing the right thing from the beginning would have been cheaper.
Listen to Casey explain it: https://youtu.be/pgoetgxecw8
You need to have efficiency in your heart and soul or you can’t honestly call yourself an engineer.
Learn your craft properly so you can do more with less — including less developer time!
Basically, I think cloud provides a loooooot of details that you have to now take on yourself if you self-host (at least if you want to do it "legitimately and professionally" as a reliable service). It's not clearly a win-win.
That all said, I recently canceled my cloud9 dev account at amazon because the resources I needed were getting too expensive, and am self-hosting my new dev env in a VM and accessing it from anywhere via Tailscale, so that's been nice.
So yes, for those of us who have done Systems Administration as a lifestyle/career, yeah you do all of those things and it's part of the fun. I started doing OS upgrades, monitoring, firewalls, and home backups of my own Linux Servers some time in High School. Over-utilization of bandwidth isn't really a "problem" unless you're doing something weird like streaming video, a 1Gbps circuit can support thousands upon thousands of requests per second.
How do you handle traffic spikes, especially from the networking point of view? What kind of connection do you have? How do you make your service as fast for all customers around the world (saying you have a succesful Saas). How do you prevent a local blackout from taking down your service? Where do you store your backups, in case your building gets flooded or your machine blows up? What would you do in case a malicious process takes over the machine? These are some things that are managed in a cloud environment.
I understand investing in a datacenter rack where you own your hardware, if you have the skills, but running it in a home office cannot support a successful business nowadays IMO.
You need to disassociate yourself from the start-up mindset when you DIY a side project app or site. Having said that, there are ways to cache and improve your write performance and maintain HA on a budget. The only thing that's hard to replicate in self-hosting is a high performance global presence.
To be fair, I'm not 100% off the cloud. Backups are on an hourly snapshot thru Restic https://restic.net/ and stored in Google Cloud Storage off-prem in case of catastrophes. Also, my Postgres database is hosted in Cloud SQL because frankly I'm not feeling experienced enough to try hosting a database myself right now.
It's really not as unrealistic as most people seem to think. People have been building online businesses for years without the cloud. Believing it's suddenly not possible is just their marketing going to work for them making them new customers imo.
I don't know about GP but managing your own server doesn't mean you cannot use a CDN with your webapp.
I have a homelab too but getting “enterprise grade” service from comcast seems to be my biggest barrier to scaling without leaning on aws.
Comcast doesn't actually change your public IP address between DHCP renewals and thus it's effectively static. The only time that it'll change is when the modem is powered off for an amount of time, or the upstream DOCSIS concentrator is powered off for maintenance or otherwise.
Especially when someone else is making money decisions. Of course in horror stories admins have to run their servers in broom closets or in sheds because business owner is too cheap to get a proper space for something that whole company is running.
This is such a tired expression. It basically means nothing in the industry, and exactly because of comments like yours.
Exactly who are you to say what my infrastructure desires are? Software is personal and people ignore this completely.
Measure twice, cut once. Fail fast is load of nonsense and burns a lot of money for no good reason.
At least in my experience, my hobby of maintaining my own home server helped out immensely in my path in the industry due to knowing what tools are available when working on multi-faceted software designs.
You don't even know if Sys Admin is doing any backups at all.
From perspective of a person that does not know anything about administering systems tweaking stuff in AWS is I would say a lot easier than setting up a server properly.
So people that know nothing about administering systems pay more because they don't have the knowledge.
If you have the knowledge then yes it is cheaper to run your own sever but what is obvious or easy for one person is not really true for someone else.
... and adding "cloud administration".
What is it with people doing completely one-sided analysis even when they experiment the thing by themselves? Is cloud administration less time consuming than system administration? That's not my experience, so I'm quite interested on how it got so.
This is off course the highly subjective meaning of a greybeard unixadmin.
While cloud may have a lot of advantages, I don't think it's trivial to run or manage. The AWS dashboard is simply overwhelming. Trying to decide between different, but overlapping services is time consuming. And while you can rebuild somewhat easily, you're also almost certainly going to have to do that as you learn about Amazon's little quirks. Your general RDBMS experience doesn't map to DynamoDB very well and you'll be in for a rough time when you learn you can't just add a new index or whatever.
Then you have all these provider-specific APIs. My experience with both Amazon and Google is that their services will return errors that they claim should be impossible, so you get to have fun debugging that in a service that you don't manage. Your application will invariably add a bunch of handling for exceptional cases and accumulate your best guess about how this impossible situation came about.
Then you have the constantly shifting devops orchestration tooling space and "best practices". I've lost track of the number of times I've needed to pull my Terraform state file and manually edit this gigantic JSON file because some plugin updated an internal struct in an incompatible way.
I'm sure there are people that get an environment up and running in one of IaaS platforms using just the web console. I've never seen it managed that way at any company I've been at. Instead, the devs own the IaaC stuff. It's certainly easier to be in multiple regions that way, but I have a hard time believing any time or money is really saved. Sure, no dedicated ops people, but now your expensive devs have to deal with and probably be on call. Moreover, all that archaic time-sucking Unix admin knowledge acquisition everyone is worried about is just replaced by time-sucking knowledge acquisition of a proprietary service and all its quirks.
Maybe we're talking about different levels of "cloud"? I can buy that Heroku is easier than AWS.
The amount of money set aflame is astounding.
You don’t go cloud to save money. You go cloud to get flexible and reduce capital expense. It’s like leasing a building vs buying. More about tax and accounting.
The cloud is also too damn expensive.
People are getting ripped off big time and don't want to be embarrassed by the truth, plane and simple.
Other way around. With enough scale you should be able to make hosting your own datacenter work.
The problem is that the people you hire tend to go off buying too much Enterprise-class shit and Empire building and the whole thing winds up costing 10 times as much as it should because they want stuff to play with to resume stuff and to share risk with the vendor and have them to blame.
Only thing Amazon did to build out their internal IT ops exceptionally cheaply and eventually sell it as the AWS cloud service was to focus on "frugality" and fire anyone who said expensive words like "SAN". And they were ordered in no uncertain terms to get out of the way of software development and weren't allowed to block changes the way that ITIL and CRBs used to.
I didn't realize how difficult that would be to replicate anywhere else and foolishly sold all my AMZN stock options thinking that AWS would quickly get out competed by everyone being able to replicate it by just focusing on cheap horizontal scalability.
These days there is some more inherent stickiness to it all since at small scales you can be geographically replicated fairly easily (although lots of people still run in a single region / single AZ -- which indicates that a lot of businesses can tolerate outages so that level of complexity or cost isn't necessary -- but in any head-to-head comparison the "but what if we got our shit together and got geographically distributed?" objection would be raised).
I did not know about it until I googled it right now. I have spent days/even two weeks figuring out how to set up Nginx and for all I know I did it terribly wrong. I paired it with other tools that I do not even remember. But I would be starting from scratch again if I needed to set another one up.
So a lot might come down to that. I was on a team that transitioned from a owned server to cloud as one day one of the test servers went down and after a week of trying, nobody knew how to fix it. We realized at that point that if a server caused a production error, we were utterly screwed as someone who had left set it up and nobody had a clue where to begin fixing it beyond reading endless tutorials and whatever came up in Google searches.
The server infrastructure was cobbled together in the first place and for a period was theoretically maintained by people who didn't even know the names of all the parts.
At least with cloud, there is an answer of sorts that can be had from the support team.
- Small API that is used by a couple people everyday? Lambda.
- Need to store some data? DynamoDB.
- Need to store some files? S3.
- Cron? Step Functions.
- Need services to communicate with each other? SQS.
Those end up being either free or very cheap.
If you have high traffic, serverless is actually really expensive. It is only worth it if you have high scale but unpredictable / bursty traffic.
> However, every team I've been on recently has spent a lot of time struggling with gluing their AWS stuff together, diagnosing bugs etc. It didn't seem to save a heck of a lot of time at all.
I understand AWS so others don't need to and I'm making the money of my life. All 3 big cloud providers have really terrible developer experience and I feel really sorry for folks who just want to get their shit done.
I used to think some other player would come in and offer something much easier and simpler to take over this space, but I'm not really seeing any serious contender. Just low code / no code platforms and managed k8s stuff.
But what if the cost is $.0001 per request? It becomes a very convenient way to make all of my personal projects permanently accessible by hosting on S3 + Lambda.
Even in large workloads it makes sense. Much of AWS is migrating from instances to AWS Lambda. There are some workloads where persistent instances make sense, but a lot of common use cases are perfect for Lambda or similar serverless technologies.
They can't be doing one-off undocumented config, package, and network/firewall changes which make it impossible to setup another server reliably. At $company I moved us to Terraform+Packer (to get them used to immutable deploys, but still just an EC2 instance) then Pulumi+Docker+Fargate so we could fix our deployment velocity. The CTO was constantly afraid everything would break; mostly cause it actually would break all the time. Now basically anyone can deploy even if they're not a SysAdmin.
That's not to say you can't automate a Pet Server, but it's a lot more likely for someone to "just once" make some changes and now you don't trust your automation. In our case we had SaltStack and we were blocked by the CTO from running it unless it was off-hours/weekend.
I'm completely missing. I have searched arpund and I have some solutions, but back in my head, some people have something else.
The part before Ansible or puppet to kick in.
It was so trivial to terminate and restart dozens of servers at any given time since unless there was a mistake in the cloud-init, we could bootstrap our entire infrastructure from scratch within an hour.
It was amazing, never had to deal with something missing on a server or a config being wrong in a special case. Dozens of hosts just purring along with 0 downtime since the moment anything became unhealthy, hosts would start auto-booting and terminate the old instance.
Senior sysadmins are really hard to come by today, not to mention someone who wants to do architecture also.
My hunch is that the 5000 onprem pet servers are not going away any day soon, because a massive amount of it is legacy systems that take a long time to migrate to cloud, if ever. Also the work stress is just ridiculous. So much stuff to do, even with automation. Only reason I still do this is that I like the "old school" tech stack vs. cloud IaaS/PaaS alternatives.
I am not so sure... I am a well seasoned sysadmin, been doing server, network, architecture. I consider myself a solid linux/network expert and have managed datacenters. When I look for a new/more exciting job, or for a pay raise, all I see are "cloud, AWS, devops". I never see "old school" sysadmin jobs e.g. as you say, we have a room full of linux boxes and we manage them with ansible/scripts/etc, but we design and maintain them ourselves, come join our team".
You don't need pet servers. Puppet or Ansible make your baremetal cattle.
The worst thing I’ve had to deal with recently is debugging some faulty RAM sticks and NVMe failure. Obviously hardware quirks are still at play and there’s not much that can be done there, but in terms of making life easier on the software side, NixOS and reproducible config definitely helps over traditional distros.
It was probably the most difficult thing I tried to, unsuccessfully, use on my desktop. I imagine learning how to use Vim/Emacs as a complete beginner would probably be several magnitudes of order easier than learning the Nix DSL and the Nix way of doing things. And from what I've read about the experience of other people who do use NixOS and talk about both the good AND the bad, using it seems like an unhealthy relationship.
Not to mention that the Nix package manager feels slow as hell and reminds me of my unpleasant hours spent using rpm and dnf.
In terms of the DSL, I’m really surprised that people find it a problem as much as they do. When I first tried Nix I moved my first nonprod server over to it that very afternoon. It helped looking at the syntax to start with as “hmm this is a bit like JSON” and to worry about things like lazy evaluation later on. I guess it helped that I was familiar with JSONNET beforehand so maybe that helped.
My first few servers I got by with just a basic understanding of the language and copying/pasting examples from the website and using https://search.nixos.org/options
Using on a desktop, now that is a whole other experience. I personally quite enjoy it for my desktop but there is definitely more of a learning experience and you might not find the benefits of Nix worth it anyway as your desktop is an always changing environment.
A few years ago I participated in a Splunk deployment and the cloud solution utterly dwarfed an in-house enterprise solution, in regards to cost. Even in the event that cost was irrelevant, certain sectors (financial institution(s)) are going to have a difficult time pivoting to a cloud-based solution and relinquishing control over the underlying infrastructure.
Yes, I know that isn't what DevOps is supposed to be, but we all know how Agile turned out, management has a magic touch to distort such concepts.
wait, what? definitely not in eastern eu
it seems like there's one mechanic per a few kms
but maybe due to the fact that average car is relatively old
At least for Debian and Ubuntu, that's why we start bash scripts with #!/bin/bash, of course.
Your point is valid for Busybox, though.
That will also fail spectacularly, as bash does not behave the same when called as /bin/bash as it does when it is /bin/sh.
I have principally noticed that aliases are not expanded in scripts unless a shopt is issued, which violates POSIX.
Forcing POSIXLY_CORRECT might also help.
That's not really a problem as long as you use #!/bin/bash shebang, and there is nothing wrong in doing that.
It's frustrating that most google search results and shell script search results on SO almost always mean bash and sh.
#!/bin/bash
There, I fixed your "what shell is /bin/sh" problem.
#!/bin/mksh
Android doesn't allow GPL code in userland, and the installed base is massive.I disagree with this. A single bash script configuring an entire hosts can be overly complex and very difficult to follow. As someone who has created complex bash scripts, this will become very time consuming and prevent you from making many changes without significant efforts. I'd suggest familiarizing yourself with tools like cloud-init and Ansible.
Booting with rear, a (somewhat "klunky") interface will allow you to restore your captured backup, apparently pristine.
It is similar in effect to the HP-UX Ignite tool that I used on that platform.
Oracle has published a number of articles and blogs on or relating to it, and it is well worth study. The invocation of tar that preserves the SELinux contexts is particularly worth capturing.
3 could have been a solution but my kids are not interested in development (if they would, it would have been my first solution)
Ship logs others places, clicking in the GUI outside of testing things out is a mistake, use Terraform or CDK or Cloudformation.
I've managed very large fleets of servers and its dead simple if you have a good setup.
https://community.hetzner.com/tutorials/k3s-glusterfs-loadba...
I don't know about others but once you become used to package managers like pacman and apk, it's a jarring experience to use apt and dnf, especially dnf. And Nix feels just as slow, if not slower, than all of them.
I have to install hundreds of MBs of metadata before I can install a package with dnf and the fastestmirror=true has essentially been useless in the history of my years of usage of dnf. The end result is me downloading hundreds of megs at speeds less than 1MBps.
> In terms of the DSL, I’m really surprised that people find it a problem as much as they do. When I first tried Nix I moved my first nonprod server over to it that very afternoon.
I find it much easier to use a combination Ansible, python/shell scripts, and dotfiles on multiple git forges. This setup may not be as declarative as NixOS but it is much easier to grok and use. I have no motivation to learn an obscure DSL to manage my system.
That's once again an oversimplification, performance is excellent for the intended workload.
You can provision some of these things from cloud providers, but your infra is going to go to shit unless you actually understand what they're really providing you and how to use it. If the only thing you can do is upload a docker image to a cloud provider and click the "create server" button, then that's not really infra work at all. It's like Wix for sysadmins.
If on a test run or onboarding this application will find just one security hole (e.g. public s3 bucket) you will have the customer lifetime because he‘s afraid he will make a mistake again.
But once you get through the door at a new job the massive difference between somebody who understands what is under the hood and somebody who just got an AWS certification with no prior knowlwdge becomes apparent to all. Quick position and paycheck raises follow.
Server hardware is fun but it's not trivial to manage, buy or run.
So when someone talks about how they've managed servers for 2 decades, own a house where they can install a 42 rack and how much better it is than a hosted solution. A lot of people rightly point out that this is hardly feasible for most people
> It's 2022 and we're about to rediscover something we know for 40 years already: mainframes are freaking expensive.
My interpretation: AWS is freaking expensive, AWS is in the center performing all computations; AWS is the mainframe we can well do without.
Some server I know don't have vim. Traefik docker image is running ash and not bash. Tomcat image hasn't vim. Etc. /bin/sh is there. No worry about assumptions. No bashism, no fish, no zsh.
So for me, if there's not even bash then I've also surely not accounted for other peculiarities on the system.
Because the old school systems just chug alongg with a minimum of administration, while those newfangled cloud thingamajigs need tons of adminning...?
I mean, that's one interpretation of your (admittedly, anecdotal -- but that's what I see too. Lots of anecdotes sums up to:) data. A valid one, AFAICS.
I chose to work with the CTO/Team to figure out a solution everyone could live with. I even chose a more annoying solution (Packer) initially just to make sure people felt comfortable and avoid changing things anymore than I had to.
Got any tips in this regard?
Ansible, Puppet, Python, Terraform, Openstack Pulumi springs to my mind. Still need to cover a PXE server. (Maybe with Ansible)
Maybe I am naive, but I think the chances of this happening on a cloud based service like Dropbox are going to be a lot lower.
https://infosecwriteups.com/how-i-discovered-thousands-of-op...
was that intentional? my current understanding of MIT license is a "do anything, i don't care, just don't blame me".
if i were you and wanted to use "dont use our resources" part, i would set it as AGPL or even SSPL if you are brave enough.
I think it might have something to do with the NAT forgetting about my UDP "connection", but haven't found the culprit yet.
Sadly, not possible on a shared residential connection. I tried the setup where the server was also the router, and that was much less convenient in case of failures than having a dedicated router.
But yes, the abundance of routers is definitely a problem here, with an OpenWRT and a FritzBox in the mix. Perhaps I should try to get IPv6 running and forget about NAT woes.
You aren't administering Android devices.
Stop obsessing about writing portable scripts. Write scripts for the targets that you are going to run them on.
Stop ignoring standards. They exist for important reasons. Invent your existentialism in some other realm.
There are people who dig trenches with spoons. It is not efficient and no one sane would be accounting for them.
> Stop ignoring standards. They exist for important reasons.
No.
> Invent your existentialism in some other realm.
I do. My target is Linux servers. Look at the topic : Reclaiming the lost art of Linux server administration" not "Finding another inefficient way to administer a phone"
That ship sailed when they froze that standard in time circa around 2008.
People will always choose convenience over correctness. Fighting against that is fighting against human nature and it's a battle lost before it's begun.
No shame in it, I myself have one, but you shouldn't be concerned with this scenario even if it happens.
Life is too short to waste on ruining your life over planning on how to ruin your life even further after a rare life ruining event has occurred. I doubt you'd be working, or having any real coherent thoughts, for many years in that scenario.
So you'd lose your friends and potentially your livelihood and any ability to financial support yourself, or any surviving friends or family, just because you don't want to use cloud backups or a professional backup service with the correct contingencies in-place ?
One could easily imagine a less dramatic scenario where your office is close to your work, you go home with your backups in your backpack; However a flood hits your town, you accidentally leave your your backpack behind at evacuation time. Your office and home are flooded and then your backups and actual data set is destroyed, no one dies but you've lost your company and not just phsycial assets. Seems a little unnecessary ?
I actually have lived in towns where floods, lighting storms and tornadoes have caused situations where this scenario could've easily played out.
I also feel the need to point out there's no need to suggest people need a psychiatrist when planning for a worst case scenario, some people are are just prepared for the worst, whether or not it's just how they're born or their paid /trained to do so.
2) Debian
When I wanted to set up NixOS, I wanted to use it to manage a bunch of NixOS virtual machines (so, nixops), and then run a couple of services. Here are a few points from my anecdotal experience:
1. Learn an entirely new language that has absolutely horrendous documentation. It used to be (I've been told it got better but I abandoned Nix before this) extremely difficult to debug as well, since oftentimes you would get an error in some library file when the error was obviously in your own file.
2. Try packaging your own stuff. Again, barely any documentation. One time, I wrote a simple Rails application in a day. It took me more than three days to figure out how to deploy it, and that involved just figuring it out. Rails adds a PID file to the directory with your Ruby code in it, but that directory in a Nix derivation (fancy [which makes it harder] name for something like package from my understanding) is immutable. Good luck packaging complicated projects that someone else hasn't packaged yet.
3. Nixops was so incredibly outdated that it caused my server to say it was using an "insecure" library from Python 2.7 and my auto updates started failing. On paper, nixops looked nice (manage NixOS virtual machines/VPSes with Nix). But it broke my updates. I eventually wrote my own replacement for Nixops, which worked, but I still don't trust the rest of NixOS to not break. The tool might have some merit on a different Linux distribution, though. (I'd also note that I believe that nixops fixed their insecurity bug, but there were a myriad of other issues I don't remember that I had with nixops)
4. Setting up pre-packaged services sounds easy. They were, somewhat. But permission errors were pervasive. SystemD tempfiles were finnicky at best and gave path traversal errors that were extremely hard to debug.
Eventually, I got a new server, installed Arch on it, and use docker-compose for everything. It takes me maybe 20 minutes to set up a new service on a good day, instead of four hours of Googling for some obscure error from NixOS. While not perfectly "reproducible," neither was NixOS, because I could simply not trust a virtual machine on nixops to boot successfully—there would always be more required even if it worked once.
I might be open to NixOS on a desktop, though.
I run NixOS, but I recognize that user experiences might be very different depending on the packages you use. NixPkgs is huge, enormous. Some stuff is really well maintained, whereas other stuff isn't.
Furthermore, declarative systems make a difficulty tradeoff. Lots of stuff is much easier, but problems are sometimes harder to debug as there's an extra abstraction layer.
IMHO, these days the best distributions out there are either very simple imperative ones with binaries (Arch, Alpine, etc.) or Nix-like (NixOS and GuixSD). Getting experience in those two extremes is really valuable.
I guess people who use Twitter don't like seeing bad things written about them on Twitter. Though the solution is to disown Twitter and all who use it.
That's also the thing the serverless/API-all-the-things/cloud promoters don't get. This interconnected web of services is incredibly fragile. It's to the point that every day is a new failure. One day github is down. Next day your CI process breaks because Docker shits the bed. Next day your E2E service is hosed. Following day you hit an API limit and need to go dump more money into the firepit. Everything is broken all the time.
This is cattle:
* PXE boot server(s)
* Image contains masterless puppet bootstrap.
* Server(s) asks git - "give me the bootstrap for my mac address"
* Server(s) gets a list of classes to apply.
* Server(s) applies classes.
Done.
I'm looking for a solution to make unattend installation possible.
Edit: Can use Ansible to make a PXE server. (Egg or chicken thing)
Also I said _not_ buy a rack, just put it on a shelf.
> <erno> hm. I've lost a machine.. literally _lost_. it responds to ping, it works completely, I just can't figure out where in my apartment it is.
Heaps of Pis in rafters will quickly turn into a cable spaghetti hell tied into ugly knots.
Introducing the... "RafterRack™" ;)
Idle power consumption of an R730 with no drives is typically about 85W, though higher-end CPUs drive that up a bit. The older R720s are a bit higher in the 100W range. When the extended warranties expire, high-spec 2nd hand equipment floods the market.
20% load (9 VMs with 11GHz usage), 196W.
Historical trends for Last Week:
Avg: 203 W | 693 BTU/hr
Max: 279 W | 952 BTU/hr
Min: 150 W | 512 BTU/hr
Historical Peaks:
465 W
Cumulative Reading:
Since Mon Jul 13 11:32:33 2020 2723.893 kWh
The last few days:
- we have a commercial ssh bastion that requires direct ssh login and then input ip address. So no dns available for sshing. I wrote a bash autocomplete script that resolves dns host entries and uses expect to automatically input the ip in the bastion. This is especially important to be able to run ansible.
- our gitlab has it's ssh filtered because "security". So you can only use a security token over http. This prevents us from being able to use git submodukes, as the token would be in ckear commited to the parent rep. So I wrote a make file to compose the project, basically a poor man git submodules.
You might find all the above trivial, and it is. But it keeps the work flowing and devips more junior than I have no clue about make files, expect, etc.
That means no memory related security bugs, which is a huge plus.
Yep. This seems like the obvious setup to me:
1) make the usual case as economical as possible (and ownership and the associated control will probably help here, unless you have to lease the expertise too)
2) outsource the exceptional case (ownership is less likely to matter here, and will matter for less time even if it does)
I worked with a large customer to help build this for them. OpenShift running on-prem, but they had some failover equipment that would ansible them an OpenShift cluster on aws. Depending on the nature of the failure it did take a little time to fail over, but 15 to 30 mins of downtime in the event of a catastrophic failure is often worth it to save hundreds of millions per year.
Everything has been bootstrapped going back to around 1999.
This. And there are millions of dollars of cloud marketing materials and programs that are at least partly to blame.
They found a larger market in being the full infrastructure, so they started down the road to cloud appliances making it easy for any average Joe to spin up infra.
This was in ec2 classic era before VPC was a thing. So your active/passive setup description is actually using AWS appropriately based on it's original ideas.
Your AWS bill is directly proportional to the number of developers, because they all spin up cloud toys they are fond of.
UK <Apr2022 = £50.96 ($68 USD), £0.2022 price cap rate
UK >=Apr2022 = £76.16 ($102 USD) £0.3022 likely new price cap rate
Thus the reason r/homelab users in Europe tend not to have Craigslist blade servers of any sort!
Knowing when to switch into and out of this mode, and what impact it has, is a more advanced subject that should not burden those learning the Borne family.
It is better to start with Almquist, or another pure POSIX implementation, with documentation specific to standard adherence.
More advanced shell features should wait.
I've never seen use of aliases in a bash script...? They are generally for CLI convenience.
When bash is not in POSIX mode, it violates the standard.
$ ll /bin/sh
lrwxrwxrwx. 1 root root 4 Nov 24 08:40 /bin/sh -> bash
$ cat s1
#!/bin/sh
alias p=printf
p hello\\n
$ cat s2
#!/bin/bash
alias p=printf
p world\\n
$ ./s1
hello
$ ./s2
./s2: line 3: p: command not foundWhat I'm against isn't the cloud existing, it's your 24x7 uptime guarantee during the equivalent of Hurricane Katrina at the cost of over engineering.
If you're an engineer in that scenario, you're either running, helping clear debris, or giving first aid/supplies. Your computer is nonsense at that point and yes it's a sign of a problem if you are clinging onto your devices in such a disaster scenario.
EDIT: Just re-read, I might have gotten you confused with someone on a different thread entirely about Kubernetes... Maybe I typed on the wrong tab, oops. Your comment is about data not uptime.
And how are you currently preparing your company against a solar flare?
> How many times have you known this to happen?
That's irrelevant, if it can happen, it will happen, and it's too late once you've lost your data. Maybe for some businesses, data isn't important, but I don't know of many?
That's the line the clouds, MSPs, and service contract providers sell you on, but it's never true. When you need help you're getting some minimum wage slave who hired on 6 months ago, and is struggling to juggle the dozens of other clients competing with you for his time, too. When you don't need help, they get to take your money for doing nothing.
Anything your provider can do for you, you can do at less cost. Cut out the middle man and you're far better off.
All that knowledge is now in-house, and you have access to it at a moment’s notice.
I think it's also worth considering that many outfits wouldn't get good value from the 24/365 coverage you propose and don't care to pay for it.
To pay them around 70K, you need to budget for 120K. It will go into social security, taxes, training, overheard of management, HR team, other costs centers like a margin to account for sickness, absenteeism, people who are trying to "find themselves", personal crisis etc...
If some cloud setup can halve your sysadmin labor, then a company with 20 sysadmins can go to 10 sysadmins, but a company with 1, 2 or 3 sysadmins still needs all of them.
____________
That's how I've seen it working in datacenter:
cheapest junior admins (450$/month) were having night shifts
and if something broke, then they were calling an engineer
For 1 bare metal server? You do know these things run on electricity, right?
The main difference between what we've got and a proper setup is on the order of about €5k of fixed costs, all of which have a lifespan of ~decades. The savings from owning your own hardware might shock you.
Also, if you think that the "professional" hosts do any better, consider that that Hetzner box you linked is a desktop machine shoved onto a rack shelf.
Without a nuanced understanding of standards, extensions, and platform availability, new bash users will get large amounts of shell usage that doesn't work.
To avoid that frustration, learn POSIX. That works everywhere that matters.
Another major user of a pdksh descendent is Android (mksh), with a truly massive install base.
Some of the bash problem, besides portability, is GPLv3. That was a major factor for Apple. I don't want my script portability linked to corporate patent issues. For this and other reasons, I don't use bash-specific features, ever.
You should feel free to continue to write portable scripts.
That is horrible advice for anyone who is just starting out though and they should stick with #!/bin/bash and not worry about it until they actually want to consider platforms other than Linux. Portability is the wrong thing for people who are learning to focus on.
It is also bad advice for most system administrators at most businesses since they won't have those alternative *nixes either, and its a best practice to minimize how many different operating systems you have to care about.
Almquist is in Ubuntu for two reasons: speed and standards compliance.
These may not be important to you, but they are to a great many people, and the horrible advice in this case is to disregard these factors.
The same isn't true for less standard kinds of service. The more standardized something is the easiest it is to decide what to hire, troubleshoot, and learn to configure your options. The less standardized it is, the harder all of those things become. VMs are very standard, email servers are less so, but not by a huge margin. Web accessible disk space and on-demand interpreters are completely non-standardized and a hell to do anything with.
Also, some services do need more upkeep than others. Email is one extreme that requires constant care, file storage and web servers demand much less attention.
Infinitely, and if you look at it from a startup lens it only makes sense. One needs to point only at the recent log4j incident. This is obviously a gigantic black swan event, but even just ongoing security patching at the OS level can be a full-time gig. There is absolutely no substitution for being able to ship code to a platform that just runs it and scales it for you.
Andy Jassey had a great slide a few years back at Reinvent, when talking about Lambda -- "in the future, 100% of the code that you write will be business logic". If you really think about that: how many times have you had to write some kind of database sharding logic, or cache invalidation, or maintaining encrypted environment variables, whatever. That idea that you can toss that -- and what that gives to teams, not having to spend massive timesinks and budgets and hiring and all of that on -- effectively -- solved problems, you really start to understand how you can move faster.
The present reality of Lambda is quite different though. Even though the code of the function itself is more or less "business logic" (although this is a really meaningless term when we're talking about known programming languages and computers), the scaffolding around it with Terraform/CloudFormation/Serverless/etc. is substantial, riddled with quirks and is really time-consuming to figure out and update. I don't think I spend less time on this accidental complexity now when we have most of our logic in Lambda, compared to the times when we were just running Flask apps in a VM.
This is not to mention how hard one has to fight to overcome the limitations of the runtime, e.g. adding some "warmers" scripts to reduce cold-start latency (no, provisioned concurrency doesn't help and is ridiculously expensive). And then comes the bill after you accidentally created invocation loop between two functions.
The scale-to-X and scale-to-zero features of Lambda, along with the guaranteed interface to your lambda with predictable input and output requirements, is incredibly empowering for an engineering team. I can absolutely guarantee that we have spent far, far, far less time maintaining our infrastructure than what we would need to be doing if we had a big-buncha-EC2 setup.
Imagine that the environment issues get taken care of, because Amazon has teams and teams and teams of engineers who are working on just that. Cloudflare has the zero-cold-start isolates. All these platforms are heavily invested in making your development and deployment experience as easy as it can be. Concentrate on writing your code, and you'll reap the benefits.
That's just a cronjob. I know some people don't like doing it that way, but that's on them. I've seen this work for years in production with minimal trouble.
How much business logic is there across all businesses?
I'm aware of a joke at Google about how it could reduce 90% of the employees without any impact on the business.
Immensely more than tech logic, that's for sure.
Just implementing and updating something like a single country's tax code produces more lines of code than the entire Linux kernel, that devs like to brag about :-)
Oh boy, if that's what your server admin guys told you before you went full cloud then I'm sorry, I've got some bad news for you.
This.
People think "professional" and act like it hides the fact that it's all the same people running stuff on a shoestring budget.
That $50/mo boxes in Hetzner tend to stay up for years it a testament to that there isn't any black magic or huge costs to keep them running.
People who start out with linux should use that. Most companies, particularly HNey startups, should use that and nothing bad will happen.
Most people don't have to care about portability, shouldn't use /bin/sh, and shouldn't learn about Almquist. Your advice does not favors to the bulk of people who just need to get shit done.
If they do wind up working in an environment where it is important, they can teach themselves the differences at that point, or else ask someone like us to review their code (and there's an automated linter out there).
You aren't the Main Character, most people don't need to care about the same things that you do.
A bash script that runs as /bin/sh might not run as /bin/bash, until you set POSIXLY_CORRECT.
Easiest solution there is to move it to a better shell, usually a Korn variant.
If you want to offshore all of your Windows servers, all I can say is "go for it!"
Still, the mitigation was like 4 hours to hunt and disable, and 2 more hours when a full patch came around. Not too difficult.
It was only time consuming for those who worked in Cloud providers where all this crap is centralized and understaffed. In real world scenarios there were tons of teams completely unaffected.
[bunch of stuff I don't need]
Exactly. Imagine paying for all that when all you need is bare metal.
Now imagine paying for all that just because you've read on the Internet that it's best practice and that's what the big guys do.
Way back the best practice was what Microsoft, Oracle or Cisco wanted you to buy. Now it's what Amazon wants you to buy.
Buy what you need.
Having all that "best practice" service is great if it works well, but when it becomes a checkbox on the purchase order then it can cause far more problems than it solves.
I have found that a lot of the push to outsource hosting is simply an attempt to deflect responsibility for problems rather than in expectation of actually providing more reliability.
For every company or startup that thinks they need 100% uptime the reality is that they can not only get away with much less but in practice will end up with much less anyway because the effort and moving parts (load balancing, distributed database, etc) will typically result in something failing anyway even if the underlying hardware is indeed 100% up, and somewhat surprisingly to quite a bit of people here, manage to survive and thrive despite that (the recent AWS outages took out a lot of services and products and they still seem to be around somehow).
Ugh. I hate that phrase. The translation into plain English is almost always "What I read in some blog" or "Because I want to" or "It's what our sales rep told us." Even from C-levels who should know better.
Yes.
The opposite is also true though: Imagine not wanting to pay for that and needing it!
There's a reason why most homes connect to the power utility companies. Yes, we can run generators ourselves. Does it make sense to do that? Not usually.
Same thing with this server. If it makes sense for your use-case, outstanding. In many cases, people are better off offloading this to another company and focusing on their strengths.
The minute gens/solar/wind/batteries combo becomes less expensive than public utility I'll switch. For now it makes no financial sense.
With the clouds it is the other way around. My dedicated servers running my software kick the shit out of AWS performance wise for a fraction of the price. And no I do not spend my days "managing" it. I can order new dedicated server and the same shell script will reinstall all prerequisites, restore data from backup and run it in few minutes add (however long it takes to import database from backup). Where needed I also have standby up to date servers.
Other than running this script to test restoration once a month my management overhead is zero.
See... anyone can pick <random thing>, and describe how people don't do it. Of course, you cite a generator, others use solar cells, right?
And that's fine. People should make choices like that. I was objecting to saying people paid X for RDS and others paid Y for one bare metal server. Those are not in the same category, there's no point comparing those prices without a lot more information.
I don't care where those snapshots are stored and how much space those take. In case I need to restore my IaaS provider gives me 2-click option to restore - one to click restore and 2nd to confirm. I sit and watch progress. I also don't care about hardware replacement and anything that connects to that. I have to do VPS OS updates but that is it.
I do my own data backups on different VPS of course just in case my provider has some issue, but from convenience perspective that IaaS solution is delivering more than I would ask for.
They tend to be the things that you don't need until you absolutely need them right now (or yesterday).
It’s easy to trivialize $20-30k a month when it’s someone else’s money and it’s less work for you.
What I have a problem with is:
- the premium over bare metal is just silly
- maximum vertical scaling being a rather small fraction of what you could get with bare metal
- when you pay for a hot standby you can't use it as a read only replica (true for AWS and GCP, idk about Azure and others)
Though it seems to require you to have 3 instances rather than just letting you read from the standby... I don't quite get the rationale for that.
I'm not sure what you mean here. At least for MySQL you can have an instance configured as replica + read-only and used for reads. Aurora makes that automatic / transparent too with a separate read endpoint.
The fact that a hot standby is usually in some sort of read-replica state prior to failing over is a technical detail that AWS sort of tries to abstract away I think.
>> 1. A solution for unlimited scheduled snapshots without affecting performance.
You can very comfortably have instant and virtually unlimited snapshots with zfs/jails (only occupying space when files change). Very easy to automate with cron and a shellscript.
>> 2. API access for managing configuration, version updates/rollbacks, and ACL.
>> 3. Close to immediate replacement of identical setup within seconds of failure.
There is a lot of choices for configuration management (saltstack, chef, ansible, ..). I run a shellscript in a cron job that takes temporary snapshots of the jail's filesystem, copies to a directory, and makes an off-site backup. A rollback is as simple as stopping the server, renaming a directory, and restarting it. It's probably more than a couple of seconds, but not by much. I think I'm uncomfortable exposing an API with root access to my systems to the internet, but I'm not sure how these systems work. I don't think it would be hard to set it up with flask if you wanted it though.
>> 4. No underlying OS management.
I don't know what this is, but I'm curious and looking it up :D.
In most of the posts I'm reading here, people have really beefy rigs. But you could do this on the cheap with a 2000s era laptop if you wanted (that was my first server).
Yes, but that both requires manual scripting it and remains local to the server. Compare to scheduled RDS backups which go to S3 with all its consistency guarantees.
> There is a lot of choices for configuration management (saltstack, chef, ansible, ..)
Sure, those are an improvement over doing things manually. But for the recovery they can do only so much. Basically think how fast can you restore service if your rack goes up in flames.
> I don't know what this is, but I'm curious and looking it up :D
It means - who deals with kernel, SSL, storage, etc. updates, who updates the firmware, who monitors SMART alerts. How much time do you spend on that machine which is not 100% related to the database behaviour.
I wasn't recommending everyone use RDS. If your use case is ok with a laptop-level reliability, go for it! You simply can't compare the cost of RDS to a monthly cost of a colo server - they're massively different things.
EDIT: I do use cloud services for some stuff. My point isn’t being anti-cloud, just that nothing is perfect.
For instance, there are LVM2 snapshots. Maybe those do affect performance. If the cost difference is big enough though, couldn't you just account for that in the budget?
I agree that literal "bare metal" sucks, but self-hosted with cloud characteristics (containers, virtualization) is not totally obsolete.
When you're using the cloud you're paying someone else a very good margin to do all of those things for you.
If you do them yourself you can save quite a lot. Hardware is surprisingly cheap these days, even with the chip shortage factored in, if compared to cloud offerings.
No access to the underlying OS can actually be a problem. I had a situation where following a DB running out of disk space it ended up stuck in “modifying” for 12 hours, presumably until an AWS operator manually fixed it. Being able to SSH in to fix it ourselves would’ve been much quicker.
"Identical setup within seconds of failure" being the exception as you need to deploy from backup which can take minutes/hours (depending on backup size) even if you have the spare hardware. Fortunately, that's the least-needed feature in your list.
It's called software, you put that on top of a server, it's not some kind of magic.
Sure, you're paying the cost of some DBAs and SREs with that price. Still, seems too way above what it costs.
Factor them all in, and "the cloud" is 1000s to millions of times more expensive than bare metal. I've seen people pay $100k/month, for services which could run on an $140/month bare metal server failover pair.
Meanwhile, people still spend enormous resources managing "the cloud", writing code to deploy to the cloud, dealing with edge cases in the cloud.
There are no savings, time wise, management wise, or money wise, with the cloud.
You are paying for ignorance.
The costs were very clearly spelled out and always lower that SaaS.
Why do people here think that cloud companies are charities? The cost of their hardware leasing, personnel, electricity, building management, insurance... it is all paid by the customer.
Plus a beefy profit margin. All paid by you.
Looks like sensible/nonsensical based on who you are and what are your needs. Not everybody needs cloud, not everybody can self host.
https://aws.amazon.com/blogs/database/readable-standby-insta...
People on my time would rather be developing software than doing sys admin stuff
I use AWS, DO and Hetzner (bare metal) for (different) cases where each makes sense.
In the past few years, maintenance burden and uptime between AWS and Hetzner hosted stuff was comparable, with the cost being an order of magnitude less for Hetzner (as an added benefit, that machine is more beefy but I wasn't even comparing that).
> People on my time would rather be developing software than doing sys admin stuff.
I am not surprised software developers would prefer doing software development :-)
No, they often don’t understand the alternatives available because the wolves are guarding the henhouse (software devs who don’t want to work with the pace of on-prem IT).
> People on my time would rather be developing software than doing sys admin stuff
No shit, that’s why software devs aren’t sysadmins.
But of course I am running build tools and bunch of other dev tools locally that are not even installed on our servers.
But in all reality, laptops are really, really powerful these days.
My laptop at the moment is a HPE something ... HP 250 G6 Notebook PC according to dmidecode. I used to have a five year old Dell 17" i7 based beast but it ... broke. I whipped out the SSD, shrank the root fs a bit (with gparted) and used a clonezilla disc and an external USB link to get my system onto a Samsung EVO M.2 thingie from the SSD. This laptop was an employee cast off/
As is probably apparent from the above, I use Linux. I have some decent apps at my disposal but in general I don't need much hardware. Decent: RAM >= 8GB and SSD storage are key. I don't play games much. I've always specified 17" screens in the past for my laptops but now I have to use a 15" jobbie, that's becoming less of a hard requirement.
I am a Managing Director (small business - IT consultancy) but I do spend rather a lot of my time doing sysadmin and network admin stuff. I also do business apps. I once wrote a Finite Capacity Plan for a factory in Excel with rather a lot of VBA. Before you take the piss, bear in mind I used the term finite and not infinite. That meant that quite a few people had employment in Plymouth (Devon not MA) in the 1990s. I won't bore you with my more modern failures 8)
Anyway as you say, modern laptops are phenomenally powerful. Mine have wobbly windows 8) I can't be arsed with MS Windows anymore for my own gear - it gets in the way. Me and the wife said goodbye at Windows 7.