Running out of disk space in production(alt-romes.github.io) |
Running out of disk space in production(alt-romes.github.io) |
They fill app their mobile apps with junk data just to make the APK/IPA bigger. So if they need to push an urgent update, they won't have users that can't update because their phones are full to the brim.
I know two Italian banks that do it, Unicredit and Intesa. The latter was on the news when a user found out that one of the filler files was a burp recording [1].
[1] https://www.ilfattoquotidiano.it/2024/12/20/intesa-san-paolo... (in Italian)
Whoever gave them that idea was doing a bad deed.
dd if=/dev/zero of=sparse_file.img bs=1M count=0 seek=1024
If you add conv=sparse to the dd command with a smaller block size it will sparsify what you copy too, use the wrong cp command flags and they will explode.Much harder problem than the file system layers to deal with because the stat size will look smaller usually.
dd if=/dev/urandom of=/home/myrandomfile bs=1 count=NShit like that just wastes space that SSD could use for wear levelling...
It also serves to leave some space unused to help out the wear-levelling on the SSDs on which the RAID array that is the PV¹ for LVM. I'm, not 100% sure this is needed any more² but I've not looked into that sufficiently so until I do I'll keep the habit.
--------
[1] if there are multiple PVs, from different drives/arrays, in the VG, then you might need to manually skip a bit on each one because LVM will naturally fill one before using the next. Just allocate a small LV specially on each and don't use it. You can remove one/all of them and add the extents to the fill LV if/when needed. Giving it a useful name also reminds you why that bit of space is carved out.
[2] drives under-allocate by default IIRC
Usually something like "expand if there is less than 5% left, with monitoring triggering when there is 4% free space left", so there is still warning when the automatic resize is on limit
carving space per PV like that is pointless
> It also serves to leave some space unused to help out the wear-levelling on the SSDs on which the RAID array that is the PV¹ for LVM. I'm, not 100% sure this is needed any more² but I've not looked into that sufficiently so until I do I'll keep the habit.
YMMV but most distros set up a cron/timer that does fstrim monthly. So it shouldn't be needed, as any free space will be returned to SSD.
> [1] if there are multiple PVs, from different drives/arrays, in the VG, then you might need to manually skip a bit on each one because LVM will naturally fill one before using the next. Just allocate a small LV specially on each and don't use it. You can remove one/all of them and add the extents to the fill LV if/when needed. Giving it a useful name also reminds you why that bit of space is carved out.
other options is telling LVM this LV is striped (so it uses space from both drives equally), or manually allocating from drive with more free space when expanding/adding LV
ZFS has a "reservation" mechanism that's handy:
> The minimum amount of space guaranteed to a dataset, not including its descendants. When the amount of space used is below this value, the dataset is treated as if it were taking up the amount of space specified by refreservation. The refreservation reservation is accounted for in the parent datasets' space used, and counts against the parent datasets' quotas and reservations.
* https://openzfs.github.io/openzfs-docs/man/master/7/zfsprops...
Quotas prevent users/groups/directories (ZFS datasets) from using too much space, but reservations ensure that particular areas always have a minimum amount set aside for them.
* https://openzfs.github.io/openzfs-docs/man/master/7/zfsprops...
Addendum: there's also the built-in compression functionality:
> When set to on (the default), indicates that the current default compression algorithm should be used. The default balances compression and decompression speed, with compression ratio and is expected to work well on a wide variety of workloads. Unlike all other settings for this property, on does not select a fixed compression type. As new compression algorithms are added to ZFS and enabled on a pool, the default compression algorithm may change. The current default compression algorithm is either lzjb or, if the lz4_compress feature is enabled, lz4.
* https://openzfs.github.io/openzfs-docs/man/master/7/zfsprops...
I knew I didn’t invent the concept, as there’s so many systems that cannot recover if the disk is totally full. (a write may be required in many systems in order to execute an instruction to remove things gracefully).
The latest thing I found with this issue is Unreal Engines Horde build system, its so tightly coupled with caches, object files and database references: that a manual clean up is extremely difficult and likely to create an unstable system. But you can configure it to have fewer build artefacts kept around and then it will clear itself out gracefully. - but it needs to be able to write to the disk to do it.
Now that I think about it, I don’t do this for inodes, but you can run out of those too and end up in a weird “out of disk” situation despite having lots of usable capacity left.
Would it be more pragmatic to allocate a swap file instead? Something that provides a theoretical benefit in the short term vs a static reservation.
Disc Space Insurance File
fallocate -l 8G /tmp/DELETE_IF_OUT_OF_SPACE.img
https://gist.github.com/klaushardt/9a5f6b0b078d28a23fd968f75...Except that one time when .NET decides that the incoming POST is over some magic limit and it doesn't do the processing in-memory like before, but instead has to write it to disk, crashing the whole pod. Fun times.
Also my Unraid NAS has two drives in "WARNING! 98% USED" alert state. One has 200GB of free space, the other 330GB. Percentages in integers don't work when the starting number is too big :)
surely you don't need a fire extinguisher in your kitchen, if you have a smoke detector?
a "warning alarm" is a terrible concept, in general. it's a perfect way to lead to alert fatigue.
over time, you're likely to have someone silence the alarm because there's some host sitting at 57% disk usage for totally normal reasons and they're tired of getting spammed about it.
even well-tuned alert rules (ones that predict growth over time rather than only looking at the current value) tend to be targeted towards catching relatively "slow" leaks of disk usage.
there is always the possibility for a "fast" disk space consumer to fill up the disk more quickly than your alerting system can bring it to your attention and you can fix it. at the extreme end, for example, a standard EBS volume has a throughput of 125mb/sec. something that saturates that limit will fill up 10gb of free space in 80 seconds.
And of course there's nothing to say that both of these things can't be done simultaneously.
Defence in depth is a good idea: proper alarms, and a secondary measure in case they don't have the intended effect.
We have a script that basically slowly expands volume when demand grows, up to a limit. So we don't have to think on stuff like "does the logs partition need to be 1 or 10GB", it will expand to the sane limit, and if it hits that we get disk usage alert before it finishes so we can either see what's going on (app shat in logs), or take a look for the one in the 10 apps that need some special tuning there
FTFY ;)
The authorization can probably be done somehow in nginx as well.
https://nginx.org/en/docs/http/ngx_http_auth_request_module....
I recently came across gdu (1) and have installed/used it on every machine since then.
du -hs -- * .??* 2> /dev/null | sort -h | tail -$LINES
There's also baobab when a GUI might help.Even more confusing can be cases where a file is opened, deleted or renamed without being closed, and then a different file is created under the orginal path. To quote the man page, "lsof reports only the path by which the file was opened, not its possibly different final path."
> Note: this was written fully by me, human.
> It’s difficult to reason under pressure. Experience, that I didn’t have here, would have helped.
But maybe the European Hetzner servers still have really big limits even for small ones.
But still, if people keep downloading, that could add up.
I don't think there is a cheaper CDN.
The author ended up doing this for /nix under pressure, but it's very much standard best practice in any unix/linux box, especially one with only 40GB.
It's always lupu... I mean NTP or disk space.
X-Accel-Redirect (Nginx sendfile), if supported by Haskell is the way, it is zero copy and will dramatically help in many cases.
If you are modifying the body is one of the cases where it doesn’t work.
5. Implement infrastructure monitoring.
Assuming you're on something like Ubuntu, the monit program is brilliant.
It's open source and self hosted, configured using plain text files, and can run scripts when thresholds are met.
I personally have it configured to hit a Slack webhook for a monitoring channel. Instant notifications for free!
And this is why I tried Plausible once and never looked back.
To get basic but effective analytics, use GoAccess and point it at the Caddy or Nginx logs. It’s written in C and thus barely uses memory. With a few hundreds visits per day, the logs are currently 10 MB per day. Caddy will automatically truncate if logs go above 100 MB.
Even though it only happened once, I still set up monitoring for inode exhaustion.
And you can tell by the fact that the filler data is called "burp.mp3" and things like that.
Seems like the sort of thing that only makes sense in a "I know my cheapskate boss won't have larger drives ready to go (or be willing to pay to expand it in a cloud scenario), and he insists that the alarm not go off until 95%, but it'll be my fault if we have a bad incident we can't recover quickly from, so I'm gonna give myself some headroom by padding things a bit" extra-paranoid scenario.
And a single large dump to disk, like some daemon suddenly bugging out and writing incessantly to logs, will render all that moot anyway.
Will dedupe,compression,sparse files you simply don’t track utilization by clients view, which is what du does.
The concrete implementation is what matters and what is, as this case demonstrates, is what you should alert on.
Inodes, blocks, extents etc.. are what matters, not the user view of data size.
Even with rrdtool you could set reasonable alerts, but the heuristics of someone exploding a sparse file with a non-sparse copy makes that harder.
Rsync ssh etc… will do that by default.
I use Proxmox as the hypervisor, and the ZFS resize part is supported on the GUI and it's trivial to use. Let me know if you need more details.
My early days of computing got easier when I had a second computer to look up the issues of the first computer.
In fact since enlarging live ext* filesystems has been very reliable² for quite some time and is quick, I tend to leave a lot of space initially and grow volumes as needed. There used to be a potential problem with that in fragmenting filesystems over the breadth of a traditional drive's head seek meaning slower performance, but the amount of difference is barely detectable in almost all cases³ and with solid state drives this is even more a non-issue.
> And most importantly 10% […] nowadays it's 50-100GB at least.
It doesn't have to be 10%. And the space isn't lost: it can be quickly brought into service when needed, that is the point, and if there is more than one volume in the group then I'm not allocating space separately to every filesystem as would be needed with the files approach. It is all relative. My /home at home isn't nearly 50GB in total⁴, nor is / anywhere I'm responsible for even if /var/log and friends are kept in the same filesystem, but if I'm close to as little as 50GB free on a volume hosting media files then I consider it very full, and I either need to cull some content or think about enlarging the volume, or the whole array if there isn't much slack space available, very soon.
--------
[1] The root-only-reserved blocks on ext* filesystems, though that doesn't help if a root process has overrun, or files as already mentioned above.
[2] Reducing them is still a process I'd handle with care, it can be resource intensive, has to move a lot more around so there is more that could go wrong, and I've just not done it enough to be as comfortable with the process as I am with enlarging.
[3] You'd have to work hard to spread things far and randomly enough to make a significant difference.
[4] though it might be if I wasn't storing 3d print files on the media array instead of in /home
> And most importantly 10% of the drive in ~2010 were 6-12GB, nowadays it's 50-100GB at least.
Back then you were paying about $2 per gigabyte. Right now SSDs are 1/15th as expensive. If we use the prices from last year they're 1/30th, and if we also factor in inflation it's around 1/50th.
So while I would say to use a lower percentage as space increases, 50-100GB is no problem at all.
Only if you fill the drive up to 95-99% and do this often. Otherwise it's just a cargo-cult.
> So while I would say to use a lower percentage as space increases
If your drive is over-provisioned (eg 960GB instead of 1024GB) then it's not needed. If not and you fill your drive to the full and just want to be sure then you need the size of the biggest write you would do plus some leeway, eg if you often write 20GB video files for whatever reason then 30-40GB would be more than enough. Leaving 100GB of 1TB drive is like buying a sneakers but not wearing them because they would wear.
> If your drive is over-provisioned (eg 960GB instead of 1024GB) then it's not needed.
I disagree. That much space isn't a ton when it comes to absorbing the wear of background writes. And normal use ends up with garbage sectors sprinkled around inflating your data size, which makes write amplification get really bad as you approach 100% utilization and have to GC more and more. 6% extra is in the range where more will meaningfully help.
> Leaving 100GB of 1TB drive is like buying a sneakers but not wearing them because they would wear.
50GB is like $4 of space the last time most people bought an SSD. Babying the drive with $4 is very far from refusing to use it at all. The same for 100GB on a 4TB drive.
Of course, doesn't matter for desktop use as the spare on drive is enough, but still, if you have 24/7 write heavy loads, making sure it's all trimmed will noticably extend lifetime
Let's set a fixed threshold -- 100GB, say -- and play out both methods.
Method A: One or more ballast files are created, totalling 100GB. The machine runs out of storage and grinds to a halt. Hopefully someone notices soon or gets a generic alert that it has ceased, remembers that there's ballast files, and deletes one or more of them. They then poke it with a stick and get it going again, and set forth to resolve whatever was causing the no-storage condition (adding disk, cleaning trash, or whatever).
Method B: A specific alert that triggers with <100GB of free space. Someone sees this alert, understands what it means (because it is descriptive instead of generic), and logs in to resolve the low-storage condition (however that is done -- same as Method A). There is no stick-poking.
Method C: The control. We do nothing, and run out of space. Panic ensues. Articles are written.
---
Both A and B methods have an equal number of alerts for each low-disk condition (<100GB). Both methods work, in that they can form the impetus to free up some space.
But Method A relies on a system to crash, while Method B does not rely upon a crash at all.
I think that the lack of crash makes Method B rather superior all on its own.
(Method C sucks.)
Alerting on an unexpectedly high rate-of-change, as some others have suggested, also seems good for some workloads.
But yes, without the actual use-case it's just speculations.
NB QVO drives I mentioned a year ago in the comments are still running, but I do make sure they are never used more than 80%
openssl enc -aes-256-ctr -pbkdf2 -pass pass:"$(date '+%s')" < /dev/zero | dd of=/home/myrandomfile bs=1M count=1024
Almost all CPUs have AES native instructions so you'll be able to produce pseudorandom junk really fast. Even my old system will produce it at about 3Gb/s. Much faster than urandom can go.Easiest alternative I guess is to pipe through head. It still grumbles, but it does work
openssl enc -aes-256-ctr -pbkdf2 -pass pass:"$(date '+%s')" < /dev/zero | head -c 10M > foo head -c 1G /dev/urandom > /home/myrandomfile
And not have to remember dd's bizarre snowflake command syntax. $ sudo truncate --size 1G /emergency-space
$ sudo shred /emergency-space
I find it widely available, even in tiny distros.Most current desktops (smaller than your usual server) won't have any problem with the GP's command. Yours is still better, of course.