Hot code reloading with Erlang(medium.com) |
Hot code reloading with Erlang(medium.com) |
What's your opinion? Ditch Docker and put the Erlang VM on the host OS? Ditch hot code loading and swap containers the usual way? Some middle ground?
> if you can avoid the whole procedure (which will be called relup from now on) and do simple rolling upgrades by restarting VMs and booting new applications, I would recommend you do so.
Erlang grew out of the challenges faced by telecoms industries such as what do you do when blue-green isn't an option? Think an in-use packet switch that is the only point of contact between two networks. No way to take the switch down for maintenance without some interruption in service, which gets messy when dealing with timeouts. In the Armstrong thesis paper he gives another example [2]:
> Usually in a sequential system, if we wish to change the code, we stop the system, change the code and re-start the program. In certain real-time control systems, we might never be able to turn off the system in order to change the code and so these systems have to be designed so that the code can be changed without stopping the system. An example of such a system is the X2000 satellite control system developed by NASA.
This power comes at a cost, though. LYSE again:
> It is said that divisions of Ericsson that do use relups spend as much time testing them as they do testing their applications themselves. They are a tool to be used when working with products that can imperatively never be shut down.
The point being, hot code reloading is an additional feature that can come in handy but for most of HN's audience probably won't be relevant; the cost outweighs the benefits of just blue-green deploying it.
[1] http://learnyousomeerlang.com/relups#the-hiccups-of-appups-a... [2] http://www.erlang.org/download/armstrong_thesis_2003.pdf
In other words, you can compare the Erlang's virtual machine with a container itself, and everything old is new again!
Typical use cases include several gigabytes of in memory state which takes a long time to read in and get hot when redeploying or a large amount of long-running TCP connections.
For most other uses, we just do rolling upgrades in Erlang as everyone else is doing. It is somewhat simpler to get to work, and immutable architecture is to a certain extent easier to manipulate.
We're using Erlang as the primary language environment for our IoT product for a lot of reasons but one big one is: Hot code loading and a very robust release upgrade environment with a lot of control over the process (including restarting everything inside the VM if that's what we wish to do).
For our product, a digital light switch / dimmer, high uptime guarantees is a very important requirement and Erlang has it all plus many other wonderful features.
You can do a hot code load in Ruby using the Kernel#load() call. It won't alter functionality currently on the call stack, but it will change the functionality of everything not on the call stack. With some sympathetic design, you can achieve hot code loading fo high availability in ruby.
$ cat hi.rb
def method
puts "hi"
end
method
load("hello.rb")
method
$ cat hello.rb
def method
puts "hello"
end
$ ruby hi.rb
hi
hello
You must engineer your application to execute the load method and that's it.
However I wonder if this is really equivalent to what Erlang does. I remember http://rvirding.blogspot.it/2008/01/virdings-first-rule-of-p...Except you can clone the car into a controlled environment, and test the whole procedure, before doing the actual replacing.
Someone wrote a module for elixir that uses inotify (and similar) to -I think- watch .beam files for modification and perform the required hot-reloads automatically.
I would be reluctant to run this in production, and I can see situations (even in development) where this could trigger unwanted code purging and would be disastrous, but it's a pretty neat thing to have and -it seems- a must for Web Dev people.
I'm asking because I don't have enough context to know why you want to do what you're asking to do.
Working for 4 years in an Erlang environment where hotloading is the norm, makes me wish for it everywhere! Why do I have to reboot to fix kernel bugs in tcp? :(
[1] the load balancers I have access to where we host had more downtime than our hosts, so not actually helpful
"Bit rot"? The only defense for the bit rot I'm aware of is ECC RAM.
Anyway. AFAIK (and I'm no Erlang expert, so there's probably something pertinent that I don't know) unless there's a resource leak in core Erlang code, resource leaks can be fixed by restarting the leaking application, or killing the leaking process. [0]
[0] Erlang software is often broken up into Applications. [1] An application is a collection of code with a well-known entry point that (ideally) does a particular thing. An application can depend on other applications and the services that they provide. Applications can be started and stopped independently of all others in the system, but -in order to keep running- dependant applications need to be designed to handle the temporary absence of an application that they depend on.
[1] http://learnyousomeerlang.com/building-applications-with-otp
Yeah it could be. Frankly, I'd likely reach for Erlang Releases before I reached for this when updating software in production.
However, for a large variety of dev work, this automatic module reloading thingie works pretty well. :)
> You could pretty easily use code:soft_purge/1 prior to loading to avoid killing lingering processes though...
Mmm. Okay. So, I'm not 100% on how this works, so please bear with me and my inaccurate terminology. :(
In any given Erlang system, there can be two versions of a module running, the "current" one, and the "old" one, right?
So, if you call code:soft_purge/1 when there is no "old" code loaded, it should return true, yes? (In addition to returning true when there's no process running the "old" code.) [0]
So, would this be a way to write an auto-loader that doesn't purge in-use code?
* code:soft_purge(?MODULE)
* if false, wait a while then retry
* if true, code:load_file(?MODULE)
I guess maybe you'd want to build up a list of all the modules that have been modified, and wait until code:soft_purge/1 returns true for all of them before loading the modules. (maybe.)
You also -obviously- want an override that allows for the purging of in-use code.
[0] Testing indicates that it does, but it's often good to double-check. :)
The exact strategy for reloading (wait for all at once, load whatever is ready, how long to wait, etc), left as an exercise for the reader. For dev, I use a function in the shell that loads everything that changed (no soft purge), for prod, i have a function that goes in order and checks soft purge, then loads (if the 2nd module doesn't soft purge, it will have already loaded the first module, but it will stop before trying the 3rd).
With most things in gen_server's, there's not a lot of opportunity for lingering code, but sometimes it happens.
I'm pretty sure that I can reload the inet module: [0]
Eshell V7.0 (abort with ^G)
1> l(inet).
{error,sticky_directory}
=ERROR REPORT==== 6-Dec-2015::03:37:00 ===
Can't load module 'inet' that resides in sticky dir
2> code:which(inet).
"/usr/lib/erlang/lib/kernel-4.0/ebin/inet.beam"
3> code:unstick_dir("/usr/lib/erlang/lib/kernel-4.0/ebin/").
ok
4> l(inet).
{module,inet}
5>
Not that this is a good idea, mind, but I'm fairly certain that it's doable. :)(Also note that you can reload the mnesia module without hassle. Its ebin directory is not marked as sticky. :) )
> I have to use DNS for load balancing [because the load balancers fall over often].
Oh lord. That's a terrible situation to be in.
[0] Which is part of the kernel application, which is one of the applications that hot upgrades require that you restart the emulator to upgrade. [1]
[1] http://www.erlang.org/doc/system_principles/upgrade.html
>> I have to use DNS for load balancing [because the load balancers fall over often].
>Oh lord. That's a terrible situation to be in.
It's not terrible, it's just not great. Our hosting environment is generally very reliable, so if I don't screw things up, my systems won't fall over. It's just their loadbalancers are crap, the suggested upgrade path was run a load balancer in a VM appliance; which seems like maybe I should just run CARP myself on the hosts (or something), instead, and skip a layer, but I'll probably never get around to that, because it doesn't come up that often :)
Oh yeah. I know. I would expect for some-to-many minor things (like the TCP bug you mentioned), this wouldn't be an insane way to do a hot upgrade. Guess one would need to make a close study of the diffs.
Regardless, I mentioned the fact that this isn't a good idea for any novice who might stumble across this comment months or years hence and get the wrong idea. :)
> ...maybe I should just run CARP myself on the hosts...
Oh, CARP is so cool. It's a bit of a pity that ucarp doesn't support IPv6. But, how does CARP replace a load balancer? Isn't it used for single host availability, or is my understanding too narrow?
I have two servers, if I get two carpable ips, and may each server primary on one, and put them both in DNS, I have load balancing and failover. If I just use one IP, at least I have failover, anyway I need each server to be able to handle full load, so I could have hot-warm instead of hot-hot.
(with more than two servers, need to figure something else out, probably two boxes running a simple load balancing in front of the rest of the cluster.