Use Haskell for shell scripting(haskellforall.com) |
Use Haskell for shell scripting(haskellforall.com) |
Some related projects:
- Joey Hess recently released a nice Haskell-to-sh compiler. I like this approach as the resulting sh scripts are runnable on pretty much every *nix. https://joeyh.name/blog/entry/shell_monad/
- Chris Done also released a lib to do shell stuff from Haskell, which build on the conduit library http://chrisdone.com/posts/shell-conduit
- Chris also wrote a shell in Haskell https://github.com/chrisdone/hell
- Then there is Shelly by Greg Weber https://github.com/yesodweb/Shelly.hs
There are probably more...
inshell
:: Text -- Shell command
-> Shell Text -- Standard input to feed command
-> Shell Text -- Standard output produced by command
I made one intentional simplification in the API, which was to not provide a way to capture standard error. It's definitely possible to provide such a utility, but I wanted to simplify things as much as possible in the first release before the slow onslaught of feature cruft begins. If there were such a utility, it would have this type: both
:: Text -- Shell command
-> Shell Text -- Standard input to feed command
-> Shell (Either Text Text)
... and you could selectively listen to just stderr or stdout by taking advantage of the fact that pattern match failures short-circuit downstream commands: Left txt <- both -- only read stderr
There is one more shell library that I know of: `process-streaming`. I actually didn't know about `shell_monad` (that's the one most similar in spirit to what I wrote).The main reason I rolled my own library is that this was written with the specific audience of people who didn't know any Haskell, but were comfortable with Python or Bash. My actual goal is to convince people internally at Twitter to use Haskell instead of Python for large scripts. I reviewed all those libraries (with the exception of shell_monad) to see if I felt comfortable marketing them to non-Haskell programmers and none of them felt like the right level of abstraction to me. I almost ended up going with Shelly, but in the process of polishing shelly for internal usage I found myself continually wrapping things with better names, different types, and providing missing features to get a single import umbrella, so I just stopped and asked: "why not just do this as a cohesive single library instead?". Also, `shelly` does not provide any `IO`-only commands: everything has to be wrapped in the `Sh` monad.
As for the other libraries, `shell-conduit` was too complex for new users in my opinion and `hell` is not embedded within Haskell (it's a separate language), and I wanted to keep the features of Haskell. I still need some more time to review `shell_monad` to see if I made a mistake by ignoring it.
For example, the pwd function returns a FilePath type rather than a String:
Prelude Turtle> :type pwd
pwd :: IO Turtle.FilePath
The datefile function is also typed: Prelude Turtle> :type datefile
datefile :: Turtle.FilePath -> IO UTCTime
So this really does seem to structure the data passed between commands, instead of the "stringly typing" unix shells have historically been known for. data Root
= RootPosix
| RootWindowsVolume Char
| RootWindowsCurrentVolume
data FilePath = FilePath
{ pathRoot :: Maybe Root
, pathDirectories :: [String]
, pathBasename :: Maybe String
, pathExtensions :: [String]
}Path carries String-like information, it can even be easily converted to and from strings. Yet, it's a strong type that won't let you write something like 'path </> file_contents' (although, with overloaded strings, you can do 'path </> "file_name"').
UTCTime is not String-like.
Getting into more tricky stuff, what's the equivalent of <() in bash?
This doesn't really demonstrate anything that shell scripts are actually written for: orchestrating and composing other processes, and job control.
If you wanted to leverage type checking for safety, it would be more interesting to typecheck the streams input and output by pipes.
main = do
cd "/tmp"
mkdir "test"
output "test/foo" "Hello, world!" -- Write "Hello, world!" to "test/foo"
stdout (input "test/foo") -- Stream "test/foo" to stdout
rm "test/foo"
rmdir "test"
sleep 1
die "Urk!"
Clearly doesn't (it creates a directory, writes in a file, removes that file and that directory all in one go without anything indicated by the function main. Is it because it's the main function of the program, or am I missing something?Hamming has this great set of lectures on how he became a world renowned scientist and in one of the lectures he explains why Ada failed and other languages succeeded. The difference was that Ada was designed logically and most successful languages were designed psychologically. Even when government contracts mandated Ada people still wrote in Fortan and hand translated to Ada. You can watch the videos and take from it what you will.
A minimal bash file is`#!/bin/bash`. A minimal turtle file is already way too long and logical.
The set of videos: https://www.youtube.com/playlist?list=PL2FF649D0C4407B30.
Everyone should do this!
I don't doubt that Haskell isn't a better scripting language language than the shell, but you can't assume /usr/bin/env runhaskell is going to return anything on random Linux servers. Perl and Python, maybe, but Haskell isn't there yet.
Well, yes and no. You can get reasonable compatibility with different Unix flavours if you stick to sh. Your script is not going to work on BSDs once you start using bash specific features, though.
Fun fact: on FreeBSD bash does not live in /bin/bash, it's in /usr/local/bin/bash. Every time you write a shebang with /bin/bash hardcoded you're making your script harder to use there.
Perl is everywhere almost by default and it's more compatible as it has just one implementation, without sh/bash/csh/ksh/tcsh/zsh madness. I'd say it's a good idea to use Perl instead of shell script for anything more complicated than a few lines of code if it's meant to be portable. (And I'm not Perl programmer at all).
I hope Haskel can gain traction in this area, if only because options are always nice to have, and competition forces everyone to bring their best game.
To me, magic bits of shell scripts which turtle would need to improve upon were it to replace said scripts are not the loop constructs, conditionals, or even the type system (even though it's completely lacking in bash), it is the ability to use pipes to link processes concurrently.
There's also nothing stopping you from using forkIO to spark off a separate thread, and doing IO in multiple threads concurrently.
Haskell's IO manager allows multiple threads doing concurrent IO in what looks like an imperative, one instruction after the other manner. Instead of async callbacks like you might expect from other languages.
And here a short example::
#!/usr/bin/env ipython3
#
# 1. echo "#!/usr/bin/env ipython3" > scriptname.ipy # creates new ipy-file
#
# 2. chmod +x scriptname.ipy # make it executable
#
# 3. starting with line 2, write normal python or do some of
# the ! magic of ipython, so that you can use shell commands
# within python and even assign their output to a variable via
# var = !cmd1 | cmd2 | cmd3 # enjoy ;)
#
# 4. run via ./scriptname.ipy - if it fails with recognizing % and !
# but parses raw python fine, please check again for the .ipy suffix which must be there!
#
# ugly example, please go and find more in the wild
files = !ls *.* | grep "y"
for file in files:
!echo $file | grep "p"
# sorry for this nonsense example ;)
# it's even possible to access the output of a command by outputvariable.s, .p or .n
# see file:///usr/share/doc/ipython-doc/html/interactive/reference.html#system-shell-access
Better take a look here, it's more complete:
https://blog.safaribooksonline.com/2014/02/12/using-shell-co...0: http://gibiansky.github.io/IHaskell/ 1: https://registry.hub.docker.com/u/gregweber/ihaskell/
I'm always on the lookout for new languages I can script with (or at least get closer to rapid prototyping) for easier learning, testing, problem solving, etc. I've got templates that I run against linters, style checkers, etc for many languages and it will be helpful to have even more options.
stdout (input "test/foo")
instead of:
output stdout (input "test/foo")
which would be expected considering the previous line.
POSIX shell is everywhere - your current Linux and OS X machines, old UNIX workstations, home routers, servers... Just drop in a file and it will probably run just fine, unless the author screwed something up completely. POSIX shell scripts are the perfect bootstrap mechanisms that will run almost anywhere regardless of architecture.
Haskell, on the other hand, is rarely present in an operating system - if you absolutely, positively need a higher-level language for „shell scripting”, then you have a much higher chance of finding a Perl interpreter, or even Python. Heck, even getting ghc and its' basic ecosystem running has always proved to be a huge burden to me. Try sticking a `cabal install` in your CI flow, you'll see your job times increase by hours.
Third, there's just the KISS aspect of it - if you're writing something that has logic so simple it can be stuck in a shell file, why not just write it in a shell file? You don't need category theory to get a few files installed...
Nothing tricky about it.
Here's a pattern that comes up fairly frequently for me:
foo | fgrep -v -f <(cut -f 2 info.csv) | bar
It uses the second column in info.csv as fixed strings to match inside lines in the output of foo, and filters them out, with the remaining lines going to bar.All 4 processes (foo, bar, fgrep, cut) run concurrently. Likely fgrep will block on cut sooner or later, but the point is that multiple communicating concurrent processes are set up using a fairly easy to use DSL.
That's what a shell is, to me.
This can be done with the "bracket" function, which works roughly like a context manager in Python:
import Control.Exception
import System.Directory
withDirectory :: FilePath -> IO a -> IO a
withDirectory path action = bracket (getCurrentDirectory <* setCurrentDirectory path)
setCurrentDirectory
(const action)`turtle` provides `fork` for running a command in the background. Example usage:
example = do
using (fork commandToForkInAnotherThread)
theseCommandsStillRunInTheOriginalThread
> How does setting up pipes work?See the `inproc` and `inshell` commands, which let you convert any shell command into a stream transformation embedded within Haskell.
> What's the idiom for chdir'ing to a subdirectory such that you pop back out again when you're done (I'd use a subshell with (ch xxx; ...) in bash)?
You can write a combinator for this using `turtle` pretty easily:
pushd newDir = do
oldDir <- pwd
cd newDir
return (cd oldDir)
... and you use it like this: example = do
popDir <- pushDir "/tmp"
... do stuff ...
popDir
> what's the equivalent of <() in bash?`inproc`/`inshell` which let you read in a command's standard output as a stream
<(foo) in bash creates a fifo, and pipes the output of foo to the fifo. It then replaces the whole <(foo) argument with the path to the fifo. This means that commands that normally expect to read from a file on the command line can instead be wired to read their input from a process. And, of course, both processes run concurrently.
>(foo) does the same thing, except the other way around, for process output.
inDir d action = do
oldDir <- pwd
cd newDir
result <- action
cd oldDir
return result
Then you could use it like a python `with` statement. :)So you get a sort of cascading scope of lambdas where the result of each lambda is available passed into the next. Each lambda depends on the evaluation of the previous one. Normally in Haskell functions are executed lazily, this structure forces the sequential evaluation.
So what are these cd, mkdir, output etc functions? They return an object with a specific type called 'IO'. This type is monadic, but that's irrelevant for now. Haskell as you know has no side effects in the language itself. The IO type basically is a command pattern, it says "execute this I/O with these parameters".
The monadic aspect of IO makes it so that at the end the commands will have accumulated in a list, of which you can get an item if you give it the results of the previous item. So that's what the main function returns, a list of commands with some lazily evaluated Haskell code in between them. Now comes the side effect part. The Haskell runtime system iterates over the list of commands and executes them. The result of each command is used to get the next command in the list.
So that's the core of the magic trick of monadic I/O, you make a lazy list of I/O commands, and have something external to the language execute those I/O commands, giving the results back to the language to get the next I/O command to execute.
The end result in this case is something that just looks and feels completely imperative.
but if you were to try and call, say, the "rmdir" function inside another function that didn't have an IO return type, you'd get a compile error. (More specifically, you could technically call the function, you just couldn't return the "IO" value as a result, so it couldn't perform any actions).
:)
So I disagree with the claim "without anything indicated by the function main", and would amend to "without anything indicated explicitly by the source code, leaving the only indication in the inferred type".
do
value1 <- function1 --effectful function
let value2 = function2(value1) --pure function
value3 <- function3(value1, value2) --effectful function
...
But the notation is a bit more magic for this "no return" case; I prefer the Scala approach where even if you don't care about the return values you'd have to write this as for {
_ ← cd "/tmp" // effectful function
_ ← mkdir "test" //effectful function
_ = someCalculation() //pure function
...But the "let" vs no "let" is a pretty strong hint anyway :)
Like everything, you have to learn to balance your IO code and your pure code. A bit like learning when to factor something into a separate class, or leave it in a few statement/methods. If you write everything in IO & do notation, you don't get the main benefits of haskell. But if your code is more than 10 lines long, chances are that there will be useful pure functions in it.
def main_function(args):
data = get_data(args)
result = do_calculations(data)
push_results(result, args)
Where the function do_calculations is somewhat pure - no side effects, but I do use local variables that I modify inside the function.> You are right. In this case effects are not isolated. But in this particular script, there are no interesting things to move into a pure function. It does not mean that it wouldn't be the case in a more complex script.
Well, I thought that the point of Haskell (of one of its points) is that it forces the programmer to declare whatever side effect in the type of the function. But here, there is no way to know that main, on top of printing stuff, also messes up with the directory and there is no type signature indicating it - in this example it's no big deal but I could write something like:
main = do
rm "/"
sleep 1
die "Oooops my files!"I don't think it's reasonable to assert that Ada "failed" (e.g. it runs on large passenger airplanes), but in any case that's kinda beside the point, TFA isn't primarily about Haskell evangelism/advocacy.
> language pragma, do notation, liftIO, parser combinators.
Arguably, all of this is within the reach of an intermediate-level Haskell programmer. OverloadedStrings is considered a basic pragma.
I think the use of `liftIO` is a reasonable objection. When I wrote the library I had the choice of utomatically pre-wrapping all `IO` commands with `liftIO` for the user (making them all `Shell` commands) by default. However, I decided not to do that for two reasons:
* If you do that you can't use them outside of a `Shell` any longer * The user has to learn `liftIO` anyway if they want to use `IO` actions not provided by the `turtle` library. I didn't want to teach the user a leaky abstraction
I don't see any issue with `do` notation is bad. Same thing with parser combinators, which are just strings in the simple case, and the "Patterns" section of tutorial has a table showing you how to convert regular expression idioms to `Pattern`s:
http://hackage.haskell.org/package/turtle-1.0.0/docs/Turtle-...
The language pragma is sort of a grey area. I decided to keep it because it doesn't take a long time to explain and it significantly increases the usability of the library.
C goes hand-in-hand with UNIX, so clearly no UNIX vendor would have it in their SDK and UNIX developers weren't willing to pay for tools.
As history has shown, the moment UNIX vendors started doing "Home" and "Pro" editions, GCC got lots of help.
As Ada talks at FOSDEM show, it is present everywhere where safety matters and its use has been slowly increasing since the Internet has shown how bad idea is to connect C code to the outside world.
Also, it's not just Unix that's written in C or a descendant - there's also, well, Windows, and a load of embedded RTOSes.
If Ada made you as productive as C with extra benefits or something to that effect, you'd expect Ada to succeed at the marketplace at a scale at least comparable to C's - especially with the government support it had which put C at a disadvantage, not?
I remember something in "Mythical Man-Month" that extolled the virtues of scripting programming for concept exploration, and I've often felt this is one of the major advantages traditionally scriptable languages have over compiled. Once you can run a program without compiling it, iteration tends to go faster.
So sure, some languages require more boilerplate to get started than others, but I've got templates for that, and I happily scripted almost all the exercises in "Thinking in C++" because it just made working them out faster, even in emacs where I can bind the compile key to any command I can dream of.
There are a number of potential approaches for coming close to the ease of shell scripting. One is options records as others have mentioned. For defaults you can have a Default instance (no, that's not boilerplate because you would have had to specify the defaults somewhere anyway). Then there is plenty of room for infix operator combinators to make it easier to change individual options. A second option could be to put options into a string that would get parsed into a record. You could use patterns similar to those used in existing command line argument processors like optparse-applicative. Or, if you don't like that, then maybe a quasiquote could give more power.
Do these things require some boilerplate? Yes. We know that is going to be required since Haskell wasn't designed for the convenience that shells were designed for. But that's fine in this case because the potential benefits are huge.
The issue is that, if you want to simulate both "grep" and "grep -r", you need to different functions, or you need to have your "grep" function accept a record of parameters.
Which you need to prefix to avoid clashes.
> If they are exclusive, you can construct a type for them.
Which is going to end up being a record, which:
- is awkward to build (compared to just giving options to a command or arguments to a function)
- will most likely need to be an instance of Default
- which needs to have its fields prefixed to avoid clashes
Starts to sound like an awful amount of boilerplate.
So you might want to use a different language - even for purely/mostly personal use, in which case Haskell would be fine.
[1] https://github.com/ValveSoftware/steam-for-linux/issues/3671
Now if you don't know about haskell and want to write a quick and short lived script, there is 0 value in writing it in haskell. However, if you happen to now a bit of haskell and that your script is likely to be used several times you might find some benefits to this.
- haskell is quick to write and the code can be quite terse. You can create an myscript.hs file and run it with runhaskell. Zero platform complexity overhead.
- you get the benefits of static types which are easier maintenance and refactoring.
- if it evolves in anything more complex, it is easy to move it in a cabal project.
- if you need to do something cpu intensive, you can compile/profile/improve perfs.
Many of us who use Haskell do so (without an academic degree or almost any CT knowledge, by the way) because it offers the best bang for our buck -- less code, more safety, more stuff done and done well! It also runs reasonably fast, unlike similarly terse languages.
Being able to use it in a light-weight manner for one-off scripts is nice too.
Every few years someone writes a retarded install script that wipes your drive, it's like it's inevitable
fmap (either id id) (both ...)
... which is equivalent to: x <- both ...
return (case x of
Left txt -> txt
Right txt -> txt)
That removes the `Either` tag and fuses them into a single stream.Unfortunately, I don't know how to get the name of the device file associated to the pipe, and I need it in order to pass it as an argument to the reading process :(
-- Note, the flow is right-to-left, not left-to-right
inshell "bar" (inshell "fgrep ..." (inshell "foo" empty))
Or you can just embed the entire thing within a single `inshell` command: inshell "foo | fgrep ... | bar"
The reason this works is that the type of `inshell` is: inshell
:: Text -- Command line
-> Shell Text -- Standard input to feed
-> Shell Text -- Standard output from command
This leads you stream to any shell command's input and read the command's output also as a stream.Probably, but using C dialects and certification processes that make C just look like Ada with another syntax.
http://www.misra-c.com/MISRAChome/tabid/181/Default.aspx
http://en.wikipedia.org/wiki/DO-178B
http://www.programmingresearch.com/solutions/medical-devices...
> Also, it's not just Unix that's written in C or a descendant - there's also, well, Windows, and a load of embedded RTOSes.
Windows did not exist when UNIX was created.
MS-DOS was based on CP/M which copied ideas from UNIX into home computers. So while C didn't had a special place in home computers, UNIX was gaining adoption in the enterprise even Microsoft had their own UNIX, Xenix.
Which they used to cross compile some of their MS-DOS applications.
So it was only natural that when they started developing Windows, they used their in-house languages and both Quick Basic and Quick Pascal were not that up to the task, leaving C as the option.
Embedded RTOS are traditionally POSIX compliant, wich leads again to C.
Microsof is actually moving away from C, this is why they don't care about compliance any longer and speak about C++ and .NET Native.
Even their latest C99 related changes are only related to what ANSI C++11/14 require and a few key open source projects that they wanted to see supported.
Which is kind of funny, because Microsoft was the last C compiler vendor in the home computing space, to add a C++ compiler to their tools, with Microsoft C/C++ 7.0.
> If Ada made you as productive as C with extra benefits or something to that effect, you'd expect Ada to succeed at the marketplace at a scale at least comparable to C's - especially with the government support it had which put C at a disadvantage, not?
Not if people are expected to pay for the compilers.
And it's a user-friendly shell with all the tab completions and history searches, which DOES NOT BELONG IN /bin/sh!
I unfortunately had to switch to Linux a few years ago (after using FreeBSD for almost a decade) and I still miss how consistent and well laid out BSDs seem in comparison.
Even though Haskell doesn't distinguish different classes of IO actions, it still distinguishes IO actions from other kinds of actions (such as stateful actions as per your example), and pure computations and that provides a hell of a lot of bang for buck.
The Idris language has the notion of effect types [1] to make achieving the goal of categorising the kinds of effects being used in a function easier to deal with, but that uses the dependent capabilities of the language.
[1] http://eb.host.cs.st-andrews.ac.uk/drafts/eff-tutorial.pdf
I use them in every single application I write, or I make my own tighter, more specific ones. They're incredibly useful in real world apps.
Also bad setters ("record {field = val}" looks nice but useless, as not a function).
I hope the developers of GHC also clearly see the problem and someday will be engaged in it. Then the language becomes much more expressive.
Is that task exists somewhere in roadmap?
Today, typeclasses and lenses cover 99% of "the record problem" as far as I've experienced.
> is awkward to build (compared to just giving options to a command or arguments to a function)
Idiomatic bash: apt-get install package_name
Idiomátic haskell: apt_get $ AptInstall package_name
What is so awkward about that?
You could extend your argument structure for this, but then you need to specify every argument all the time, or have the user modify a default value. This is definitely awkward compared to straight shell.
In particular, I think they're more useful in applications than libraries and most Haskell code you can find in the wild is library code---so you end up not seeing them much.
import Prelude hiding ((-))
data Grep = Grep {isRecursive :: Bool, maxCount :: Maybe Int} --etc
deriving (Show)
grep = Grep False Nothing
--short pseudonim
r :: Grep -> Grep
r command = command{isRecursive = True}
m :: Int -> Grep -> Grep
m num command = command{maxCount = Just num}
(-) :: a -> (a -> a) -> a
(-) command flag = flag command
ourGrep = grep -m 50 -r
main = print ourGrep -- > Grep {isRecursive = True, maxCount = Just 50}
--then we should write monad which execute that data grep "foo"
& "recursive" <~ True
& "maxCount" <~ 100
in a typesafe way. It just probably wouldn't be worth the complexity. It also probably couldn't be a straight `IO` action then, which was a design constraint of Gabriel's. example = do
file <- lstree "some/dir"
True <- liftIO (testfile file)
grep "Some pattern" (input file)
This is an example of how most of Bash's option heavy ecosystem is an outgrowth of Bash's limitation as a language (individual commands accumulate flags to work around functionality difficult to implement within the Host language). I think having a decent host language decreases the need for so many configuration knobs for every command.Something like OCaml would be better suited, since polymorphic variants, named and default arguments give a lot more flexibility, though the fact that shell commands happily return different outputs depending on their options would still be an issue.
find . -exec grep $pattern {} \;
but `grep -r` is easier to write....for Unix' insistence on composability, the shell tools are often unnecessarily monolithic, probably because that's the only sane way if the only type you have in interconnect is `string`.
Here, you'd need to either parametrize the return type of ls to get simple strings (which is what you want most of time) or additional metadata, or alternatively to have different ls commands.