Julia Macros for Beginners(jkrumbiegel.com) |
Julia Macros for Beginners(jkrumbiegel.com) |
pg's article on the topic, "What Made Lisp Different", [0] has aged poorly, and points 8 and 9 it makes (a notation for code using trees of symbols and the whole language always available) are no longer Lisp-specific. The final point, about "inventing a new dialect of Lisp", doesn't hold true either - as seen here, Julia is doing just fine not claiming to be another dialect of Lisp, even though many sources mention directly that it's Lisp-inspired.
Congrats to Julia people for the macro system and to the author for the article!
There is also a secret option to get into a lisp repl in Julia "julia --lisp".
Wtf....... what? I just tried it, it's true. Is this some easter egg?
After that, I realized that macros aren't always something that needs to be avoided; in the right hands they're immensely powerful.
I've only played a little with Julia macros, but it seems like they learned a lot of Lisps lessons, so I support it wholly.
[1] I wasn't aware of how Objective C was built at the time.
Maybe it is better now but when I looked at macros ~5 years ago some language update changed the ast produced by the parser and I basically gave up.
I like that Julia offers some macro-like techniques that replace a lot of the cases where one might use a macro for performance reasons.
For example, maybe you want to handle something that looks like:
foo ~~> bar
In lisp syntax (and recall that is what the Julia ast is: everything is a head and then arguments) it might look like: (~~> foo bar)
; or
(op ~~> foo bar)
But if you change to e.g. foo ~~> bar + 5
You might get (+ (~~> foo bar) 5)
Or (~~> foo (progn (+ bar 5)))
I don’t remember what you got or which cases were tricky, only that I could never guess what the output of dump would be.Was the language even stable then?
While NSE enables the dplyr syntax that many people enjoy, for me it's too magic and I have trouble reasoning about variable names in other people's code.
Consider how ergonomic testing is thanks to macros: https://docs.julialang.org/en/v1/stdlib/Test/
Here's an example of passing quasi-json to a plotting function: https://www.queryverse.org/VegaLite.jl/stable/userguide/vlpl... . This lets you essentially transliterate a VegaLite spec into Julia without needing to translate it into Julia.
Finally, macros that operator on dataframes let you write code that looks kind of like SQL, and is much more pleasant than working with functions: https://dataframes.juliadata.org/stable/man/querying_framewo...
However, it is useful to provide a nicer syntax and DSLs.
Some examples: https://stackoverflow.com/questions/58137512/why-use-macros-... https://www.juliafordatascience.com/animations-with-plots-jl... https://gist.github.com/MikeInnes/8299575
(define-all i 5 0)
;; creates i1 i2 i3 i4 i5 initialized to 0
That's somewhat impossible with functions. The closest you get is either an array/dict with only runtime error checking, or an external codegen program.I wrote a post [0] about how to do this in Racket. The macro generates ORM code based given a SQLite DB. Aka the compiler queries SQLite and generates table-column functions automatically.
More potential benefits are: Better static error messages (can implement a type system using macros, example here[1]), and controlling execution order (can add lazy computation semantics).
[0]: http://tech.perpetua.io/2022/01/generating-sqlite-bindings-w...
[1]: https://gist.github.com/srcreigh/f341b2adaa0fe37c241fdf15f37...
df = tibble(a = c(1, 2))
and you want to use a dplyr verb to modify it mutate(df, b = a + 1)
the `a` in the above expression refers to the column in `df`, but this means it's hard to reference a variable in the outer scope named `a`. Furthermore, if you have a string referring to the column name `"a"`, you can't simply write mutate(df, b = a_var + 1)
Contrast this with DataFramesMeta.jl, which is a dply-like library for Julia, written with macros. df = DataFrame(a = [1, 2])
@transform df :b = :a .+ 1
Because of the use of Symbols, there is no ambiguity about scopes. To work with a variable referring to column `a` you can write a_str = "a"
@transform df :b = $a_str .+ 1
I won't pretend this isn't more complicated or harder to learn. Some of the complexity is due to Julia's high performance limiting non-standard evaluation in subtle ways. But a core strength of Julia's macros is that it's easy to inspect these expressions and understand exactly what's going on, with `@macroexpand` as shown in the blog post.DataFramesMeta.jl repo: https://github.com/JuliaData/DataFramesMeta.jl
mutate(df, b = .env$a + 1)
And if you have a string (contained in a_var) which identifies a variable you can do mutate(df, b = .data[[a_var]] + 1)
You could argue these feel clumsy, but I wouldn’t say it’s “hard” to do either of these things with dplyr.However, both patterns are another special case how identifiers are resolved in the expression. Aren't `.env` and `.data` both valid variable and column names? So what happens if I have a column named `.data`?
Another example, which is the reason why we chose the `:column` style to refer to columns in `DataFramesMeta.jl` and `DataFrameMacros.jl`:
What happens if you have the expression `mutate(df, b = log(a))`. Both `log` and `a` are symbols, but `log` is not treated as a column. Maybe that's because it's used in a function-like fashion? Maybe because R looks at the value of `log` and `a` in their scope and sees that `log` is a function an `a` isn't?
In Julia DataFrames, it's totally valid to have a column that stores different functions. With the dplyr like syntax rules it would not be possible to express a function call with a function stored in a column, if the pattern really is that function syntax means a symbol is not looked up in the dataframe anymore.
In Julia DataFrameMacros.jl for example, if you had a column named `:func` you could do `@transform(df, :b = :func(:a))` and it would be clear that `:func` resolves to a column.
This particular example might seem like a niche problem, but it's just one of these tradeoffs that you have to make when overloading syntax with a different meaning. I personally like it if there's a small rule set which is then consistently applied. I'd argue that's not always the case with dplyr.
.data[[a_var]]
?But in general yeah, R plays pretty fast and loose with scopes, and lets you capture expressions as arguments and execute them in a different scope from the outside one
Personally I'll happily take not being able to use those as column names if it means I can avoid always typing : before every in-data variable, but your comment gave me a better understanding of why it would be bad for some other person or scenario, perhaps where short term ease-of-use is lower on the list of priorities.
For your second example, it doesn't come up in R because a data frame column cannot be a function. Columns must be vectors (including lists) and you could have a vector where one or all elements are functions, but the column itself cannot not be a function (functions are not vectors), so there's no ambiguity there. To call a function stored in your data frame you'd have to access an element of the column, and any access method, e.g. `[[` or `$` would make the resulting set of characters invalid as the name of an object (without backticks, which would then disambiguate the intent)
df <- tibble(x = list(function(x) x + 1))
df %>%
mutate(y = x[[1]](3))
Separate from dplyr, in R when you use `(` to call a function it searches only for functions by that name. log <- 3
log(1)
# 0
frog <- 3
frog(3)
# Error in frog(3) : could not find function "frog"
log <- function(x) x^2
log(1)
# 1I agree it's unlikely that a user will name their column `.data`. But it certainly saves developer effort from thinking about these issues.
The larger concern, really, is that Julia needs to know which things are columns and which things are variables in an expression at parse time in order to generate fast code for a DataFrame. It needs to do this without inspecting the data frame, since the data frame's contents aren't known at parse time.
One option would be to make all literals columns. But then you run into issues with things like `missing`, which would have to be escaped or not recognized as a column. Its hard to predict all the problems there, and any escaping rules would definitely have to be more complicated than R's. So we require `:` and take the easy way out, which has the added benefit for new users who might get confused about the variable-column distinction.