Froid: Optimization of Imperative Programs in a Relational Database [pdf]

RMarcus 7 years ago | |

I've only read part of it, but it seems great so far! I always appreciate the clarity and practicality y'all at the JGL take.

I'm amazed that the implementation was under 1500 LOC! Was that the research prototype or the shipped preview?

Congratulations on the VLDB paper! Hopefully I'll come say "hi" in LA :)

karthiksr 7 years ago | | |

Thank you.

The shipped preview has only a bit more than 1500LOC.

The VLDB paper was presented at Rio in Aug this year already, but I'll try to come over to LA anyways :)

maslam 7 years ago | | |

Karthik, I'm no Spark expert but almost all advice I read is to avoid UDFs if at all possible. Examples below:

- https://medium.com/teads-engineering/spark-performance-tunin... - https://www.inovex.de/blog/efficient-udafs-with-pyspark/

karthiksr 7 years ago | | |

Thank you for those pointers.

There are definitely some differences between the kind of UDFs that Spark supports and the kind that Froid handles. For one, Spark UDFs cannot invoke a Spark SQL query in their definition AFAIK, whereas TSQL functions can. But still, some techniques might be applicable. Definitely worth digging further!

RMarcus 7 years ago | | |

Doh! Guess I should've checked. I didn't make it to Rio last year... Figured I was gonna miss a bunch of good stuff.

maslam 7 years ago | |

Thank you for the paper - it is well-written and succinct. Karhik, do you think this approach can be applied to Apache Spark as well (given its well-known slowness with UDFs)?

karthiksr 7 years ago | | |

Thank you. Conceptually the ideas behind Froid follow from relational algebra so it can be applied to other relational engines as well. However, the details still need to be figured before making any concrete statement.

If you could share any pointers about UDFs and their performance problems in Spark, I would love to investigate more.

prince617 7 years ago | | |

You might want to check out this related work: http://casper.uwplse.org

karthiksr 7 years ago | | |

Thank you. Casper is very interesting work, and I am aware of it. Program synthesis offers an alternative approach to such problems, with different trade offs and characteristics.

The paper includes a brief discussion on synthesis-based techniques, and the reasoning behind Froid's design choices.

gigatexal 7 years ago | |

Why does the first example return price as a char? Looking forward to reading the paper fully. I just scanned it.

karthiksr 7 years ago | | |

It returns a formatted string including the price and the currency code. Eg: "5000 USD".