Why Swift for TensorFlow?(github.com) |
Why Swift for TensorFlow?(github.com) |
Edit: I wonder if Swift could be replaced with Rust for iOS development?
We believe that Rust supports all the ingredients necessary to implement the techniques in this paper: it has a strong static side, and its traits system supports zero-cost abstractions which can be provably eliminated by the compiler. It has a great pointer aliasing model, a suitable mid-level IR, a vibrant and engaging community, and a great open language evolution process.
A concern with using Rust is that a strong goal of this project is to appeal to the entire TensorFlow community, which is currently pervasively Python based. We love Rust, but it has a steep learning curve that may exclude data scientists and other non-expert programmers who frequently use TensorFlow. The ownership model is really great, but mostly irrelevant to the problems faced by today’s machine learning code implemented in Python.
From my point of view, a weak justification is worse than no justification in cases like this.
Rust is much better suited to this task than Swift from a technical point of view. The far superior platform support for Windows and Linux is ample reasoning to say Rust is better suited for this task, since very few data scientists will be training models on macOS. However, that's only one of several areas where Swift has shortcomings for a project like this. Swift is great for iOS and macOS development, of course, since it was designed for that. I don't think Swift is a bad language by any means, and with enough effort, it can be reshaped to be good for Tensorflow... the GitHub document just provides zero useful justification for the work required to make it good for Tensorflow.
EDIT: to some of the replies talking about Rust's learning curve, that mostly applies when you start trying to design efficient, interlinked data structures involving ownership. For most applications of machine learning, this simply wouldn't be a problem. The library would provide the data structures, you just have to use them. Rust can provide simple interfaces to complicated things.[3] The compiler's error messages are usually incredibly helpful.
The learning curve of Rust should not be relevant here, compared to Swift, which is also full of idiosyncrasies. Swift and Rust both have a large learning curve for someone coming from Python. This is because they're statically typed languages that are just different from a scripting language. For an application like this, I would say those learning curves are roughly equal at the language level, but as I pointed out in my comments, Swift has an enormous learning curve of requiring many data scientists to either install and learn Linux, or throw out their current computer, buy a Mac, and learn macOS.
My point here is not that Rust is the most suitable language for Tensorflow (although it could be), but rather I'm making the point that Rust is more suitable than Swift for a project like this, and therefore this document is just annoying. It would be better for them to delete this document and just say "we're using Swift because our team has a lot of experience with it and because the creator of Swift is leading this project, so we would lack enthusiasm and momentum if we were using something else, even if it were more suitable."
Julia would be really interesting to see explored further, since it would appeal much better to many existing data scientists who would be transitioning from Python. The times that I've played with Julia, I was amazed at how slow the JIT is for even tiny scripts. LLVM is powerful stuff, but it is painfully slow at everything. It would be nice if Julia offered an alternative backend for rapid development.
[1]: https://github.com/tensorflow/swift/issues/3#issuecomment-38...
[2]: https://github.com/tensorflow/swift/issues/3#issuecomment-38...
[3]: http://kiss3d.org/
If you like the pain of using a non supported language without all the XCode, UIBuilder, CoreData, Instruments, Metal Shaders debugging,... goodies then yes.
They complain about C#/Java having "highly dynamic constructs" but correct me if I'm wrong but isn't swift also a GC/OOP like Java and C#?
I don't think Swift has any inherent objective advantages over c#.
I think it would have been a better decision to go with C# over Swift as Microsoft has a clear roadmap with the language and it is already supported on linux/mac/windows.
It's a fine enterprise language, but good lord writing data science and machine learning stuff in it would be an right pain. It's also not super high performance, and when you're doing a lot of maths heavy operations, high performance is absolutely crucial. I had great difficulty establishing whether SIMD/vectorisation was even supported, and then even more difficulty getting it to work.
Julia would have been a far, far superior choice than Swift.
For a project like this, though, the type F# providers are a bit of a game changer that opens a lot of roads to create a 'best of both worlds' experience. For example, offloading heavy maths to other runtimes while providing a mature stack for everything outside of ML. The F# Type Provider for R (http://bluemountaincapital.github.io/FSharpRProvider/), is an example of this hybrid approach.
I believe Julia looks to be the better choice over Swift, tho.
Its performance is good enough for doing medical digital imagining as presented by Siemens at FOSDEM 2019.
It is a matter to properly use the features that the language gives us.
Swift makes use of SIL and LLVM bitcode before the final binary is produced.
Likewise C# can be AOT compiled to actual machine code via NGEN, .NET Native, CoreRT and Mono/Xamarin.
Apparently I could use a library called Vectors, buried deep inside some numerical library, but then the runtime wouldn’t recognise the libraries existence despite being a dependency and installed (and linked and every other thing you have to do to get .Net to do anything). After I fixed that issue it wouldn’t let me construct any arrays or anything.
Suffice to say, on top of C#/F# being painful to deal with at the best of times, attempting to do anything numerical was an absolute shit fight. I’m sure if you’ve got a whole team, you can make anything work, but for me it was not at all worth the effort.
When you consider I can get fully guaranteed (not just hoping the compiler chooses to optimise it right) in Julia practically for free along with nicer syntax, 100% less namespacing hell, equal or greater performance, and far more data science and numerical packages and it’s hard to see what the draws of C# would be.
Roslyn and RyuJIT aren't the same thing. Roslyn is the new compiler infrastructure for MSIL generation, where the original C++ compiler got replaced by bootstraped VB.NET and C# compilers.
RyuJIT is the new JIT compiler introduced in .NET 4.6, replacing the former JIT64.
I don't disagree that Julia is better suited for data science given the eco-system, as proven by my other posts in this thread, just that the performace is also there when one wants it.
I do agree with your criticism of the document here, though. It feels very much like Swift happens to check many boxes, but the lack of Windows support is baffling. It's simply table stakes to be able to run, fully supported, on Windows, macOS, and major Linux distributions. That should be the very first thing anyone considers.
But beyond that, I think even with Rust's macro system it could be difficult to make it work for Tensorflow in a way that feels appropriate for Rust programmers _and_ for TensorFlow. This was explored in F# for Tensorflow research[0] and a completely different approach[1] was taken because making a type system suitable for tensorflow got too unweildy.
[0]: https://github.com/fsprojects/TensorFlow.FSharp
[1]: https://github.com/fsprojects/TensorFlow.FSharp#live-checkin...
If you're talking about matrix shape compatibility (matching up rows from one with columns from another) I'm hopeful about const generics here: https://github.com/rust-lang/rfcs/blob/master/text/2000-cons...
"Java / C# / Scala (and other OOP languages with pervasive dynamic dispatch): These languages share most of the static analysis problems as Python: their primary abstraction features (classes and interfaces) are built on highly dynamic constructs, which means that static analysis of Tensor operations depends on "best effort" techniques like alias analysis and class hierarchy analysis. Further, because they are pervasively reference-based, it is difficult to reliably disambiguate pointer aliases."
Having done user research on this by speaking to data scientists, I can say that static typing is desired by a nonzero number of who practice what we would consider to be data science and machine learning. Much like how TypeScript is seen as a revelation to hordes of JavaScript programmers who have never used static types before, the ability to get some level of correctness verification at design-time matters.
Do you have any insider knowledge that Chris Lattner had the unilateral power to choose Swift for this project? I would imagine with the importance of TensorFlow at Google, the decision to go in this direction had to be agreed on by a number of people.
> The learning curve of Rust should not be relevant here, compared to Swift, which is also full of idiosyncrasies. Swift and Rust both have a large learning curve for someone coming from Python.
How exactly would Rust-Python interoperability work? Swift for Tensor Flow allows any python library to be called like a native library in Swift. Could you do that in Rust?
Yes, and companies are even doing it in production. Sentry probably being the best well known.
Who would trade static typing with fast prototyping any time.
Data science is a really nebulous term covering many drastically different domains of CS. Many DS I talked with, don't really produce code, they do coding to produce analysis, which is the actual delivery. For them, code is ad-hoc and disposable, created on demand and left in the dust until rediscovered when mission comes.
Some of the code do survive and enter production stage, I guess that is where they would seek some assurance from static typing. But I do think they could learn to mitigate most of pain if they can commit themselves to write some unit-tests/functional tests, yet such awareness is rare among the DSs I know and worked with.
So all in all, yes static typing MIGHT help, in some way, but I don't think it addresses the underlying pain point as much.
These need not be at odds. Many ML languges like F# or OCAML, by use of type inference, get you type safety without having to type a bunch of stuff and sacrifice faster prototyping. And certainly in F# there is a history of having productive tooling that lets you prototype easily. Simply writing some F# code in an F# script in an IDE, hitting alt+Enter, and letting it execute in an interactive shell is hugely productive for exploratory tasks. And features like Type Providers build out types for an arbitrary data set that let you guarantee your code is actually correct for the data.
What I've mentioned isn't without its flaws, and eventually someone is going to reach head-scratching problems just as they would in any other environment. I don't think there's an objective way to measure productivity across a wide range of professionals, but I do believe that some subset of them would prefer static types for their work. This is backed by conversations with some of them about problems they encounter.
So gradual typing like in Julia is already a good thing for having the best of both worlds.
This is being explored with "Live Checking" in F#[0], which offers a form of static typing over TensorFlow without actually forcing you to express every complex interaction with data in types.
[0]: https://github.com/fsprojects/TensorFlow.FSharp#live-checkin...
Enter stage left: Julia
Julia is already pretty great, I'd really love to see what cool stuff we could have with a swell in community size and investment!
[1]: https://github.com/JuliaArrays/StaticArrays.jl
[2]: https://github.com/davidavdav/NamedArrays.jl
In brief, it is not the duty of the automatic differentiation package to favour a specific array type – it just works for all of them, which is something that I find fairly magical with Julia.
2) StaticArrays would be a good starting point, but the point of it is to optimize Arrays by unrolling for loops and triggering SIMD (IIRC) and there are performance penalties when your arrays get really large, which they do, in ML. Something LIKE the staticarrays typesystem but without the overoptimization would be welcome.
3) (kind of tangential) I have beef with how GPU is handled as GPUArray in julia. It really should be handled as a worker node using the ClusterManagers-type semantic; you should be async sending tasks to the GPU as if it were a remote agent (which it kind of is, due to PCI bus bandwidth and latency bottlenecks) and waiting for the result to come back as a Future.
Nowadays, it's difficult enough to convince people to drop e.g. MATLAB for R or Python for Julia (let's assume that there's some merit to it), despite them having excellent counterparts for almost everything. Swift's success in this domain depends solely on the adoption by developers/researchers/engineers. Unless they're just going to mostly use it internally (as Google is known to).
Which brings me to the last point - why on Earth would they pick Swift (apart from Christ Lattner being involved) when Julia was on the table? It ticks all their boxes and has more mature ecosystem for all things "data". Provided rationale is hardly convincing.
As someone who uses TF heavily, I would be much more excited about this project if they'd chosen Julia. Swift's tooling isn't great, and I already have a foot in one language with an immature data science ecosystem (Rust).
I'd like to add that, with my limited experience in prototyping some of my ML models, having a static checker to check that your tensors have the right shape is much better than having to run your code.
Their focus is more on the c bindings and allowing other people to build what they want on top of that.
Other language bindings aren't generally going to be used for anything more than inference. First class actual data science work isn't going to happen in other languages anytime soon (at least outside of julia and R which are at least trying to compete in this niche).
_edit_ seems JuliaC does support this sort of thing:
https://juliacomputing.com/blog/2016/02/09/static-julia.html
Actually in this case it does. Swift for Tensorflow includes python interop out of the box: https://www.tensorflow.org/swift/api_docs/Global-Variables#/...
The supported use-case would be to do your ML work in Swift, and then call numby etc. from Python.
That might have been the right choice for Apple’s uses of Swift where GC pauses affect the user experience but for most other use cases it’s too much of a cognitive burden IMO.
Because of iOS?
"Swift for TensorFlow rethinks machine learning development ... I imagined, advocated for, coded the initial prototype and many of the subsystems after that; recruited, hired and trained an exceptional engineering team; we drove it to an open source launch and are continuing to build out and iterate on infrastructure."
This was not in process before Chris came, it was a project he suggested and started pushing on?
What is there to be coincidental?
Swift for Tensorflow might work if the scope is to create a client side model definition loader natively for various TF models.
Nobody use Swift seriously for server side training, there is no point in doing so except to add swift to the list of language that claim to do deep learning but in reality nobody will consider them.
Personally I am running Mojave on a late 2009 macbook pro and it still works amazingly well. Transition from Mojave and especially the new XCode are also way faster than previous iterations. There are caveats though, as the processor in my computer is too old, I had to hack homebrew to compile everything from source.
(Also, using the patcher does not hinder my ability to push updates to the App Store or use iMessage, if that is a concern)
It's a superior platform to on which develop this sort of thing, and further along at that. Also easier to use.
In my experience, Julia has also inscrutable scoping rules, a slow REPL, and it's only fast if you don't count the "startup time" of having to precompile everything.
Re: Scoping rules, these are being evaled.
Re; startup time, already better in 1.1, and will soon be marginalized from two ends: Better static compilation and better tiered compilation.
See https://github.com/apple/swift/blob/master/docs/Windows.md and https://forums.swift.org/t/windows-nightlies/19174 for more info.
Is TensorFlow "huge" in linux, windows, android? Because I also evaluate swift for my use case (https://www.reddit.com/r/swift/comments/8zb9y1/state_of_swif...) and decide instead on use rust mainly because the lack of solid support on non-apple platforms. However, after use rust for months now I still consider swift a better contender between performance/ergonomics than rust (rust is damm hard sometimes, and suddenly you could hit a trouble with above-average complications. I don't see how put this burden in a library to be used for more "regular" folks could work)
Windows is less far along, but recently a contributor got nightly builds started on Azure, and it appears there is serious work on this front.
In any case, it's already possible to run Swift for Tesorflow on Windows using WSL and Docker.
But is not tensor flow popular on windows? Because then build on top of swift will mean:
- Put swift on a fast track to be decent on windows, linux, android(?)
- Ignore the windows users and let them battle a bad dependency?
On the one hand, we do want full access to the programming model exposed by the component -- its control structures, abstractions, everything else. One the other hand, these are mostly duplicated by our host programming language: it's going to have variable bindings, operators, iteration, conditionals and everything else. Doing an embedding like this is a way to expose most of the component's facilities without introducing a ton of "new syntax" in the form of combinators or having programs where a lot of the critical code is escaped in strings.
This same problem shows up in programming interfaces to RDBMSes. LINQ is a good example of the same embedding technique.
I ask this because the number 1 reason my deep learning research group chose python was because of the extensive and interactive scientific plotting ability that's built into python jupyter notebooks. While our volume of analysis isn't on the scale of say a google/fb (primarily biomedical image analysis), the ability to easily visually debug the results is much more important for developing robust models.
Another reason now that I think about it, is the number of scientific libraries that I can just "pip install" without much thought (such as scipy/opencv).
The lack of windows support is addressed in just two lines. Julia being an already established language in the domain of data science does not seem to be especially important to them.
I think the most honest part of the document is:
> because we were more familiar with its [Swift's] internal implementation details
Side note, does anyone know the effort required to get various python based libraries running on swift? i.e. numpy, scipy, pandas and so on?
There’s some pretty impressive ML frameworks in Julia and the language can do some really cool things, so I’m hoping that gives it the edge.
Plus, I found tensorflow exceedingly painful to use, so hopefully something else prevails.
I believe the Swift for TensorFlow team is currently hiring for this.
Some of those features are making their way into the main branch, but at the moment you could not import the TensorFlow library into an iOS project and use it. Swift for TensorFlow needs to be built using a separate toolchain.
I imagine calling Rust from Julia will be much more common than calling Julia from Rust. I know approximately nothing about this, but there are plenty of questions about embedding Julia into C/C++[2][3]. May be similar for Rust.
[0] https://github.com/dyu/ffi-overhead [1] https://github.com/robsmith11/fastfloat [2] https://discourse.julialang.org/t/support-with-embedding-jul... [3] https://discourse.julialang.org/t/api-reference-for-julia-em...
I think it's been pretty straightforward that Chris joined Google and started this project. You don't have to wonder if that's a coincidence, that's what he's happy to say happened.
> MacBook Pro (Mid 2010 or newer)
So your 2011 should be supported. If it's actually an older machine, compiling from source is an option, if slightly inconvenient.
https://github.com/google/swift-jupyter#rich-output has an example with screenshots.
I'm not familiar with Snap, but I did find https://snapcraft.io/swift
Also, upgrading will be less common than Rust since almost everyone uses the latest release/toolchain. There's not really a reason to use the daily builds unless you're contributing to the Swift project.
Also a lot of the libraries which are mostly used in iOS don't have any dependancies on the iOS platform: for instance promise or event emitter implementations etc.
IBM is actually supporting a number of open-source swift projects as well: https://github.com/IBM-Swift.
Between that and painless interop with C/C++, Swift does not feel under-supported on Linux in the least.
It's possible to write production code in brainfuck if that's what you really want to do. Statically checked code is easier to implement correctly, easier to modify, and easier to maintain.
If that were true literally no one would program in python. Static checking is not the end all to uptime and stability. I wrote an elixir program in three days that served as a testbench for a senior's go program (which took him six months to write). This senior believes in static typechecking for everything and doesn't write unit tests. The testbench handles thousands of parallel async requests without a hiccup and even survives operating system resource exhaustion, where the go program falls over and panics.
Erlang is not statically checked (there is a static typechecker, but it's not fully typed). I promise you a well written erlang program has much higher stability than a well written go program. There is a reason why kubernetes exists, after all.
There are many cases in which people don't choose the optimal language. But I would say the size of the python community has more to do with inertia, the breadth of libraries available, and a relatively shallow learning curve than it says about its strengths as a tool for writing good software.
I actually find that Python has some rather serious warts: the whole story around environment/version management is a mess, and the less I have to work with Python in a serious capacity the better.
> I wrote an elixir program ... The testbench handles thousands of parallel async requests without a hiccup and even survives operating system resource exhaustion, where the go program falls over and panics.
Well Erlang is specifically designed for concurrency and stability: if you want to judge your result on those two metrics I hope it is going to perform well.
I never made the claim that static typing is the "end all to uptime and stability" - static typing makes it easier to reason about your code, and to provably eliminate many issues. It's very nice that you implemented a test harness quickly, but come back to me when you've worked on a complex codebase with several other people over an extended period of time.
Also it's not a high overhead runtime. The runtime itself is compiled to highly optimized machine code (it can even compile, say the derivative of f(x) = 5x+3 down to the machine immediate "5" at compile time).
There is a lot of lifting to get that compilation framework into place, so there is a load-time overhead.