MXNet – Deep Learning Framework of Choice at AWS

MXNet – Deep Learning Framework of Choice at AWS(allthingsdistributed.com)

99 points by werner 9 years ago | 56 comments

cs702 9 years ago |

Translation from corporatespeak: "We don't have an internally developed framework that can compete with TensorFlow, which is controlled by Google, so we are throwing our weight behind MXNet."

As others have commented here, there is no evidence that MXNet is that much better (or worse) than the other frameworks.

eva1984 9 years ago | |

Exactly. Among those DL frameworks, I think what TensorFlow gets right the most is the tooling support. The metric collection/visualization/checkpointing is plug-and-play in TensorFlow, others not too much. For example, summary a.k.a metric collection is just a subgraph of the whole computational graph, which can be evaluated at any time. A simple and neat abstraction indeed.

Those properties combined make TensorFlow the most engineer/practitioner friendly choice in the market. If AWS hopes to compete with TensorFlow in all seriousness, they need to catch up with support on those seeming trivial but important details.

werner 9 years ago | |

Amazon has been building technology based on ML&DL for over 20 years and has developed several frameworks. You must have missed the announcement of this open source framework earlier in the year: https://github.com/amznlabs/amazon-dsstne.

cs702 9 years ago | | |

I saw that when it was announced. DSSTNE has failed to capture the hearts and minds of developers. In my experience, it doesn't come up in any conversations about which frameworks to bet on for new product development.

And I'm rooting for Amazon (and FaceBook, and Microsoft...). TensorFlow needs competition for the hearts and minds of developers.

nate_martin 9 years ago | | |

This doesn't address the root comment at all. Does Amazon actually think MXNet is the best? Or did they simply choose the next best thing that isn't already backed by another "big four" company (Google -> TensorFlow, Facebook -> Torch). It's hard to believe this is actually about scalability without any data.

julsimon 9 years ago | | |

Here is a very nice blog article explaining how Amazon is generating recommendations at scale with Apache Spark and Amazon DSSTNE :) https://aws.amazon.com/blogs/big-data/generating-recommendat...

ogrisel 9 years ago | |

At least MXNet is a good one that deserves more publicity and backing (in terms of maintenance effort). I find it better for the community to have AWS back a good existing open source project than to re-invent a very similar wheel one more time.

cs702 9 years ago | | |

I like MXNet, and I think it's great that Amazon is backing it publicly. TensorFlow needs competition for the minds and hearts of developers.

antinucleon 9 years ago | |

There is a huge distributed performance advantages vs TensorFlow. You can get a hint from Prof. Carlos Guestrin's keynote talk at Data Science Summit 2016. Also, CMU CS Dean Andrew Moore cited MXNet as "is the most scalable framework for deep learning I have seen"

wicke 9 years ago | | |

This recent OSDI paper [1] has a direct comparison in Fig 8. It appears there is no particularly pronounced distribution or general performance advantage, and TensorFlow actually outperforms MXNet in this comparison.

1: https://www.usenix.org/system/files/conference/osdi16/osdi16...

[full disclosure, I work on the TensorFlow team]

fpgaminer 9 years ago |

It seems more prevalent now than it used to be, that frameworks/libraries are being used as weapons in a sort of mindshare war between the world's megacorps. Or perhaps I'm misremembering history. And I don't mean just AI; just look at Angular (Google) vs. React (Facebook).

It's a bit of a double edged sword. As developers this war gives us free access to well funded and heavily developed tools. The world has been fundamentally changed by their availability. But at the same time we need to understand that the primary reason they exist is to lock developers into a particular vendor. It's most transparent with Google's TensorFlow, where they were obvious about their intentions to offer TensorFlow services on their cloud platform.

This article more than most exemplifies their desperate attempts. For now it seems to remain mostly that, desperate attempts, with the tools remaining more-or-less platform agnostic. But I foresee a grim future where our best libraries and tools are tied inextricably to a commercial ecosystem.

deepnotderp 9 years ago | |

Then utilize torch.

mastazi 9 years ago | | |

Isn't Torch actively supported by Facebook? https://research.facebook.com/research/torch/

oneshot908 9 years ago |

Using 3 year-old GPUs on a much deeper network than the other guys(tm) to demonstrate awesome scaling efficiency == Intel-level FUD. Note also the absence of overall batch size.

Wonder what would happen to that scaling efficiency if those GPUs were P40s?

See also the absence of equivalent AlexNet numbers to further obscure attempts at comparing this to the other guys(tm).

Can't wait for Intel's response to this.

deepnotderp 9 years ago |

Okay, with all due respect, this is BS. I love MXNet and think it's under appreciated as well. But, pretty much its best feature is the memory mirror. (see oneshot908's comment)

imh 9 years ago |

This reads weirdly. He talks about how MXNet is the best choice without comparing it to other frameworks. That's the whole point of choosing between things. I'm sure they did the legwork to make this decision, and some insight into that choice might help others follow. Without that, my distrust radar is blinking.

AlexCoventry 9 years ago |

From the OP:

  > a Deep Learning AMI, which comes pre-installed with the popular open source
  > deep learning frameworks mentioned earlier; GPU-acceleration through CUDA
  > drivers which are already installed, pre-configured, and ready to rock

You might want to clarify that the negative reviews [0] are from earlier versions which did not include the CUDA drivers. I recently considered this AMI and rejected it for a class [1] because of these reviews.

[0] https://aws.amazon.com/marketplace/reviews/product-reviews?a...

[1] https://www.meetup.com/Cambridge-Artificial-Intelligence-Mee...

mli 9 years ago | |

The deep learning AMI now has both CUDA and CUDNN installed.

eva1984 9 years ago |

> we have concluded that MXNet is the most scalable framework

Without back by any benchmarks? This claim is lazy.

bsfjgngdnxy 9 years ago |

>MXNet can consume as little as 4 GB of memory when serving deep networks with as many as 1000 layers.

So perhaps I'm not well versed enough in deep learning, but does this mean that they solved the vanishing gradient problem? How are they managing to do this?

ogrisel 9 years ago | |

For deep convnets the vanishing gradient problems can mostly be solved by using residual architectures. See: https://arxiv.org/abs/1603.05027

This is kind of related to solving the vanishing gradient issue in RNNs by using additive recurrent architectures like LSTMs and GRUs.

Alternatively it's possible to use concatenative skip connections as in DenseNets: https://arxiv.org/abs/1608.06993

Still using 1000 layers is useless in practice. State of the art image classification models are in the range 30-100 layers with residual connections and varying numbers of channels per layer depending on the depth so as to keep a tractable total number of trainable parameters. The 1000 layers nets are just interesting as a memory scalability benchmark for DL frameworks and to validate empirically the feasibility of the optimization problem but are of no practical use otherwise (as far as I know).

bsfjgngdnxy 9 years ago | | |

Thank you!

deepnotderp 9 years ago | |

Vanishing gradient isn't the same as memory efficiency. The memory mirror option is what allows this extremely efficient memory usage by only being 30% more compute intensive.

bsfjgngdnxy 9 years ago | | |

Yes, but that's not what I asked about.

mrdrozdov 9 years ago |

Did not realize you could use MXNet declaratively (like Tensorflow/Theano) and imperatively (like Torch/Chainer). Can anyone speak more of their imperative usage of MXNet?

crowwork 9 years ago | |

it means you can declare gpu array like those in numpy/torch, write them imperatively from python side, and mix them with the graph computation, instead of forcing everything to be part of a graph

billconan 9 years ago | |

does declaratively mean the use of expression template in C++?

I learned about it last week, I don't seem to see too much benefit if the goal is good performance.

ogrisel 9 years ago | | |

No it means writing a program that defines the structure of a computation graph lazily (without executing the nodes when defining the model) so as to reuse that compute graph in a later step of the programs.

The computation graph is an in-memory datastructure that can be introspected by the program itself at runtime so as to do symbolic operations (e.g. compute the gradient of one node in the graph with respect to any ancestor input node).

theano implements this in pure Python and can generate C or CUDA code from string templates (in Python). tensorflow has to a Python API to assemble pre-built operators which are mainly written in C++ and use the Eigen linear algebra library.

turingbook 9 years ago |

Li Mu, the core developer behind MXNet, works for Amazon recently.

badminton1 9 years ago |

[offtopic] I think presentations with ascending bar charts are sort of cliche.

egeozcan 9 years ago |

> Machine learning (...) is being employed in a range of computing tasks where programming explicit algorithms is infeasible.

I found this comment interesting. Is this really the summary of what machine learning is about?

Analog24 9 years ago | |

Image classification is a classic example of such a task. How exactly would you go about writing an algorithm to tell the difference between a picture of a cat and a picture of a dog?

samcodes 9 years ago | | |

Well, this might be cheating, but I would apply a bunch of different filters for things like edge-detection, etc. Then I would come up with a statistical model that, for each feature, gave the likelihood that there image under consideration was a dog. Then I would aggregate all those results into a final likelihood.

Not trying to be sarcastic, I just can't think of any way other than the ML way.

Russell91 9 years ago | |

Yes! Sometimes you know that a solution will take a particular mathematical form, without knowing what the parameters will be. So you can write down a program (function) that can express any solution of that form, and use an optimization algorithm e.g. gradient descent on labeled examples, to figure out which specific instance of your possible solutions works best.

blahi 9 years ago |

MXNet is the only deep learning framework that has proper support for R. That's why I use it and it is pretty nice IMO.

fnl 9 years ago | |

Isn't TF available in R as of late, too, from the RStudio guys? Still incomplete?

blahi 9 years ago | | |

Atrocious syntax.

gnipgnip 9 years ago |

Can someone please spell-out for us muggles what sets these frameworks (Theano, Tensorflow, Torch, CNTK, Mxnet) apart ? They all seem to be essentially doing the same thing underneath.

politician 9 years ago | |

Cloud vendor feature signaling, mostly.

Microsoft wants you to use CNTK on Azure. Amazon wants you to use Mxnet on AWS. Google wants you to use Tensorflow on GCP.

It's irrelevant whether these frameworks can be used outside their home platform by broke college students. That's a red herring. The cloud vendors are looking to sell enterprise contracts, and they need to check all of the boxes.

This strategy makes complete sense from a business perspective, and you really cannot fault them for doing it.