I think the `future of work` for machine learning practitioners will quickly separate into two groups: a very small and elite group that performs research and a much larger groups that use AutoML but whose jobs also deal more with data preparation (which gets automated also) and ML devops, supporting models in production.
In financial services in particular, there are tons of time series and regression problems on small data such that a neural network (beyond perhaps some super small MLP) would be a ridiculous thing to try.
I think the breakdown of workload you described will only happen in business departments where there is a need for large scale embedding models, enhanced multi-modal search indices, computer vision and natural language applications, and maybe a handful of things that eventually productize reinforcement learning. I could also see this happening in businesses that can benefit from synthetically generated content, like stock photography, essays / news summaries / some fiction, website generators, probably more.
What I described above is a tiny drop in the ocean of applied statistics problems that business have to solve.
Throw away all the BS. and, yes, it's obvious.
I suppose OP means there will be two groups: people who use AutoML and people who try to make AutoML better.
"Our results show that random search with early-stopping is a competitive NAS baseline, e.g., it performs at least as well as ENAS, a leading NAS method, on both benchmarks"
ENAS, the specific algorithm that they find does no better than chance, is in this library. My understanding is that the results are pretty generic though, i.e. NAS is very far from a solved problem. (Hyperparameter tuning for "classical" models are another matter. That's commoditized and available as a service at this point, see tpot, DataRobot, etc., etc.)
No Windows support in a Microsoft product. Curious.
This looks very useful for tuning hyper-parameters, and the fact that the tuned algorithm is treated as a black box makes this very flexible.
I think this does everything MLFlow does and more (besides maybe helping with deployment?)
In good old fashioned statistics there's the idea of the jackknife: for the i-th sample run a regression on all the data except i, and store statistics of interest (coefficients, predictions, etc). This gives you an ipso facto sampling distribution for the statistics of interest.
Similar and more common in econometrics is the bootstrap: run your model in like 1999 subsamples (with repetition) of the data and get sampling distributions.
With said sampling distributions, whether from the jackknife or the bootstrap, you're able to test whether your model is valid -- what's the probability that it'll have significant coefficients or an r2/mae/mape score indicating predictive capacity.
Cross-validation (and even scikit-learn is starting to default to five folds not three) is a "lazy" version of this. You don't get a sampling distribution but at least you're able to know that a given model appears good because it grips the data with all its might and doesn't work out-of-sample.
sklearn even offers the jackknife under some ML-y name like "one at a time scoring".
Are people migrating from scikit to tensorflow in production for non-deep learning usecases ?
At least that's the behaviour of the platform[1] I am working on.
[1]: https://github.com/polyaxon/polyaxon#hyperparameters-tuning
BTW, I think all autoML solutions forget about end users. They all require too much engineering knowledge from the user. I think it will be nice to have an autoML solution that can be used by citizen data scientist.
[1]: https://nni.readthedocs.io/en/latest/sklearn_examples.html
in contrast, when we wrote bespoke GPU code for the graph, we saw a ~25x performance increase over relying on CPU plus MKL. I am being deliberately vague here and I cannot give further detail.
> possibly the world's first or second (full-time) CUDA programmer, with 14 filed patents, and the world's fastest implementations of molecular Dynamics (CUDA ports of Folding@Home and AMBER).
DNN require an architecture search, I.e. the building block are full layers, depth of the network, optimizer etc.
scikit learn search a parameter space, I.e. the algorithm weight are much much simpler and few.
So to sum up, DNN search involve big building blocks while scikit learn search (or for that reason any "classical ML" algorithm) is more of a parameter search.
[ The actual sci kit learn search would also include pre processing steps, which can be seen as a separate block]
Also, note that that DNN search is much more expensive than scikit learn search (100X) ]
The tools included in the repository are very broadly applicable and only a few of them are specifically targeted at neural architecture search.
[1] https://www.kdnuggets.com/2016/08/winning-automl-challenge-a... [2] https://openreview.net/forum?id=ByfyHh05tQ
Because I think that's insane. It's one thing if you don't care about speed and you care more about time-to-market. It's another thing if you're complaining about things being too slow but you're not willing to learn about anything that would let you do anything about it. I run into far more of the latter.
For example, consider needing to train hundreds of unique small models every day, based on new customer inputs affecting causality effects for that day (I had to do this for ad forecasting in a past job).
Generating embeddings via pre-trained models essentially produced gibberish and performed far worse than custom feature engineering + simple logistic models.
Of course if all you have are numbers without context, there isn't a lot you can do to improve the situation.
>>> automl = autosklearn.classification.AutoSklearnClassifier()
>>> automl.fit(X_train, y_train)
>>> y_hat = automl.predict(X_test)
[1] https://automl.github.io/auto-sklearn/stable/This is the approach of a project I am currently working on. (and am now explicitly making clear in the README!)
I agree that the final model should be a randomforest/xgboost/lightgbm for typical tabular data.
Yes, this is the part that sounds like parody to me. At least, as a working statistician, I can tell you that the concept of AutoML could not apply to the far majority of things I work on.
It walks through an example with arsenic data in wells and a problem of estimating how distance, education and some other factors relate to a person’s willingness to travel to a clean well for water.
Deciding on how to standardize the input features, how to rescale for regression coefficients to be interpretable in meaningful human units, how to interpret statistics of the fitted model to decide whether a feature is helping or hurting by adding it (since this cannot be deduced from raw accuracy metrics alone), how to interpret deviance residual plots for outlier analysis, etc.
All those things have nothing to do with changing the architecture of the model, except possibly including or excluding features, and in that example there were no hyperparameters to tune, and the inference problem would not make sense for hyperparameter tuning on raw accuracy outputs anyway, since the goal was not optimizing prediction but rather understanding impact of features that have semantic meaning in the contexf of possible policy choices that could be adopted.
By way of contrast, applying an automated subset selection algorithm to automatically choose the features would be a naive idea with likely bad results in that case, and setting up an optimization framework that would optimize over possible transformations or standardizations of the inputs seems equally dubious compared with expert, context-aware human judgment.
And this is a very trivial example. If you modify a problem like this to address causal inference goals, or add some type of cost optimization on top of it, it becomes more and more complex, but exactly in a way that a tool like AutoML can’t help with.
In other words, making an AutoML that can truly apply to all types of estimation or inference problems is no easier than solving strong AI computer vision and natural language problems entirely, since you need contextual reasoning and creative proposals for inventing features and sleuthing the goodness of fit of a certain model architecture in light of the human-level inference goal you’re trying to reach.
* Model selection, hyper-parameter optimization, and model search
* Neural architecture search
* Meta learning and transfer learning
* Automatic feature extraction / construction
* Demonstrations (demos) of working AutoML systems
* Automatic generation of workflows / workflow reuse
* Automatic problem "ingestion" (from raw data and miscellaneous formats)
* Automatic feature transformation to match algorithm requirements
* Automatic detection and handling of skewed data and/or missing values
* Automatic acquisition of new data (active learning, experimental design)
* Automatic report writing (providing insight on automatic data analysis)
* Automatic selection of evaluation metrics / validation procedures
* Automatic selection of algorithms under time/space/power constraints
* Automatic prediction post-processing and calibration
* Automatic leakage detection
* Automatic inference and differentiation
* User interfaces and human-in-the-loop approaches for AutoML
[1] https://sites.google.com/site/automl2018icml/I agree with you from the view of the current state of the art methods and the current state of the AutoML / fundamental ML research communities. Current methods are very limited, but I can not think of a reason why a sufficiently general searchspace of architectures/pipelines could not produce something like a GAN or a WaveNet.
I do not think that designing algorithms as novel as the ones you listed is currently a goal of AutoML, as that is not something we have an attack for. However, I do think that with increasing capabilities, the field of AutoML will seek to automate every step of the machine learning pipeline - including the design of algorithms. E.g., once/if there are attacks to apply NAS for yielding truly novel architectures, I think NAS researchers will be happy to do just that -- wouldn't you call that AutoML then?
But that would require enormous computing resources!
Can AutoML produce something as novel as a GAN, CapsNet, WaveNet, Transformer, Neural ODE, etc? Is that even considered to be one of its goals?
In my opinion, there's a clear separation between a group of people trying to improve AutoML so that it's more useful in doing all those tasks on the list, and a group of people trying to invent next gen ML algorithms or DL architectures.