The State of Machine Learning Frameworks(thegradient.pub) |
The State of Machine Learning Frameworks(thegradient.pub) |
Once your model is running, and if/when you start hitting performance bottlenecks, then you consider migrating your model to TensorFlow.
Ok, admittedly, there are a couple reasons. The fact that most papers don't mention the framework they use is a big one. So if users of one framework disproportionately mentioned that framework in their paper, it would be overrepresented.
I did cover this concern though, in the Appendix. Check out the "Biased Sample" section.(https://thegradient.pub/p/cef6dd26-f952-4265-a2bc-f8bfb9eb1e...)
Basically, some conferences have encouraged researchers to submit code. Instead of checking the papers, I checked their code instead. The results are pretty much the same. So I think that mentions in top conferences probably correlates well with uses in code.
PyTorch is easy to use and modify, but Chainer, and by extension cupy (a separate awesome project!) are really, really easy to work with.
Additionally I think tensorflow opt in by default for eager execution is fine maybe good even. Many models are relatively simple and I doubt the gains for rewriting them to utilize the execution graph will be worth it when with the keras frontend you can just dump the h5py model and run it from there which many companies already do.
Rewriting will only be an issue for sufficiently complex models and at that point I imagine competent ML professionals will have baked the time for that into the estimate of the engineering costs.
[0] https://trends.google.com/trends/explore?date=2017-01-11%202...
I work on a team that does the latter and lately DS have been handing off PyTorch models that we cant scale or make performant because Torchscript doesnt really work with any realistic code complexity and authors include all sorts of random python libraries. So we can't load models in C++ or get them under 50ms.
So the framework divide very much feels like dynamic vs statically typed languages. People that dont have real production demands love dynamic languages for the productivity.
I’m working in a Go code base and I’m thinking of using it instead of creating a separate service in Python.
Since I am just keeping up with deep learning in particular and AI in general for my own interests, I will likely switch over to PyTorch because there is no risk involved and learning something new is fun. This is a big change since I have years of TF experience and perhaps four or five evenings spent with PyTorch.
pytorch bug tracker: https://github.com/pytorch/pytorch/issues/755
I much prefer PyTorch, effectively all graph frameworks are there. Very nice to see TPU support with 1.3 as well.
My point is researchers using a framework (matlab) does not mean it’s used heavily in industries or even all industries.
Personally I think all these deep learning frameworks just haven't had as much time to mature, I have a feeling once they do that the one that dominates academia will eventually dominate industry.
Or even Karma & Jest,
Facebook seems to be late to market, but learns from Google's mistakes, to create simpler and more elegant tools.
Constraining design by end to end use cases is a remarkably robust and useful process.
PyTorch is way better at having clean engineering abstractions than TensorFlow, but still falls short when things like “forward” or maintaining your own training loop and gradient metadata are necessary concepts for a practitioner’s end to end workflow.
[0]: https://blog.keras.io/user-experience-design-for-apis.html
https://thegradient.pub/p/cef6dd26-f952-4265-a2bc-f8bfb9eb1e...
As a summary, though:
PyTorch has become dominant in research because of its API (both its stability + having eager mode).
TF has become dominant in industry because A. it came out several years before PyTorch and industry is slow to move, B. It supported a lot of production use cases (mobile, serving, removing Python overhead) that PyTorch didn't for a long time.
BTW I've also read (here on HN) PyTorch learns much faster than TensorFlow does.
The Keras interface for tensorflow makes it easy & fast to make "good enough" models. That is often a driving factor
industry: tensorRT.
- More code to check-in (Looks more productive)
- More infrastructure, e.g. checkpoints, exporters etc. (Looks like they're doing more work)
- Fancy visualizations (Allows them to look impressive while presenting loss plots)
- Easier to reuse things others have implemented and still get credit for it (TF model zoo, research repo etc.)
Why researchers like pytorch:
- Way easier to hack together their novel idea
- Looks scrappier (which somehow makes the individual look like a better researcher instead of an ordinary programmer)
- Lots of other researchers release code in pytorch so if you're working off of their idea, you use pytorch to avoid re-producing their results.
Open to debate on these ideas, let me know if you have a counterpoint or any other reasons to add
With those bullet points, looks like you didn't talk to actual engineers, but rather middle-layer management people.
Not even bad engineers try to pretend like this is true.
Yes there is a lot of code to be written/optimizations to be done to make things production worthy. However I know a lot of tensorflow research projects that handle data batching in such a terrible way that would take weeks to re-write for production.
As for verboseness of pytorch vs tensorflow, I think either could get more verbose under different circumstances. However for simpler tasks, I think tensorflow is more verbose in general (not accounting for the new release which seems to mimic pytorch/keras a little more). For larger production tasks, its a toss-up depending on whether you need to add new components.
Why I use tensorflow:
- keras
- I used tensorflow yesterday
However its been my experience that the average engineer believes this. They often aim to push some amount of code (meaningful or not) every few days or so.
The best engineers I know write a lot of code overall but are more interested in ensuring they build the right thing and are ok with not pushing code for a while if they need more time.
Despite the absolute nightmare of getting it installed and running on a gpu, I managed it and had a fantastic model. It was doing so well that the company wanted to expand the project and build out a multi-gpu rig as part of it. So I get building that environment and install the latest CUDA, cuDNN, nvidia driver and use tensorflow 2.0 aaaaaand it wouldn't work. I actually spent a long time hacking on it till on a forum I read that it was just a bug that hadn't been fixed yet.
At this point I decided to see what Pytorch was like. In literally one day I installed everything and migrated my project completely over to pytorch. Same speed, same accuracy, works perfectly on a multi-gpu rig when I set it to. It was like a breath of fresh air.
The next day I wrote some C++ to import a saved pytorch model so it could run in a deployment environment. The C++ api is also great. The docs are lacking a little bit, but an Facebook researcher mentioned to me on the forums that they're hoping to have it all done by next month.
It's unlikely that I'll be going back to tensorflow.
Because of that, it doesn't make much sense to judge a "differential programming language" like TensorFlow or PyTorch by the ease of installation. It'd be like saying "I prefer C# over C++" because it is easier to install.
1. Extremely easy to debug and work with. Being able to debug effortlessly in PyCharm makes life very easy.
2. The API is quite clean and nice and fits in really well with Python and nothing feels hacky. I've developed my own Keras-like framework for experimentation, training and evaluating models quickly and easily and the entire experience has been really enjoyable.
3. The nicest thing though is that as the article points out, a huge percentage of researchers have moved to Pytorch and this allows us to more easily look at other researcher's code and experiment with things easily and incorporate ideas and cutting-edge research into our own work. Even for things that are released in TensorFlow, if it is an important publication that gains attention and traction in the community, you will likely have implementations in Pytorch pop up soon enough.
I do think that TensorFlow still has an edge on the deployment at scale/mobile side of things as pointed out by the article. But Pytorch is a lot younger and they are making a lot of progress with every release in that space.
There're maybe all of two "surprises" I've encountered in all my time using it, if even (1. Gradients are accumulated in state, 2. nn.Module does funky things with attributes, so use something like nn.ModuleDict if you're going to be dynamically setting modules). Everything else works like a dream, and works almost exactly how you expected.
Model parameters? .parameters() gives you a dict-friendly generator of tensors. Model state? .state_dict() is a dictionary. Loading model state? load_state_dict(state_dict)... just loads a dictionary. Reusing modules across different modules? Just assign them! Determining what parameters to optimize? Just ... give the list of parameters to the optimizer.
You can use all your using Python development and debugging tools, and it feels 100% natural. I can fit it into other Python workflows without making the whole program centered around TensorFlow.
TensorFlow is undoubtedly powerful, and if you have the time/resources to put into a static-ish TensorFlow-centric workflow, it could pay off many times over. But it definitely feels like learning an entirely new language, with an entirely different debugging pattern. And furthermore, a language that is constantly changing patterns and best practices, other than super-standard Keras examples.
To put into context, even running the official TensorFlow models repository has deprecation warnings. Whereas torchvision works like seamlessly and reads like a reference for writing PyTorch model code.
There is just a developer-centric focus to PyTorch that makes it a joy to use.
I was able to create a custom detection network for a 3-class problem, load up the COCO pretrained weights for the network, strip out all the other weights at the "head" for all the other COCO classes except for the "person" class and then fine-tune the model on my custom 3-class dataset. The resulting model generalized exceptionally well on people as it was still able to retain a lot of its performance from the COCO pre-training. It was so easy to do all of this. Literally, maybe 10 lines of code, and so easy to figure out since I could introspect the state_dict and the weights file directly in my PyCharm interpreter while working out how to do this.
I am a bit saddened by all of this, because I really liked how easy it is to define a graph in Tensorflow in Python, serialize it, and then use its minimalistic C API to use the graph in Go, Rust, or wherever you need it.
How is your experience with PyTorch and backwards API compatibility (I know that they only reached 1.0 fairly recently)?
Other than that, I've had next to no issues, and the API has only gotten better over time, with more convenient ways to do things.
PyTorch has a much smaller footprint, and is happy to delegate code to separate libraries (e.g. torchvision), so you run into "all-or-nothing" dilemmas less frequently.
Lightgbm library has consistently performed well. I've been interested in how many colleagues instantly jump to neural nets when in my experience this often doesn't beat lightgbm on medium sized datasets not related to text/images.
Code: https://github.com/Chillee/pytorch-vs-tensorflow
Ablation of claims: https://thegradient.pub/p/cef6dd26-f952-4265-a2bc-f8bfb9eb1e...
JS interactive charts: https://chillee.github.io/pytorch-vs-tensorflow/
With fastai module that's built on Pytorch learning and developing Deep Learning solutions have become a lot easier. So there's a real game on now
But anyway, at this point I got so many things already running over TF + Keras that I don’t see any use case of reverting back the entire code base written over 3-4 years to other platform just because new grads from university are using some library more over other. I got everything I need, so why to suffer unnecessarily? I can just spend the same amount of time polishing existing things rather than spending time after something which has less probability to be at same level as existing things.
It's always like this.
Think how Ubuntu took over the server market because amateurs were preferring it instead of Redhat/CentOS. And when they became professionals or were in a position to decide, they also put Ubuntu on the server because this is what they knew best.
A quick observation that may not be 100% accurate but still worth mentioning: in some ways TF feels like it was written to solve large scale issues on day one. For example, when I started playing with the new TF 2.0 distribution strategies and dataset pipeline, I quickly got the sense that this thing was meant to move and ingest bucketloads of data across hundreds/thousands of vm instances. In a way, I suppose it's a reflection of Google culture where there's a strong emphasis on not doing things that don't scale to Google Scale.
As a result of this, I sort of feel that you should start with PyTorch and eventually graduate to TF if/when the scale requires it. This is sort of like starting with Rails/Django/Node, and migrating to a Go/JVM/[Insert Your Favorite Static Language Here] stack when the traffic load warrants it.
[0] https://www.fast.ai/about/ [1] https://www.youtube.com/watch?v=J6XcP4JOHmk&t=4152s
* Automatic differentiation of higher-order differentiation being important, and how there's clearly room to disrupt there
* Increasing hardware diversity seems to mean that both frameworks will run into a brick wall as-is
Exciting space. It'll be fascinating to see how dramatically, or not, things change in the coming years.
Back when we were using TensorFlow, whenever we wanted to try something new, sooner or later we would find ourselves wrestling with its computational graph abstraction, which is non-intuitive, especially for models with more complex control flow.
That said, we are keeping an eye on Swift + MLIR + TensorFlow. We think it could unseat PyTorch for R&D and eventually, production, due to (a) the promise of automatic creation of high-performance GPU/TPU kernels without hassle, (b) Swift's easy learning curve, and (c) Swift's fast performance and type safety. Jeremy Howard has a good post about this: https://www.fast.ai/2019/03/06/fastai-swift/
It feels a bit too early to tell. I don't believe many researchers will switch to Swift though.
As it is the API for FastAI is constantly changing and has hardly ever felt particularly stable. I don't see it ending up becoming this complete, stable, polished framework if they keep switching focus. I don't care one way or the other as I don't personally use it because it is way too complex to extend it to do anything simple if you just have your own networks and Dataset class that you want to plug into their infrastructure. Being familiar with Pytorch and Python, I've always found it much easier to just work with those 2 rather than trying to bend the fastai library to do things that don't fit perfectly into the applications they designed for.
Perhaps it’s the nature of the game that changed with many new kinds of architectures and so on. But maybe Keras is already overengineered for someone who just wants to make thumbnail sized GAN stuff at home.
The lack of say, keras.applications is a shame, but it won't last, and if you have a GPU or 8 the power of optimized (p/v)map definitely makes up for it.
> Great API. Most researchers prefer PyTorch’s API to TensorFlow’s API. This is partially because PyTorch is better designed and partially because TensorFlow has handicapped itself by switching APIs so many times (e.g. ‘layers’ -> ‘slim’ -> ‘estimators’ -> ‘tf.keras’).
Arguably, one of the biggest issues Google had with Angular was the switch from 1.x to 2.x. You'd have thought they learned about how not to make major changes on OSS projects.
Facebook on React for instance do an amazing job here, they use prefixes to anything they don't want to support like "UNSTABLE_" and show warnings forever when they actually plan to make something small obsolete.
I tried to learn from both, so in some of my bigger personal OSS projects (amount of work involved) like npm's "server". I purposefully made some APIs a bit more limited than I could to have more flexibility later on if I didn't like the direction. Of course at a different level, I am a single dev doing OSS on my free time after all.
But I understand in a project of the size of e.g. Tensorflow it's not an individual dev learning, it's more about the company learning how to do things better.
Admittedly, I've never used MXNet so it might have more issues that I'm not aware of. Judging from the benchmarks I've seen, however, MXNet got a lot of things right.
Unluckily, I just don't think it added enough on top of PyTorch or TensorFlow for people to consider switching. People switched from TensorFlow to PyTorch because eager mode was just so much easier to use.
I haven't had success doing so using frameworks such as Torch and TF, even if their toolkit is better to develop new solutions.
Also we get to write code in C++, which can be a big positive when developing machine learning SDKs. I personally still do most of the prototyping in Python though.
I'll be checking the link on the post that mentions that pytorch allows models to be converted to c++, looks promising actually.
[0] http://dlib.net/
And then the title is "PyTorch vs Tensorflow", but it never says whether the Y axis is unique mentions of PyTorch or Tensorflow? From the context I guess PyTorch, but come on!
The Y axis should be "Fraction mentioning PyTorch", and the title should be "Papers that only mention PyTorch or Tensorflow" (assuming I have understood this correctly).
Shame it was labelled so badly because it's an amazing graph otherwise!
I fixed these properly at some point, but I made some last minute modifications to the text size and such.
These interactive figures are probably a bit better overall too: https://chillee.github.io/pytorch-vs-tensorflow/
I'll change that ASAP. Thanks for the heads up!
EDIT: Fixed! Lemme know if that addressed your issues.
I'll add that it was much easier to install PyTorch with GPU support than it was to install TensorFlow with GPU support - at least that's how it was around November of last year. The PyTorch install was painless, whereas we ended up having to build TF from source to work with out setup. Could be different now as I haven't looked at TF since then.
Unfortunately, if anything I think it's the opposite. The constant creation and deprecation of TF flavors (tf-eager, tf-slim, tf-learn, keras, tf-estimator, tf.contrib [RIP]) has made reading tensorflow code online somewhat disastrous. Everybody, including the TF team, is using a different API and it's difficult to keep all of them straight. It seems that you're doomed to end up using some combination of many of the above in a way that makes sense to you and your team, adding another confusing model to the pile.
- loop through epochs
- loop through each batch
- run a forward pass for the batch ( model(batch) )
- calculate the loss for the batch ( criteria(y, yprim)
- compute the gradients/backprop ( loss.backward() )
- update the weights (optimizer.step())
This really enforced everything I learned and I think breaks down the problem. All this of course in addition to everything else already mentioned, and super convenient module/network building and definition.
There will be bad model code in both PyTorch and TensorFlow. The difference is that, bad PyTorch code reads like bad Python code, and I've accumulated a lot of experience reasoning through bad Python code. Bad TensorFlow code can come from any one of of the history of paradigms that TF has gone through, and I don't even know if it's bad or just some funky new TF functionality I'm unfamiliar with.
One area where I wonder if neural nets would be a more useful option is using something like an LSTM to predict defaults based on a sequence of data? I've tried this a handful of times and doing a bit of feature engineering to aggregate data in a handful of fixed buckets has usually been better and easier, but I'm far from an expert in that area.
I know Jeremy Howard has shown decent results with fastai/pytorch for tabular data and I've seen some Kaggle teams do well with neural nets for tabular data. I've also had decent results with gbdt/nn ensembles. But I think in most situations where you just have tabular data, you'll get better results with less effort if you use lightgbm or the like.
I call them “tricks” but really they’re just design decisions based on what current research indicates about certain problems. This is largely where the “art” part of neural networks comes from that many people refer. The search space is simply too big to try everything and hope for the best. Therefore, how a problem is approached and how solutions are narrowed and applied really matter. Even simple things like which optimizer you use, how you leverage learning rate schedules, how the loss function is formulated, how weights are updated, feature engineering (often neglected in neural networks), and architectural priors make a big difference on both sample efficiency and overall performance. Most people, if they’re not just fine-tuning an existing model, simply load up a neural network framework, stack some layers together and throw data at it expecting better results than other approaches. But there’s a huge spectrum from that naive approach to architecting a custom model.
This is why neural networks are so powerful and why we tend to favor it (though not for every problem). It’s much easier to design a model from the ground up with neural networks than it is for e.g. xgboost because not only are the components more easily composable thanks to the available frameworks but there’s a ton more research on the specific interactions between those components.
That doesn’t mean than every problem is appropriate for neural networks. I completely agree with you that no matter what the problem is you should never jump to an approach just because its popular. Neural networks are a tool and for many problems you need to be comfortable with every one of those decision points to get the best results and even if you’re comfortable it can take time and that isn’t always appropriate for every problem. My other point is that I wouldn’t draw too many conclusions about a particular algorithm being better or worse than another. I’m not saying that was the intention with your comment but I know many people in the ML industry tend to take a similar position. It really depends on current experience with the applied algorithms, not just experience with ML in general.
I particularly like this:
> In my experience it really comes down to how many “tricks” you know for each algorithm and how well can you apply and combine these “tricks”. The difference is that neural networks have many more of these tricks and a broader coverage of research detailing the interactions between them.
This is pretty true - the lack of knobs to turn on something like XGBoost or LightGBM both make it pretty easy to get good results and harder to fine tune results for your specific problem. Maybe this isn't the most correct way to look at it, but I've always sort of pictured it as curve where you are plotting effort vs results, and the one for LightGBM/XGBoost starts out higher but is more flat, and the NN one is steeper.
I guess reading your post makes me wonder where the two curves cross? Do you have good intuition for that, or do you feel so comfortable with neural networks that they are sort of your default? I peeked at the company you have listed in your bio, and it looks like you have pretty deep experience with neural networks and work with other people who have been in research roles in that area too, and I wonder how that changes your curve compared to the average ML practitioner? Certainly figuring out how to pick the best layer combinations, optimizer, loss functions, etc benefits hugely from intuition gained over years of experience.
This tuning approach gets good results for Lightgbm. I'd recommend using TimeSeriesSplit.
https://www.kaggle.com/nanomathias/bayesian-optimization-of-...
I've seen colleagues do something like this, or random search over NN architecture (NUM layers, nodes per layer, learning rate, dropout rate), always falling short of results this archives, despite far longer time to code up an tune model.
And if you have any kind of seasonality you a dataset with a large enough timeframe. (At least more than a year.)
Nonetheless, LightGBM and xgboost are also commonly used in the insurance sector.
They are still somewhat problematic for conversion rates in a highly dynamic market though.
There are lots of improvements going into pytorch for mobile at the moment, but for the moment I'll wait and see how it turns out - I didn't have much fun with caffe2 when "train in pytorch and deploy with caffe2" was the storyline FB pushed (e.g. problems with binary size and slow depthwise convolutions) so not too eager to migrate back at the moment.
Whereas the Tensorflow API actually creates a static graph that can be easily converted to ONNX.
Edit: They talk about this problem in the article:
> Although straightforward, tracing has its downsides. For example, it can’t capture control flow that didn’t execute. For example, it can’t capture the false block of a conditional if it executed the true block.
> Script mode takes a function/class, reinterprets the Python code and directly outputs the TorchScript IR. This allows it to support arbitrary code, however it essentially needs to reinterpret Python.
common people do want to appreciate & adopt the things that seems fit to the knowledge sphere at present from the researchers. tensorflow approaches are better and respected each and everyone of the community as well in exchange enlightened us new ways of understanding of ml solutions. it have turned into a family “If you want to go fast, go alone. If you want to go far, go together.” and given the assets alphabet have a common man can turn into researchers! e.g. >> https://learn.grasshopper.app take this for example "Learn to code anywhere. Grasshopper is available on iOS, Android, and all web browsers. Your progress syncs seamlessly between devices." << this is the status quo ! it's a gift of a lifetime for generations !
Debian is popular for Docker images exactly because many of the people trying Docker were already familiar with Ubuntu. Those users quickly ended up wanting smaller images, making Debian an obvious thing to try out since Ubuntu is basically Debian with bells on.
Ubuntu fought a sea of distros and came out as what's very nearly an industry standard, if not an official one. The 90s were a fricking mess by comparison. Slackware on floppies.
(And now I need "Slackware on floppies" dubbed over the "Jesus wept" scene from Hellraiser.)
Here's a report that suggests the exact opposite:
"Don't let the revenue numbers lead you to thinking Red Hat Enterprise Linux (RHEL) is more popular than Ubuntu. By The Cloud Market's Jan. 8, 2019 count of Amazon Web Services (AWS) instances, Ubuntu is used in 314,492 instances, more than any other operating system, while RHEL is used in 22,072 instances."
https://www.zdnet.com/article/inside-ubuntus-financials/
Disclosure: I work for Canonical, but as an engineer; I'm not in marketing or anything and that's not my job. But I do get the impression that Ubuntu is way ahead in general use in the cloud, and is also the generally used base for Docker images (I don't immediately see how to get that statistic out of Docker Hub). I didn't think this statement was controversial.
Though you can already use very clean Pytorch style libraries like Flux and Knet or the Tensorflow bindings to leverage the benefits of Julia for high performance numerical processing on the adjacent tasks such as data preprocessing.
https://arxiv.org/abs/1907.07587
https://news.ycombinator.com/item?id=20477873
From the abstract:
> We describe Zygote, a Differentiable Programming system that is able to take gradients of general program structures. We implement this system in the Julia programming language. Our system supports almost all language constructs (control flow, recursion, mutation, etc.) and compiles high-performance code without requiring any user intervention or refactoring to stage computations.
Just linking to this for those who haven't seen it.
You can take a look at https://discourse.julialang.org/t/where-does-julia-provide-t... for some of my questions.
Essentially, the biggest advantage imo is that Julia offers a single cohesive language, where compilers can do anything at the language level. I don't think this will allow for a single killer application - almost anything Julia can do ca n be simulated by some combination of Python/C++.
However, what might be true is that using a single language allows for much faster development and iteration than a combination of Python/C++. I think the way that'll manifest is in more and more high quality libraries coming out for Julia that are higher quality than the Python ones.
Maybe wait 5 years, and we'll see what happens :)
Do you have any particular evidence that PyTorch is slow here?
I don't want to say anything encouraging or discouraging about TensorFlow. Just that it doesn't make much sence to make a judgement based on installation experience. Installing TensorFlow or PyTorch is a very small percentage of man-hours, compared to releasing a DNN to production.
Yes, Javascript features and frameworks have evolved quite fas... oh wait
tf.contrib is just a module where user-contributed code was stored, which included both low-level constructs and higher level APIs. tf.estimator is an abstraction that is mostly used for productionizing models. tf.slim/tf.learn were indeed redundant with keras (a library developped externally), but were necessary steps before keras became part of tensorflow.
Just like the rest of ML, whether neural networks are the right choice still depends on the problem at hand and the team implementing the solution. It definitely impacts where the performance / time curves intersect. If we just need something decent fast, or we’re working with another team that doesn’t have the same background, we tend to focus on approaches with fewer moving pieces. If we need the best possible performance, have a qualified team to get there, and have the time to iterate on development then the curves would favor neural networks.
Possibly more important is to focus on the process for how you derive and apply model changes. You get some model performance and then what? Rather than throwing something else at the model in a “guess-and-check” fashion, be methodical about what you try next. Have analysis and a hypothesis going in to each change that you make and why it’s worth spending time on and why it will help. Back that hypothesis by research, when possible, to save yourself some time verifying something that someone else has already done the legwork on. Then verify the hypothesis with empirical results and further analysis to understand the impact of the change. This sounds obvious (it’s just the scientific method) but in my experience ML practitioners and data scientists tend to forget or were never taught the “science” part. (I’m not accusing you of this; it just tends to be my experience.)
Random search, AutoML, hyperparameter searches, etc. are incredibly inefficient at model development so they’ll rarely land you in a better place unless a lot of upfront work has been put in. For us, they’re useful for two things: analysis and finalization. For analysis the search should be heavily constrained since you’re trying to understand something specific. For finalization of a model before going into production, a search on only the most sensitive parameters identified during development usually yields additional gains.
The last data seems to be about 3 years old, and Ubuntu was about 1.5-2x the CentOS/amazon linux share. I suspect that's changing with the release of amazon linux 2, but there's no data to back that up.
Amazon itself primarily uses a RHEL based distro, which is what I meant originally.
So what I can do is, that I can instantiate a CenterNet model of identical architecture, except with only 3 heads for the 3 classes instead of 80 heads for the 80 COCO classes. Now, when I try to load the COCO weights in, there will be a mismatch and typically you end up with the heads being left with their default initialization while the rest of the backbone gets the COCO weights... this is the traditional way you do transfer learning on related problems because you are still starting off with a much better set of weights for your entire network backbone than random weights, which will help with training on related tasks.
However, we can go a step further and load up the state_dict from the COCO weights file, figure out which set of weights are for the "person" head and assign them to let's say the 1st of your 3 heads in your new architecture. You can even go a step further... Since the "donkey" class is quite similar to the "horse" class in COCO, you could also transfer the weights for the "horse" head in the COCO weights to your 2nd head. So now you have a network with 2 of the 3 heads already set up to be robust person and horse detectors. These are much better poised to then be fine-tuned on your application specific data for examples of people and donkeys. You end up with a model that is much more robust on those 2 classes despite only having (let's say) a couple of hundred labeled images for your specific application.
Hope all of this made sense. It's just nice that in Pytorch, everything is pretty straightforward and weights are just dicts and it's super easy to introspect them and splice them, etc.
I think you may be living in a bubble. I've been running devops for various shops for half a decade and I've only once used Ubuntu, because it was already being used by an acquisition.
I won't deny that Ubuntu is popular. It's certainly got the lions share of the desktop market. But there is no such consensus in the server market.
Here's some date I could dig up by a couple of minutes googling:
https://w3techs.com/technologies/details/os-linux/all/all https://w3techs.com/technologies/history_details/os-linux
Other sibling comments link to more.
make menuconfig