An API for hosted deep learning models(blog.algorithmia.com) |
An API for hosted deep learning models(blog.algorithmia.com) |
Also, these guys could offer support for these models on private cloud servers, to enable privacy.
Er, Nvidia itself has an official Docker application which allows containers to interface with the host GPU, optimized for the deep learning use case: https://github.com/NVIDIA/nvidia-docker
Training models is one thing that can commoditized, like with this API, but building models and selecting features without breaking the rules of statistics is another story and is the true bottleneck for deep learning. That can't be automated as easily.
Also, much more minor grievance but I really dislike websites that don't work on my 15" laptop, what's going on here? http://i.imgur.com/q13lCLK.png
https://21.co/learn/deep-learning-aws/
Disclaimer: I work for 21.
Unless the users of this service then feed whether the answer given by the service was correct back into the service, I don't see how it would help to train their model.
Happy to be corrected by someone with a better understanding of the space.
I agree that building models is still definitely a big challenge, but the tooling and knowledge is getting better every day. Either way, our goal with Algorithmia is to create a channel for people to make their models available, and create incentive for people to put in the effort to train really solid, useful models.
It is not the final solution for containerized GPU applications.
The real challenge is doing this on 100+ GPUs and leveraging multitenancy for an additional 100X+ economy of scale. We're actively working on it, and in my experience, this seems like a classic scheduling area where different domains will want to do it differently. However, even there, it'll end up something like "plugin a new user-level mesos scheduler x", and Nvidia is working on exactly that.
I'll wait for someone at Baidu or the Titan lab to blow up those numbers by another 100-1000X ;-)
Edit: If this sounds like a cool problem, we're leveraging GPU cloud computing and visual graph analytics for event analysis (e.g., core tool for teams in enterprise security). We would love help, esp. on cloud infrastructure or on connecting the eco-system together! Contact build@graphistry.com and we'll figure something out :)
You can run multiple containers on the same GPU with nvidia-docker, it's exactly the same as running multiple processes (without Docker) on the same GPU.
For workloads where you aren't making full use of system resources at all times, then the economy of scale provided by our compute cluster often results in compute-per-second being more cost effective even before considering the costs of managing your own infrastructure. It fits into the "serverless" trend, FWIW.
Each algorithm in our marketplace has a cost calculator that breaks down the price using a per-API-call estimate. If you have a specific workload in mind, feel free to reach out and we'd be happy to further discuss the pricing.
I agree that most startups need to get an MVP out the door as soon as possible which leads to clouds. I think hybrid cloud will be the way to go long term.
If you think about it, on one side we have things like AWS and others where devops and "make running your own infra at scale easy" like docker and k8s. On prem in some form isn't going anywhere. What WILL be interesting are the plays like say: convox where you can manage a cloud like you would an on prem openstack/k8s deployment.
It does strike me as tricky needing to match driver versions between the host and the container. Do you know if there is any effort to eliminate that requirement?
Also while we're chatting, is there any hope of NVIDIA open sourcing their linux drivers? How would such a move affect nvidia-docker?
All the user-level driver-files required for execution are mounted when the container is started using a volume. This way you can deploy the same container on any machine with NVIDIA drivers installed.
We have more details on our wiki: https://github.com/NVIDIA/nvidia-docker/wiki/Internals
Concerning your last question: I don't have any information on this topic, but anyway it would not really impact nvidia-docker.
For distributed training (which Caffe doesn't actually support, not the official version), you would have to run one container per instance, but this is more a configuration problem at the framework level, than a Docker or nvidia-docker problem.