Distributing a Fully Connected Neural Network Across a Cluster

Distributing a Fully Connected Neural Network Across a Cluster(iamtrask.github.io)

30 points by iamtrask 11 years ago | 6 comments

ajtulloch 11 years ago |

How is this on the front page? This is a completely incoherent.

For anyone actually interested in some interesting techniques for multi-GPU DNN training, http://arxiv.org/pdf/1404.5997v2.pdf and references therein are probably a good start.

iamtrask 11 years ago | |

This also might help... here are some slides graphically showing how the distribution works. http://prezi.com/hdctecihctdr/?utm_campaign=share&utm_medium...

herewego 11 years ago | |

Your condescension here is entirely unnecessary. Surely someone as qualified as you could have provided a more thoughtful and encouraging comment.

iamtrask 11 years ago | |

i apologize for the verbosity and thickness. Happy to answer questions though. :)

dhaivatpandya 11 years ago |

The exposition is not very clear. What exactly do you mean when you say "No edges will be communicated over the network, only half of the nodes."? I'm puzzled, because a few sentences later, you claim "The only network IO that would be required would be sending each edge value to its respective node in Q."; so the edge values are actually communicated?

From what I've understood, what you're suggesting is that for every node in a layer, you colocate the edge on the same machine?

iamtrask 11 years ago | |

Precisely! I highly encourage checking out the slide-deck for a graphical representation.

For every node in every other layer, I colocate the edge on the same machine. In this way, when a group of, say, 10 nodes in layer 1 are each sending a weighted message to a single node in layer 2... they can pre-combine their messages (weighted sum) and send only that value over the network. This happens for every node in the second layer, reducing network i/o (this is the first optimization).