It isn't everything you need know in 30 minutes, but it's a concrete coverage of lots of topics in machine learning in under 150 pages. Here's why I'm recomending this paper:
* The algoritm is easy to understand.
* It can handle classification, regression, semi-supervised learning, manifold learning, and density estimation. The paper gives an introduction to each of these topics as well as a unified framework to implement each algorithm.
* It can handle categorical data and missing data [2]
* It gives as good results as other state of the art algorithms.
* The paper is well-written and easy to understand for someone without a deep background in machine learning.
[1] It's mostly a review paper. Using random forests for density estimation is new.
[2] This review paper doesn't cover categorical data or missing data.
Is another great resource that introduces many ML topics from the ground up.
Fair warning, I haven't had time to have a look at the video (short break at work). I'll do it once I get home.
I'll refine the example for the next time!
I've done a few weirdo projects with NLTK, tho, and its great fun. By stream hacking do you mean offloading learning sets (active or initial) and that heavy overhead into the "cloud", or am I misunderstanding the terminology?
The challenge with stream analysis is that you are dealing with a continuous stream of data where you can see each element of the stream only one time and must still be able to cluster/classify/analyze it. There are still few algorithms and tools designed explicitly for that purpose.