Show HN: 1M Song Dataset dev in 10 mins

deepkut 14 years ago |

I really think your website should be more indicative of what MD does. I had to look it up elsewhere. Regardless, very cool video.

kky 14 years ago | |

Thank you, I appreciate the feedback.

angryasian 14 years ago |

I wish the site would write up an explanation, rather than providing a video. Could someone who watched summarize ?

simon_weber 14 years ago | |

"Hawk is Heroku for Hadoop: an on-demand, easy-to-use cloud service for big data. With Hawk any company will be able to extract the value from their big data without the large amount of effort and cost that Hadoop otherwise requires."

In the 6 minute example, they load a dataset from S3, then use Pig and Python to process it. You can "illustrate" each step of your code, which pulls out small, relevant samples from the dataset and shows the results.

jacabado 14 years ago |

Looks amazing, can't wait to get access to it.

I'm starting my thesis on music information retrieval, just studying the related work for now. If anybody has any suggestion on the directions I could follow would be really welcome.

My initial idea would be to focus on playlist generation taking into account user's history and usage. So far I've seen a lot of related work exploiting song similarity, some cool work on music mood and some on assisted playlist building. I'm also not ruling out recommendation or discovery.

kky 14 years ago | |

Thank you! There is a link to request an invite at mortardata.com -- if you'd like access, let us know. We just got a lot of invitation requests from this post, but we'll invite you as soon as we can.

kky 14 years ago | |

And also, if you haven't seen musicmachinery.com, do check it out.

bbq 14 years ago | | |

Wow, this is a great resource. Thanks for sharing.

jacabado 14 years ago | | |

I was just going through their archives trying to answer my question and found a great discussion on this post:

http://musicmachinery.com/2011/05/14/how-good-is-googles-ins...

There are some insightful comments from names I recognize from my research.

nashequilibrium 14 years ago |

How does your offering compare to AWS elsatsic MapReduce using Pig?

ajdavis 14 years ago |

Very cool demo, this looks like an amazing tool (and I don't even know much about Hadoop!). One question -- it looks like you're skipping over the time it takes for the "illustrate" function to calculate your results. How long does it take for this million-song dataset?

kky 14 years ago | |

Oh thanks for asking that -- it takes about 30s to illustrate the million song set; we're working to make it faster!

fasouto 14 years ago |

+10 for the product, seems really nice. But I don't like the main site, it's not very informative

I'm doing something similar for my master thesis, a pig console embedded in js and also Cassandra support. I expect to release it in mid-January.

res0nat0r 14 years ago |

This looks pretty awesome. So is this just using Elastic Mapreduce on the backend? Can you use your existing AWS credentials for this with a Hawk surcharge on top? This looks like lots of fun to use. Can't wait.

kky 14 years ago | |

We actually built on EC2, not Elastic MapReduce. Invoices come from Mortar only -- that way when we can achieve bulk AWS savings, we can keep our cost lower.

I'm glad it looks awesome, thanks!

latch 14 years ago |

FWIW, looks great. Took me a while to hit the maximize button on the videos...for a while I thought "man, they really didn't do a good job, I can't read any of the text"

kky 14 years ago | |

Oh, great to know that, thanks!

ajdavis 14 years ago | | |

I had the same thought at first -- next time, make sure the text is bold enough to be legible at 320x240. Once I maximized the vid it looked great, of course.

vitalyg 14 years ago |

Very cool tool. Can the tool be integrated with my existing Hadoop cluster or do I need to transfer all the data to MH?

revertts 14 years ago |

Do I get any control over node types, or is "number of nodes" the only knob I can turn?

kky 14 years ago | |

Right now, just "number of nodes". But that's definitely something we'll be adding soon.

dennisgorelik 14 years ago |

How is it different from SQL?