Deep-Fried Data

560 points by udba 9 years ago | 140 comments

> For the generation growing up now, the Internet is their window on the world. They take it for granted. It’s only us, who have seen it take shape, and are aware of all the ways it could have been different, who understand that it's fragile, contingent. The coming years will decide to what extent the Internet be a medium for consumption, to what extent it will lift people up, and to what extent it will become a tool of social control.

I agree completely. This is something we should be cognizant of.

pimlottc 9 years ago |

> Many [programmers] work jobs that are intellectually stimulating, but ultimately leave nothing behind. There is a large population of technical people who would enjoy contributing to something lasting.

This hits pretty close to home.

keyle 9 years ago | |

Same here. I'm battling with this thought a lot. Beyond jobs, I think there should be communities of developers, designers, producers, writers, getting together and figuring out this stuff. And I don't mean open source projects. Let's group together smart people wanting to make a difference and have a hit list of things we (people) actually need. A group that would organise people into mission driven development.

I'm so fed up of getting paid to potentially make founders rich. Or to be a small cog in a gigantic machine on a slow decline. I'm also unemployable because I can't buy into the corporate BS anymore. And where I am, there doesn't seem to be design/dev jobs that actually want to make a difference. It's an economy problem.

The startup thing seems to be the best way we go about solving problems in the world today. But if you happen to _not_ be at the right place at the right time, meeting the right people, poof, it's gone. I can't imagine that an advanced specie would operate this way. We should be focused on solving problems, instead of being focused on escaping the rat race, to then be able to solve problems.

I'm glad I am not alone seeking purpose. There goes a point where you're technically advanced, you have itches to fix things and all you see is the broken economy of consumerism and "let's give kids video clips and smileys, derp".

wtracy 9 years ago | | |

> And I don't mean open source projects.

Then what do you mean? You described exactly what some of the largest, most successful FOSS projects (Firefox, KDE, Gnome, Libre Office, FreeBSD) are already doing.

> Let's group together smart people wanting to make a difference and have a hit list of things we (people) actually need.

Well, the FSF maintains a list of "high priority Free Software projects" that need help, but it's strongly colored by the FSF's politics: http://www.fsf.org/campaigns/priority-projects/

intended 9 years ago | | |

That used to be what the net and all of the coder based communities were about. Thats why I used to come to slashdot, or HN when I found the net.

And we've reached that point when the creatures (the firms and technologies built by those people) of that culture are diverging from ideals of the culture.

But the above comment is inherently empty - any successful system will eventually expand till it reaches a barrier of complexity which cannot be overcome on its own.

Figuring out what to do is the challenge.

The things that worked were coders having free time to spend on interesting projects. But I suspect, that we've better understood the value of coder time, and the major firms are now paying the correct amount to keep coders busy.

The market BS is a good thing for coders in the short and medium term. People who understand finance and strategy are willing to pay what it takes today, to own a chance at being richer tomorrow.

If you have a neat hobby? communities will help you get better at it. Maybe if its really good, you can convert that into a product/firm and possibly a good exit. If that happens you won't have to worry about it ever, and you'll be that thing which is respected among your peers - a succesful serial entrepreneur. You would have done the hard thing (product creation, team management, finance, successful exit).

In a group of people who respect ability and excellence, its hard not to think of the guy who did the harder job as meritorious.

In short: I don't think theres a market solution for a new market normal.

Its easier to figure out you are cut of a different cloth, recognize the market dynamics for what they are, and make time to build whatever it is you want to build.

Eventually a lot of other coders are going to come to similar realizations (provided the cultural bubble online isn't too distortive)

pjc50 9 years ago | | |

Some friends of mine came up with https://www.mysociety.org/ as an answer to this.

> "We should be focused on solving problems, instead of being focused on escaping the rat race, to then be able to solve problems."

Sometimes one person's problem is another person's solution, and vice versa; this is where so much of politics comes from. A lot of people are invested in making sure certain problems stay unsolved, or even unacknowledged.

vijayr 9 years ago | | |

instead of being focused on escaping the rat race

This resonates. How would it work in practice though? Many people want to contribute, but they are stuck as they have to provide for their family.

Also, how do we know what to work on? Someone linked to FSF projects - are there other lists, places we can go to, to find actual tasks/projects to work on? Unless there are incentives (not money/fame, but seeing our efforts put to good use/getting feedback etc) people would lose interest. It would be awesome if we can curate such a list - pharma, food/agriculture, mental health etc. And break down this list into smaller, manageable tasks. I guess many people can find 5-10 hours a week to contribute.

odbol_ 9 years ago | | |

I think the First Things First manifesto is a good start: http://firstthingsfirst2014.org

leesalminen 9 years ago | |

Same here. I was just talking with a friend last night about how software developers should have a creed, like how doctors have the Hippocratic oath. We could potentially cause far more harm than doctors nowadays.

mamp 9 years ago | | |

Interesting, but note that the Hippocratic Oath doesn't work so well with preventable errors in health care the 3rd leading cause of death in the US https://www.washingtonpost.com/news/to-your-health/wp/2016/0...

bthdonohue 9 years ago | | |

As part of my computer science program in college, we studied the ACM's Software Engineering Code of Ethics and Professional Practice, which in effect is what you describe. http://www.acm.org/about/se-code

Asparagirl 9 years ago |

> I’ve saluted the efforts of Archive Team and the Internet Archive, but their activity is like having a museum curator that rides around in a fire truck, looking for burning buildings to pull antiques from. It's heroic, it's admirable, but it’s no way to run a culture.

...but in the meantime, here's an obligatory and shameless plug for donating to the Internet Archive[1] (tax-deductible in the US), or better yet making a recurring monthly donation so they can more accurately forecast revenue for the year, or better still getting your employer to make a nice big donation to this crucial bit of Internet memorybanks.

And as for Archive Team, we're always looking for a few good geeks.[2] Run an instance of the Warrior on spare cloud servers, or help patch and ship code at GitHub.[3]

[1] http://archive.org/donate/

[2] http://archiveteam.org/index.php?title=Main_Page

[3] https://github.com/ArchiveTeam/ArchiveBot

chewxy 9 years ago |

Regarding Maciej's fears about machine learning -

I've written about this before, and even right now I'm not sure where I stand exactly, except that tweaking the algorithms to compensate for bias is definitely not the right answer: if you look at the mirror and don't like what you see, you don't draw on top of the mirror to accentuate the result! You go on a diet!

I liked the idea of data gardening, but the thought of going-to-communities is daunting. I get tired even thinking about it.

Regarding living beyond walled gardens:

> Publish your texts as text. Let the images be images. Put them behind URLs and then commit to keeping them there. A URL should be a promise.

But people already do that! The question now is to turn to why people do otherwise. I personally do not understand the reason people say, post long blogposts on Facebook, but I do understand for services like Medium.

For example, I'm extremely tempted to write on Medium because it provides the network effects of readers clicking on tags to read next. So the question is how do we democratize that?

big_surprise 9 years ago | |

  So the question is how do we democratize that?

Commenter wtracy has already linked to the FSF's list of High Priority Free Software Projects... From there, look into what they have to say about free wifi (and in particular, but not limited to the OrangeMesh package): http://www.fsf.org/campaigns/priority-projects/free-software...

If a convincing case could be made that the benefits to National Security outweigh the costs to the copyright cartels, I'd be willing to bet that young secondary-schoolers would have a blast with a decently designed curriculum that includes a working student-to-student mesh-network as one of its goals.

chewxy 9 years ago | | |

I actually meant democratizing the network effect that Medium has. The "free marketing" bit.

I mean, right now one can pretty freely go write up a blog using self-hosted wordpress, octopress, pelican, hugo or whatever. But choosing that over Medium can sometimes mean a lot more work to put in. But if we can democratize the ease-of-use and the good bits of Medium/Facebook/Twitter... Maciej's end statement about "using open standards, write text in text, images in images" would have been achieved.

The problem is that corporations now create a significantly more compelling version (in most criteria - UX, UI, etc) of the Free and Open versions out there.

makomk 9 years ago |

Have to admit that I didn't expect to see that quirk of LiveJournal culture mentioned in an article on the HN front page, let alone in a speech to the Library of Congress. It just sort of faded away without really influencing the current generation of social networks.

Also, it's funny how the net changes, how unthinkable it is to have a social network that doesn't slice up people's data and use it to advertise to them now compared to how anti-advertising LiveJournal was back then. Not convinced it's a change for the better.

idlewords 9 years ago | |

I'm the guy who gave this talk. To add to the funny, LiveJournal hired me to rewrite their ad engine in 2007. I did a horrible job at it, but turned my ineptitude into a principled and lucrative ideological stance that I have milked ever since.

Don't be afraid to pivot.

bo1024 9 years ago | | |

I enjoyed reading it a lot, thank you. I think about these issues a lot and you gave me new things to think about / crystallized some nice perspectives.

embarcadero 9 years ago | | |

Was this recorded? Link?

jaywunder 9 years ago |

I don't think I understand what the exact point of this talk was. Maybe the thesis was stated at the end of the talk when he said that he wishes the internet were more like a city rather than a mall. I think the internet can be like a city, and I think a great example of a place where people with conflicting ideas talk together is HN. Sure HN can be an echo chamber at times. But there's quite a few times when people with differing opinions talk about their different opinions.

Also I don't necessarily understand Ceglowski's stance on why we shouldn't use deep learning and should avoid surveillance on the web. I don't take issue with becoming a datapoint in Facebook's web of people because nothing bad has happened or can happen from me giving Facebook my data. When most people speak out about the data that's being collected about Facebook and Google users they say they're "worried about what could happen" but then never list any bad things that they're actually afraid of. The speaker falls to this issue too. Ceglowski says:

>I worry about legitimizing a culture of universal surveillance.

But then never explains what bad could happen from legitimizing that culture. Maybe I'm completely missing the point of the talk? Please explain what I'm missing if I'm actually missing something.

marklyon 9 years ago |

I provide guidance to attorneys involved in the discovery process; "Technology Assisted Review" is of huge interest to those teams, as it allows them to leverage coding on a small sample of the population across a much larger set of documents. For many cases, the cost and (occasional) time savings is instantly attractive. Sadly, the process is hard to do well. Far too many screw it up in new and amazing ways.

The author's concerns over machine learning are well-founded. The best option I've been able to identify to ameliorate some of the concerns is focusing on the population that will be suppressed. Once the model returns the desired recall / precision, drawing samples from the excluded population with a rigorous acceptance standard can help validate whether you've simply built a model around your biases. Couple that with allowing an opponent to validate a randomly-selected sample and you've cleared up a lot of the uncertainty in the model.

It's not perfection, but perfection is a very difficult standard.

abofh 9 years ago | |

The issue with that approach is ensuring the suppressed are represented. When it's black vs white, you can oversample one and be done.

However, if there's any winner take all built into the system, there's a strong incentive to not even acknowledging dissent.

pcmaffey 9 years ago |

Machine learning does not have less bias than human researchers. It is simply magnified at scale.

And that scale is exactly the state of the internet. There is so much data available to study and understand, that we absolutely need better tools, like machine learning or whatever we want to call it, to help us keep up. Shit's moving faster than our human perception can handle, especially for those who didn't grow up with the internet.

Yes the data analyctic tools we have right now are premature— like fast food to our productized minds— but they will improve rapidly, as our taste for quality improves.

But sure demonizing the things you don't like is one step on the path to learning what's truly valuable.

1024core 9 years ago |

"The names keep changing—it used to be unsupervised learning, now it’s called big data or deep learning or AI"

Um, I'm sorry, but unsupervised learning and deep learning are not the same.

omginternets 9 years ago | |

The point is that these phrases become buzzwords, at which point deep-learning vs AI becomes a distinction without a difference. In the mainstream media you can safely replace both of these terms with "statistics" and not alter the meaning of the sentence.

In other words, terminology can be used to make precise, meaningful distinctions, or it can be used to embellish.

roel_v 9 years ago | |

Technically sure, but outside of the world of statistics, nobody makes that distinction or cares.

idlewords 9 years ago | |

What's the distinction?

absherwin 9 years ago | | |

Unsupervised refers to whether or not the dataset is being trained against anything. Think about the difference between: How many people will view this webpage? Divide these pages into 20 clusters? The first is supervised. The second isn't.

Deep learning refers to a particular type of a particular learning technique: Specifically a neural network that has many hidden (intermediate) layers. Deep learning can be used for either supervised or unsupervised learning.

yarou 9 years ago | |

Yeah, but garbage in is still garbage out.

Which is the point he was trying to make.

aub3bhat 9 years ago | | |

It's not "garbage" it called science & mathematics, those terms have "meaning", and have lead to progress in hard long-standing problems, which have in turn lead to billions of dollars and millions of man-hours allocated to understanding and using them.

Just because you lack ability to understand nuances of something does not makes it "garbage".

dvdplm 9 years ago |

Idelwords is one of those blogs worth dropping everything you do and just take a deep breath and dive in and revel in the joy of clear thought expressed through clear prose. Love it. Thanks.

sdenton4 9 years ago |

"...Dim witted grad student that you can't really trust..."

Reminds me of the phrase "graduate student descent" for training neural networks...

I've been noticing more casual dismissiveness towards grad students lately. They are certainly often treated as the grunt laborers of academia, in areas where career prospects are downright stupid. I generally feel it would be more productive to at least pretend that they're being trained to be independent, aggressive researchers in their own right, though.

gabrielgoh 9 years ago | |

grad students are put in the same category as interns and teenagers, a naive type of person still in the making. i dont think there's any ill will intended.

eanzenberg 9 years ago | | |

You are dim witted.

No ill will intended.

vilhelm_s 9 years ago | |

Yeah, I also thought this metaphor seemed weird. "Now you need some adult supervision in the room", but grad students are, in fact, adults. And not particularly dim-witted, as a rule.

nullc 9 years ago | |

Considering the working conditions and prospects for the future that graduate students face, one /could/ argue that that a selection bias should be expected there.

> to at least pretend that they're being trained to be independent, aggressive researchers

But that is the issue, isn't it-- it would be pretending.

munin 9 years ago | |

looks like we found the grad student!

but seriously, as a grad student, absolutely no one gives us respect. not our peers, our bosses, or society. why would you expect some random on the internet to do better?

Esau 9 years ago |

"And this time it's not the government, but the commercial Internet that has worked so hard to dismantle privacy."

So true.

udba 9 years ago |

I'm currently applying for co-op jobs (internships) and while trawling the university job board I've seen many positions requiring big data this or machine learning that.

What's not clear to me is why companies who don't seem to have any need for machine learning team (i.e. a subscription box company) are looking to hire one.

Surely part of this can be pinned down to the hype associated with ML that may well die out, but the proliferation of these tools doesn't bode well for Maciej's dream of a weird, creative, and interesting internet.

teej 9 years ago | |

These companies aren't looking for someone to develop new machine learning techniques, they are just looking for someone who can slap together existing utilities to meet their goals.

Companies that run on subscription literally live and die by their churn rate. It is both feasible and reasonable for a subscription box company to hire someone to use machine learning to build a predictive churn model. That may seem trivial to you but that's the reality behind those job posts.

morecoffee 9 years ago |

Giving up control is harder than the author makes it seem. It isn't so much that you give it up, but that you give _to_ someone else. Picking that someone else is extremely difficult and a wrong choice will destroy your community.

Using machine learning on the other hand is a safe bet. It is much easier, I would assert, to write machine learning code to organize data than to curate a community of humans to organize data. The ML approach will do pretty good even if it isn't the best, which is why it's what everyone is switching to.

Keeping with the author's example, is it easier to organize erotic fanfic with a computer, or enable a community to do it without spiraling out of control?

crzwdjk 9 years ago | |

For example, it is clearly easier to write a machine learning system to find interesting articles and highlight insightful commentary, compared to using something so crude as a group of people on a website collectively voting on which stories and comments they like... wait a minute...

skybrian 9 years ago |

If you think the Internet is as safe and controlled as a shopping mall, you probably should be reading Krebs on Security more.

People tend to move towards the more mall-like areas of the Internet due to spam and abuse that they don't want to deal with. This can be low-level stuff, or (as in the cases of Kreb himeself) sometimes the attackers get out the big guns, and you need to run for cover.

And that's why we're hanging out here, after all, and not in some unmoderated forum. And even here, post on certain subjects and conversation quickly degenerates.

I think we do need a wider variety of spaces to hang out, though. No set of rules works for everyone. And if you do want 4chan, you know where to find it.

aab0 9 years ago | |

> If you think the Internet is as safe and controlled as a shopping mall, you probably should be reading Krebs on Security more.

That's an amusing comparison, given how much of Krebs focuses on offline ATM skimming, copying credit cards at point-of-sale terminals, hacking major retailers's CC databases, and using stolen cards at retail and mall stores to cash them out...

erichocean 9 years ago |

> Publish your texts as text. Let the images be images. Put them behind URLs and then commit to keeping them there.

I sounds like he's saying ephemeral content is worthless and should be shunned.

I, and hundreds of millions of others, disagree. You want a bland, awful, boring society? Easy: make everything you do stick around forever—like a promise. And then watch the world self-police as the lifeblood drains out of it.

You'll get…Facebook. No thanks.

idlewords 9 years ago | |

The audience for this talk was people with very large collections they're bringing online. I was trying to encourage them to avoid exotic formats, custom plugins, custom software (shudder) when they put this material online, and make them web accessible.

For example, here is three quarters of a PETABYTE of historical American newspapers: http://chroniclingamerica.loc.gov

erichocean 9 years ago | | |

Okay, that makes sense. I 100% agree that bringing collections online in exotic formats is a terrible idea.

goldmar 9 years ago |

Thank you for this article. I have really enjoyed your writing style, especially the creative metaphors you're been using :)

TranceMan 9 years ago |

I have been having similar thoughts for a while: https://news.ycombinator.com/item?id=10937201

qwertyuiop924 9 years ago |

>the Internet is a shopping mall. There are two big anchor stores, Facebook and Google, at either end. There’s an Apple store in the middle, along with a Sharper Image where they are trying to sell us the Internet of Things. A couple of punk kids hang out in the food court, but they don't really make trouble. This mall is well-policed and has security cameras everywhere. And you guys are the bookmobile in the parking lot, put there to try to make it classy.

It's already been mentioned, but this guy needs to get out a bit more.

The internet is a city. There's the specialist shops (HN), the bustling malls (Reddit, YT), the shady back alleys (4chan, 8chan etc.), the historical districts (Usenet, Archive.org), the cafes (IRC, ICQ, Slack, etc.). To their credit, the author is more knowledgeable than most, however.

I see so many dismiss the internet as just Facebook, or YouTube, discuss trolling as if it's a single phenomenon, and it's a recent thing, associated with Social Media. So many think that there's an internet culture: there isn't: there's a set of almost infinite numbers of overlapping, interlinked cultures. I can even map out the origins and historical influences of a few. There are even a few who think that social media sites are good forums of discussion. The poor sods: the Usenet was a better discussion forum than Facebook ever was, and the Usenet's not that great.

If you really want to see what the internet is like (that isn't advice for the author: I'm pretty sure the mall analogy doesn't encompass his internet experience, and is merely an analogue I find odd), explore. See it all, in all of its weird, wacky, zany, jokey, serious, offensive, manic, smart, stupid, brilliant, insane glory. I promise you, you won't be dissapointed.

People ask me why I'm not on social media. It's because social media is boring. Unlike Reddit, 4chan, and the rest, not much interesting happens. Unlike HN, I'm not likely to be intellectually stimulated, or learn something new. Unlike static sites, I don't get to see that kind of wild creativeness that personal webspace tends to invite in hackers, nerds, and others who know what makes the web tick. I don't want to see what you ate, I don't want to see your cat, I don't want to hear banal details about your everyday life. I want to hear something intersting, new, and original. I want to hear the next Ze Frank, or Tom Ridgewell, or Simon Travaglia, or Steve Yegge, or RMS, or PG, or Ryan Dahl, and you can bet I won't on a site with a signal:noise ratio that high.

People also ask why I'm fascinated with the internet. My response is, why wouldn't I be? It's a catalogue of decades of human creativity and interaction. It's open mike night at the largest club in the world, which is also a discussion forum, and a shady back alley, and a convention. It is - to borrow and butcher Sir Terry's words - like being blindfolded and drunk at several different parties at once.

But, in what it rapidly becoming the sign-off on my incoherent, long-winded ramblings that are really only tangentially connected to the topic at hand, maybe I'm just totally mad.

EDIT: tried to clarify that I wasn't trying to insult the author. Not my intent, but it seemed to come off that way. It still does, but less so, and I prefer not to edit my old content too much. Also, I just checked out pinboard. Pinboard is amazing, and I am impressed.

Basically, don't take this as anything more than a tangential, incoherent ramble started by an analogy the author used which I found unrepresentative. Because that's what it is.

curuinor 9 years ago |

Rather hilariously, deep frying is already a term of art in ML, of course in a radically different setting. Deep fried convnets (https://arxiv.org/abs/1412.7149).

argonaut 9 years ago | |

One (not especially widely read) paper's title can hardly be called a "term of art" in ML.

aub3bhat 9 years ago |

Frankly as a grad student (The kind that the author apparently considers "dim witted"), the entire article is meaningless babbling without any underlying theme.

I wonder if the author truly understands "Machine Learning", what are his qualifications? A degree in Art History, and some "programming experience" aren't very assuring. E.g.

>> "The names keep changing—it used to be unsupervised learning, now it’s called big data or deep learning or AI"

WTF?? The author should enroll in a beginner Machine Learning course on Udacity or Coursera before making philosophical statements about fields he has zero clue about.

It seems the only skill the author has is piecing together meaningless arguments that appeals to average HN users incapable of distinguishing between informed opinions and pseudo-scientific rants. Hell at least bad graduate students have to give examinations, read papers and make original contributions that get peer reviewed (otherwise they fail/get-kicked-out/drop-out). Not like this guy who does not understands difference between "supervised" and "unsupervised" machine learning, yet feels comfortable in making "prophetic" statements about machine learning.

Also

>>> "These techniques are effective, but the fact that the same generic approach works across a wide range of domains should make you suspicious about how much insight it's adding."

What does he means by "same generic approach". If we assume he is implying specific algorithms then we have a good "No free lunch" theorem that shows that a single algorithm is not effective across all domains. Now if by "generic approach" the author mean "machine learning" in general then its as ridiculous as saying

"Mathematics is effective, but the fact that the same Mathematical approach works across a wide range of domains should make you suspicious about how much insight it's adding."

The entire article is filled with "truthiness" and "feel-good" statements, which fall apart on closer examination.