Origins of the youtube-dl project(rg3.name) |
Origins of the youtube-dl project(rg3.name) |
I wanted copies of those pictures and the easiest way to get them was the write a tool to download them rather than have to coordinate with 3 to 15 friends and ask them to copy the images to a CD or USB stick or some other nonsense. Dropbox wasn't a thing and not all my friends were tech heads that would want to setup FTP servers.
Flickr had also come out with an API. APIs for online services seemed kind of new at that point and Flickr was one of the first AFAIK.
So I wrote the app https://blog.greggman.com/blog/flickrdown/ and a few months later it was accused by other users of flickr of being solely for the purpose of downloading copywritten images. Not once did I ever use it for such a purpose nor, AFAIK did any of my friends. None of us had any interest in other people's images on flickr, only shared images of mutually attended parties, bbq, picnics, events.
Those users reported the app to Flickr and the app was banned.
It was banned by the app's id. That meant you could register your own app and then hack in your app's id and still use it. IIRC I continued to use it to download pictures from our events but it always pissed me off they banned it. It also pissed me off because it wasn't accessing anything you couldn't just scrape for. The API made it easy to get a list of URLs, search for albums or people etc but you could easily write a script that just scraped the HTML to find all the same data. Didn't matter, flickr didn't budge.
It further pissed me off that over zealous flickr members accused me of lying about its purpose. Like many topics today, there is often absolutely nothing you can say that will convince someone else your intensions are not bad.
As a user (not a web developer), I personally never saw the practical point of web APIs; I have always just "scraped the HTML". Many times the solutions I write outlive the corresponding "API"; IME, often the non-API method of data retrieval is more robust and reliable than using the so-called API.
YouTube used to have a freely accessible search API. Not anymore. However "scraping" the YT search result pages continues to work fine.
Twitter used to have RSS/Atom feeds for each account so you could follow someone without a client, just a regular old news aggregator.
Sometimes it's better to not even try. Acting is always the more powerful move. Just do your thing. You wrote an awesome program that downloads stuff, they got offended and banned it. You've already accomplished your goal so you let it go... But if you cared enough you could just write a scraper for the website itself. What are they gonna do about it?
In an ideal world, downloading copywritten data would be so easy and ubiquitous that the intellectual property laws would be unenforceable. Sure, they would get mad but who cares? There's very little they can do about it.
I maintain a similar project for SoundCloud called SoundScrape: https://github.com/Miserlou/soundscrape which I started for a similar reason, to save my own 'likes' and tracks that I've made and my friends have made.
SoundCloud made this very easy, as they had an API which exposed the endpoint MP3/WAV location in a field. The tool used an API key provided by SoundCloud to fetch the response.
Overnight and without warning, they removed that field from responses, changed the terms of service, banned my application for terms of service violations, and deleted all of my personal music and likes because I had used my own account to create the API key.
I was very angry at the time since all my my music got deleted, but these days I'm just sad. Things like this have little by little destroyed all of my enthusiasm for technology.
I want to be a carpenter now.
When I was a kid, I watched with wonder and delight at all the amazing things we were doing and inventing. We were poor so I didn't get much tech, but I always marveled at it. I saved up for ages to buy one of those personal organizers that were all the rage in the mid 90's. I saw myself as part of a world working together to build a brighter future for all of us, and I think I wasn't alone in that.
It is true that the seeds of many issues we face today were already firmly planted in those days and that it was unrealistic. But there is value in dreams because dreams tell us who we want to be and give us the engine to get there.
Today, I couldn't care less about tech, actively see it as a negative influence on human life, and understand the Amish and Luddites alot better.
I'm currently planning to build a farm out in the middle of nowhere and then Apple can have its fiefdom, Google can own everyone's data and control what people think, they all can tell people what they're allowed to say as our new overlords and I just won't care at all.
My app didn't let you search by keyword, only by user. Further it didn't remove watermarks or do anything else. If photographers put their photos on flickr and they care about stealing they usually both watermark and only put relatively low-res versions there and you have to pay them for high res versions.
If flickr provided logs that showed bulk download and further some proof that even with bulk download that it was actually affecting professional photographers and not just a few geeks collecting some pictures they liked then I'd be more inclined to buy into their ban but without that I'm pretty confident the ban had no basis in reality.
It is only when you re-publish the photos that it becomes theft of intellectual property.
Also, if you don't want something downloaded, don't post it on the Internet in the first place. The problem you're talking about isn't that photos get downloaded, it's how those photos are subsequently used.
Newspapers will routinely rip off photos from social media, sometimes in the face of explicit non permission.
After watching the videos from my couch for a few days I decided to post a link to my extension on the Udacity message board... and it absolutely blew up! My dinky little extension had thousands of users all over the world seemingly overnight.
But the absolute highlight was getting an email from a student from Iran. Iran just blocked YouTube because of https://en.wikipedia.org/wiki/Innocence_of_Muslims and there was a whole group of students who could no longer participate in the course. Apparently they had some friends at a US university use my extension to download the videos and reupload to a VPS they ran. I was blown away - my quest to sit on a couch ended up accidentally helping fight censorship.
I maintained the extension until Udacity added a native video download feature and then took it down. But it was an interesting experience and definitely shaped my perception of fair use laws. They are important. People have way more legitimate uses for information than lawyers can imagine.
That's brilliant! We can never predict the impact our tools will have on people's lives.
Your idea of partially porting youtube-dl to the browser gives me an idea... would it be feasible to port it fully? I think the biggest hurdle would be ffmpeg, but a few days ago I saw "A pure WebAssembly / JavaScript port of FFmpeg": https://news.ycombinator.com/item?id=24987861
I hope that the DMCA takedown issue can be resolved reasonably, but it’s starting to seem more and more like a move off of Github is overdue. Especially in a world where anyone can stand up a Gitea or Gitlab CE instance.
I remember coding my own YouTube downloader because of similar reasons. My internet connection was way too slow to stream videos, even at the lowest quality, so I'd make a list of videos, download them in the background the entire day and then watch it offline at the end of the day. When I finally discovered youtube-dl, I was relieved that I no longer had to keep maintaining my script... and it supported almost every other video website.
Also, I just realized that it can still be downloaded from the official website and updated using --update argument.
I don't think it can be solved on Github.
"GitHub’s CEO suggested that YouTube-DL won’t be reinstated in its original form. But, the software may be able to return without the rolling cipher circumvention code and the examples of how to download copyrighted material."
https://torrentfreak.com/riaas-youtube-dl-takedown-ticks-of-...
This pretty much makes youtube-dl useless, since the "rolling cipher" is just downloading the same bit of js, inspecting it, and executing it, almost the way a web browser does (AIUI, the difference is that yt-dl inspects the js and picks out the function to run from it instead of just running it all verbatim). This counts as circumvention according to the DMCA, which leaves yt-dl little legal standing in the US.
Also note that the "examples of how to download copyrighted material" in the yt-dl tests were just code for getting the first few bytes of a number of RIAA-sequestered music videos. Small excerpts are usually allowed under Fair Use. The RIAA didn't really look into that detail.
On the plus side, this fork is active and not DMCA'ed, for now. I just turned to it because I needed a fix for Bandcamp that upstream yt-dl doesn't have:
Does everything on Youtube use the rolling cipher? I thought it was only on things like major label music videos.
If the good guys have an inferior product and charge double, I'll sometimes pick the bad guy. And more often than not, I get burned, costing me tenfold what it would have cost to just go with the high-integrity choice in the first place.
I'm not leaving github over this, but I'm mostly starting new project on gitlab instead.
GitHub did many, many decisions that I found sketchy.
It's a risky move, dabbling with stuff that is targeted under DMCA. Anything hosted in the US is liable for takedowns - including domain names that are under the control of US-based companies. You'll need to deal with acquiring hosting and DDoS protection yourself, plus keeping track of security updates. And to be honest Europe isn't exactly a legal safe haven either, we also have nasty laws (e.g. in Germany the infamous "Störerhaftung") exposing you to liability.
I would argue that recognizing the reason that useful tools are taken down, even if you argue that reason is not legitimate, is an important part of figuring out how to stop those tools from being taken down.
> Also, if you don't want something downloaded, don't post it on the Internet in the first place.
I understand the argument although I do wish photographers could be free to post their work without fear of others taking it.
You described the written one above. GP described the actual one.
After all that's what the "property" in "intellectual property" is - the bundle of exclusive rights. Owners of those rights aren't deprived of them by somebody copying the works over which the rights exist.
github went way above-and-beyond here. It is under no obligation to:
1) Enforce an invalid DMCA request
2) Take down forks and repos from other users, without DMCA requests
3) Threaten to ban users
DMCA has a simple, neutral process. github receives a request. It's required to take down that specific tool if it's a valid request (NOT everyone who forked it). The developer who owns that repo can then put that up. At that point, github brings that repo back, and gives the RIAA means to file litigate against the developer. It's inspired by the concept of a common carrier, where github would be acting as a neutral party, not a thuggish policeman-for-hire.
If github ignored DMCA requests, I wouldn't do business with them either. I expect them to be a neutral third party, as the law dictates.
Personally, for channels, I use a script only needs to access the channel's page; it outputs a list of all the videos in the channel. One of the more recent web development trends I dislike are sites that "load more results" using additional Javascript-triggered HTTP requests in response to scrolling a page. YouTube channels with multiple pages of videos are one example.
With custom scripts I wrote for searching YouTube, outputting lists of videos from channels, and downloading non-commercial videos, I can use YouTube without the need a graphical browser.
It's even on AUR, for easy usage.
What github did probably wasn't illegal either. Most companies can fire a customer for no great reason, which github did.
A perfectly reasonable response, I suspect, might be to start sending github large numbers of inane takedown letters similar to the RIAA one, alleging that random tools can circumvent copy protection. For example, vscode, cpython, and many other projects live on github, and can be used to circumvent copy protection too. That's a true statement.
github is welcome to ignore those letters too, since they don't conform to DMCA requirements. Or they can follow them.
In either case, I'm kind of curious what github would do. I suspect if they randomly kill projects, most projects will go somewhere else. github won't be considered a reliable service provider. Or github will re-evaluate their policies to be more reasonable, and not take down projects willy-nilly, which would be a good outcome too.
If someone sends you a letter saying:
"Dear qw3rty01: You're hosting Python. Python can be used to circumvent my copy protection provisions. I believe this is illegal under 17 USC §§1201(a)(2) and 1201(b)(1). I ask that you immediately take down and disable access to Python."
They're allowed to do that.
So long as they have a genuine belief that Python violates §§1201(a)(2) and 1201(b)(1), they've sent a stupid letter, not an illegal one. I believe your only recourse is to ignore it.
Your understanding of legal fees is incorrect:
1) I don't need a countersuit to recover legal fees. I can generally file a motion in the original litigation.
2) The US doesn't have federal anti-SLAPP laws. Some states have anti-SLAPP, but even there, there are open legal questions on how state laws interact with federal law suits. Unless you happen to live in a state with anti-SLAPP and one where precedent says state anti-SLAPP laws apply to a federal law suit, you're probably out of luck.
3) Even so, anti-SLAPP is designed for specific types of legal intimidation; I'm not sure this would qualify.
I just looked at the RIAA DMCA repo, and the situation is more complex than reported. The RIAA got a court ruling of some sort in Germany. That throws all sorts of wrenches into all sorts of analyses.