Academic Torrents(academictorrents.com) |
Academic Torrents(academictorrents.com) |
Some environments, based just on prestige, have big problems with toxicity (StackOverflow, Wikipedia) which I didn't see at all in some music trackers.
https://en.wikipedia.org/w/index.php?title=Wikipedia:Systemi...
(using a version of the article from ten years ago because everything is unnecessarily verbose on wikipedia now)
But if I were to speculate, I guess it always propagates from the top. The point is, that the visible community you can speak of is not entirely randomly chosen from the user base, and the user base are people who just want to use the product, not to play corporate mechanics. If in the end the goals of the general public are somewhat aligned with the internal community of ladder-climbers, it works out fine. Otherwise it doesn't.
(And, by the way, ladder-climbers in most of these communities tend not to be the nicest people by default... Let's just say, they are Dwight. So if you let them do stuff that is not desirable for the general community, they will.)
I think StackOverflow philosophy is flawed by design, the main point of user frustration always was the fact that questions that they very much need to get answered are closed as "too broad", "opinion-based" or something of the sorts. Dwights love to exercise their power by noticing that something can be close "as not good fit for this site", and users who want that stuff to be discussed obviously hate that. That is something that could be fixed from the top, but the top specifically wanted it this way.
Wikipedia is similar to that, but users and Dwights stand even further apart, since general user doesn't even make an account to make an edit, doesn't look who makes the edits and doesn't know the internal playground. The main point of frustration here is a user, who knows his stuff well and wants to share the knowledge, but is being shut down by a Dwight, because the subject is "of low importance" to him. This infuriates the user even more, considering that there are thousands of articles about some fucking Harry Potter-universe pokemon or whatever, which, naturally, doesn't raise an issue with Dwights, because they are Dwights and they love this stuff. This is also something to be solved organizationally from the very top.
Music trackers are way more meritocratic. People, who eventually get to be moderators can be formalistic or not — it varies — but they generally just want a lot of music on the tracker in a well-organised manner — and this is exactly what general public wants! It's another question how they get motivated by the platform to contribute so much — and involvement sometimes seems to be much more hard work than on Wikipedia — but the point is that they really do contribute useful stuff.
Also, music trackers tend to be way more liberal (in a sense to allow freedom, not to be left-wing politically, ironically, quite the opposite is true nowadays). Nobody cares is somebody is rude, racist or whatever, if off-topic flamewar goes over the top — the whole thread goes down. Otherwise, you can post whatever you want and nobody gives a shit and isn't pressured by the media to do something about it. After all, unlike twitter, reddit or stackoverflow, they aren't traded on the stock market.
2014 HN discussion: https://news.ycombinator.com/item?id=7149006
Using RSS to allow mirrors to host different subjects is really clever, although some of the categories seem quite large (>5TB). It may be worth breaking up each category (sharding) to keep each to 100GB or less so a volunteer can pick a couple and not worry about running out of disk when a category grows.
Then it would be good to track how many seeds each category-shard has so volunteers can help where it's most needed.
Incidentally, when the torrent file for your anime image collection passes 20MB, something has obviously gone very w̵r̵o̵n̵g̵ right.
There is no metadata - all you have is an awkward imprecise textual search of the abstract that comes with the data. Good luck hosting the world's data that way.
Through the magic of cryptographic hash algorithms, you can just keep your data sets floating around “raw” (like in these torrents), and then, elsewhere, ascribe metadata to the hash of the content it is meant to annotate.
Then, later, you can reassemble them in either order—either by first finding a data set, hashing it, and then looking up metadata in some metadata-hosting service; or by first browsing a catalogue of indexed metadata, finding out about a dataset that meets your needs, and then retrieving the data set by its hash.
Which is to say: with digital data, library science (creating metadata and chains-of-custody and indexing them for search) and archiving (ensuring access to pristine artifacts over time) don’t need to happen at the same time, in the same place. There can be separate “artifact hosting” and “metadata library” services. (Which is especially helpful in contexts where private IP is involved—you can still keep in your metadata library, the metadata for a data-set you don’t have the rights to; and those with the rights can go get the data-set themselves.)
> s that you don’t need to keep digital data’s metadata attached to the data “at the hip.”
You don't have to, but it's still mostly a good idea. But this stuff isn't either-or. We can have both.This is especially true for research oriented files, where consumers are often unable or unwilling to maintain a functional metadata store, and do a lot of manual file handling. Saying "well, somebody could have set up a super-awesome metadata system that track this" doesn't magically make those resources exist.
Library scientists might say archiving and structuring and curation are all facets of that science. And you'll also want a hash search engine that finds related hashes, as there can be many revisions + versions, only some of which have some metadata.
def get_labels(rightside):
met = {}
met['brain'] = (
1. * (rightside != 0).sum() / (rightside == 0).sum())
met['tumor'] = (
1. * (rightside > 2).sum() / ((rightside != 0).sum() + 1e-10))
met['has_enough_brain'] = met['brain'] > 0.30
met['has_tumor'] = met['tumor'] > 0.01
return met
I will say that it is very handy to know exactly how the labels were computed.What I really meant is a way to search and select data based on metadata. For example has_tumor.
Also note how everything is still one single blob, to get one line of any of the files, one would need to download everything.
I think the abstract is sufficient for searching data; expecting some kind of smart database that can handle all the weird formats science uses is a bit much.
Just download it then. We got mp3 albums off Napster on modems back in the day, surely getting that torrent is easier and faster today.
That an article like that exists is patently absurd in my view and kind of makes me a bit ill. Things like that is what led to this: https://www.youtube.com/watch?v=C9SiRNibD14
I really firmly believe that if you think there is a European (?) science and an African science and they are distinct and equally valid then either me or you do not belong on Wikipedia and I would actually like Wikipedia to clarify their mission in this light.
In my view Wikipedia should not be a repository of value judgements or specific values that one should adopt - perspectives on Christopher Columbus is important and should be included but in no manner should those perspectives be made out to be incontrovertible or something other than value judgements and perspectives from specific points of view. I think it is valuable to understand the European perspective and native american perspectives at the time and throughout the following centuries for events.
But I don't think Wikipedia should be telling me I must think what Columbus did was good or bad - Wikipedia should not be trying to teach me morality - and as long as it does not do that I don't see how there is any problem with what topics Wikipedia covers and who writes it.
I think the only problem comes in when you attempt to do something which is impossible - like incorporate something which is fundamentally specific to specific people (morality) into something which purports to be valid for everyone.
One can understand a possible path that goes "xyz information source is biased", "xyz info source isn't suitable for abc group", and "xyz info source is specific to xyz people, we need our own abc source". However, that seems to require a few assumptions? And still isn't as negative as that youtube video linked.
Would appreciate if you could elucidate on your views.
[1] (please forgive the scare quotes)
> The average Wikipedian on the English Wikipedia is ... (some characteristics)
This builds to conclusion:
> The systemic bias of the English Wikipedia is permanent. As long as the demographic of English speaking Wikipedians is not identical to the world's demographic composition, the version of the world presented in the English Wikipedia will always be the Anglophone Wikipedian's version of the world.
I don't see how you get to that conclusion form the premise other than by thinking that reality is modified by personal characteristics.
If there is an Anglophone Wikipedian's version of the world which includes things like gravity and science - then it is not valid for Africa (as the woman in the video is expressing) as Africa is not the Anglophone world ... not sure what about this is not clear.
And it absolutely is as bad as that youtube video I linked - you think that poor unfortunate woman came up with that drivel on her own? She is not nearly dumb enough - no single person can be that stupid.
You need years of academic circle jerking and hand picking of the dumbest arguments from the dumbest people to come up with something that stupid.
You might also be interested in reading https://en.wikipedia.org/wiki/Chinese_Wikipedia#Self-censors...