This is absolutely amazing for debugging purposes. Also you never have to worry about corrupt save files or anything of it's ilk. Development is easier, diagnosing problems is easier, and using a programmatic data structure on the backend means that you can pretty much keep things clean and forward compatible with ease.
(Oh, also being able to debug by altering the save file in any way you want is a godsend).
Is there any way that arbitrary code in the file could compromise the user's system? If so, does the user know to treat these data files as executables? Is there any way someone untrusted could ever edit the file without the user's knowledge? Even in combination with other programs the user might be running? Are you sure about all of that?
Maybe Lua in particular is sandboxed so that's not a problem (beats me), but in general this is an area where safe high-level languages can all of a sudden turn dangerous. Personally I would rarely find it worth it.
I apologise in advance for ranting... I hope this is not too off-topic, but instead a "zoom out" on the issue.
This touches on something deep and wrong about how we use computers these days. Computers are really good at being computers, and the amplification of intellectual capabilities they afford is tremendous, but this is reserved for a limited few that were persistent enough and learned enough to rediscover the raw computer buried underneath, and what it can do.
For example, I dream of a world where everything communicates through s-expressions, all code is data and all data is code. Everything understandable all the way down. Imagine what people from all fields could create with this level of plug-ability and inter-operability. We had a whiff of that with the web so far, but it could be so much more powerful, so much simpler, so much more elegant. All the computer science is there, it's just a social problem.
I understand the security issues, but surely limiting the potential of computers is not the solution. There has to be a better way.
In Loa 5.1 you can use setfenv http://www.lua.org/manual/5.1/manual.html#pdf-setfenv
And in Lua 5.2 the functions that eval strings receive the global scope as an optional parameter. http://www.lua.org/manual/5.2/manual.html#pdf-loadfile
1. http://stackoverflow.com/questions/1224708/how-can-i-create-...
2. http://stackoverflow.com/questions/4134114/capabilities-for-...
I find it easier to trust Lua than similar facilities in other programming languages because the kernel of the language has a relatively simple semantics, so the TCB of a sandbox is lower, and the source is easier to understand than most other languages.
Note that sandboxing in Lua 5.2 has a still simpler semantics than for Lua 5.1 - few other languages evolve in a way that makes the language easier to trust.
But it's still code, so you can e.g. inject an infinite loop and the loader will hang. (You can protect against this, you can install a debug hook that gets called after every N instructions executed, and kill the loader.)
The biggest 'concern' would be save hacking, but at the end of the day that will happen no matter what so it doesn't bother me much.
I leaned heavily on Python's pickle module for serializing a few thousand entities to disk a few years ago. By streaming them to the application at startup time, it remained plenty fast for all datasets it'd encounter. I intended to replace it with SQLite one day, but I never had to. I could just keep them all in memory.
I'd probably choose something a bit safer now, but it was hard to beat the simplicity.
Edit: Igglyboo has a point too.
Also, the Tiled Map Editor exports directly to Lua.
local t = {}
t = {["foo"] = "bar", [123] = 456}
t.foo2 = "bar2"If you ever used maven xml configs, java object marshalling or c# xml you would understand the pains of using xml as a file format for software projects and data representation. You have to find a solution that is language agnostic, neither LUA or JSON is.
Could persist to disk as a binary, sql or a plist (xml).
I guess the only downside is, that if you got a lot of composite classes all with their own properties and associations (say a graph), there's a lot of manual work to be done.
Thanks for sharing this, it's one of those ideas that (to me) seems so brilliant in its simplicity that I probably would've never thought of it.
Any hiccups in the day-to-day work using this approach? I'm just trying to get a better idea of the workflow since I'm very seriously considering applying it to my next project.
I’ve created mods of the game where you fun faster but gravity is stronger, and where all levels are randomly mixed into one level, and where the dangerous falling platforms also give you energy while you’re on them, and where the sound effects give the player clearer feedback on what they’re doing. And though I could cheat by multiplying my score by 1000 and submitting it online, I actually have been careful to always comment out the high-score saving and submission code in each of my mods.
I like the game much more than if the developers had obfuscated the Lua files so I couldn’t read and edit them.
The only argument against human-editable text files is parsing speed, not security.
If the player has fun, it's a nice feature! :D
Hell, I've used it myself more than a few times.
If not loading things is important to you, mind.
But if I want something that is able to express data structures customized by myself, usually with hierarchical data that can be verified for validity and syntax (XML Schemas or old-school DTD), what other options are there?
Doing hierarchical data in SQL is a bitch and if you want to transfer it, well good luck with a SQL dump. JSON and other lightweight markup languages fail the verification requirement.
Firstly some clarification - this appears to just be about the persistence format for his dive log. It was XML, now it's git based with plain text.
As someone who had to manage a system which worked with plain text files structured in a filesystem for a number of years in the 1990s, this is done to death already.
You now end up with the following problems: locking, synchronising filesystem state with the program, inode usage, file handles to manage galore and concurrency. All sorts.
Basically this is a "look I've discovered maildir and stuffed it in a git repo".
Not saying there is a better solution but this isn't a magic bullet. It's just a different set of pain.
Running the new executable then loaded the new configuration. This worked like a champ, up until the Age of Antivirus Software, which always had much grief over writing to executable files.
It's a trick I learned from the original Fortran version of ADVENT.
"XML is what you do to a sysadmin if waterboarding him would get you fired."
Made my day :-)
I am currently stuck on a project I want to start becasue I cannot get it to fit right in my (future) head. And I am glad I am not an idiot for not being able to knock out my next great project in between lattes.
(Ok, in direct comparison terms I am an idiot, but at least its not compounded)
"A change in perspective is worth 80 IQ points."
-- Alan Kay
My biggest hurdle solving new problems is divining a unifying, simplifying metaphor. Once you have the right notion, that Eureka! moment, everything falls into place, like magic.Like how Kepler was able to fully explain Bache's astronomical data once he realized the planets orbits the sun.
Personal example: I used to write print production software. Placing pages onto much larger sheets of paper that get folded and bound into a book. A task called image positioning aka imposition. It took me years to figure out how to model the problem. Key insight was simulating the work backwards, from binding back to the press. Then when I showed the new solution to my coworkers, the response was "Well, duh."
G+ is largely misunderstood. It is a lousy tool for interaction with people connected to you purely socially. It's a very good way to find and interact with people connected to you by interest.
Sure, XML can be nasty, but thats very much a function of the care taken to a) format the file sensibly b) use appropriate structure (i.e. be as specific as necessary, and no more).
I didn't really know what he was talking about but I think this is it.
The title does need changing though as it is definitely file formats under discussion not file systems.
But here we have threads about Lua, why people hate XML and love JSON and all kinds if irrelevant issues which have been well hashed elsewhere ad nauseam. Why not restrict to an analysis of whatever it is Linus developing?
HN is getting truly annoying and sucky, if it isn't so already.
This I like. The race away from the waterfall straw man has also stripped us of the advantages of BDUF.
While rigid phase-driven project management helps nobody, I think there's still room for speccing as much as we can upfront within iterative processes.
Or you could run to the IDE and start ramming design pattern boilerplate down its throat the second you're out of the first meeting ;)
A lot of people use AGILE to avoid planning at all, which is a particular destructive anti-pattern, and the exact opposite of what you need.
Yup, I've seen this a lot.
In one instance "Agile" meant I could finish a major task using an unfamiliar language, framework and code base in short order.
Genuinely, the customer was told "Of course, fuzzix here is familiar with Agile processes so you should have this in 3 weeks".
edit of course this also meant there was no formal spec for the task, though I did have a photo of the whiteboard.
I think that the first "how" should be planned as much as anything else. I understand how you refactor from v0.0.1 to v5.34.2 iteratively, but I think that getting from vNothing to v0.0.1 is qualitatively different.
If I don't have a complete idea of how my minimally functional thing will work that is small enough that I can completely hold it in my head, and instead just architect by agglutination and test writing, 1) my results are going to be hacky garbage, 2) my first 50 iterations are going to be devoted to replacing it all haphazardly to fix bugs, and 3) the code and interface will become increasingly more complex, harder to work with, and strewn with special cases.
When v0.0.1 is well planned, v2.5.2 may not look anything like the plan anymore, but in my experience it becomes shorter, cleaner, and more correct rather than a giant ball of band-aids propped up with tests.
This might be a tangential discussion. Earlier, I used to have a similar approach. Can't code until I have the complete picture. But, it's tough to do in a commercial world and you have deliverables. So, nowadays, I start with what I know and scramble my way until I get a better picture. There are times when that approach works. But, there have been days where I was like - "wish I had spent some more time thinking about this".
I am curious how folks on HN handle this "coding block".
A notebook: I'll write down some notes and just kind of free write whatever thoughts come to mind. If there's something that I think is important to come back to, I'll draw an empty box in the left margin (to be filled with a check mark later)
Readme: start writing the Readme for the project, even if you're not entirely sure of the details. Include code examples. If you don't like how the API is coming together, change it. It's way less work to modify the API now than it will be later.
Write a test: I don't always unit test, but when I do I test first :). This works well on projects that already have a decent test suite. It's kind of an executable version of the Readme.
Branch and Hack: branches are cheap. Make one and start playing. Don't like how it's turning out? Make a new branch and try again!
Ctrl-Z: maybe the answer won't come to you right away. Let it sit and run in the background for a while and come back to it. If I'm worried about forgetting details, I'll write it down in a notebook first.
(other than that I agree it's a good solution)
git as the basis of filesystem is interesting, hope we don't need to manually make branches and commits to use it
I'm not sure why git is the best tool for the job in this case, even after reading the post & some of the contents.
If you really need a hierarchical serialization format that is "verified for validity and syntax", the problem is that XML has prevented the adoption of something better (because it was "good enough").
If you don't need that, then XML is overkill and bloat and makes your format less readable than it could be. And you rarely need it, because either your data is computer-generated and -read, so there's little point in putting in extra schema checks, or schema verification is woefully insufficient (because it can't verify the contents of fields, relations between fields, or a ton of other stuff that can accidentally go wrong).
> But if I want something that is able to express data structures customized by myself, usually with hierarchical data that can be verified for validity and syntax (XML Schemas or old-school DTD), what other options are there?
What would be better?
Is XML schema really so much better than e.g. JSON schema?
To me it feels like there's an impedance mismatch between the kind of structures XML lends itself to and the kind of structures programs are good at dealing with. So for program-to-program communications with a certain level of validation I find Protocol Buffers is a much better fit. Conversely in cases where human readability is really important, XML isn't good enough compared to JSON.
Namespaces exist to solve a real-world problem that happens in real-world use cases (SVG embedded in HTML, HTML embedded in RSS). While it would be nice to look at things that are complex and say "it would be less complex for these trivial cases without this feature", in reality there are then common use cases that become more complex or even impossible in the general case, which seems like a very short-sighted benefit. Namespace prefixes are really not that difficult to configure, and once configured XPath makes them very easy to use :/.
And because of this almost no-one bothers to actually handle it properly so you often can't actually use the advanced features even if you wanted to.
{ "name": "bob", "salary": 1e999 }
Ah crap! Deserializer blew (in most cases silently converting the number to null) <person>
<name>bob</name>
<salary>1e999</salary>
</person>
No problem. The consumer can throw that at their big decimal deserialiser.And the following is not acceptable as it breaks the semantics of JSON and requires a secondary deserialisation step as strings ain't numbers...
{ "name": "bob", "salary": "1e999" }
JSON is a popular format but it's awful.> doing everything in XML is a stupid idea (e.g. XSLT and Ant)
XSLT actually made a lot of sense. If everyone writes code to transform format1 to format2 then what you end up with a lot of slightly different transformations. Its main downfall, just like XML itself, was that it was annoying and time consuming to write.
How would you replace all this if you moved away from XML?
http://git.hohndel.org/?p=subsurface.git;a=tree;f=xslt;hb=HE...
And impossible to debug. Write once, do something else for some weeks, and trying to understand what you were doing at a later point is nearly impossible.
It remains annoying and time-consuming :) But there's no better option for reliably creating valid EPUBs to a predetermined business specification.
If we start out instead with something that's turing complete and simple to begin with (perhaps S-expressions?), we can (often trivially) write our own validators/type-checkers, or any other processing tool to verify the document structure, with few or no constraints, and without requiring the effort and expertise to parse complex syntax.
Simply put, XML does not correctly model the data by which we intend to interchange. It was a noble effort, but it didn't come from a place of innovation. It came from corporate needs for standardization.
<customer>
<account>
<type>Personal</type>
...
</account>
<account>
<type>Business</type>
...
</account>
<custid>496F3AB</custid>
</customer>
This may seem innocuous, but XML allows mixing of arrays and objects too liberally, and makes automatic parsing overly complex. At first <customer> appears to be an array of account objects, but wait now that we reach the end we find that <customer> is an object with multiple keys and must create an unnamed array key to hold accounts.XML is a document markup language, not a data format.
<customer custid="496F3AB">
<account>
<type>Personal</type>
...
</account>
<account>
<type>Business</type>
...
</account>
</customer>
it would make a lot more sense, I think.The likelihood of a JSON feature biting you in the ass like that is far lower. Don't use XML until you actually need something XML SPECIFICALLY provides.
Also JSON easily translates with easy to work with dictionaries and lists, XML parsers take more code to work with equivalent items.
S-expressions work great. Syntax checking is far simpler, and validity checking is hence something you can roll yourself (and writing an S-expression schema checker ain't tough).
Besides, since 1960 or thereabouts we have S-Expressions. The world should just have used that without reinventing the wheel once again.
Which is why he's reusing git for resolving those pain points? Well presumably all except "synchronizing filesystem state with the program" -- where he's gone from using some kind of xml parser to marshal xml to objects/structs in ram to using a (simple(r)?) text parser to do the same.
I'm guessing he just writes/reads a full (part) of a log (a branch of the full tree, or whatever is used in the program. Maybe a list anchored at a date?) -- and lets git sort the history/backup thing.
So, yes, it's a different format, but I think the argument you're making is off -- seeing as he already has git for that? It's more like combining Maildir (or mboxes, only commited when valid) and git.
There's not much more to infer from the comment.
Unless he's invented a new ASN.1 encoding which plugs into libgit or something or a new text serialisation format (both unlikely).
Git is so well-designed that expert users manage to trash their repositories and propagate the damage.
Maybe that's not a problem of libgit. But tools are both the infrastructure and the UI.
But what he replaces it with is a git object store. Each xml-node becomes a git object. They each point to a parent (just as git commits point to a parent commit).
Now writing to this datastore means adding a new node to the git object database and changing the parent references.
Where git stores commits that are related sequentially in time, this stores nodes in a tree relationship that IS the document.
If he's not talking about this then I'd like to officially take credit for my weird idea right now.
Regardless, I would think that some applications are simple enough (store few enough separate objects in the file system) that the issues you cite are not likely to cause a problem.
You could write hex values in the program text editor, then you could tell the calculator to execute the hex codes as machine code. I understand the previous models, TI-82 and TI-85, were hacked / backdoored to run user-supplied assembly language, so TI responded by including an official entry point and developer documentation for the TI-83.
People later wrote loaders which allowed programs to be stored as binary instead of text (using half the space). Some loaders also had the capability to run binary programs by swapping them into the target address rather than copying them (theoretically a third option would be possible, running programs in-place if they weren't written to depend on their load address, but this wasn't a direction the community went in. gcc users may be familiar with -fPIC which produces code which can run from any address, and this flag is necessary when compiling code for shared libraries.)
This allowed people to create massive 20K+ applications (an RPG called Joltima comes to mind), that used most of the available RAM.
The fact that this loading scheme made static variables permanent was also quite convenient. (And most variables were static; stack-based addressing would be tough because the Z80 only has two index registers, one of which is used -- or perhaps I should say "utterly wasted" -- by the TI-83 OS.)
The next generation, the TI-83+, included I think 256K of flash ROM, and a special silver edition was released which contained 2 MB.
The other huge thing I learned from ADVENT was polymorphism. The comment in the source code "the troll is a modified dwarf" was an epiphany for me.
Without it, publishing would be stuck in a morass of nebulous, ill-documented proprietary messes, and a great deal of current learning would be at risk of being lost to posterity. The fact that there are associated open standards such as XSLT with which to transform it is just the icing on the cake as far as publishing is concerned.
This is why there's so much distaste for XML - people try to use it for applications where it isn't ideal (and there are many more of those than there are applications where it is ideal) because they've swallowed someone else's hype, and as a consequence they have a bad time. If not for the unbelievable exaggeration a few years back (I heard people claim without irony that XML - a markup language for god's sake - would literally change the world), the divisiveness wouldn't exist, and it would be a technology used by experts quietly getting on with the jobs it's best for.
That's only true for minimally formatted documents. For anything that approaches professional typesetting requirements, XML is a nightmare.
By far the biggest problem, it the requirement that inner elements must be closed before outer ones can be. This frequently means that the software must do a huge amount of read-ahead to figure out which aspect of the formatting changes first to make that formatting element innermost.
Sometimes, that's simply not possible to arrange and so you have to close a whole bunch of elements and then reopen all but one of them.
All this because a constraint of the format.
Ideal formats, such as used by typesetting systems that don't use XML, allow you to say: keep this formatting trait on until it's switched off. There is no concept of every element needing to be a subset of its encompassing element.
I just hope that opinion of it as a markup language can be rehabilitated before someone reinvents it and kicks off a new hype cycle.
"+Aaron Traas no, XML isn't even good for document markup.
Use 'asciidoc' for document markup. Really. It's actually readable by humans, and easier to parse and way more flexible than XML.
XML is crap. Really. There are no excuses. XML is nasty to parse for humans, and it's a disaster to parse even for computers. There's just no reason for that horrible crap to exist.
As to JSON, it's certainly a better format than XML both for humans and computers, but it ends up sharing a lot of the same issues in the end: putting everything in one file is just not a good idea. There's a reason people end up using simple databases for a lot of things.
INI files are fine for simple config stuff. I still think that "git config" is a good implementation."
Subversion has a really good XML output for its log command which is a joy to use (and that's something to say if you work with XML) whereas with git you always have ugly format options that are most of the time underdocumented.
I just had a quick scan of the user guide. It's very impressive. Looks like markdown but with all the edge cases thought out.
At its core XML (if you ignore all the DTD, namespace and entity rubbish) is both simpler and more powerful than this. You have text, tags and attributes. What those tags and attributes mean is up to the application, but at the very least you can be sure that the document can always be reliably parsed into a form you can work with.
I'd really like to hear more about this perspective, if anyone feels like they can elaborate.
I'm sure quite a lot of people will easily recognize it. :^)
Subject: Re: S-exp vs XML, HTML, LaTeX (was: Why lisp is growing)
https://groups.google.com/forum/message/raw?msg=comp.lang.li...
1. There's very little detail here; it's a nicely worded, emotionally charged piece that leaves a lot of detail unaddressed, e.g. "'I would like to hear why you think it is so bad, can you be more specific please?' If you really need more information, search the Net, please." That's not very helpful.
2. It argues for 'simpler' markup via the removal of attributes. Where possible, I totally agree, as at least hinted at in my original post. Sometimes, though, this would be impossible or unwieldy (e.g. HREF attribute on an A element).
3. Character entities vs. unicode - totally agree. Wherever possible, I use proper unicode characters rather than ugly character entities in my markup.
4. "But the one thing I would change the most from a markup language ... is to go for a binary representation." Linus would vehemently disagree on this point.
G+ is actually a great place to read long form messages and comments, but doesn't really have content discovery down.
With that on the board/paper/code, we can start to test our assumptions and iterate on the solution. I do not know if that classifies as planning, but it works very well.
Which IMO is a sensible way looking at it. I too think XML is not perfect but if all the other stuff we're stuck with currently would be as good enough as XML, IT would be a place with less WTFs all around. ;-)
Even if you were to stick with XML semantics as is, you could improve the syntax to be actually readable and eliminate the angle bracket tax [1, 2].
[1] http://blog.codinghorror.com/xml-the-angle-bracket-tax/
[2] http://blog.codinghorror.com/revisiting-the-xml-angle-bracke...
Alternatively Carl Sassenrath was pushing Rebol in the past. See his blog post "Was XML Flawed from the Start?" - http://www.rebol.com/article/0108.html
Update: Just posted the above blog link to HN: https://news.ycombinator.com/item?id=7361260
XML is "good enough" for a lot of cases. There are lots of tools to mess around with it too, which is really quite valuable when you're experimenting with various kinds of data or you're debugging. Being able to extract out stuff you're interested in XML format means you can perform a lot of complex manipulations quite easily.
The really annoying issue is as the parent says, that the accounts collection does not have a name. This means there's no canonical mapping for the structure into a programming language object, which necessitates that libraries require annotations or some other side-channel way of specifying how to wrap the accounts into a collection.
In Jaxb e.g., how many times must we add junk like:
@XmlElementWrapper(name = "accounts") ?
In any individual case the workaround is easy, but it's annoying to have to do it repeatedly.XML really is better as document markup than structured data representation.
<customer>
<custid>496F3AB</custid>
<accounts>
<account>
<type>Personal</type>
...
</account>
<account>
<type>Business</type>
...
</account>
</accounts>
</customer>In particular: http://www.xach.com/naggum/articles/3224334063725870@naggum....
with key words being "Whather what you are really after is foo, bar, or zot, depends on your application.".
His articles on SGML are mandatory reading too.
Several years ago someone posted these links and it opened wonderful world of Lisp to me. Not the language per se (there are many languages) but whole another Universe of how things could be done. I swear I jumped on the chair reading every page of CL standard, how brilliant it is on every level to C. Eventually it led me to rethink attitude to C and Unix in general, core parts of which I despise now.
So here am I returning favor, maybe someone will follow these links too.
Thank you, Erik. Rest in peace.
Git's output was designed in the Unix spirit; you can parse it very quickly without needing a parser toolchain.
It's also extensively documented: git help log, etc
If you wanted more security, you could keep a secret that you don't include in the saves but do include when you calculate the hash, so that anyone who doesn't have the key is going to get the wrong answer. That's about as far as I'd consider going for relatively trivial data like save games. Though that's, in principle, discoverable if someone's sufficiently interested.
After that point, it becomes much simpler for someone to watch the memory associated with your program and extract/alter the values there. (Programs to do that to games are generally called Trainers.) That's not a complicated thing to do unless someone's tried to stop you doing it.
There are some techniques to provide some degree of security there. Changing where in memory you place your information each time springs to mind, thus making it more difficult for people to find out where the values are and then share their locations. However, even that's not perfect. Depending on how sure you want to be that no-one's going to alter the values, you're potentially looking at requiring very deep knowledge of security there.
After that point the next easiest target may be the program file itself.
That said, if you want to get around that sort of problem and you're really serious about it, then running your encryption in an environment that hostile may be making things more difficult than they need to be. You might use a trusted platform module, to try to make the environment you were in less hostile, if one were present on the user's machine. But, honestly, I'd want the information not to be stored or calculated on the user's machine if it were that valuable. Have the user's end be the input, encrypt their signals with your public key, and do the calculations that you needed to be sure of remotely.
Though then the user has to trust you. I wouldn't usually advocate that my users trust me that much - not unless we were dealing with a situation where the information we were talking about was entangled with others in some way such that a reasonable argument could be made that they didn't own it, and I was just the best common arbiter I could think of.
-she shrugs awkwardly-
You can get yourself into a situation where it's probable that the amount of effort someone would have to invest is vastly greater than the likely value of the information fairly easily. But ultimately it's a question of how expensive you want to make things and what that's worth to you. Against a sufficiently dedicated adversary, with a sufficiently valuable target, there are so many unknowns in computer security that I wouldn't even be sure that storing the data on your server would be sufficient ^^;
Gives a fresh angle on the 'open world' type game.
Speedrunners are an exceptional case, but I think everyone gets a little annoyed when they look at a leaderboard and all the top players have scores of UINT_MAX or times of 0 seconds.
Obviously cheaters will find a way regardless of whether you give them the source code or not, I'm just saying dfc's concern is not totally ridiculous.
If the core of the game is single-player/non-competitive, why should we be so worried about cheating?
So, the solution to the fact that the merging UI is a pile of garbage is HAVE A SINGLE PERSON ALWAYS DO THE MERGE. Excuse me? The whole point of a distributed revision control system is so I don't have to have a single choke point. That's the definition of distributed.
Then there was the KDE disaster: http://jefferai.org/2013/03/29/distillation/
Yeah, the root fault wasn't Git. However, at no point did Git flag that something was going horribly wrong as the repository got corrupted and deleted. Other distributed SCM systems I have used tend to squawk very loudly if something comes off disk wrong.
Maybe the underlying git data structures are fine, but, man, the UI is a pile of crap.
And, I won't even get into rebase, because that seems to be a religious argument.
Seriously, you can not call yourself a git expert, if you think rebase is a difficult thing to explain.
Might you sometimes make mistakes? Sure. I hardly see this as a systemic thing, though.
The mirror shenanigans I agree suck. Not sure what the real takeaway is there, other than don't rely on mirror as a good form of backup.
That isn't what the post advocates. He says that having a single person approve the pull request is a good idea, but approving the pull isn't the same thing as manually doing a merge. Projects I've worked on required that the submitter merge master into their branch before their PR would be accepted.
And yes, it's not ridiculous, on the contrary, it's perfectly understandable.
Of course, these kinds of questions depend a lot on the game in question, and I think they don't have a definitive answer :)
Edit: clarification as HN won't let me reply any more.
That's why there is the distinction between well-formed and valid XML documents. Only with valid XML documents there is a schema attached that will describe these nodes with the type attribute. And because it is extendable, these types can be anything but they will be automatically validated by the parser.
JSON OTOH doesn't have this extensibility. There are a couple of predefined types but if you need to go beyond them (and this happens all the time because JSON doesn't even define a date type!) any interpretation is up to the parsing program and this can vary tremendously (again, look at the handling of dates and for example the questions on stackoverflow about them).
It's just meant to be human readable.
If you want human editable "json", use Yaml: http://www.yaml.org/ (it's a superset of Json that adds comments, linking etc.)
However, I hate YAML with a passion. It is worse than XML in my books. I can usually read JSON fine. I can also read XML in many cases. For the life of me, I just can't read YAML. It has something to do with "-", line indentation and different ways of writing lists.
Of course, someone will say YAML is technically better ...
Could you provide examples? I'm trying to collect more examples for a JSON validator -- http://mattfenwick.github.io/Miscue-js/ (built during a big project using JSON, after I started running into some issues that I couldn't check using other validators)
I'd love to hear more examples if you're willing to share.
Why doesn't Windows do the right thing by default?
Oh, its because a different system behaves differently.
DVCS is fundamentally more complex than VCS.
I guess maybe the reason git doesn't do this by default might be because the idea of rebasing early on (the "omg you're overwriting history in a RCS!!!!!") was a bit taboo and it's taken time for people to get used to the idea. Note that I'm just speculating about that, I did follow the git discussion early on and I know that people then (and still are to some degree) afraid of "rewriting history" (note that I quote that because I don't really see it that way).
What's the issue?
It only has "complete structural validation". Which means it doesn't feature custom types.
Although it adds a workaround for the date issue by adding a handful of supported sub-types (http://json-schema.org/latest/json-schema-validation.html#an...)
It is far from what validation XML Schemas offer.
Python has exactly the same problem -- control-structure nesting quickly gets confusing and hard to read beyond a certain (fairly small) size -- but at least with python, you have the option of splitting off stuff into separate functions to limit the amount of nesting and size of blocks.
By limiting expressiveness you also gain static analysis and predictability. It's not about limiting the potential of computers, it's about designing systems that strike the right balance between the power given to the payload and the guarantees offered to the container/receiver.
For example, it is only because JSON is flat data and not executable that web pages can reasonably call JSON APIs from third parties. There really is no "better way" -- if JSON was executable then calling such an API would literally be giving it full control of your app and of the user's computer.
Speaking for myself, in my ideal world this sort of schema-checking and executing is ubiquitous and easy. Obviously that's not the world today. While there are tools for checking JSON schemata there doesn't seem to be a standard format. I wonder how hard it would be to implement a Lua schema-checker.
It's a relatively new data format designed by Rich Hickey that has versioning and backward-compatibility baked in from the start.
EDN stands for "Extensible Data Notation". It has an extensible type system that enables you to define custom types on top of its built-in primitives, and there's no schema.
To define a type, you simply use a custom prefix/tag inline:
#wolf/pack {:alpha "Greybeard" :betas ["Frostpaw" "Blackwind" "Bloodjaw"]}
While you can register custom handlers for specific tags, properly implemented readers can read unknown types without requiring custom extensions.The motivating use case behind EDN was enabling the exchange of native data structures between Clojure and ClojureScript, but it's not Clojure specific -- implementations are starting to pop up in a growing number of languages (https://github.com/edn-format/edn/wiki/Implementations).
Here's the InfoQ video and a few threads from when it was announced:
https://news.ycombinator.com/item?id=4487462, https://groups.google.com/forum/#!topic/clojure/aRUEIlAHguU, http://www.infoq.com/interviews/hickey-clojure-reader
Of course there's a "better way": running the code in a sandbox. You could do so using js.js[1], for example. (Of course, replacing a JSON API with sandboxed JS code is likely to be a bad idea. But it is possible.)
[1] https://sns.cs.princeton.edu/2012/04/javascript-in-javascrip...
But my larger point still stands; the fundamental tradeoff is still "power of the payload" vs "guarantees to the container." Even in the case of sandboxed execution, the container loses two important guarantees compared with non-executable data formats like JSON:
1. I can know a priori roughly how much CPU I will spend evaluating this payload.
2. I can know that the payload halts.
This is why, for example, the D language in DTrace is intentionally not Turing-complete.
PostScript is Turing-complete.
Here's my attempt at a clearer version:
"Take Postscript vs PDF. Postscript is Turing-complete and therefore you cannot jump to an arbitrary page or even know how many pages the document has without running the entire thing first."
"Take PDF vs PostScript. The latter is Turing-complete"
Back on topic: The reason for PDF's existence is to be a non-turing complete subset of postscript. Features like direct indexing to a page are why Linux has switched to PDF as the primary interchange format.
Also, if, for example, cars and airplanes and banks and nuclear submarines would accept executable code as input, some people would do damage on a gargantuan scale.
Clearly, being liberal in what you accept must end somewhere. I argue that it should end very, very soon. Even innocuous things such as "let's allow everybody to read the subject of everyone's mail messages", if available at scale and cheaply, would entice criminal behavior, for example by those mining them for information that you are away from home.
Does anybody know how RMS thinks about passwords nowadays?
Learning to swim is not done by throwing a kid in the deep end of a pool. Learning to code is not done by encouraging bad security practices.
Repeating the proper noun doesn't achieve the goal of emphasizing the fact that you're referring to something you just mentioned.
The problem I think is that just because XML is human-readable, it's less sufficient as a format that is human-writable (I'm looking at you, Maven!). I believe this is the root cause that many people hate XML, even though it has a very sweet spot in application-to-application communication.
Right -- the parser blew it. That many implementations do this is frustrating (and caused me so many problems that I ended up building my own validator for problems like this: http://mattfenwick.github.io/Miscue-js/).
JSON doesn't set limits on number size. From RFC 4627:
An implementation may set limits on the range of numbers.
It's the implementation's fault if the number is silently converted to null.
I guess we need better implementations!
> JSON is a popular format but it's awful.
If you're willing to take the time to share, I'd love to hear more examples of JSON's problems. I'm collecting examples of problems, which I will then check for in my validator!
[1] http://www.tbray.org/ongoing/When/201x/2014/03/05/RFC7159-JS...
Regardless, JSON is so much more readable that I'm very glad it's pushed XML out of the picture for the most part.
XML can be read as a stream and at certain points like after reading an element or attribute, an object can be created on the fly or a property on an object set and the type deserialised at the same time. The types don't have to be native types either; they can be complex types or aggregate types such as any numeric abstraction or date type you desire.
See java.xml.stream (Java) and System.Xml (CLR) for example.
As for readability, some XML is bad which is probably what you've seen but there's plenty that's well designed.
XML is afflicted with piles of criticism which usually comes from poor understanding or looking at machine targeted schemas that humans don't care about.
You'd complain the same if you looked at protobufs over the wire with a hex editor.
XML strings ain't numbers neither. You can throw a big decimal deserialiser (e.g. as a custom deserialization adapter) at a JSON document as well.
XML doesn't have strings (or types at all really)
JSON strings are strings.
There is a massive semantic difference here when it comes to parsing.
{"name": "bob", "salary": "1e999"}In your contrived example, somehow, the user of JSON didn't realize the salary could overflow a float. (OTOH, he succeeded in serializing it, mysteriously.) All the while, the XML user was magically forward thinking and deserialized the value into a big decimal. Your argument simply hinges on making one programmer smarter than the other. If one knows that a value will not fit a float, the memory representation won't be a float and the serialization format won't use a float representation. It has nothing to do with JSON vs XML.
In this particular case, you're giving a different representation, so of course you an pass it as a string.
1. imposing CPU limits incurs an inherent CPU overhead and code complexity.
2. if those limits are hit, you can't tell whether the code just ran too long or whether it was in an infinite loop.
So now if we fully evaluate the options, the choice is between:
1. A purely data language like JSON: simple to implement, fast to parse, decoder can skip over parts it doesn't want, etc.
2. A Turing-complete data format: have to implement sandboxing and CPU limits (both far trickier security attack surfaces), have configure CPU limits, when CPU limits are exceeded the user doesn't know whether the code was in an infinite loop or not, maybe have to re-configure CPU limits.
Sure, sometimes all the work involved in (2) is worth it, that's why we have JavaScript in web browsers after all. But a Turing-complete version of JSON would never have taken off like JSON did for APIs, because it would be far more difficult and perilous to implement.
Abstractions exist for a reason -- this is analogous to source/channel coding separation or internet layers. They don't have to be that way, but are there for a reason.
Someone could change my opinion, though. Provide me a data format which proves certain things about it's behavior and that would be a nice counterexample.
And the vast majority of parsers will not parse anything for you, regardless of schema definitions.
Which effectively puts you in the same place as the JSON string.
XML:
->[byte stream]->[deserializer]->[bignum]
JSON: ->[byte stream]->[json reader]->[string]->[deserializer]->[bignum]
The latter is, well, wrong.