A Scholarly Markdown(scholarlymarkdown.com) |
A Scholarly Markdown(scholarlymarkdown.com) |
For an extra five minutes learning you get a boatload (think container ship) more features[1] - it compiles to DocBook: a mature, actually standardised, highly structured format, and from that you get HTML, EPUB, PDF, slideshows, and man pages for free.[0] For math you get MathML, ASCIIMath, and LaTeX (along with a number of ways to render them.) It has a super nice syntax, is equally good at little docs and huge books, and you could theoretically write a proper academic paper in it with the LaTeX backend. And you always know what's going to happen when you try to mix bold and italic...
Also endorsed by Linus.[2]
[0]: http://www.methods.co.nz/asciidoc/#_overview_and_examples
[1]: Well, five minutes to be able to do everything Markdown can do; everything else will take a bit longer
[2]: https://plus.google.com/+LinusTorvalds/posts/X2XVf9Q7MfV (comments)Edit: I was wrong about Github not supporting AsciiDoc
I don't have much experience with AsciiDoc, but I've encountered other limitations in writing pandoc's AsciiDoc renderer. For example, I didn't see any way to include multiple paragraphs (or other block content) in an AsciiDoc footnote. That's a deal-breaker for many academics.
I also want to emphasize something that is often not mentioned in comparisons of Markdown and AsciiDoc. As John Gruber emphasizes in his Markdown documentation, Markdown emphasizes ease of reading in source format. AsciiDoc has different priorities, and it sacrifices the readability of the source document to get them. Here's an example involving nested lists, from the AsciiDoc manual:
1. List item one.
+
List item one continued with a second paragraph followed by an
Indented block.
+
.................
$ ls *.sh
$ mv *.sh ~/tmp
.................
+
List item continued with a third paragraph.
2. List item two continued with an open block.
+
--
This paragraph is part of the preceding list item.
a. This list is nested and does not require explicit item continuation.
+
This paragraph is part of the preceding list item.
b. List item b.
This paragraph belongs to item two of the outer list.
--
Markdown equivalent: 1. List item one.
List item one continued with a second paragraph followed by an
Indented block.
$ ls *.sh
$ mv *.sh ~/tmp
List item continued with a third paragraph.
2. List item two continued with an open block.
This paragraph is part of the preceding list item.
1. This list is nested and does not require explicit item continuation.
This paragraph is part of the preceding list item.
2. List item b.
This paragraph belongs to item two of the outer list.
Note that the AsciiDoc is at least as easy to write, perhaps easier, because you don't need to worry about indentation. But the Markdown source is more readable; the indentation makes clear the structure of the list in a way that mirrors how it would be displayed in a browser or on the page.Markdown's syntax constrains how much it can ever do, whereas AsciiDoc is designed to be extensible, providing macros at the "language" level and underneath that there's the whole DocBook toolchain: it seems to me that the clever move is leverage all this, rather than create yet-another-Markdown (and it's not like the syntax tradeoff is at all dramatic.) I think the footnote issue you mention can be done in block macros, but I'm not sure.
I might be getting a bit utopian here, but what would be really great is if all these markup languages sought to have their canonical implementation in pandoc, which would allow for the standardisation of a set of pass-throughs/filters/annotations for things like equations and citations.
Thanks for pandoc, btw. Just out of interest, is there a reason you haven't attempted an AsciiDoc reader? (Or have you?) Assuming it'd be quite a bit more work than the others, what with macros and so forth.
You stated about the footnote support being a deal-breaker for many academics, the poor equation support is a deal-breaker for me. Most of what I write is full of equations, especially aligned equations. It's necessary to reference a specific equation that might be on a different page (or far above it on an HTML page).
It looks like Asciidoc has different ways of handling LaTeX equations for non-LaTeX output. It can render a PNG, it can use a Javascript solution, or it can output MathML.
In addition to default asciidoctor style, there are other themes available - http://themes.asciidoctor.org/preview/ (See bottom right for theme switcher).
You can also take a look at Pro Git book styles - http://git-scm.com/book/en/v2 which was written in asciidoc - https://github.com/progit/progit2
Both Common Mark and Pandoc serve different purposes: the first is an initiative to counter fragmentation/balkanization of the Markdown ecosystem and in being so has to reckon with backward compatibility, consensus and adoption. The latter is a document conversion library, which, by design, needs to reckon with interchangeability between formats and may hence be hampered by the lowest common denominator as regards feature support. Internally, Pandoc keeps an Abstract Syntax Tree (accessible in json format), and defaults on its own flavour of Pandoc Markdown (featuring a Markdown superset of content element types).
Both the Common Mark community and John MacFarlane have made it clear their first and foremost focus is on standardization, not so much on extending the feature set. Yet, scholars and technical writers are in dire need for something more heavy-weight than the rather small set of features offered by Common Mark implementations or Pandoc Markdown. Hence the Scholarly Markdown initiative and the scholdoc reference implementation (Pandoc fork).
More on how Scholarly Markdown came about, can be read on the blog of one of it’s pacemakers, Martin Fenner; e.g. http://blog.martinfenner.org/2013/11/17/the-grammar-of-schol..., http://blogs.plos.org/mfenner/2012/12/18/additional-markdown...
https://raw.githubusercontent.com/softcover/softcover_book/m...
The beautiful idea in markdown is that it allows you to mix (non-container) HTML tags in with the .md and it just works. Softcover markdown is in the same vein, allowing the more readable markdown for main copy, and intermix LaTeX tags as required. Beautiful if you ask me. Or at least beautifuler than ```math ... ```math.
The "backward compatibility" of ScholarlyMarkdown with basic markdown is a cool feature as many tools/plarforms exist that "support" .md now, but to preview you'll still need something that renders the equations, so strictly speaking ScholarlyMarkdown is a new markup langauge.
For example, they do not plan to add syntax highlighting blocks (the ```some code``` on GitHub) to their implementation, because they believe that it is outside the scope of markdown. Then, because a lot of people actually need this feature, they still have to patch or extend or plug-in the functionality into any implementation of CommonMark, leading to fragmentation again. And frankly, the reason that I start writing more markdown is precisely because of the syntax highlighting ability. Oh, and to write scientific articles, math formulas are a deal-breaker.
They want to have a standard, unambiguous syntax specification, a suite of comprehensive tests and a cleanly implemented parser. They want to unify the community of markdown users and developers. All of those are commendable goals. But at the end of the day, it doesn't satisfy my needs, so I'd rather use a messy, poorly specified markdown flavor or even just render the markdown with GitHub's service.
To correct the record, fenced code blocks have been there from the beginning: http://spec.commonmark.org/0.18/#fenced-code-blocks.
That's what LaTeX does too. Its markup, generally, is semantic.
As a side note, a lot of Markdown + LaTeX + Code can be done in IPython Notebook. (Though, there are some things absent, like referencing citations or other equations).
https://github.com/timtylin/scholdoc-examples
More will be added as I tweak the syntax. My hidden goal is to add enough to ScholarlyMarkdown to be able to reproduce around 70% of ArXiv.
http://www.dllu.net/programming/dllup/
which handles math using svgtex instead of clientside MathJax for faster rendering. It also compiles to both html5 and LaTeX. However, dllup is overall less polished, missing some features (labelling equations, sections).
Babel and Org have clunky syntax but the mode takes care of that. The combination isn't perfect but it's very powerful, in particular the ability to chain several languages together in flexible ways.
Coming up with more ways to combine pretty formatting with syntax highlighting is polishing a pretty smooth surface. I'm dreaming of a really slick syntax and operating environment, like babel if it wasn't tacked onto org. In the meantime, using what we have.
Of course, having spent the past few months writing an app that basically reinvents this same wheel (http://www.eqeditor.com/writer/) I suppose I'm in good company.
I'm imagining some kind of online component like ShareLaTeX that will become a clearinghouse for a number of tested and proven conversion paths, and can handle compiling a ScholarlyMarkdown document to different formats. This project won't go anywhere if this can't at least be done for major houses like Elsevier, PNAS, Phys Rev, etc.
There should be a nice ScholarlyMarkdown -> LaTeX cross compiler for starting simple documents in TeX and then sharing with collaborators (pick some obvious defaults or allow a config file to get fancy). But more importantly, if I am joining a project that already has a bunch of LaTeX wizardry going on, I should be able to seamlessly and implicitly edit the text parts in ScholarlyMarkdown without my collaborators knowing.
Can ScholarlyMarkdown do this? If not then I'm not really interested.
Edit: yes, I was missing something. Scholdoc supports MathJax (which has browser, Office, and LaTeX support).
Small issue: The title isn't showing up in Firefox (or Chrome) because there are two title elements and the first one is blank.
It's more difficult but I feel like there's a lot more potential.
Additionally there is a very stable, performant and flexible reference implementation implemented as a web app (mediawiki) with excellent import/export in XML format (and the list goes on ...), versioning with syntax-highlighted diff, etc etc etc.
What have I missed? :)
Nobody likes MediaWiki syntax, not even its own users. It's awful. The only reason it's still used at Wikipedia is because there's too much content to reasonably convert.
=> built
CommonMark is currently focused on the task of giving a decent spec for the core syntax and robust, efficient implementations; extensions will wait til that project is done, but certainly aren't ruled out.
Pandoc has always been in the game of extending the feature set. Here are just some of the Markdown extensions pandoc supports: LaTeX math (which can be rendered in a variety of formats, including native Word and MathML), LaTeX macros, inline LaTeX, automatically numbered examples and cross-references to these, automatically generated citations (using CSL styles), super and subscripts, strikeout, figures, YAML metadata, definition lists, several styles of tables, fenced code blocks with syntax highlighting, header identifiers, and footnotes. scholdoc just adds a few things on top of all this (and many of them could be implemented in pandoc filters). As noted in one of the other comments on this thread, most of the features scholdoc adds are under active discussion in pandoc as well. So it's not that pandoc and scholdoc have different aims; pandoc just moves more slowly, because it has to worry about how features are implemented in many more output formats, and it operates under some other constraints that scholdoc rejects (e.g. trying to avoid the use of English words like "Figure" for syntax cues).
Basically this is forked because I've been wanting to see how much change to the AST is needed to include most of the academic-specific features. Since Pandoc's AST definition is a separate package from Pandoc itself and could possibly be a dependency of other projects, I thought it would be best to figure most of it out first and end up with just one proposal. Scholdoc has a much more limited number of input/output syntax, so it has much more flexibility when it comes to adding new document element types.
Consider this a self-motivated skunkworks project for Pandoc.
Internal referencing and attributes on figures are two things that are currently being discussed for pandoc. The discussion has been going on for quite a while though - hence people making forks.
Discussion on internal referencing: https://github.com/jgm/pandoc/issues/813
Discussion on image attributes: https://github.com/jgm/pandoc/issues/261
This raises a couple more questions for me.
First, when searching I was able to find Martin Fenner's very interesting blog posts about ideas for a "Scholarly Markdown" and, as those issues and the first link on the scholarlymarkdown.com site reference, he appears to be associated with a separate "scholmd" project, also called "Scholarly Markdown," which is apparently a related project that itself is a fork of the Python markdown science project:
scholmd:
Markdown Science:
https://github.com/karthik/markdown_science
However, it's unclear what all of the relationships are between all of these projects and forks.
Secondly, since some (or all?) of the changes are being discussed in the Pandoc issue tracker, are these changes intended to be submitted to Pandoc in pull requests? I don't currently see any.
(except for really simple cases)
Lastly, why is everyone hellbent on following the original syntax like it's the word of God? Anytime I hear someone complain about an issue in Markdown, people point to the 2004 syntax and how it's true to spec. Users don't care, they want something usable. I'd be happy if someone creates a completely new spec, and I never hear the word Markdown again. I'm tired of the same issues being passed down.
I'm currently using a workflow in my thesis where I use ScholarlyMarkdown to write individual chapters for final inclusion into an existing LaTeX book document. I find that ScholarlyMarkdown works quite well this way, and it potentially allows collaborators, since individual parts are isolated.
I'd love a tool that works with this philosophy, and I feel certain anything like ScholarlyMarkdown won't catch on in my field (theoretical computer science) without such tools.
As you probably know, TeX is a full programming language (albeit an odd one), which means that to compile (La)TeX to Markdown, Markdown needs to support all the features of TeX, which means that it has to be Turing-complete. Not to mention the fact that parsing TeX is kind of nuts too.
If anyone edits the TeX to fall outside that sub-set, it's reasonable to just embed those parts verbatim in the resulting Markdown.
I built something similar to that, but instead of forking Pandoc (and investing the time to learn Haskell), I just added filters through Python. It's not at fast, but you can intercept mostly anything.
Thus, the flow ends up as:
paper.md -> Pandoc -> paper.json + metadata -> paper.tex -> paper.pdf
paper.tex -> paper.md + metadata ->(merge) paper.tex -> paper.pdf
http://www.noteshare.io/section/the-fundamental-class-of-projective-space
http://asciidoctor.org/docs/user-manual/#using-multiple-stem-interpreters
http://www.methods.co.nz/asciidoc/latex-filter.html
http://dblatex.sourceforge.net/example/dblatex/example_mathml.pdf
As for references, it has it's own lightweight bibliography system out of the box, but there's a plugin[0] for BibTeX too, and DocBook has full-on support for BibTeX so it's just a matter of tooling. AsciiDoc gives you DocBook, and DocBook gives you pretty much everything.[1] The whole thing is completely extensible at multiple levels (macros, XSL stylesheets), so adding any essential features it doesn't already have is certainly much simpler than starting from scratch! [0]: https://github.com/petercrlane/asciidoc-bib
[1]: http://pub.hdcrd.com/kb/Dev/Documention/LaTeX/Tool/Dblatex%20%28DocBook%20to%20LaTeX%20Publishing%29/0.3/manual.pdfI also became super-convinced that some level of AST change was necessary to keep things sane, and since I wasn't able to use the existing Math and Image types anyways (they're not attributed), I ultimately just started a new AST type package namespace called "Scholdoc". Everything just evolved from there.
Markdown syntax for generic containers (Div and Span) hasn't been implemented yet but the discussion is fairly mature now:
Turns out, you just need to write consecutive expressions one after another. Put ampersands in all these consecutive expressions and ScholarlyMarkdown will turn it into one giant align environment. If there is no ampersands, it generates a gather environment instead.
I apologize for this not being in the documentation. After all, I estimate that it is only about 30% complete. It really is in a dismal state. I originally didn't expect much people to see the site until later this year when I plan to launch it. This will all be rectified eventually, promise!
Here is literally a large align-block example straight out of the thesis that I'm currently busy working on (instead of the documentation). You can inspect the Scholdoc-rendered LaTeX code to see that you can really do align blocks reliably with this syntax.
If you output to HTML, it will just turn this same align block into a MathJax-friendly format, and hand it off to MathJax for rendering. This is what it will look like:
http://jsfiddle.net/hzvwa139/3/
Where the equation numbering is place is entirely a non issue. Why? It's not being decided by ScholarlyMarkdown, but by MathJax. Go to the jsFiddle example above and change the line that reads
TagSide: "left",
To now read TagSide: "right",
and you get it now on the left. Scholdoc simply put those default setting there for convenience; you're not tied to it at all.Why put the number on the left side by default? Think of what happens when you have a long equation, and when you're trying to read it on a narrow screen such as a phone. If the number was on the right, it may very well be cut-off by the screen, forcing you to scroll around to find equations. Sometimes layout decisions that made sense for paper for centuries doesn't make sense for screens.
I should point out that Scholdoc is only using MathJax on the HTML side for consistency; there is no technical reason why it can't use another renderer that understands LaTeX, but so far MathJax is the only one that even comes close to supporting all standard AMS features. Note that MathJax itself does HTML/CSS, MathML, SVG, and PNG (via a node server) output, and is entirely user-configurable.
For me, MathJax can't render my equations. I use some techniques to get a vector with an arrow on the bottom. Two commands I used all throughout my thesis are:
\def \rV#1{\hbox{$#1$\kern-0.38em\lower0.85em\hbox{$\vec{}\,$}}\,}
and \def \Vec#1{\hbox{$#1$\kern-1.0em\lower0.4em\hbox{${\scriptstyle \rightarrow }$}}}
These won't work in MathJax (understandably).C'mon, nobody deserves to be subject to automatic TeX expression breaking. Even careful hand-tuning with help from the breqn package can often look pretty bad.
> Two commands I used all throughout my thesis are:
At the risk of turning this into TeX StackExchange, I just want to point out that these kerning manipulations are entirely possible in MathJax. The problem is that, maddeningly, they chose to have non-math commands like kern and lower work exclusively in math mode, and not outside one like in a hbox. Therefore, if you change your second command to something like this:
$a \kern-1em\lower0.4em\hbox{$\scriptstyle \rightarrow$}$
then it will work in MathJax. Of course, this would then break in regular TeX because you're not supposed to use kern and lower in math mode. I keep myself sober by remaining optimistic about a potential overhaul in the MathJax latex processing code.In the meantime, I think we would benefit from a stopgap service that renders these things in an actual TeX engine running on a server into high-res PNG (easy) or SVG (doubtful). There's been quite a resurgence in effort to make latex rendering into a scaleable web service lately. I'd love to collaborate with someone from, say, ShareLaTeX on that, if possible.
To be honest, there was never really any conscious decision to pick Markdown over AsciiDoc. It was just serendipity that when I was looking to convert a bunch of my existing Markdown notes to papers, John happened to add (at Martin Fenner's urging) pretty robust support for citations to Pandoc. Being able to use an existing .bst files and Natbib author/year syntax with existing Bibtex database files, while also using CSL to format to everything else was pretty key. The syntax only worked in Markdown at the time, so I went with that. This project just simply grew into an experiment on how to get the least hack-y looking source document for what I needed.
Basically, I had a bunch of existing Markdown stuff typed with both my phone and my computer using NValt (already using the double-backtick math syntax as a hack), and I was looking for a way to reuse all that. If I had to do conversions or had to start from scratch, I would have just started another LaTeX Document, and then this project wouldn't be here.
As an aside aside: I personally think that one shortcoming of AsciiDoc is that Markdown ended up having a more mature application ecosystem outside of the most general purpose text-editing tools. I'm thinking of something like Scrivener/Ulysses which is really great for project writing and has robust markdown support while writing. I've also recently found out that TeXpad "mysteriously" formats Markdown syntax in the editor, and builds an in-editor TOC of it. This is in addition to with rendering support in basically every note-taking tool. Sadly AsciiDoc doesn't seem to have experienced a similar growth spurt in terms of popularity amongst developers, and as a result have a bit more friction to use.
For live preview of asciidoc there are some options - http://asciidoctor.org/docs/editing-asciidoc-with-live-previ.... Among them Atom with live preview works really well.
citation: http://asciidoctor.org/docs/asciidoc-syntax-quick-reference/...
On the other hand, they also have an epub style sheet, and it appears (a little oddly) that the layout of the epub is better than the pdf.
FWIW most "heavily optimized" custom LaTeX styles I've come across tend to feel like being slapped in the face with MS Word -- and I think I've yet to encounter any that actually improve on the "standard" styles in any meaningful way (with possible exception of the APA style, which is ugly, but as it has to conform to APA, it's ugly by design. And looks better than most other APA conforming styles I've seen).
Still, having a starting point makes the job much easier -- so this is a great resource.
Here is an excerpt from a typical paper's macro section [1]. As you can see they're mostly one-liners to remove the need to keep typing textup and mathbb, simple mathoperator definitions and such.
The only only thing you can't get will be the user-defined macros (and of course bib entries that doesn't already exist), but there is already a consistent mechanism to define your own macros in SM via the "math_def" block. They "do the right thing" in the sense that if you render to latex snippets then it wouldn't redundantly include these macros in the output.
Second: The PR for image attributes is here: https://github.com/jgm/pandoc/pull/1806
There isn't a PR for internal referencing yet because the implementation hasn't been worked out yet and it isn't a simple change (should there be a native representation, should it be a filter, which syntax should we use, what about the existing citation syntax...).
The series of blog posts by Martin (and his efforts with John in getting citations to work in Pandoc) was the impetus of this project. I've reached out to Martin several months ago for comments, but I've not heard from him since. I guess he's very busy with his day job at PLOS. If he's willing, I'd very much like to reconcile this project with his efforts. The goal is, after all, better authoring workflows for all academics compared to the status quo, and it's going to take some concerted effort to get us all out of this giant energy well we got going for a few decades now.
Everyone seems to have an opinion on how to do this right, and that is part of the reason why the whole concept is pretty fragmented. Some of my thoughts:
Pandoc is the markdown converter that comes closest to what most people need, so I am happy to stick with it. I personally don't think that a fork is viable, things are already hard enough as it is.
Scholarly markdown is a solution for 80% of use cases, people writing math-heavy texts are probably better of sticking with Latex.
Scholarly markdown needs to be a community effort, I don't see any other way on how this can succeed
> I personally don't think that a fork is viable, things are already hard enough as it is.
I don't think so either. Scholdoc as a fork was always intended to be a stop-gap measure to quickly test out ideas. Pandoc's use of relatively standard Parsec is easier to hack, and lots of other subsystems like citeproc remain crucial. Scholdoc changes Pandoc's AST, so any discussion of re-integration is going to be a non-starter until at least 2.0
For this kind of workflow to be viable, 95% of the required effort is not going to be on the syntax/converter anyways. The real hard work is still ahead.
> Scholarly markdown is a solution for 80% of use cases, people writing math-heavy texts are probably better of sticking with Latex.
I agree, except I also think that there can be a 80% situation for math. I work with a lot of applied mathematicians/electrical engineers, and the math system in Scholdoc is designed with them in mind.
I really think that the ultimate goal is to arrive at many good ways (of which this may be one) to produce semantically-relavant open interchange format such as JATS. I assume this is what PLOS is trying to achieve as well? I do know that several people at PLOS is vehemently opposed to Markdown and what it stands for.
> Scholarly markdown needs to be a community effort, I don't see any other way on how this can succeed
Definitely. The best we can hope for is to occasionally stir this pot once in a while and hopefully something will spontaneously nucleate once the time is right.
I know that there's dvisvgm for SVG output from DVI. I think there's also a tex2svg. Also, there's LatexRender-ng. So, there's a number of tools for SVG output.