Pandoc(pandoc.org) |
https://pandoc.org/installing.html
> We provide a binary package for amd64 architecture on the download page. This provides both pandoc and pandoc-citeproc. The executables are statically linked and have no dynamic dependencies or dependencies on external data files.
I wish I'd known about this sooner. I don't spend much time with text documents outside the web, but when I do, pandoc handles the disparate formats admirably. The only inconvenience is when I update my system, there's guaranteed to be a huge pile of Haskell libraries to download.
+ Static websites from any input to html
+ Markdown & TeX & References to pdf for academia
+ Generating manpages for new tools
+ Generating ebooks
... Let's just say I get a bit lost when it isn't available.
Do any of your tools use long options (prefixed with a double dash)? If so, make sure you disable the "smart" extension, otherwise you might end up with en dashes.
For novels, I tend to just use Markdown, as kerning will be done in CSS.
For academics, I use LaTeX and Asciidoc together, but some paragraphs might be inserted in various other formats - whatever is easier. The build tool doesn't care what the format is, it'll take any input pandoc accepts.
I guess it's not as well-known as I thought.
a couple questions i have, seems firstly that old school .doc files are not supported, docx yes. unfortunately i still get a lot of docs in .doc format which seems to be microsoft's proprietary format (docx seems to be more open).
my second question is whether or not there's a filter for golang, most of my development is in golang, so i either need to call your cli as a forked process or best to have a native library. i have never worked with haskell so not sure if i can import a haskell library from golang directly. i imagine there'd need to be a golang wrapper around the cli.
lowriter --convert-to odt some-document.doc
odt is not the only supported target, but doc --libreoffice--> odt --pandoc--> plain seems to give better results than e.g. doc --libreoffice--> txt or doc --libreoffice--> docx --pandoc--> plain.Pages to anything else, please.
https://orgmode.org/worg/org-tutorials/org-spreadsheet-intro...
Sorry, I couldn't resist.
It's a conversion tool for existing formats.
For blog posts and notes I see the appeal, since the boilerplate can be a hindrance to spontaneous writing.
Also, Latex can't produce any output which is accessible to blind people (other than giving them the raw LaTeX). The PDFs latex produces are probably the least accessible format available (much worse than a word proeuced pdf, or some html). This matters to me, and should matter more to other people (in my opinion).
I don't find the boilerplate minimal at all. Contrast the following:
\begin{itemize}
\item First
\item Second
\item Third
\end{itemize}
with - First
- Second
- Third
I won't even get into the hell that is tables.I loved LaTeX until I discovered Org Mode. Pandoc also scratches the same itch.
If one is going to write LaTeX code anyway, it seems easier and cleaner to use LaTeX all the way, move all the boilerplate along with the personal template to say, a file named preamble.tex, and \input{preamble.tex} in the documents.
However, there are situations where Pandoc can be convenient. For example, I wanted a document[1] to be written primarily as README.md (CommonMark format), so that GitHub could render it as the project README. At the same time I wanted to render a PDF output from a customized form of the content. Pandoc is convenient for cases like this although it takes a bit of work to fine-tune the formatting and customize the content for each output format.
It's not for everyone, but emacs+auctex really reduces the latex boilerplate (at least writing it) that I don't really feel it's a hindrance.
Incidentally, I really like the thoughtful syntax additions Pandoc makes over olde Markdown (eg., tables, definition lists, and span & div syntax as well). Such a great all-around doc tool.
Once the work has moved into a Word file, isn't that where it stays? Editors and publishers often make heavy use of features like track changes and notes. Doesn't pandoc lose that information?
Scribble Code Example:
#lang scribble/base
@title{On the Cookie-Eating Habits of Mice}
If you give a mouse a cookie, he's going to ask for a glass of milk.
@section{The Consequences of Milk}
That ``squeak'' was the mouse asking for milk. Let's suppose that you give him some in a big glass.
He's a small mouse. The glass is too big---way too big. So, he'll probably ask you for a straw. You might as well give it to him.
@section{Not the Last Straw}
For now, to handle the milk moustache, it's enough to give him a napkin. But it doesn't end there... oh, no.
Scribble -
Scribble is a collection of tools for creating prose documents—papers, books, library documentation, etc.—in HTML or PDF (via Latex) form. More generally, Scribble helps you write programs that are rich in textual content, whether the content is prose to be typeset or any other form of text to be generated programmatically. - https://docs.racket-lang.org/scribble/
Some languages based on Scribble
Skribilo -
Skribilo is a free document production tool that takes a structured document representation as its input and renders that document in a variety of output formats: HTML and Info for on-line browsing, and Lout and LaTeX for high-quality hard copies.
The input document can use Skribilo's markup language to provide information about the document's structure, which is similar to HTML or LaTeX and does not require expertise. Alternatively, it can use a simpler, “markup-less” format that borrows from Emacs' outline mode and from other conventions used in emails, Usenet and text. https://www.nongnu.org/skribilo/
Pollen -
Pollen is a publishing system built on top of Scribble and Racket. So far, I’ve optimized Pollen for web-based books, because that’s mainly what I use it for. But it can be used for small projects too, and non-webby things like PDF.
As a publishing system, Pollen includes:
A programming language. The Pollen language is a variant of Scribble, with specific dialects tailored to different kinds of source files. You don’t need to use the programming features to do useful work, but they’re available when you need them.
A set of tools & libraries. Pollen can produce output in any format, but it’s especially useful for markup-style formats like XML and HTML.
A development environment. Pollen works with the DrRacket IDE. It also includes a project web server so you can dynamically preview and revise your publication. http://docs.racket-lang.org/pollen/Backstory.html
They are Domain Specific languages that excel at outputting awesome HTML and PDF. They really aren't markup but really they are a Macro system that is built on top of a full Lisp (Racket) It is easier and much more powerful then anything I have seen on Pandoc and Latex (I use Latex still for specific targets but not for general papers anymore).Racket has the best documentation period and it is because the documentation
Just a few links:
- Where everything is documented: http://pandoc.org/MANUAL.html
- If you have questions or suggestions: https://groups.google.com/forum/#!forum/pandoc-discuss
- Contributing to pandoc is also a great way to get your feet wet with Haskell. In my experience, very supportive community. See http://pandoc.org/CONTRIBUTING.html and for good first issues: https://github.com/jgm/pandoc/issues?q=is%3Aopen+is%3Aissue+...
Finally, a great feature, that hasn't been mentioned here, is pandoc filters. Basically, pandoc provides a way for scripts (in any programming language) to hook into the transformation pipeline and modify the document AST (similar to the HTML DOM) in-between the reading and writing steps. See http://pandoc.org/filters.html
# pandoc test.doc -o test.pdf
pandoc: Unknown reader: doc
Pandoc can convert from DOCX, but not from DOC.Is this underlining, and not redlining as defined in financial services? (redlining: differential pricing based on demographic makeup of a zip code or neighborhood)
Here's the build command for responsive.style[1]:
pandoc $file -f markdown -t html5 -H templates/header-prod.html -B templates/nav.html -A templates/footer-prod.html -o (echo "../$file" | sed '$s/\.md$/.html/') -s --data-dir=./ --highlight-style breezedark --variable=file:(echo "$file" | sed '$s/\.md$/.html/')
Works beautifully!1: https://github.com/tomhodgins/responsive.style/blob/master/s...
I wrote up a tool as well, with navigation and prev/next links: http://www.unexpected-vortices.com/sw/rippledoc/index.html
I built a pipeline to convert a Markdown file to publishing-ready files for ebooks, Kindle and paperback for my novel; the whole thing is described here: http://www.gabrielgambetta.com/tgl_open_source.html
My website itself is static, generated from a bunch of Markdown files, some HTML templates, and a bit of postprocessing. But most of the work is done by Pandoc.
Anyway, pandoc is great.
Luckily, LibreOffice can produce tagged PDFs. And unoconv is a convenient utility for doing this from the command line. So you can use pandoc to convert to a format that LibreOffice can consume, then issue a command like this:
unoconv -f pdf -e UseTaggedPDF=true mydoc.odt
I've tried it, and it works.ConTeXt is supported as well: `pandoc input.md -t context -o output.pdf`
watch: $(ALL)
while true; do \
clear; \
make $(WATCH); \
inotifywait -qr -e close_write .; \
done
"make watch WATCH=build" will now compile documents on every save. Works well for single documents, collections of documents or entire websites.[1] https://gist.github.com/timpulver/0d01285952b97deb70df6104cc...
There are a small number of corner cases that need to be spec'd out before CommonMark can declare a v1.0 release[2]. If you have the skills for this kind of thing, please weigh in!
[2] https://talk.commonmark.org/t/issues-we-must-resolve-before-...
https://github.com/ashton314/marked-man
It's just a one-liner: `pandoc -s -t man "$1" | groff -T utf8 -man | $PAGER`
(That was basically stolen from an answer to one of my questions on Stack Overflow—thanks to those who answered! :)
There are a few things (in latest version, 2.2.3.2) that don't really survive round-trip from markdown back to markdown:
- reference-style links (e.g. `[foo][f]`). They are converted to inline links e.g. `[foo](http://...)`.
- setext vs hashmark headers. `foo\n=====` will get converted to `# foo`.
- markdown allows for forced-linebreak <br>s to be added with two trailing blank spaces at the end of a line. Pandoc escapes these with a trailing `\` at the end of the line.
These are only occasional nuisances, but overall the documents (at least in my experience) are not butchered.
I also occasionally go from markdown to docx for the purposes of uploading to google-docs and copy/pasting large sections into other docs. This is the only markdown-to-google-docs workflow I've found that works to preserve formatting. It's never really butchered anything, except a few times the syntax-highlighting for code-blocks gets confused and keywords get the wrong colors.
You can choose whether reference links go at the end of the paragraph or the document.
Pandoc is seriously a great tool! I love the way it's designed and have found it useful off and on over the years. Truly marvelous for making information available in any needed format.
Example:
pandoc in.md -o out.html -V pagetitle="My Title" --to=html5 --template="my.html" --css "my.css"
The example converts a markdown file to HTML, using a given title, a template file, and a stylesheet file.The pipeline is also well implemented with Haskell, which is good for writing your own fast functional transformations.
I tried creating a workflow from Asciidoc through Pandoc to MS Word but that didn't work so well. Tables being the biggest issue.
It was a little work to set up the workflow with scripts etc, but being able to write the book in markdown and still having full control over the design was definitely worth it.
[0] sample here: https://patricklouys.com/professional-php-sample.pdf
An example of how easy this is and the styles I use for my personal blog: https://curious.observer https://github.com/davnn/curiousobserver
https://users.soe.ucsc.edu/~ivo/_posts/2015-03-12-repeatable...
http://gbraad.nl/blog/document-generation-using-markdown-and...
The problem I had was that latex was turned into images, but changing the font-size of the reader did not change the size of the images, making the text readable, but the maths barely readable.
This is something I would love to see happen though.
You can add some CSS to the generated EPUB to change that. But if your EPub reader supports MathML, you can do that with pandoc. See http://pandoc.org/epub.html#math
What editor do HN folks use? I wonder if there's a leaner editor out there with an equally nice distraction-free editing interface. Thanks in advance!
[0] https://github.com/euclio/vim-markdown-composerNot free, but a real pleasure to use.
Other day I thought about contributing to Yarn, the Javascript package manager, but the only way that I found to communicate with the developers were issues in GitHub. Since I didn't know if the feature I wanted would be well received, I just quit.
Overall great experience. Thanks for the great tool :).
It's the software you would have to buy, but if you have to compare files a lot I would highly recommend it. (Runs on multiple platforms, too.)
Aside from multi platform, BC has things like CSV compare and marking unimportant changes are fairly robust. I do personally prefer open source myself, but BC is certainly one of the few that I didn't mind paying for.
Not sure why you think it has to be that way. I author LaTeX documents using org mode. Org mode handles most of the boilerplate, and I can still put pretty much any custom LaTeX within the org document, wherever I want it (this includes \newcommand, etc). I lose nothing by going to org mode, and I gain much in terms of reduced boilerplate.
My absolute favourite moment with that flow was a client who wanted one as a docx instead of a PDF. Pandoc obliged and they commented that I must have spent a lot of time reformatting things for them :)
I think your parent comment was saying that LaTeX alone is sufficient. You also seem to be saying that LaTeX alone is sufficient while using Org mode. Would you please clarify if I am interpreting your comment correctly or not?
I'm not saying either. The parent said it's easier and cleaner to use LaTeX all the way. I was pointing out that it is easier to write in a format like Org mode and export to LaTeX (whether via Pandoc or Org mode's built-in exporter).
Of course LaTeX is "sufficient". It is also, IMO, painful.
i'm not the parent, but that is the main reason i try to abstain from posting on public forums unless its under a pseudonym, which my github account isn't.
I'm not trying to say that my anonymity is guaranteed with irc, its just unlikely that future employers and similar link it to me.
https://addons.mozilla.org/en-US/firefox/addon/multi-account...
Requiring everyone to sign in with full, verified name limits what people can talk about.
In fact I'd almost recommend against commenting online on anything using your full name.
I'm curious now though... maybe I'm missing out!
I had reverse-engineered Dan Brown."
Could you talk a little more in depth about what Dan Brown's pattern/structure is?
All I can offer right now is my raw notes, which are in Spanish. This is a structural analysis of Angels and Demons, The da Vinci Code, and The Lost Symbol: https://imgur.com/bX6ByJA
This is an one-page treatment of the three books, with the "blanks" filled appropriately for each: https://imgur.com/LlDVUKn
I doubt any of this is groundbreaking. Story structure is a widely studied topic (and one that I find fascinating). But it seems like Dan Brown uses a very well defined, customised version of this, that makes for engaging, fast-paced books.
I sort of proved (for myself, at least) that this works, by writing a novel whose structure was originally based on this pattern (although it later diverged a bit), and which causes the expected effect - a couple of readers have read it in a single sitting :)
EDIT One more question. How do you write dialogues in MD?
"It's just some text between quotation marks", Gabriel typed. *Have I completely misunderstood his question?*Markdown, reStructuredText, textile, HTML, DocBook, LaTeX, MediaWiki markup, TWiki markup, TikiWiki markup, Creole 1.0, Vimwiki markup, OPML, Emacs Org-Mode, Emacs Muse, txt2tags, Microsoft Word docx, LibreOffice ODT, EPUB, or Haddock markup to
HTML formats
XHTML, HTML5, and HTML slide shows using Slidy, reveal.js, Slideous, S5, or DZSlides
Word processor formats Microsoft Word docx, OpenOffice/LibreOffice ODT, OpenDocument XML, Microsoft PowerPoint.
Ebooks EPUB version 2 or 3, FictionBook2
Documentation formats DocBook version 4 or 5, TEI Simple, GNU TexInfo, Groff man, Groff ms, Haddock markup
Archival formats JATS
Page layout formats InDesign ICML
Outline formats OPML
TeX formats LaTeX, ConTeXt, LaTeX Beamer slides
PDF via pdflatex, xelatex, lualatex, pdfroff, wkhtml2pdf, prince, or weasyprint.
Lightweight markup formats Markdown (including CommonMark and GitHub-flavored Markdown), reStructuredText, AsciiDoc, Emacs Org-Mode, Emacs Muse, Textile, txt2tags, MediaWiki markup, DokuWiki markup, TikiWiki markup, TWiki markup, Vimwiki markup, and ZimWiki markup.
Custom formats custom writers can be written in lua.
https://pandoc.org/That's not surprising -- it's basically impossible to "parse" LaTeX, as it's defined by execution.
This works for everything except table notes a la ```threeparttable```
$ htlatex mydoc.tex
instead of $ pdflatex mydoc.texThis sounds like it should definitely be a target of a grant. I guess most government organisations around the world are using Word et al, which isn't too bad these days accessibility wise (AFAIK).
Can you provide a small example of a LateX document that produces an inaccessible PDF?
https://support.office.com/en-us/article/compare-and-merge-t...
Had this forever, here's a visual on old version of Word:
http://www.techyard.net/how-to-compare-two-word-documents-an...
For more advanced, try legal blacklining:
https://support.office.com/en-us/article/compare-document-di...
Or combine multiple sources of a similar document:
https://support.office.com/en-us/article/combine-documents-f...
That you can't concat two word files together in 2018 is ridiculous.
Of course you can!
# cat word1.doc word2.doc > word3.doc
Sounds ludicrous? So does your statement.
Setting aside the hyperbole, I believe it would be a conceptual nightmare to start defining concat of two docs. For simple cases, feel free to make copy pasta :-)
Classic Word workflow: I make a document and send it to my boss; he makes some changes but suggests further research etc. and sends me his altered version, but I stupidly work on my own version. Making this right takes like four clicks.
This sounds like utopia. :-) Do you have a link to a page that outlines these steps?
It's good for a small fix, but not something to rely upon in your main documentation workflow.
They might not be in word, but they absolutely can be, and in fact are a superset of CLI interactions (since a GUI interaction step in e.g. Automator can invoke any shell command).
>so you're limited to the features the GUI designer thought of
And in the traditional shell pipeline (that is, not Powershell) I'm limited to working on dumb streams from one command to another.
(I imagine it registers it as one huge change, rather than tracking a zillion individual changes, but it might be interesting to try.)
Just append the pages of document 2 to the end of document 1. Then the user can decide whether to remove the page break introduced by it. I have this in my env for doing this with PDF:
$ cat =concatpdf
#!/bin/bash
if [ "$1" = "to" ]; then
shift
gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=$*
else
echo Usage: concatpdf to out.pdf in.pdf in2.pdf in3.pdf
fi
The peculiar syntax with "to" ensures that I do not invocate it incorrectly. $ pandoc file1.md file2.md -o file.docx
Tada, I two files into one word file.It isn't difficult. TeX and LaTeX have been doing it since the 70s and 80s respectively.
http://checkers.eiii.eu/en/pdfcheck/?url=https%3A%2F%2Fwww.m...
The ACM template fails more! http://checkers.eiii.eu/en/pdfcheck/?url=https://www.acm.org..., and it's generated by pdfTex-1.40.15
https://www.cs.york.ac.uk/aig/projects/implied/docs/cp03.pdf
Try extracting "Theorem 2" on page 5, or any text really. I just get random noise through either a PDF reader, or something like pdf2ascii / ps2ascii.
We just made this with standard latex.
I work with course materials made in Latex, and students sometimes need/want to copy and paste from them, so I try to avoid these kinds of problems.
This is just an example. From experience, most PDFs at conferences and journals, generated from pdf, are not accessible to varying degrees.