Using Microsoft Word with Git(blog.martinfenner.org) |
Using Microsoft Word with Git(blog.martinfenner.org) |
The funny thing is that the document authors like these ways of working. It is the tech people who don't. I've seen "Git for Word" proposed many times a year for a while now. And all of the ideas are interesting, but none of them appeal to my audience because they don't care about git's feature set. Nobody wants to branch and merge. Nobody wants a straight version history. ("Nobody" meaning nobody in my market, not nobody in the world.)
They want a storytelling experience. They want to know the why, not the what. And the workflow tends to be unidirectional, not with collaborative changes coming back together, but with expanding changes as each person adds their ideas and makes change for a specific instance of using a document. The experience we build for them bring in pieces of version history, pieces of comments, pieces of telling the story of why something was done, so people down the line can have more context to decide whether to accept or reject the changes.
It isn't that "Git for Word" is a bad idea - on the contrary, it would be great if someone pulls it off. My point is that building something that improves on Word isn't actually about the software, it is about the document workflows. If you find groups who work like software devs do, where documents receive small updates from a team, and bring all changes together for a final product, there is probably a market. But when evaluating such ideas, there has to be a reality check of whether the actual use of the documents truly matches the use case for git.
When I worked as a VFX freelancer I was amazed at the number of hours (=money) burned by marketing agencies who didn't manage to give me the definitive variant for a simple list of things they wanted. In one instance they gave me everything they had, including crude and unrecognisable filenames, hints about things that I should ignore via telephone etc. I had to make sense of it and compile a list which I sent them to approve. They ended up approving another list (!) which they themselves sent me two weeks prior and they only managed to correct this once I hinted at this.
Of course this is a example of saw qhow things should never be. This usually involves somebody getting sick and some uninformed person taking over etc. But what I learned on film sets is that you should choose the defaults of your communication culture in such a way, that it works under the absolute worst conditions (bad weather, hungry, stressed, confused, etc).
And I have seen so many organisations fail at precisely that. If you get I'll someone else should be able to take over without heading to an oracle. This is not a special function limited to a version control workflow, it is something that has to do with clear communication.
Using git can sometimes help avoiding the whole problem by making it obvious which file is the latest and which is a variant of it, the people using it will have to use clear communication as well (e.g. by writing good commit messages, choosing the "right" commit sizes, naming things the right way etc). So if you know how to use git, you just might value clear communications a little bit more than the average person.
As git is a distributed system I think it’s not at all clear what the definitive final variant might be —- and that is a strength.
That can be handled externally to git via ad hoc convention, say by using a system like gitlab or github and letting it declare one as “primary”, or by having someone post to a mailing list (“Commit X on a repo you can reach at URI Y is the official release”) both of which are common.
But in your example various people could mail you commits and not have any consensus on which is authoritative.
The defaults are sensible. Throw money at it and pay someone enough to sort things out and get it done, e.g. you as a freelancer get a data dump and ask the right question and the problem is solved. Sure it costs money. But everything costs money.
Git works great among peers. But most organizations are hierarchical. And the boss doesn't have to give a shit about which draft is the latest because the boss is the boss.
some would say whoever solves that problems is filthy rich
As a lawyer I can full confirm that our industry works as you have described (as regards documents workflows), and with my tech background I can also confirm that most features of dev-oriented solutions like git are mostly uninteresting from a lawyer's perspective.
I agree with both comments.
To add, in large-scale corporate/commercial practice (which is the area which I practised), Git would be useful in replacing email-based collaboration, but the switching costs seem too high.
Currently, the corporate law contract negotiation workflow is as follows:
1. a party adds their tracked changes to a Word document based on a template contract;
2. the party emails this document to party B;
3. party B reads the changes, may discuss the changes with their client, adds their tracked changes, and then emails the updated document to party A.
This process repeats for every document, punctuated by occasional conference calls between the parties, until the parties agree.
‘Git for law’ would be useful for lawyers in increasing efficiency - and thus reducing costs for clients.
However, the benefits for law firms of adopting a new Git-based workflow are likely to seem relatively small to lawyers. Their current email-based version control system is messy and time-inefficient, but generally functions with minimal error.
On this basis, I would predict that most corporate law firms would be very slow to adopt a Git-based system - the benefits may not justify the costs.
One should also note that lawyers, particularly contract/commercial lawyers, are conservative by profession. In my experience, most lawyers are very slow to adopt new technologies, highly risk-averse, and skilled at spotting risks. The combination of these traits means that any technology will have to offer a very high benefit to replace an existing legal workflow.
I never trust the received file's "track changes", always compare to the latest version I've sent -- and it is extremely common to find a change that wasn't mentioned/discussed, and somehow magically "accepted" or otherwise not tracked in the other side's "track changes". Whenever I point these out, I always got a "oh, yes, forgot about that one", or "I didn't intend to put that in" or "I'm not sure why it didn't appear in the track-changes view" -- but out of tens of these (with multiple lawyers over multiple years), not one was ever in my favor.
Branching might not be as interesting on a single project - but diffing is, very much; and I'm sure it's not more coveted mostly because most lawyers either (a) don't realize how good it makes life for you when you can diff and blame easily, or (b) are abusing the fact that it is so hard to diff/blame on documents, and certainly (c) usually charge by the hour, so some efficiencies are actually going to cost them money if they implement them (a famous Upton Sinclair quote comes to mind).
m@replace-with-my-username.com
Exactly like you've hinted, the right way to crack this is to bring a full-fledged word processor like Google Docs, but instead of ad-hoc realtime collaborations the software has to enable customizable unidirectional document workflows with controlled collaboration.
Most serious document creators don't want to branch and merge, instead they want to pass on the document through a series of stages. They want statistics on when, what and why of each stage. And at any point of time the document is in one definitive stage not scattered across emails/folders/versions/forks.
https://support.google.com/a/answer/9381067?hl=en
Unlike its normal collaboration mode, the file gets locked down.
It allows multiple people to work in parallel (and in private). When somebody sends a pull-request eventually, they are presenting a story of changes that they want to get into the document and people can discuss them and approve them individually. (Of course, git the tool isn't necessarily suitable for non-technical people, but git-the-workflow seems to be a good foundation.)
Could you elaborate on what such a tool could look like without git style branching?
I don't know, I'm just spitballing. Sounds like it'd be fun for awhile to attempt to seamlessly get this into the workflow and see how it's accepted.
I think it's more about the user interface. The user interface of Git is essentially what programmers already do - code.
It is _their clients_ who don't, not just tech people. I hired lawyers a few times. IMO their redline and email workflow is error-prone craziness that could use improvement. That said, I'm a "tech" person, so I might be biased.
I've spent many years in 'collaborative writing' in R&D, mainly grant proposals and joint reports/deliverables, most in the CS/IT domains. Writing those texts is very different from writing the software.
First thing you should realize is there are no 'tests', and all the 'code' is usually in a single big file. Anyone that has touched the document can have potentially messed up everything, both content, layouts and meta-data, and there is no automatic way to check whether it still makes sense. Many times people will not use the agreed upon editor/version, and sometime (often) that means a boatload of minor edits to the document all over the place just from opening and saving. Imagine everyone in your software team using different editors all with their preferred coding conventions that are automatically applied to the whole project at load.
From this you can deduce the enormous responsibility of ownership and gate-keeping in the workflow. The absolute worst collaborations I have been part of were those that somehow believed that if they used a collaborative document editing facility, wikis or Google docs for instance, that would negate the need for assigned owners/editors. Those tug-of-war shitstorms got exponential the closer one came to the submission deadline (technically incorrect, i know, but you know what I mean).
Some tips:
- Have well defined ownership for each section or part of your document. The owner receives and makes all changes for that part.
- have a final editor that is responsible for the complete document receiving the changes of the parts from their owners only.
- Do not trust 'track changes', but use Word's built in document compare if you are the final editor. For complex formatted documents (nearly all instances require you use an insanely styled template, you 'clean room' import (C/P through notepad) the text changes into the correctly formatted doc under your control.
- release the current trunk document often, ideally once per day. This requires staggering, with subeditors closing submission windows and submitting their updates to the main editor before EoB. Everyone editing should work against the latest release.
-Every version published by the final editor should be immutable. Mail it to everyone if needed, but if you use a link to some sort of repository make sure it is a deep link to a version that can not be updated in the repository, or hilarity will ensue.
- use versioning in the filename. filename_YYYYMMDD_HHMM_dXXX_rNN.docx where XXX is the assigned party acronym for the person making the update. 'YYYYMMDD_HHMM' is only touched by the editor, 'dXXX_rNN' is the NN'ed changes release by part XXX against version YYYYMMDD_HHMM .
Most certainly Git can function as a repository, but there will be people that will not work with it (nor any other repository) so always assume mail interactions as well.
Finally, there should be a special place in hell for the people that designed SharePoint versioning. Don't even think of going there.
https://github.com/TomasHubelbauer/modern-office-git-diff
I've made this script which automatically extracts the Office file format (which is a ZIP archive of XML documents) and versions the XML documents and their extracted text contents alongside the binary Office file. This is done using a Git hook and it seems to work pretty well. If you're in need of versioning Office documents, this might be a good enough solution for you.
Edit: I should also address why not use the built-in Office versioning feature? The reason I don't use it is because I like to be able to view the diffs in Git. I don't want to have to use Office just to see the changes. My solution offers that. By doubling-up the way the original is versioned in the way of tracking the extracted XML and text contents as well, each commit's diff will have the binary change as well as the textual diff which in my experience is good enough to tell the gist of changes. And you're using standard Git / text manipulation tools you would use with any other diff.
I downloaded a docx document from the net, opened it in libre office, removed a single word, saved it as fodt, removed a single word again, saved it as fodt again, and the diff between the two fodt is gigantic.
Apparently there are lots of items like <text:p text:style-name="P20> whose content didnt change, but their ID did. It didn't even only affect IDs of content after the removed word, but content before as well.
The file has 19361 lines and the diff size is 1110 lines so there is some level of locality, but note that a lot of those lines are just base64 data of image content. The fodt is 1.5 times as large as the original file.
Try it yourself, this is the document: https://www.acquisition.gov/sites/default/files/manual/SOP_P...
I recommend having a commit hook that (somewhat) pretty-prints and line-wraps the XML – perhaps splitting on sentences too, so that adding a word doesn't proliferate all down the page. I haven't tried this, though, so it might not help. If you do, could you release the code?
It used to store everything on one line without breaks if I recall correctly.
With a little bit of work to ensure stability of numbering, FODT and related flat ODF formats could be really usable with version control.
Word is expensive, proprietary and the XML it generates is unfathomable. There are so many better FOSS tools and systems that we could be using. If you're collaborating on a document then markdown or LaTeX has you covered. You get version control though git and multiple people can contribute. If you're writing a book or article, then the graphic designers and typesetters are going to make the design decisions, not the author, so why bother messing around with fonts and colours and the infuriating placement of images and tables.
I authored a kid's book on coding, and the process was a nightmare. I authored in markdown, used pandoc to convert and then further edited in libreoffice, to be able to send stuff through in docx format. Then revisions were sent back in docx and I had to reverse the whole process, so I could maintain my plain-text version of the book. Then the proofs were sent through as PDFs, which I then had to markup for corrections. Many of the mistakes were due to the crappy way Word places images. In the end I just bought a copy of Word, and submitted to the way my publisher wanted me to work, which disrupted the authorial process.
It's time we ditched Word, in the same way we ditched VHS and DVD. It's an outdated technology that remains dominant just because everyone uses it at school, and then refuses to move on. If schools insisted that all homework was submitted in something like markdown, we'd see a dramatic change in a very short period of time. (BTW when I was teaching CS, my kids authored in markdown and submitted on GitHub)
Right, rant over - but I've been talking about this for years -http://coding2learn.org/blog/2014/04/14/please-stop-sending-...
In the last few months, though, I gave up on Markdown to switch to a more robust format - LaTeX. Before I switched, I didn't know LaTeX at all, but I knew from my reading that it had the features I needed.
It certainly makes for less _noisy_ source files in my opinion, and it also means that you get to take advantage of the fact that, if you want to, you can easily convert your markdown to HTML, with maths using something like mathjax.
This was a bit of a ramble, but I honestly can't say enough nice things about pandoc.
It's worth noting here that I'm writing layout in LaTeX also - like controlling the number of columns, where breaks exist, etc.
Seriously, org has served all my authoring needs for over a decade now. You can export to LaTeX and HTML easily, and now pandoc does a decent job of exporting to other formats. You can embed LaTeX lines in your org document, so you get the full power of LaTeX, without having to write LaTeX for everything. Tables are hellish in LaTeX, and even lists are a pain.
Of course, there is the whole "You have to use Emacs" thing...
And honestly... I enjoy writing LaTeX. The structure just feels really comfortable to me.
I'd say RST is suitable for many types of documentation but I'm not convinced that it's suitable for conference/workshop submissions.
Now we need a native diff viewer for structured files, where the changes are presented with attribution either side by side, or alongside (like gitk, or like gitlab diff viewer).
Then we need an editor that supports doing the gitty stuff natively, so that the non-technical writer doesn't have to worry about creating repos and committing the changes from the command line.
https://www.zoho.com/writer/help/document-tools.html#Combine...
Feedback welcome: https://www.simuldocs.com/
ETA: Apparently right below this comment someone has already created this: https://news.ycombinator.com/item?id=24303611
Given this line, I think it's fair to add (2014) to the title.
This is pretty old news by now :)
It wasn't fleshed out or usable, but it was an interesting project. I was impressed at how open the Word/Office format was, this was before Microsoft's reemergence into openness and open source.
Another useful trick is to pipe the ANSI-colored terminal output through `aha` (https://github.com/theZiz/aha or `brew install aha`) which produces HTML output, e.g.
git wdiff | aha > ~/Desktop/mydiff.html
You can then send the file mydiff.html to collaborators by email or add to CI build script.I am finally replacing it with a sharepoint solution. Its a headache to have to maintain versions for non-technical people.
[0]: https://www.simuldocs.com/features/version-control-for-micro...
https://www.vivekkalyan.com/using-git-for-word
I tend to prefer markdown for most things, but find it hard to beat Word in terms of simplicity of elegant designs for, say, resumes.
Apart from attachments and metadata the actual document is some kind of xml monstrosity that contains the text and the markup. It’s not very useful to just create diffs from that, it looks a bit like the HTML created by FrontPage if you remember that.
You can just rename a docx file to .zip, unpack it and peek around.
However, diff on word xml is perfect tool to understand how the microsoft interprets the spec.
i wrote a novella using a folder system + text editor + git. i'm trying to put that into a web app. don't know how useful it would be for other people though. and don't know if it will ever be finished because i need to write.
If you like writing out of a text editor (I use Atom) it's super useful.
I've tried using the git diff patience algorithm, but didn't work well - frequently, the diff was about to remove every single line and add all them back to the XML file.
I got some decent results with `xmllint --format` which is the linter/formatter from libxml2 (so available in most Linux distros and ported to most platforms).
(I was using xmllint as a formatting step when unpacking ODT files in my similar tool to the directly above; mentioned in a sibling comment. I found the XML files in ODT files were much more prone to being minimalized and reformatted/reordered on every save in comparison to DOCX which was surprisingly more stable in XML formatting.)
https://github.com/WorldMaker/musdex https://pythonhosted.org/musdex/
Because I built it to be extensible/support plugins I've used it for all sorts of interesting file types beyond DOCX too. (CELTX, a screenwriting format from years back; prettier diffs for Inform 7 source text; experimented with an SQLite deconstructor; ...)
Looks like I take a slightly different approach too, in that I store a bunch more metadata about the deconstructed contents (not just relying on directory listings), so I end up trusting my reconstruction tool a bit more and I mostly don't store the binary blobs in git, as I assume I can reconstruct them quickly enough.
One benefit of your solution over the `textconv`-based approach mentioned in the article is that your solution offers two different levels of diffs (XML and TXT).
To simulate that with textconv, you’d have to switch between two `diff.doc.textconv` variants.
Edit: I meant 'fascinated with using git here in this context'.
That said, "track changes" is still used extensively especially with parties outside the organization, especially for legal documents.
Some of the proposed solutions were very nice, particularly Draftable - but it's expensive and my bosses didn't feel it was worth it. To this day they still work on huge slide decks that are partially shared, but I'm just not involved anymore with that side of things so I stopped pushing. I still think a way of tracking Powerpoint decks on a slide-by-slide basis, with partial merging and synching, would be really good to have (existing features for embedding are '90s-era).
For Word there are quite a few solutions nowadays, most are clearly superior to the stuff Office ships with. So the problem is still there, just not as bad as 15 years ago.
I use O365 collab features daily (with SharePoint/OneDrive) storage and the experience has been similar to that of GSuite. I regularly work on PowerPoints with multiple people simultaneously editing the slides.
These are not WYSIWYG solutions which answers 99% of your question "why". When people want to write a document they want to write things and have the things appear on a page, possibly in different formatting. Injecting ideas like source files, rendering pipeline, etc. will just result in confused people.
That's why online solutions like Google docs are popular. No special app, things look like expected, you can collaborate, and few people actually need any fancy features.
> text
> image
> more text
> table
> more text
There are any number of applications that allow you to write markdown and view the generated HTML in whatever formatting you want. Your recipient then gets to choose their own fonts, colours etc, which from an accessibility point of view, is much better.
Unless you're printing a hardcopy or creating a PDF, what is the point of Word?
I write a lot of stuff in the legal area (articles, books, contracts, court documents, etc) and there's nothing that comes close to Word.
For some time I had tried to switch to LibreOffice. My goal was to quit Word, which is the only software that still binds me to Windows/Mac (not interested in Wine). I hoped to finally be able to switch to Linux without any hiccups.
Unfortunately LibreOffice is not quite as good as Word. I use many of the advanced features of Word, and the more you use these in LibreOffice, the more you encounter bugs. At one point I had a .odt file with tons of cross-references in footnotes pointing to other footnotes. When I was ready to ship the document I found out that all cross-references were messed up and I had to redo them all.
Now it's true that LibreOffice has a huge and active community that works hard to improve the product, but as word processors are my main and most important tool for work, I need the most reliable software I can get. Unfortunately that is still Word...
On top of that I must add that I do need to properly format documents 99% of the times, and also on this I find Word slightly superior, even if admittedly on this is quite comparable to FOSS solutions. The only quite big problem at this regard is interoperability. Since I know that most, if not all, my colleagues/counterparts use Word, whenever I send a document I need to send something that "will just work" for them, which is a docx. This means that using anything other than Word might give some problems in relation to formatting, which in same cases is pretty important.
Markdown -- standard markdown isn't expressive enough (no tables for example), there are lots of extensions but none which are "standard".
LaTeX -- doesn't produce accessible documents, so is a non-starter in lots of areas (seriously, the PDFs it generates are some of the worst around when it comes to accessibility. Word's are amazing).
If we put aside the Word file format, and maybe the ribbon, is Word bad though?
I've been using it for decades, and have tried OpenOffice and LibreOffice too over the years - nothing comes close to Word.
Markdown is not suitable for "normal" users, but as a developer, I've come to prefer markdown for technical documentation and such (especially where I want a history, diffs etc), but I still use Word for a lot of other things.
Word is incredibly fully-featured - I use a lot of functionality, but am likely still only using a fraction of what it has. It really does have all your document editing needs covered.
Aside from the file format, I think Word is a fantastic piece of software. I have a few annoyances with it now and then, but it's been very dependable and kept me in good stead over the years.
Microsoft Office is quite useful, and probably good value for money. But it has never ever been free, and I’ve been buying it for work and home since Office 4.x on Windows 3.1 was the new hotness.
Consider someone dealing with inter-departmental collaboration on documents at a company in the 70s or 80s. They could potentially invent their own system, make paper copies mandatory, go full computer, or any number of solutions in between. Technology was considered hard and looked recognizably so, and management was less likely to question technical views and opinions about this. People were way less likely to get fired and generally visualized staying there for a while, so they were comfortable sticking to their viewpoints.
Today, your boss and their boss are all concerned with how to get the maximum amount of work out of you in the time you're at the company. So if you propose retraining everyone on Open Office or Markdown, because it has high potential for a better way of tracking changes or something, you'll get pushback from a) management, because the CEO is going to say “but I use Word all the time, why can't you just use that?” and b) the workers, because they know they will be forced to learn it on their own time rather than being given a proper amount of time to train and learn. [1]
I think modern society and modern work are slowly defaulting to the idea of quickly throwing in the towel and just using whatever technology is approved by the milieu. This is true even in our industry, consider this article [2] by Latacora [3] for instance: it's full of statements which approximately say “Just use CloudTrail”, “Just Use Jamf”, “Just Use Okta SSO” etc. If our industry is doing things like this to optimize extraction (even the article acknowledges that SOC2 is purely documentation optimized for selling to big companies), why would we be so surprised that publishing departments and such are optimized to Just Use Microsoft Word rather than a technically better system?
-------
[1] Think back: when was the last time you had a proper training about how to use a certain piece of software by people from the company building it, or at least certified trainers? These were way more common back in the day.
[2] https://latacora.micro.blog/2020/03/12/the-soc-starting.html
[3] A very respectable security company focused on startups.
Basically, it's a "good enough" WYSIWYG, and a number of industries have standardized on it, in spite of the fact they should actually use an open standard + tool that actually fits their needs. I think screenwriting might be the one industry to escape Word, since they use Final Draft as I understand it.
Because Office/Word has become the hammer of the document writing world.
It isn't an issue that it's a bad product and better products our out there (and there are).
It is that everyone is expected to know how to use word at a basic level. From Secretaries to VPs and CEOs, almost universally these people can open a word document, edit it, and save it.
Because of this expectation, it is easier to throw money at Microsoft and have the tool you can expect everyone to use.
Non-technical folks do not want to reach for CSS to apply formatting to a text document. Heck, neither do I.
As a boss myself I have to say: I totally give a shit. Salaries are my company’s #1 expense by a wide margin. I don’t want my staff spending their time manually merging docs received via email when there are much better solutions out there. I hire people because they are smart and can get shit done that makes money, not because I want servants.
This is the killer app for Office 365 and google docs: stop wasting time emailing shit around, one canonical version even outside of company walls.
IMHO this is a useful "feature" for lawyers. Don't forget that usually lawyers of two parties are working"together" only apparently, when in fact they are always litigating for their client's best interest.
The goal is not to reach a common agreement, but to reach the agreement that best serves the interest of one's client, most of the times at the expenses of the other party.
This is achieved in many ways, one being having text in a contract that the other party is not fully aware of, either because it's not properly understood or noticed.
This means that including in a document text without the other party noticing is a good old trick that quite valuable to any lawyer.
As a lawyer my position on this is that it's the other party to blame if it did not check the document properly (I always compare the documents for differences even when sent with revisions).
> (c) usually charge by the hour, so some efficiencies are actually going to cost them money if they implement them
I tend to disagree with this line of thought. Lawyers have a thousand ways to inflate their timesheets. Using a tool that makes their life more miserable by forcing them to do manual work that could be automated is certainly not one of them.
As a non-lawyer, this is why they say "the problem with lawyers is that 95% of them give the rest of them a bad name". And as I mentioned, I also always compare.
> Lawyers have a thousand ways to inflate their timesheets. Using a tool that makes their life more miserable by forcing them to do manual work that could be automated is certainly not one of them.
I agree, and they do inflate them regularly -- all lawyers I asked to draft NDAs and employment agreements for me charged a few hours worth for the first one "because they had to write it" even though it was unchanged from another client (for sure; I've seen that exact one before).
Still, they need to keep an air of "being busy" and "working hard", and the best way to do that is to occasionally work hard.
Oh, and I try to do all my presentations in it too - it can export to Beamer.
As for Emacs, I know what you mean. I tried it on and off for 10-11 years before I finally stuck to it. In my case, what helped was that vi/vim really was much worse, so it's not like I had a seriously good alternative. I tired of repeatedly switching editors per task (had one for Python, another for LaTeX, etc). I finally one day said "I need to learn a really good editor and stick to it." I bought the Emacs book, spent a whole week reading it, and forced myself to Google a solution whenever I couldn't remember how to do something. I was surprised how quickly I became proficient in it - within a month of use.
(All without learning elisp - I was a "power" Emacs user for 8-9 years before I learned elisp properly).
And then I discovered org mode. While I've encountered people who were proficient Emacs users but left for something else, I haven't found anyone who is an org mode user who switched to something else. I know people who use other editors in general, but still use Emacs just to use org mode.
> If we put aside the Word file format, and maybe the ribbon, is Word bad though?
Is the ribbon bad, though, or are we just living through 2003 era UI shock for 13 years now?
I lambasted the ribbon as was fashionable back then, but actually using it in 2020?
It's good. Like, legit amazing UI design. Makes it easy to peck through menus and find what you want. Structures chords for power users, and importantly makes those chords visually discoverable. I've found a pile of good keyboard shortcuts just by starting to slowly press a chord on the ribbon.
It's great for mouse users. It's great for noobs just learning the ropes. It's great for keyboard users who already know their stuff. It's amazing at facilitating the learning process into knowing your stuff
It's been great for 17 years. Do we still have to pretend it sucks because we were used to a different way of doing things?
Having said that, adding an option to include formatting changes is on our roadmap.
However, when you're a professional copy editor of non-fiction publications, a substantial part of your work will consist of checking little details, such as making sure the titles of books, magazines, and articles are formatted correctly and so on.
The latest Office products have proper real-time collaboration and change tracking, a la Google Docs.
That's not such a big deal since you can obviously change the font. For a long time there were lots of text fonts but very few math fonts, and those math fonts that did exist would either have some symbols from Computer Modern or wouldn't have a suitably similar text font. But now there are a fair number of choices. Personally I like mathpazo (with Palatino for text) but I've found people used to Computer Modern can find this a bit much of a radical departure. (Edit: I've found a more conservative choice is Times for text and Utopia (MathDesign) for math.)
TeX does have a few small typesetting niggles. For example, if you set f(x)g(y) with normal small brackets around the x but large brackets around the y (because it's really a displayed fraction) then you'll find g is miles away from its argument but right next to f's argument. (I'll avoid opening the can of worms about what the root cause is here, but it's very clearly wrong in this case.) This is actually not that big a deal either - there are lots of problems like this but they're all fairly minor and small in number compared to the huge number of things typeset correctly. The only problem comes when people refuse to correct things because they assume that if TeX typesets it that way then that must be correct by definition.
TeX is like a programmable pocket calculator from the 1970s, way ahead of it's time but today it's something that conspires with Word, Google Docs, and other dull tools to suck out the oxygen for sharp tools.
Even I, happily maintaining some pages in reST, wouldn't want to inflict that on people.
WYSIWYG is a big part of what made the GUI revolution so successful. The computer for the rest of us, wouldn't be for the rest of us, if we had to worry about Git and how to render our file format.
I've had the same frustrations dealing with publishers and Word templates as you had. Your mistake is that you are conflating our experience writing a technical book with the vast majority of users who are not writing technical literature. A writing system for the masses should be as easy to use (for the basics at least) as paper and pencil. Git and learning even a simple markup language does not meet this standard.
I have not used Word for ~10 years, but not in the last ~20 or so years, after I realized how much time and effort it cost me -- nearly missed an important deadline because of a Word 2 vs Word 6 incompatibility that manifested in a very inopportune moment.
It's been around for almost 30 years. I'm constantly receiving documents from people who've used it for >25years. And there is never use of styles, often spaces instead of tabs, many "new lines" instead of a page break, and a host of other things like that. References are not dynamic (just typed out) meaning that an item inserted in the middle of a list makes many of them wrong.
The vast majority of people who have used it for decades use it mostly as a smart typewriter, because the "pro" features like styles require a lot of discipline and the "let's just press the bold button" is too easy and enticing.
WYSIWYG needs to die whenever anything professional is needed.
>Write markdown
You have now lost almost all people who currently write documents. Nobody who is not a developer wants to write in markdown. The mass market wants point and click, buttons, and WYSIWYG.
Basically, when you try to set the properties of text (e.g. bold, fonts, ...) there are always anomalous behaviors involving:
* if you just start typing what font it is in
* selections (why is that the selection region seems to actively avoid the exact selection you want?)
There is something fundamentally wrong with the data model behind it that makes it impossible to implement in a way that makes sense 100%.
For your other objection (and maybe what you were also really getting at with your selection objection), maybe you'd like WordPerfect 5.1's "reveal codes"? :-) We can all agree that Microsoft wouldn't have hesitated to steal that feature if it would have benefitted them. The fact that they didn't is proof that formatting markup is something that was historically tried (or considered) and rejected, rather than something waiting to happen in the future.
In any case, for a program as huge as Microsoft Word, I think this is all quite minor. How much of your day is really ruined if you start typing after some bold text, find that the new text is bold when you didn't want it to be, and have to manually turn it off again? It's a fundamental problem with the model, like you said, but has surprisingly tiny impact on usability. If this is your biggest objection, it's almost proof that the program is pretty good. (But I can sympathise with minor objections: I hate copy and paste works differently in Excel then any other program!)
One very large problem that typically comes up in large teams: Only 1 team member can edit the the "live version" of a document (it's locked for editing by the version control system), the other team members need to work "offline" and then reintegrate their changes/drafts into the main document. Everybody has lived through the horror story of a team member in the different time zone still having checked out the live version and going to sleep :)
Sometimes you have to circulate a draft document in parallel to multiple parties (e.g. colleagues with special subject expertise + client's inhouse lawyers + client's technical experts + other party's law firm + other party's inhouse lawyers + other party's technical experts). It can happen that you need to reintegrate comments from different parties to different versions of the drafts, e.g., if your client gives feedback quickly and you re-circulate updated version internally, then you receive other party's comments to the older draft version...
Besides the mechanical aspects of reintegrating comments, it is also difficult to track if everybody who needs to sign off has actually signed off on the parts of the documents they had to review. Often it gets lost who made which comment/change. It can be quite awkward if a regulator asks you "Isn't the technical statement on page 12 contradicted by fact XYZ" - please explain until tomorrow - and you have to quickly figure out who actually put that in...
m@replace-with-my-username.com
Would be much appreciated!
Also is there anything that actually guarantees that the tracked changes were the only changes made? I haven't seen this but it seems like a serious flaw in the process.
Also what if you get an intern to do some of the work, then you want to review the changes between version:$lastyousaw and version:$current. IIUC the mail with tracked changes only allows you to view one "patch" at a time.
The mail allows you to see all patches, sometimes 'clean slate' is done by accepting all changes. While this sounds like a problem in theory, in practise its not.
I agree that things can be better somehow, but it is really difficult to see any solution which is at least 10% better. The current workflow also has the advantages: - data is as safe as your filesystem and email system together - Word file is generally not considered a vendor lock-in - everybody understands the workflow - nobody can block the workflow (like not checking in again with sharepoint)
Git was originally designed on an email based workflow for software development (hence the commands am, format-patch and send-email).
For contract negotiation, if the template contract was in plain text, then it could be emailed as a patch. The party would then apply that contract to their local git repository, make the changes and email the diff from the original template back to the first party.
So essentially, you could still use email, but have the diff between changes as the content in those emails (along with inline comments).
Unfortunately, corporate Outlook/O365 based email systems don't work very well when used in that fashion.
m@replace-with-my-username.com