Write HTML Right(lofi.limo) |
Write HTML Right(lofi.limo) |
What is the "right" way? Perhaps it is to use style from both of these extreme examples and write code that is easy to read and edit for the person that is working with it.
Or perhaps the right way is to never imply the way you are doing things is the only correct way and then try to pass it on as facts?
For HTML, these are good recommendations.
Sometimes, like for technical writing where there are
various
distinct and important formatting choices, it's just hard work to get it the way you want it even with a WYSIWYG editor.1) not everybody knows and
2) you're relying on memorization for people to read your code, which means you're smashing the ladder rungs behind you
Software on a team is a performance art. People are either watching you and copying your behavior, or watching you and getting confused.
And if you've ever felt overbooked on a project while other people are idle? It's stuff like that that put you into that situation. And since you're the one who did the 'stuff like that', it's at least partly your fault you're in this situation. Stop being a ball hog, and you'll get fewer bruises.
Browsers are not really fast on my Android, and I wish they were fast.
In 2022 we also all use text editors or IDEs that can collapse entire blocks of tags, to improve readability.
I'm not sure I can see a clear benefit here outside of very few edge cases, and I am sure it comes with its lot of disadvantages.
Do you consider any table that doesn’t explicitly declare <tbody> “unclean”? That’s an implicit element in every <table>, according to the spec.
(sorry)
Agreed. That's why I prefer to have things written down. In this case, WHATWG and W3C already did the work for us.
> And if you've ever felt overbooked on a project while other people are idle?
I've seen what you're talking about, but I'm not the one getting overbooked. I'm not generally the one fighting over this stuff. If I get feedback on a PR telling me to add li close tags, I'll probably just do it.
If you're using a technology on a daily basis, it will pay big dividends to spend a little time learning how it actually works.
Unscientific stats from a recent project where I noticed it:
+ Document is about 50,000 words in size. About 150 words to a paragraph element, on average.
+ Converting the entire thing to self-closing p elements added an overhead of about 120ms in Firefox on Linux, before initial render.
+ Converting the entire thing to self-closing p elements added an overhead of about 480ms in Chrome on Linux, before initial render.
+ Converting the entire thing to self-closing p elements added an overhead of about 400ms in Firefox on Android, before initial render.
+ Converting the entire thing to self-closing p elements added an overhead of about 560ms in Chrome on Android, before initial render.
+ The time differences appeared to be linearly increasing, as the document grew from 20,000 to 50,000 words.
+ Curiously, Quirks Mode also increased the load times by about 250ms on Firefox and 150ms on Chrome. (Tried it just because I was surprised at the massive overhead of removing/adding the tag endings.)
The most common place this was going to be opened was Chrome on Android, and a whopping half-second slower to first render is going to be noticeable to the end user. For some prettier mark up.
Whilst you can debate whether that increased latency actually affects the user, a decreased latency will always make people smile more. So including the end tags is a no-brainer. Feel free to write it without them - but you _might_ consider whether your target is appropriate for you to generate them before you serve up the content.
Are you sure you converted it properly? I'd expect those kinds of numbers if your elements were very deeply nested by mistake (e.g. omitting tags where it's not valid to do so), but I don't see why leaving out </p> should be so slow.
Try these two pages:
+ Unclosed: 4.00s, 3.91s, 3.59s, 4.45s, 3.93s
+ Closed: 3.90s, 2.74s, 3.9s, 2.05s, 3.39s
Though I'd note that the newline you have immediately following the paragraph, even when closing, would probably reduce the backtracking effect. And having no explicit body or head element would probably cause some different rendering patterns as well.
<p>Content<p>Content[EOF fig1]
to be (slightly) slower, than <p>Content</p><p>Content</p>[EOF fig2]
(most likely because of some "backtracking" when hitting `<p[>]`), or <p>Content</p>
<p>Content</p>[EOF fig3]
(with that that small insignificant `\n` text node between paragraph nodes), what should be possibly faster than "the worst scenarios": <p>Content
<p>Content[EOF fig4a]
or even <p>
Content
<p>
Content
[EOF fig4b]
with paragraph text nodes `["Content\n","Content]"` / `["\nContent\n","\nContent\n]"`, where the "\n" must be also preserved in the DOM but due white-space collapsing rules not present in the render tree (if not overridden by some non-default CSS) but still with backtracking, that <p>Content
</p>
<p>Content
</p>[EOF fig5]
should eliminate (again, similarly to fig2 vs fig1).(Sorry for wildly biased guesswork, worthless without measurements.)
All paragraphs had a blank line between them, both with and without the p end tag. The p opening tag was always at the top-left, with no gap between it and the content.
So, for example:
<p>Cheats open the doorway for casual play. They make it easier for disabled players to enjoy the same things as their peers, and allow people to skip parts of a game that <em>they bought</em> that they find too difficult.</p>
<p>Unfortunately, cheats are going away, because of extensive online play, and a more corporate approach to developing games that despises anything hidden.</p>
Versus: <p>Cheats open the doorway for casual play. They make it easier for disabled players to enjoy the same things as their peers, and allow people to skip parts of a game that <em>they bought</em> that they find too difficult.
<p>Unfortunately, cheats are going away, because of extensive online play, and a more corporate approach to developing games that despises anything hidden.
(You can also discount CSS from having a major effect. Less than a hundred lines of styles, where most rules are no more complicated than: `p { font-family: sans-serif; }`. No whitespace rules.)However, if you wanted to look at this in a more scientific way - it should be entirely possible to generate test cases fairly easily, given the simplicity of the text data I saw my results with.
But then if you run it through Prettier it'll add all the closing tags for you :)
Are what more strict? You're missing a subject there.
At a guess, you're referencing the differences between Chrome/Firefox rendering times? And are surprised that Chrome is always slower?
In the same completely unscientific stat taking, I found that Chrome was significantly faster at parsing the HTML head element of a document than Firefox, and that difference was enough for Chrome to pull ahead of Firefox in overall rendering times for smaller pages. (Chrome was about 30% of Firefox's time spent in the head.)
However, Firefox was faster at parsing the body, and as I had a larger-than-usual body (50k words is not your average webpage), Firefox was overall faster.
Every time the author introduced a shorthand, they had to clarify that it works only in specific situations. The result of those qualifiers is that you will have to have some code written in the more verbose style anyway. Context switching between those styles and having to decide whether the shorthand works in any given case just isn't worth it on a large project that you'll be making changes to over time.
<html lang=en-GB>
It's not verbose after all, and IIUC may be omitted if and only if the document is served with corresponding information in `Content-Language:` HTTP header, but nasty (or rather annoying) things may happen if that fails [1], so when it comes to "right HTML", following this advice sounds reasonable.[1] https://adrianroselli.com/2015/01/on-use-of-lang-attribute.h...
I think this is similar to semicolons in Javascript: with semicolons at the end of each statement there is no ambiguity, but if you do not have semicolons, you have to know about edge cases, like if a line starts with a square bracket or paren.
So instead of this:
First sentence. Second sentence. % Comment on first sentence.
I can write: First sentence. % Comment on first sentence.
Second sentence.
(Of course, one could define a new TeX macro that doesn't display anything to add comments anywhere in-line. That's not as readable, though.)I've also read that one-sentence-per-line works better with diff programs, but I haven't had any problems with the program meld, so this isn't convincing to me. The advantage the linked article mentions in terms of rearranging sentences also is worth considering, though I haven't found the normal way to be that bad so I'm not convinced by that either.
Some other links on this coding/writing style:
https://rhodesmill.org/brandon/2012/one-sentence-per-line/
https://news.ycombinator.com/item?id=4642395
http://www.uvm.edu/pdodds/writings/2015-05-13better-writing-...
I’m equally interested in this as the HTML. Any clue what the author is referring to?
Anyone interested in this subject, check out a series of three very tiny books called “UPGRADE YOUR HTML” by Jens Oliver Meiert.
They give great step-by-step examples for eliminating optional tags and attributes, reducing HTML to its cleanest simplest valid form. The author is a super-expert in this specific subject, working with Google and W3C on this. His bio here: https://meiert.com/en/biography/
From LeanPub: https://leanpub.com/b/upgrade-your-html-123
From Amazon: https://www.amazon.com/gp/product/B08NP4GXY2/
Reminder of what? To me this reads like satire, even if it wasn't intended as such.
(Also: a huge thank you for creating SingleFile. One of my favourite extensions of all time.)
Pretty neat extension!
- it will be possible to be consistent with closing tags or not
- you can do other arbitrary things to improve your working experience with it
Ever tried Slang styled templates?
If you’ve got stuff that would look good as a table, use a table.
I wouldn't mind if we had a bunch more basic content documents on the web.
Just use templating engine like Pug and get away with most of the annoyances.
It's concise about what part of the text is covered by a certain tag due to forced indentation, not to mention you'll never need to close any tag and you never write "class=" but are all turned into CSS selector notation among many other tricks.
https://github.com/pugjs/pug#syntax
Unless the HTML I'm composing will be touched by people like designers who would get scared of new syntax, in which case I'll use Twig or Nunjucks, I'll never write plain HTML for myself.
There's also a very solid implementation in PHP as well.
https://github.com/pug-php/pug
You can either let server side (node.js or PHP) compile that on demand or let your editors compile them as you edit if you're working on a static file.
I really think the language humans write should deviate from the language the runtimes understand to get all the convenience while never breaking how runtimes/crawlers interpret your output. Same goes for Stylus against CSS.
Note however that this means that the whitespace between paragraphs will be part of the paragraph which can be annoying if someone tries to copy the text on your website and gets an additional space after each paragraph which wouldn't have happened if you explicitly closed the </p> directly after the text.
Also, you should keep the opening <html> and specify the language of your document even for english since e.g. automatic hyphenation does not work if you don't specify a language.
Otherwise really like this condensed HTML style and have recently converted my personal website to it.
It causes way more trouble than those benefits are worth
Omitting the closing tags where possible is less about saving keystrokes than minnimising interruptions to my writing flow.
But I wouldn't advocate it for published documents, just my local scribblings.
How ironic.
... anyway, it bothers me sometimes that I'm not aware of any spec for "reader mode compatibility", did anyone see anything like that?
I do not know what criteria are needed for the reader mode in Chrome. (The HTML code looks OK to me?)
After reading the connected blog post http://perfectionkills.com/experimenting-with-html-minifier/
They are mistakenly and traditionally associated with literature-type paragraphs but that is not correct. You generally use them in forms to split different groups or inputs, that has nothing with paragraphs of a written form and even less with textual paragraphs.
I think there is really a lot of confusion about them in this whole thread.
Take a look at the source code for http://info.cern.ch/hypertext/WWW/MarkUp/Future.html for instance, which was written by the creator of HTML, Tim Berners-Lee.
You can also look at the source code for any page of the current HTML spec (e.g. https://html.spec.whatwg.org/multipage/introduction.html) where, again, <p> is used for each paragraph in the text.
Paragraphs relate to grouping content[1], not textual one. There's no logic in paragraphs.
I quote here the official spec, which makes various examples of how paragraphs are not related to logical paragraphs:
> The solution is to realize that a paragraph, in HTML terms, is not a logical concept, but a structural one. In the fantastic example above, there are actually five paragraphs as defined by this specification: one before the list, one for each bullet, and one after the list.
And I'll quote also the definition on MDN:
> The <p> HTML element represents a paragraph. Paragraphs are usually represented in visual media as blocks of text separated from adjacent blocks by blank lines and/or first-line indentation, but HTML paragraphs can be any structural grouping of related content, such as images or form fields.
Failing to realize that paragraphs are grouping rather than logical content leads to frequent misuses of paragraphs and this comment section is literally filled by bad paragraphs examples which suggests the community is largely ignorant on html.
[1]https://html.spec.whatwg.org/multipage/grouping-content.html...
What do they mean here?
Does anyone have a source/reference for this?
<p>
Paragraph with a list won't work as you could think
<ul> <li> Test </li> </ul>
Something else
</p>
Parses to: <p>
Paragraph with a list won't work as you could think
</p>
<ul> <li> Test </li> </ul>
Something else
<p></p>
Similarly, in JS you are paying the price for optional semicolons even if you decide to use them. return
{
x: 1
};
Will still not work even if you use semicolons elsewhere. So I don't see any advantage to actually using semicolons. JS is not worse than Python with it's basic inference, and yet in Python people will almost yell at you if you attempt to use a semicolon :)I'd much prefer these features to be opt-in (yea, give me XHTML back for generated content). But when I can't can't disable them, why not embrace them ;)
JS semicolon insertion is worse, because it depends on the following line. In Python, an unescaped newline outside of brackets always ends the statement, but in JavaScript, parentheses, brackets, binary operators, and template literals on the following line change that. The Python rule also makes a dangling operator outside of brackets a syntax error, which is a potential source of unintentional introduction of ASI when making changes to code in JavaScript.
You can still use XHTML; just send "Content-Type: application/xhtml+xml". You can express the same things as an HTML document, but with a saner parser mode.
Periodically I have to send code to people who then make some of their own changes inline. God forbid trying to explain "yeah, they don't need to be closed, but that does because it's nested and..." Disaster (/hours of extra support) waiting to happen.
<p>Hello <p>World</p>!</p>
when it’s actually two consecutive paragraphs, an exclamation mark outside of any paragraph, and a closing p tag without an opening counterpart.And when you do know HTML, you might as well omit optional tags.
If you think that HTML syntax is crazy, I won’t blame you, and you might consider XHTML instead, but you should be prepared for different woes.
I remain unconvinced it was a wise idea.
not closing tags for instance is really asking for future headaches. sure, it works for a simple text list, but not when it gets even a little complicated (add links, images, buttons, etc.). even worse are p tags, where you have to memorize a whole matrix of what it can contain and what breaks out implicitly. with every insertion/deletion, you need to check the list. it's needless mental drag.
<p><div></div></p> is invalid HTML because <div> ends the paragraph, resulting in an unpaired </p>.
Isn't it a "view" of information? Any sufficiently advanced text editor can recreate it with a simple key combination.
The DOM is a tree, with nested elements. Losing that information doesn't get you anything but tag soup (which is, oddly, what the author suggests this style is supposed to avoid)
https://www.fieggen.com/shoelace/
Specifically:
https://www.fieggen.com/shoelace/grannyknot.htm
An HN favorite:
If your first cross is left over right you need to make your second cross right over left, or vice versa. I found an image showing the difference for the un-slipped version, but it's the same with a bow: http://www.tikalon.com/blog/2020/square_granny_knots.png
Granny knots untie themselves and the bow will end up perpendicular to the knot instead of parallel.
> The right way to tie your shoes is with a square knot. It's easy to confuse this with the granny knot, which is the wrong way. The square knot is a simple and sound knot with many uses. The granny knot is an unsound knot whose only known uses are to make your shoelaces look crooked and to trip you.
You look goofy trying to relearn to tie your shoes, but it really is fast and sturdy.
And, spoiler alert: after repeated runs I'm not seeing any substantial difference between these test pages. And based on how the HTML parser works, I wouldn't expect it.
(I work on web performance on the Chrome team)
For me, on Firefox on Linux (I know it's the one with the smallest difference, but I don't have the others on hand, sorry), using the "load" time at the bottom of the Network tab, with cache disabled and refreshing with Ctrl+F5, interleaving the tests:
- Unclosed: 1.38s, 1.49s, 1.45s, 1.52s, 1.48s
- Closed: 1.47s, 1.37s, 1.48s, 1.49s, 1.35s
The one with </p> omitted takes about 0.032s longer on average going by these numbers, but that's about 2 frames of extra latency for a page almost twice the length of The Lord of the Rings.
Regarding the page itself, I tried to keep everything else as identical between the two versions as possible, including the DOM, hence why I wrote the </p> immediately before each <p>. As for backtracking, I'm not sure what you mean. The rule for the parser is simply "If the start tag is one from this list, and there's an open <p> element on the stack, close the <p> element before handling the start tag."
(BTW your snippet's content sounds interesting and feels relatable, definitely intrigued.)
Tested only on Windows, in browser consoles.
Numbers:
Firefox (Nightly) (performance.now is clamped to miliseconds)
total; median; average; snippet
2279.0; 4.0; 4.558; '<p>_'
2652.0; 4.0; 5.304; '<p>_</p>'
2471.0; 4.0; 4.942; '<p>_abcd'
2387.0; 4.0; 4.774; '<p>_\n'
3615.0; 5.0; 7.230; '<p>_</p>\n'
2380.0; 4.0; 4.760; '<p>_abcd\n'
3093.0; 5.0; 6.186; '<p>_\n</p>\n'
3107.0; 5.0; 6.214; '<p>_</p>\n\n'
2317.0; 4.0; 4.634; '<p>_abcd\n\n'
2344.0; 4.0; 4.688; '<p>_\n\n'
Google Chrome (performance.now is sub-milisecond) total; median; average; snippet
2870.4; 5.2; 5.741; '<p>_'
2895.2; 5.4; 5.790; '<p>_</p>'
2684.7; 5.2; 5.369; '<p>_abcd'
2845.4; 5.2; 5.690; '<p>_\n'
3836.7; 7.3; 7.673; '<p>_</p>\n'
2837.8; 5.2; 5.676; '<p>_abcd\n'
4022.5; 7.4; 8.045; '<p>_\n</p>\n'
4044.3; 7.3; 8.089; '<p>_</p>\n\n'
2928.4; 5.2; 5.857; '<p>_abcd\n\n'
2805.3; 5.2; 5.611; '<p>_\n\n'
Test config Snippets per document: 5000
Rounds: 500
Wrap: '<!doctype html>(items-paragraphs)'
Content each item (_): bunch of random digits chunks, something like '1943965927 52 27 5 51664138859173 5161 7226 5 15 2 55679 6553712585'
Code: https://gist.github.com/myfonj/57a6a8fcb1c5686527412543a897c...(Before realizing I can use synthetic domparser I made something what measures document load time in iframe (http://myfonj.github.io/tst/html-parsing-times.html) but it gives quite unconvincing results, although probably closer to the real world. Understandably, synthetic domparser can crunch much more code than visible iframe.)
<p><div></div></p>
Yes, obviously this is bad and nonsensical HTML. Under no circumstances does it make sense to have a div inside a p. In fact, the above doesn't even work, being parsed as <p></p><div></div></p>
But the intention of this example is not to show good HTML. The point is that many people have only a very basic understanding of HTML syntax, under the impression that <foo><bar></bar></foo>
works for any elements, because there's a <foo> and a </foo> so clearly anything inside it must be inside the foo element, right? But this is not the case for all elements. HTML's syntax is more complicated than that. My example was only intended to correct this misconception, not to demonstrate semantically-correct HTML, and that goes for other similar examples made by other people in the comments too.A better example here is whether you are lenient and accept unescaped html entities like "<" vs "<". If you require it to be escaped "<" or if all entities in your inputs are always escaped, then your text parser never has to backtrack. But if you are lenient, your text parser can do catastrophic levels of backtracking if there is a single "<" somewhere (unless you are careful). Imagine input that starts off "<a small mouse once said". If could be quite a while before your parser knows it's not an anchor open tag.
Even in the rarest cases I maybe had them like when copy pasting in the wrong place they were so rare that I don't think it's worth the additional noise of semicolons.
This is not quite true. There are a number of mutual incompatibilities between the XML and HTML syntaxes at both parse and run time.
At parse time, it’s mostly in the direction of XML syntax making things possible (e.g. nesting paragraphs or links, which the HTML parser prevents), but also in the other direction (e.g. <noscript> has no effect in XML syntax since it’s essentially an HTML parser instruction); you’ve also got case sensitivity which matters for SVG; and there’s the matter of the contents of <script> and <style> elements and their handling of <>&, where the best but still imperfect solution is a crazy mix of XML comments, JavaScript/CSS comments and XML CDATA markers. (See https://www.w3.org/TR/html-polyglot/ for more details of all this kind of stuff.)
At run time, behaviour changes in such a way that it will break some JavaScript libraries, due to differences like .tagName being lowercase instead of uppercase, and .innerHTML requiring and producing XML syntax.
‡ including sound, solid reasoning for why the former is incorrect and the latter is correct
I don't think that's correct. The article is literally talking about how to write HTML, and explaining the benefits of writing it in this style.
Not with that attidude... if you write consistently and with intention, it turns out just fine.
Check out the source for https://try.nodebb.org, for example. Dynamically generated, (mostly) syntactically correct, (mostly) human readable.
What really needs to happen it a separation of HTML from UI markup elements. HTML will be used solely for textual markup and a new markup language can be used for UIs. This would allow us to return to a proper separation of concerns.
Do you mean CSS? Using <b>, <I>, <strong>, etc has been “bad form” for a while (maybe not strong though)
I think that is a very compelling format for a text-first web page, like a blog post or news article. Of course it is a coding style not well suited for complex web apps with deep hierarchy.
http://info.cern.ch/hypertext/WWW/MarkUp/HTMLConstraints.htm...
You can’t orient yourself - you can’t tell where you are - unless you count the branches. And indenting makes that visible.
In the examples for TFA, you can tell your location from the names of the elements. Eg <td> is enough for you to know you’re probably inside a tr inside a table.
And that is the more common case than the general tree example.
But a method of describing html does have to answer the question of how it represents arbitrarily deep nesting. But I like the answers it’s given for the more common case of structures that are not arbitrarily deep.
Or, to put another way, HTML != DOM, even though HTML can be rendered into a DOM.
[1] https://github.com/gildas-lormeau/SingleFile/commit/6c7a2ef1...
The rational evidence in favor of this claim has always been weak. The "div" tag basically finished it off. The people who use HTML "semantically" have always been dwarfed by the people just making it look good on the screen, and the number of applications that use those semantics has always been small and on the fringe for something so putatively important.
However, the idea persists to this day despite its near complete failure to pay off significantly in nearly twenty years, and I'm sure someone will angrily reply to this and list the incredibly useful semantic HTML features that they and fifteen other people have found to be just incredible. Perhaps we'll also get the traditional citation of the Google info boxes, which have nothing to do with the semantic web and everything to do with Google throwing a crapton of machine learning and humans at the problem of parsing distinctly non-semantic HTML until they cracked the problem.
(An honorable mention to screen readers, which sorta do benefit, but still nowhere near as much as you might casually expect.)
Today the reason not to use tables is more just that it's inconvenient to do things like have a mobile and desktop layout. I believe they've got all the tools nowadays to tear into a table-based layout, break the tables apart, and treat it like any other CSS-styled content, but that's relatively recent, and still a silly way to operate when you could just use normal layout elements ("div" if nothing else) like a sane person and not have to undo the table layout before you can manipulate them properly.
Too limited, and deserving of much more than an honorable mention.
Accessibility should be a fundamental consideration of any reasonably sized app, using <table>s to markup tables is part of that.
Assistive devices are not limited to screen readers, and it's just good practice to use tables for tables.
CSS Grid has landed in all major browsers, if you want a grid layout, use grids for layout.
I am not clear on exactly what "the idea" refers to, perhaps you could clarify. Also, how has the idea "completely failed"? And what would complete success look like?
> Google info boxes[...] have nothing to do with the semantic web and everything to do with Google throwing a crapton of machine learning and humans at the problem of parsing distinctly non-semantic HTML until they cracked the problem
This is verging on /r/SelfAwarewolves material.
Also, not being able to rearrange blocks for different size displays is kind of a non-starter relative to the mobile internet, which I'd guess was more important than frameworks.
In my own hobbyist stuff, though, there's just something a little gross about putting layout in the HTML -- I want HTML to represent semantic structure, in ways I'd be okay with Lynx displaying, and I want CSS to do all the lunatic nonsense to make me happy with how it looks on a modern browser. I wonder how much it's this aesthetic principle of separation that motivates others as well.
Tables for layout were fine back when everyone was browsing the web on SVGA, XGA, or even SXGA screens at 96dpi (72dpi on the Mac). Now a visitor might be on a high DPI display in portrait orientation, full screen on a 4K monitor, or anywhere in between I think it's a bit disrespectful to visitors not have have a responsive page layout. Tables are a liability for responsiveness and should only be used for tabular data.
Divs that follow the document's semantic hierarchy and are positioned on your CSS have none of those issues.
Anyway, a lot of ways to use Bootstrap and other grid-based frameworks introduce the same problems back. And if you want to really display things in a table, well, a table fits quite well your requirements.
https://developer.mozilla.org/en-US/docs/Web/CSS/display#int...
From [1]: “It is sometimes suggested, even by some accessibility advocates, that layout tables are bad for accessibility. In reality, layout tables do not pose inherent accessibility issues.”
What was the best argument that you can recall? What were some of the bad ones? What does "harm" mean in this context?
> because the dependencies between HTML structure and styling are just too many and too strong.
Which dependencies? What would a structure/styling language combination look like that that lacked or had weak dependencies?
There are several for HTML different versions and it is standardized that you can omit some closing tags and some tags altogether.
The benefit of writing in a standardized language is that later you or anybody can run tools against your sources that check for conformity.
So that is why I prefer HTML. But I would like to hear your opinion on what is the best mark-down dialect currently?
From their Github page I read: "The spec contains over 500 embedded examples which serve as conformance tests."
So it's not so simple any more, is it?
Whether the latter is worth it tends to depend on other things than parse time.
I'm not saying it's never worth it.
As it stands now, it's very practical to just slap some <div>s down and do some CSS and be done.
I mostly recall “layout belongs into CSS files” (so a matter of principle) and “layout tables are bad for accessibility”, which while in theory could be an important point, in practice screenreaders had already adapted, and had (still have) very practical heuristics to distinguish layout tables from data tables (see e.g. https://webaim.org/techniques/tables/).
The thing is, at the time, using CSS to achieve the equivalent of layout tables was an exercise in frustration and futility, in that the results were exceedingly brittle and very often broke when either the table content, the surrounding elements or the browser window size changed too much.
Nowadays we have CSS grid and flexbox of course, but I imagine that in some cases a layout table could still be the most straightforward solution today.
> Which dependencies?
The fantasy back then was that it would be possible to define the HTML content and structure completely independently from layout and styling considerations, and then a separate CSS file could be used to specify any conceivable styling and layout for that content. While that is true to a certain extent, it usually breaks down as soon as you need HTML elements to be in a different order or nesting relation, or when you need additional intermediate nesting DIVs, etc.
In reality the HTML structure and the CSS structure (bound to each other by IDs, class names, hierarchical selectors etc.) is so closely intertwined, and the mapping points (i.e. IDs, class name combinations, etc.) are so many that, for the most part, only superficial changes can be made to one side without having to also make some adjustment on the other side. Ideally, it would be possible for an HTML author and a CSS author for the same web page to mostly work independently from each other. In reality this is almost impossible, except for the case where the HTML remains basically unchanged and the CSS can change within the constraints of the existing HTML structure.
Banning layout tables was never going to be a major factor in coming substantially closer to the imagined ideal here.
> What would a structure/styling language combination look like that that lacked or had weak dependencies?
I think it’s inherently difficult, because you will always need to specify which styles/classes should apply to which element in a rather fine-grained manner, which just means there will always be a lot of dependencies between the two sides.
One thing you’d need in order to realize arbitrary layout is a way of mapping structured content into a different structure. That basically means having a functional programming language to define the mapping, if you want to have full flexibility.
Which is why it is bizarrely unselfaware when Semantic Web advocates almost inevitably cite that as their biggest success. It isn't. It's their biggest failure.
You should be more sure of the things you're pretty sure of before saying you're sure of them.
There was no misinterpretation—from this end, that is. Your comment wasn't particularly sophisticated. It didn't require explanation.
> Google did not simply write a web scraper that pulls a <business_hours> or a <dc:business_hours> tag out of the web. They wrote a web scraper that super, super intelligently examines the HTML and[...]
No shit. The value proposition of the semantic web follows from how the world would be much better off if that weren't necessary. It has always been the case that, without the "semantic" half of "semantic web", attaining Google-level mastery over the Web's messy inputs is really, really difficult and requires Google-level resources. This isn't news. Yet you presented it as if it were in insightful observation wrapped in sage wisdom.
In your attempt to "prove" by counterexample what's Wrong with the semantic web, you just end up undergirding its very premise.
> Which is why it is bizarrely unselfaware when Semantic Web advocates almost inevitably cite that as their biggest success.
You cited them. You are literally the only person who mentioned them here, at all. You brought them up.
Saddling someone who advocates for X with the burden of defending position Y that you yourself have pulled from thin air is a textbook example of a bad argument. If you defeat some easily take-downable opponent (a 6-year-old, let's say—and one who is made of straw, for good measure) and then plan to enter the ring in subsequent matches having only bothered yourself with the thought that you will face the threat of another strawchild, that's not wise. It's stupid.
I claimed the specification existed, I didn’t claim it was a simple specification.
I just find it interesting. This would indicate to me that there are 500 "features" in the language. I thought mark-down languages just provided a few shortcuts for producing the most commonly needed HTML features and then provide a fallback to HTML. So if you cannot do it in the markdown language, use HTML instead.
But let's assume we can get away with just a single test for tables. And then we introduce the features "section headers" and "bold" and "underline". All these features can interact (e.g. underlined bold section headers), so we want to test combinations of all those features, and have a nice combinatorial explosion.