Discussion:
[whatwg] JavaScript function for closing tags
Michael A. Peters
2017-10-14 06:36:00 UTC
Permalink
There does not seem to be a JavaScript API for closing open tags.

This is problematic when dealing with WebVTT which does not require tags
be closed.

Where it is the biggest problem is when the document is being served as
XML+XHTML

I tried the following hack which seemed to be working:

cleandoc = document.implementation.createHTMLDocument("FuBar");
cleanbody = document.createElementNS("http://www.w3.org/1999/xhtml",
"body");
cleandoc.documentElement.appendChild(cleanbody);


Then I could do the following when with a WebVTT cue:

cleanbody.innerHTML = string;
return (cleanbody.innerHTML);

That *mostly* works but seems to sometimes fail when string contains
entities, such as  

What happens is it returns an empty string.

Given that WebVTT is part of HTML5 and browser native html5 audio
players don't support caption tracks forcing us to write our own
implementations if we want captions with audio, it sure would be nice if
there was a pure JavaScript way to just add closing tags to a string
because there is never a guarantee valid WebVTT cue has closed tags
which are required for XHTML sent as XML.

Seems to me that a JS native function to add missing closing tags would
have more application than just WebVTT cues.

I looked for a jQuery filter that does it, but could not find one.

It also could be of benefit in emulating document.write() as many of
Google's tools *still* require document.write() despite the issues with
document.write() and XML having been known for 15+ years now.

Any chance of getting a parser into JavaScript that at least would be
capable of closing open tags in a string passed to it?
Silvia Pfeiffer
2017-10-14 07:46:58 UTC
Permalink
Hi Michael,

It seems to me that the TextTrack API is made for this use case.
Why does it not work for you?

Cheers,
Silvia.


On Sat, Oct 14, 2017 at 4:36 PM, Michael A. Peters
Post by Michael A. Peters
There does not seem to be a JavaScript API for closing open tags.
This is problematic when dealing with WebVTT which does not require tags be
closed.
Where it is the biggest problem is when the document is being served as
XML+XHTML
cleandoc = document.implementation.createHTMLDocument("FuBar");
cleanbody = document.createElementNS("http://www.w3.org/1999/xhtml",
"body");
cleandoc.documentElement.appendChild(cleanbody);
cleanbody.innerHTML = string;
return (cleanbody.innerHTML);
That *mostly* works but seems to sometimes fail when string contains
entities, such as  
What happens is it returns an empty string.
Given that WebVTT is part of HTML5 and browser native html5 audio players
don't support caption tracks forcing us to write our own implementations if
we want captions with audio, it sure would be nice if there was a pure
JavaScript way to just add closing tags to a string because there is never a
guarantee valid WebVTT cue has closed tags which are required for XHTML sent
as XML.
Seems to me that a JS native function to add missing closing tags would have
more application than just WebVTT cues.
I looked for a jQuery filter that does it, but could not find one.
It also could be of benefit in emulating document.write() as many of
Google's tools *still* require document.write() despite the issues with
document.write() and XML having been known for 15+ years now.
Any chance of getting a parser into JavaScript that at least would be
capable of closing open tags in a string passed to it?
Michael A. Peters
2017-10-14 08:13:23 UTC
Permalink
I use TextTrack API but it's documention does not specify that it closes
open tags within a cue, in fact I'm fairly certain it doesn't because
some people use it for json and other related none tag related content.

Some errors using the tracks in XML were solved by the innerHTML trick
where I create a separate html document, append the cue, and then grab
the innerHTML but that doesn't always work to close tags when html
entities are part of the cue string.

I use JS regex to turn <c.foo> into <span class="foo"> and <v whatever>
into <span data-voice="whatwever"> and just jQuery .append() to place
the cue in a div for captions/subtitles since browser standard html5
audio players have zero support for captions by themselves, and the
jQuery .append() is what needs closing tags in XML or it understandably
completely fails.

Also using it for chapters which don't appear to be supported in either
browser standard audio or video players.
Post by Silvia Pfeiffer
Hi Michael,
It seems to me that the TextTrack API is made for this use case.
Why does it not work for you?
Cheers,
Silvia.
On Sat, Oct 14, 2017 at 4:36 PM, Michael A. Peters
Post by Michael A. Peters
There does not seem to be a JavaScript API for closing open tags.
This is problematic when dealing with WebVTT which does not require tags be
closed.
Where it is the biggest problem is when the document is being served as
XML+XHTML
cleandoc = document.implementation.createHTMLDocument("FuBar");
cleanbody = document.createElementNS("http://www.w3.org/1999/xhtml",
"body");
cleandoc.documentElement.appendChild(cleanbody);
cleanbody.innerHTML = string;
return (cleanbody.innerHTML);
That *mostly* works but seems to sometimes fail when string contains
entities, such as &#160;
What happens is it returns an empty string.
Given that WebVTT is part of HTML5 and browser native html5 audio players
don't support caption tracks forcing us to write our own implementations if
we want captions with audio, it sure would be nice if there was a pure
JavaScript way to just add closing tags to a string because there is never a
guarantee valid WebVTT cue has closed tags which are required for XHTML sent
as XML.
Seems to me that a JS native function to add missing closing tags would have
more application than just WebVTT cues.
I looked for a jQuery filter that does it, but could not find one.
It also could be of benefit in emulating document.write() as many of
Google's tools *still* require document.write() despite the issues with
document.write() and XML having been known for 15+ years now.
Any chance of getting a parser into JavaScript that at least would be
capable of closing open tags in a string passed to it?
Michael A. Peters
2017-10-17 15:50:47 UTC
Permalink
Post by Michael A. Peters
I use TextTrack API but it's documention does not specify that it
closes open tags within a cue, in fact I'm fairly certain it doesn't
because some people use it for json and other related none tag related
content.
Looking at https://www.html5rocks.com/en/tutorials/track/basics/
it seems JSON can be used, no idea if content type is different or not
for that.
Post by Michael A. Peters
Some errors using the tracks in XML were solved by the innerHTML trick
where I create a separate html document, append the cue, and then grab
the innerHTML but that doesn't always work to close tags when html
entities are part of the cue string.
Mixing XML and HTML is not a good idea. Would it not be easier to have
the server send out proper XML instead of hTML? Valid XML is also valid
HTML (the reverse is not always true).
I agree, but what I was using an html document for - when using JS
innerHTML it has closing tags so the only issue would be tags that html
itself does not close (e.g. br) but those are not applicable with a
WebVTT cue - which is only suppose to support a very small number of
tags, all which have closing tags.

The problem is WebVTT does not require tags be closed in a cue, e.g.

04:05.000 --> 04:07.250
<c.foo>This is a cue.

That's allowed in WebVTT

I convert c.foo into

<span class="foo">This is a cue.

and when I add that to the html document and use innerHTML it then has
the closing </span> on it.

While it seems to work with some html entities, it breaks with others
like &#160;

So for now I have to just make sure all my WebVTT are closed and not use
the hack that adds closing tags - but since WebVTT cues do not have to
have closing tags, but the cues need to work in XML documents, a
built-in parser in JS that can add missing closing tags I think would be
a good thing.
And if XML and HTML is giving you issues then use JSON instead.
I did not see JSON mentioned in the W3C spec though.
I think the JSON in WebVTT cues is not spec but some are using it.

Basically the textrack API seems to allow almost any string, it really
has to as WebVTT is not static and the spec changes. I wouldn't mind
JSON being added to WebVTT as it would be a handy way to encode metadata
about the media but that's another topic.

A built in JS HTML parser may also be of benefit in preventing code
injection, e.g. stripping out tags from a WebVTT cue that a website does
not allow.

The TextTrack API doesn't filter out things like script or other tags
that aren't part of WebVTT which means any site that allows users to
upload WebVTT files is creating a potential code injection vulnerability.

Server-side code should filter it on upload, but it would be nice to
*someday* be able to pass a string through a native JS filter much the
same way we can with htmltidy server-side and remove all but
white-listed tags and attributes and get back a cleaned string with all
tags closed.

It looks like Google has a library that does that but it isn't intended
for client-side JS and may not be fast enough for things like phones to
process time-sensitive cues (I don't know).

I might be wrong but it looked like the google library I found was
intended for server-side Node.js use.
Silvia Pfeiffer
2017-10-17 19:47:40 UTC
Permalink
We could specify that WebVTT cues of type metadata should contain valid
JSON - that would make sense to me.

Cues of type captions or subtitles stupid get parsed dune by the addCue()
function of the texttrack API - but not all browsers implement this yet.
Would be worth registering bugs on browsers.

Cheers,
Silvia.

Best Regards,
Silvia.
Post by Michael A. Peters
Post by Michael A. Peters
I use TextTrack API but it's documention does not specify that it
closes open tags within a cue, in fact I'm fairly certain it doesn't
because some people use it for json and other related none tag related
content.
Looking at https://www.html5rocks.com/en/tutorials/track/basics/
it seems JSON can be used, no idea if content type is different or not
for that.
Some errors using the tracks in XML were solved by the innerHTML trick
Post by Michael A. Peters
where I create a separate html document, append the cue, and then grab
the innerHTML but that doesn't always work to close tags when html
entities are part of the cue string.
Mixing XML and HTML is not a good idea. Would it not be easier to have
the server send out proper XML instead of hTML? Valid XML is also valid
HTML (the reverse is not always true).
I agree, but what I was using an html document for - when using JS
innerHTML it has closing tags so the only issue would be tags that html
itself does not close (e.g. br) but those are not applicable with a WebVTT
cue - which is only suppose to support a very small number of tags, all
which have closing tags.
The problem is WebVTT does not require tags be closed in a cue, e.g.
04:05.000 --> 04:07.250
<c.foo>This is a cue.
That's allowed in WebVTT
I convert c.foo into
<span class="foo">This is a cue.
and when I add that to the html document and use innerHTML it then has the
closing </span> on it.
While it seems to work with some html entities, it breaks with others like
&#160;
So for now I have to just make sure all my WebVTT are closed and not use
the hack that adds closing tags - but since WebVTT cues do not have to have
closing tags, but the cues need to work in XML documents, a built-in parser
in JS that can add missing closing tags I think would be a good thing.
And if XML and HTML is giving you issues then use JSON instead.
I did not see JSON mentioned in the W3C spec though.
I think the JSON in WebVTT cues is not spec but some are using it.
Basically the textrack API seems to allow almost any string, it really has
to as WebVTT is not static and the spec changes. I wouldn't mind JSON being
added to WebVTT as it would be a handy way to encode metadata about the
media but that's another topic.
A built in JS HTML parser may also be of benefit in preventing code
injection, e.g. stripping out tags from a WebVTT cue that a website does
not allow.
The TextTrack API doesn't filter out things like script or other tags that
aren't part of WebVTT which means any site that allows users to upload
WebVTT files is creating a potential code injection vulnerability.
Server-side code should filter it on upload, but it would be nice to
*someday* be able to pass a string through a native JS filter much the same
way we can with htmltidy server-side and remove all but white-listed tags
and attributes and get back a cleaned string with all tags closed.
It looks like Google has a library that does that but it isn't intended
for client-side JS and may not be fast enough for things like phones to
process time-sensitive cues (I don't know).
I might be wrong but it looked like the google library I found was
intended for server-side Node.js use.
Continue reading on narkive:
Loading...