[whatwg] <audio> metadata
Andy Valencia
2017-04-10 01:38:14 UTC
What follows is a first pass at addressing a missing ability
when dealing with Internet streams, usually radio ones.
Comments and suggestions are quite welcome; as my first
attempt--ever--at submitting to this group, apologies if
I've made any mistakes in how I've proceeded.

Andy Valencia
Contact: https://vsta.org/contact/andy

Proposal for enhancement to HTML5 <audio> element to better support
metadata provided by streams.

# Background

Browsers have an <audio> element which can directly play streaming
URL's. When the stream is an Internet radio station, the stream
almost certainly offers metadata; typically, the artist and track,
and often an album cover URL as well. While modern browsers can
play these streams, there is no way to access this metadata.

Media elements in modern browsers _do_ have the notion of metadata,
but in current standards this is limited to static characteristics
such as duration. When listening to a radio stream, the metadata will
change as successive tracks are played.

# Stream Delivery of Dynamic Metadata

A moderately detailed description of current practices in providing
metadata is provided at:


(One detail glossed over in this page is that older streaming servers
start their GET response with "ICY 200 OK" rather than a standard
HTTP response. This technically means the response is HTTP 0.9
without headers; the headers are present, but must be processed by
the application and trimmed from the start of the media stream
in the GET response. Apple Safari already declines to play 0.9
media elements within a post-0.9 page, and there are signs that
Chrome will follow suit. Thus, accomodating this older mode should
be considered optional.)

Newer streaming server versions use a proper HTTP response, with
the documented elements in the HTTP response header.

Because the metadata is encoded in a general attribute=value
format, a wide variety of metadata may be encoded. By convention,
at least the attributes "StreamTitle" and "StreamUrl" will be
included in the response. Also by convention, StreamTitle is
structured as "Artist - Track Name".

# API Considerations

While listening to streams with metadata is a compelling application
of web technology, it is probably a tiny percentage of the use
of <audio> elements. Thus, this proposal describes a mechanism
which leaves the default browser behavior unchanged. It also
gracefully degrades when presented to a non-implementing
browser, still permitting stream play while in the existing
non-metadata fashion.

This proposal is designed with the current state of streaming,
currently depending on the HTTP Icy-MetaData header element. However,
this specific detail is abstracted, in recognition that streaming
technology could evolve in the future. The intention is that this API
will remain stable even if the details of metadata request and
delivery change.

As noted, current HTML5 media elements do have some metadata support.
This API is also structured to permit these existing elements to
participate in this API if desired.

# <audio> element changes

These enhancements are activated when the <audio> element has the
new attribute "dynamicmetadata" present. Without this attribute,
no metadata request header is added to the stream HTTP request, and no
other attributes of this proposal will be acted upon even if present.

If the server response indicates support for dynamic metadata,
on each metadata update, the <audio> element's attributes are changed
to reflect the latest received metadata. For each "Attribute=Value"
in the metadata update, an attribute "metaAttribute" with value "Value" will
be added to the <audio> element. Previous metadata attributes which
are not present in the latest update are left untouched. The browser
must verify well-formed identifier names for each Attribute, and
quietly reject ill-formed names. It can also apply checks on the
Value. (Implementors are reminded that the Value, because it encodes
information such as names, might contain a wide range of character

The <audio> element adds the hook "onmetadatachange", connecting to
a function which is called on each update of metadata. This
function is called with an event, which includes the attribute
"changed" with a value which is a list of Attribute names which
are present in the update.

# Example code

function new_meta(ev) {
const changes = ev.changed;
if (changes.indexOf("StreamTitle") >= 0) {
const title = player.metaStreamTitle;
const idx = title.indexOf(" - ");
if (idx == -1) {
// "Just a plain old piece of text"
artist.textContent = title;
track.textContent = "";
} else {
// <artist> - <title>
artist.textContent = title.slice(0, idx);
track.textContent = title.slice(idx+3);
if (changes.indexOf("StreamUrl") >= 0) {
// Display album art
art.src = player.metaStreamUrl;
} else {
art.src = "";

Artist: <span id="artist"></span><br>
Track: <span id="track"></span><br>
Album Art:<br>
<img id="art">

<audio src="http://an-internet-radio-station.whatever" id="player"
dynamicmetadata onmetadatachange="new_meta(ev)">

Philip Jägenstedt
2017-04-10 04:44:42 UTC
This is related to https://wicg.github.io/mediasession/, which is the API
for telling the browser about what the metadata is, so that it can be used
in UI that is outside of the page.

Two things seem to be missing, then. First,
http://www.smackfu.com/stuff/programming/shoutcast.html isn't detailed
enough to get interoperable implementations, in particular the metadata
keys would have to be defined.

Second, as already outlines above, one needs to be able to get the current
metadata somehow. I think that a mediaElement.getMetadata() method that
returns a https://wicg.github.io/mediasession/#the-mediametadata-interface
would make sense, and then the metadatachange event could be a simple event
with no extra information on it.

There is a very old bug for exposing the metadata:

In order to make progress, there needs to be implementer interest. Although
it may well fizzle out, a new issue
https://github.com/whatwg/html/issues/new with the concrete suggested
changes for HTML would be a good starting point.
Anne van Kesteren
2017-04-10 05:07:27 UTC
Post by Philip Jägenstedt
In order to make progress, there needs to be implementer interest. Although
it may well fizzle out, a new issue
https://github.com/whatwg/html/issues/new with the concrete suggested
changes for HTML would be a good starting point.
Demonstrating there's interest in this through a popular JavaScript
library or two would help a lot.
Andy Valencia
2017-04-14 20:23:02 UTC
Thank you for the helpful comments.
Post by Philip Jägenstedt
http://www.smackfu.com/stuff/programming/shoutcast.html isn't detailed
enough to get interoperable implementations, in particular the metadata
keys would have to be defined.
Note that mp3, flag, ogg, and wav all have entirely open ended
container formats for their metadata. The framework as used by
Shoutcast and Icecast is similarly extensible, although StreamTitle
and StreamUrl are the obvious low-hanging fruit. I've reached
out to the Icecast folks to see if there's interest in, say,
an informational RFC.
Post by Philip Jägenstedt
Second, as already outlined above, one needs to be able to get the
current metadata somehow. I think that a mediaElement.getMetadata()
method that returns a
instance would make sense, and then the metadatachange event could
be a simple event with no extra information on it.
Ok. Note that this data structure suffices to encode the baseline
information from Shoutcast/Icecast. It does not, for instance,
encode "Label", needed to do licensing reporting in the USA.
"Year" is another datum often of interest.

But "good enough" as a starting point is fine by me.

I'm assuming getMetadata() is based on the older mozGetMetadata()
API? That API appears to auto-populate from the various supported
audio file headers. So adding "dynamicmetadata" is just another
way for those fields to be populated.

(I'm going to look at submitting an edit to the metadata container
so it can handle extension information, something along the lines
of the "X-" prefix in RFC-822 headers. That would be orthogonal
to this work.)
Post by Philip Jägenstedt
In order to make progress, there needs to be implementer interest.
Yes; I have no doubt there's many more ideas to lob at browser
implementors than they could ever cook up (even if vetted to just
the "good ones"). However, as a long-time C dev with a touch of C++,
I'm counting on doing an implementation for at least one major
browser if I get to rough consensus.

I know, it doesn't guarantee they'll accept my submission.
Post by Philip Jägenstedt
Demonstrating there's interest in this through a popular JavaScript
library or two would help a lot.
If I could do this in Javascript, I would. Multiple issues:

<audio> and src= run at the full efficiency of platform audio
streaming. But you don't get to see the bytes.

You can do the fetch yourself and look at the partial data in
responseText (remember, it's an ongoing stream). But responseText
keeps growing, requiring you to periodically reset the connection.
Hard to maintain a listening experience.

What to do with the bytes after peeling out the mdatadata? A local
URL is based on a Blob, which is immutable, so no good way to tack on
newly arrived data. You can run your own ogg/flac/mp3/wav decoder,
but you'll come up short of platform efficiency. Probably a non-
starter for mobile.

The new ReadableStream mechanism to feed fetches out of a
Service Worker is either a solution, or at least pretty close.
(I can't quite convince myself it's actually mutable in the way
needed to endlessly append stream data as it arrives.)

But the overarching issue is that you're doing JS-initiated
network operations, and origin policy is going to stop you.
You can claim Shoutcast/Icecast should give permissive
origins, but they don't, and since an admin-ish interface is
also multiplexed at the host, probably shouldn't.

I'll rework my submission based on these comments, thanks again.
Delfi Ramirez
2017-04-15 12:00:59 UTC
Hi All:

Some information that may be of use, concerning to the WPI rules for
royalties et al in <audio> files.

What we know as rights. Uh,

Meta elements required

* Title : 100%
* Artist ( Interpreter): 12%
* Time: lenght of the <audio> piece. Royalties are assigned by time
* Year: (_Objective Reason: It use to happen that some__ <audio> files
have the same name, thus causing a mistake in the attribution to the
artist as it happen in the past_)
* Composer: 20%
* Arrangements: 20%
* Producer: 40%
* Label: It depends on the contract, but as an agent collects all the
* Software: Last but not least, some software use in the production of
audio has its own rights. Uh.

Hope this information it helps for the meta-tag issues

_BTW . There is a new start-up company named orfium.com, which CEOs now
all this issues, there is the cahnce ask them for more info._



Post by Andy Valencia
Ok. Note that this data structure suffices to encode the baseline
information from Shoutcast/Icecast. It does not, for instance,
encode "Label", needed to do licensing reporting in the USA.
"Year" is another datum often of interest.
Only "artist" and "title" are required for royalties reporting for internet radio.
But "album" and "year" provides additional information that helps.
Commercial radio and TV uses at minimum the artist and title, and if lucky the listener (digital radio) and viewer get to also see album and year.
Also royalty reporting is done in a earlier stage, what a listener sees is not what is logged/given for royalties reporting.
Ogg (Vorbis or Opus) should in theory be easily supported as metadata is given in a sidestream right? So is therefore independent of the audio stream.
Mozilla has audio.mozGetMetadata()
I have no idea if that fires once or each time more metadata is passed in the stream.
Only says that it is fired when the metadata is loaded.
I'm assuming it's only at stream start though.
So with a few "tweaks" Firefox could support Icecast Ogg metadata, if the browser is compliant with the Ogg standard then support is very easy to add.
Shoutcast v1 would require parsing of the audio stream, Shoutcast v2 is a little different and can pass info like album and year and artwork.
The only Shoutcast v2 compatible player I'm aware of is the aging Winamp, the majority of Shoutcast streams are v1 streams.
So while Firefox almost is able to provide stream meta updates, all the other browsers do not though and would require polyfill which as you point out has it's own issues with having to reset the stream as the buffer fills up.
Maybe support for enabling a large cyclic buffer could be used, triggered by a "stream" parameter for html audio maybe.
There would still be a issue with metadata possibly being partly in the current buffer and partly in the next buffer so any javascript would need to splice that together.
Ogg seems simple enough
And parsing of this metadata should be in the ogg source (libogg?) so any browser that supports Ogg should be able to get that metadata.
Roger Hågensen,
Freelancer, Norway.
Delfi Ramirez
2017-04-16 01:01:17 UTC
Hi Roger, hi all:

My fault WPI, was horrendous mistake due to the keyboard, and other
thingsin mind : WIPO, I meant.

here below the info to avoid future re-works in the API and teh <audio>
tag, if it helps.

* WIPO stands for _World Intellectual Property Organization_, and the
IP acronym for them, mean _royalties. _Speaking in our technical terms,_

* http://www.wipo.int/wipo_magazine/en/2015/02/article_0001.html
* The international legal rule is featured in the Article 15 of the
treaty :


I did not want to be rude, neither to pray for extra efforts by the team
at WHATWG. just put in common knowledge , based on my past personal (
say vane ) professional experience.

Jut three final observations to serve and to help

* Length: Five seconds MAY be the minimum legal.

* (5") Five seconds of 'stolen' / Ring sounds that are digitised audio
files/Different mixes/edits seconds/et cetera of <audio> is the minimum
for a wannabe-lawyer to go to court.. Please, keep this fact in mind.

* Meta-Data: Following the indications of the WIPO ( focusing on a
World Wide Web service ) that now, services like Pandora are not allowed
to stream ( are de facto banned ) in earth places like Africa, Europe,
or The East. May it be due to not meet the legal requirements. Because
of, sadly, streaming besides the neutrality of the technique is a unique
sales channel.
* IRSC: "_The MP3 format does allow rights management information like
ISRC to be included however it is rarely used. What is used is the ID3
system of tags, which is not part of the international standard, but
does enable ISRC to be encoded. It is therefore recommended that an ISRC
be encoded into the ID3 tag._" Uh.
* <audio> files may not just to streaming songs or bleeps, but to
scientific talks and college conferences. which may be radio live
* Thus, the (Title - Author) binomial proposed, in this clear need,
does not works.

just mumbling



Post by Delfi Ramirez
Some information that may be of use, concerning to the WPI rules for
royalties et al in <audio> files.
I have no idea what/who WPI is.
But StreamLicensing.com (which has a deal with ASCAP/BMI/SESAC/SoundExchange)
Only require artist and title, and that artist and title is viewable by the listener.
One of the PROs (Performance Royalty Organization) did want album but waived that requirement.
Post by Delfi Ramirez
Meta elements required
* Title : 100%
* Artist ( Interpreter): 12%
* Time: lenght of the <audio> piece. Royalties are assigned by time
* Year: (_Objective Reason: It use to happen that some__ <audio> files
have the same name, thus causing a mistake in the attribution to the
artist as it happen in the past_)
* Composer: 20%
* Arrangements: 20%
* Producer: 40%
Artist and title is always required. But I assume that by title you the field itself as in it being "Some Artist - Some Song" where spacedashspace (" - ") is a separator for artist and title.
As to length, any listened time longer than 30 seconds is counted, and I forge the max time.
You also forgot about mentioning ISRC which is a globally unique identifier for tracks, radio stations may use ISRC when sending in performance logs.
I'm not sure a end listener would need all this meta data though, such info should be logged separately by the radio station or by the streaming server itself.
The listener would only be interested in (minimum) artist and title, album, year and artwork being a bonus. And lyrics being a nice surprise.
Although I'd argue that artist and title (+ album and year) could be used to fetch artwork and lyrics using XHR upon user interaction instead.
I'm not going to comment further on the royalty stuff as this is weering quite off-topic now.
Roger Hågensen,
Freelancer, Norway.
Delfi Ramirez
2017-04-16 13:36:47 UTC
Hi Roger, hi all:

* "_passing metadata in a stream so that a HTML webplayer can show
artist and title_" can be accomplished extracting the ID3 tags from the
file and presenting them as JSON values, for example

* Sound.load(new URLRequest("07 - audio.mp3"));
* Some old tricks on the issue were done in the past. here the link of
an ECMAScript derivative from the past, if it serves you as a model ID3
tags Get/Receive [6].

But, please take in consideration:

* Not all webplayers stream music by obligation. Not all webplayers
are designed to play music.

* Sounds: Rings and bleeps of a mobile device or a computer device.
Audio meta tags should be applicable to the webplayer ( one file vs.
multiple files ).
* Streaming on the web: As I noticed to this group in my last email,
in our present days there is a growing demand for streaming by
scientific communities to publish audio conferences or talks.

* Concerning to my references WIPO et al, yes I agree , technology
should be considered neutral.

* The only important issue to consider would be the five seconds
minimum lenght.



Post by Delfi Ramirez
* WIPO stands for _World Intellectual Property Organization_, and the
* Meta-Data: Following the indications of the WIPO ( focusing on a
World Wide Web service ) that now, services like Pandora are not allowed
to stream ( are de facto banned ) in earth places like Africa, Europe,
or The East. May it be due to not meet the legal requirements.
The reason for services like Pandora geoblocking is because the PROs are basically trying to carve out regions (like region blocking for DVDs and BlueRay), it's greedy and stupid. The EU is working on legislations to limit or do away with this for online stuff.
Also, metadata sent to the user/listener has very little to do with royalty reporting. The reporting must be done by the webmaster at the webserver level or by the DJ at the encoding level.
These logs are then passed independently of what the the listener see/hear.
One thing that could be useful to show to the listener is a copyright hint like indicating if the stream is CC BY (Creative Commons Attribution) for example.
May I also point out that this has gone very offtopic (I should probably be the last person to point this out though).
WHATWG has very little to do with PROs/WIPO/Royalties/rights, a different fora should be used for that.
I'd like to get back on topic and to the discussion of passing metadata in a stream so that a HTML webplayer can show artist and title (and maybe year/album if present) to the listener and have this be changed/updated when this info changes in the stream (usually at song change but can occur more often to provide special non-song messages as well).
Firefox seems to support it (though I have not had the time to test it yet) but it is uncertain what formats it works on and if it works for streams at all.
Roger Hågensen,
Freelancer, Norway.
Anne van Kesteren
2017-04-15 18:40:29 UTC
On Fri, Apr 14, 2017 at 10:23 PM, Andy Valencia
Post by Andy Valencia
But the overarching issue is that you're doing JS-initiated
network operations, and origin policy is going to stop you.
You can claim Shoutcast/Icecast should give permissive
origins, but they don't, and since an admin-ish interface is
also multiplexed at the host, probably shouldn't.
If the same-origin policy stops you, it should also stop a C++
implementation. It's there for a reason.