Post Archive

› May 8, 2003

Serving XHTML as 'application' or 'text'

  • Reported by liorean

Having read a bit on this issue on the net as of lately, I've got a few wonderings about XHTML and MIME-types - mostly because I think the W3C are moving in the wrong direction by recommending 'application/xhtml+xml' before 'text/xml' (which is going away) or creating a new 'text/xhtml+xml' MIME-type.

- I totally agree with the fact XHTML should not be sent with 'text/html' since there is already an associated handling of such files, that doesn't exactly match the one for XHTML.
- Generally XHTML documents are content rich and the content is made for reading. The top types of contents are designed so that if there isn't any application for that type of content, it should either be treated as unknown, or sent to an interpreter for that general type. That interpreter will then decide what to do if it can't be displayed. The 'text' type can be sent to a general text reader/editor, while 'image' can be sent to a general image handler, 'audio' to a general audio handler and so on. Because XHTML is in most cases some markup with much textual content, it makes sense to send it as the 'text' type.
- 'application' on the other hand is made for binary data or data that isn't human readable - or textual data that is marked up with binary or not human readable codes. XHTML is generally not that type of data.
- The fact that encoding is set to default to one type by XML and one type by HTTP is not a problem with the assignment of a MIME-type - it's a problem with the current HTTP standard. Instead of setting a MIME-type to something that isn't SEMANTICALLY right (and I think the choice of 'application' and 'text' can be considered one of semantics), we should correct the protocol to not break usage of the semantically correct 'text' MIME-type. HTTP could easily without breaking especially many implementations be changed to use UTF-8 and default to the same. Unicode is the way of the future anyway, so why let an encoding that takes the same space for the general type of content that is transfered, but is inferior in what characters it can carry, take precedence simply because of 'because that's the way we have always done it' arguments?

Now, what are your thoughts about this? Am I wrong in this? If so, why? Do you have any arguments that I may have overlooked?

Comments

1. May 8, 2003 06:08 PM

Quote this comment

Tony Posted…

I dunno. I'm all torn this way and that. I've talked myself back and forth on this issue every which way. I myself don't have a burning desire to set mime-types sent to the browser. I don't care if it's super-easy. I just don't want to. I REALLY don't want to if I have to specialize it according to the browser...I do enough browser-specific crap nowadays. I think it's a browser problem. I think that the browser should recognize BY THE DOCTYPE what the hell to do with the document it just downloaded. I agree completely with the application argument. Most of the data sent isn't for an application to be rendered. It's text to be parsed by the browser based upon the latest and greatest rulesets by the W3C. This topic makes my head spin round and round, like a record baby...

2. May 9, 2003 11:03 AM

Quote this comment

paul Posted…

I'm not exactly following what you're saying, but I think I catch your drift. I looked into this a little while back now and I don't think I agree with what you're saying.

One point you make is that the "application" prefix to a MIME type should only be used for data that isn’t human readable. You also point out that XHTML is a format for conveying content rich documents and so should fall back to text if an XHTML parser is not available by using the "text" prefix to a MIME type. I disagree with this logic.

IIRC, it was thought at the release of HTML the exact same thoughts you list above, that even though the HTML format was meant to be parsed by a program and the user never see the source, that the format was human readable in its plain form. When this was unleashed on the internet many people using mail readers that could not parse HTML found the format inconvenient to read in its plain text form. I think it is now thought that HTML should have been sent using a "application" prefix.

On another note, this is the reason CSS had to be released with a "text" prefix to its MIME type as it was to be included inline with HTML documents as well as in documents of its own. Now, CSS is clearly not human readable, nor does it make sense for a human to read it.

Now aside from this, I agree with the fact that XHTML has a "application" prefix to its MIME type and I don't think this format is generally human readable. If I were sent an XHTML document in an email and I didn't have a parser available, in Mutt, for example, I would not choose to read the source, nor would I find it easy reading the source.

3. May 9, 2003 10:10 PM

Quote this comment

Matt Posted…

The application/* MIME type prevents intermediate servers from transcoding the content-encoding value, as one advantage. It's also already registered as RFC 3236. Don't browser sniff, sniff the HTTP-ACCEPT header, it'll tell you if the browser can handle it.

4. May 13, 2003 04:44 PM

Quote this comment

Doug Posted…

I'm on board with Tony on this one:

I myself don’t have a burning desire to set mime-types sent to the browser. I don’t care if it’s super-easy. I just don’t want to. I REALLY don’t want to if I have to specialize it according to the browser…I do enough browser-specific crap nowadays.

I think it’s a browser problem. I think that the browser should recognize BY THE DOCTYPE what the hell to do with the document it just downloaded.

Myself, this MIME-type business seems like a stupid hoop to jump through and will do nothing to encourage to designers to adopt standards.

But it seems like the W3C is beginning to wander off into la la land, what with the strange formulation of XHTML 2 and all, so this MIME-type business doesn't surprise me too much. As much I appreciate standards, it seems like some of these recommendations from W3C is just a lot of ultra-geeky masterbating (i.e. recommendations/standards for their own sake, vs. actually serving some sort of pragmatic purpose.)

5. May 13, 2003 06:11 PM

Quote this comment

Liorean Posted…

I don't agree (of course). Mostly because the content type is intended to be the thing that tells the user agent (=browser today, but MIME and HTTP aren't intended for browsers specifically) what type of document it's sent. The browser wouldn't be able to diffetentiate an XML document from an SGML document from a JavaScript from a StyleSheet from a textual database ...you see what I'm getting at... if there weren't something that told the browser what it's supposed to handle the recieved document as. Then, within the confines of that, the type can contain sup-types, and that's what DTD provides. The content type does the same for the net as the file extensions does for Windows. A .doc document can be Word6.0 or Word2k, you can't tell from the extension. It's the same with content types. This is in fact one of the most crucial features of the HTTP protocol to get the internet to work as it should. What I'm talking about here isn't really related to that, though. As long as we have one content type that is coupled with the handling we want, that's what we should send the file as. BUT, the choice of what to call that type is what I'm talking about. This is akin to using .txt, .asc or .text for text files. Any of the extensions is as good as the others, the question of what of them is the most appropriate to have coupled to the handling they all are associated with is the one I'm addressing. I think the XHTML file format is quite readable. I wouldn't want to read it if I can avoid it, but I prefer reading it as text to not reading it at all. Another thing, the meaning of the supertypes is to know what handler to send it to if there exists no one specialised on that content type. I think sending it to a 'text' handler is more appropriate than not being able to open it in any handler at all, as with 'application'. The choice of displaying it or doing something else lies on that handler for the general 'text' type, not on the program shuffling it to there. In other words, if a mail program doesn't support 'text/html', it can chose to NOT render it, but instead tell you to get a program that can handle that content type. Or, if it's simply not able to handle html, but knows how to convert it to pure text, it can strip it of tags and display the test alone.

6. January 28, 2004 03:15 AM

Quote this comment

Zooplah Posted…

I'd just like to note that the original proposed MIME type for XHTML was text/xhtml; but it was taken to the IETF for normalization and ultimately got application/xhtml+xml (which we'll just have to live with). Furthermore, this is all compliant with the RFC's: text/xml is plain XML text (you don't get namespaces or things like that). application/xml is for applications of XML.