1. Howdy! Welcome to our community of more than 100.000 members devoted to web hosting. This is a great place to get special offers from web hosts and post your own requests or ads. To start posting sign up here. Cheers! /Peo, FreeWebSpace.net

Xhtml

Discussion in 'Webdesign / HTML' started by Meksilon, Apr 25, 2012.

  1. Meksilon

    Meksilon NLC NLC

    Messages:
    1,151
    Likes Received:
    20
    Trophy Points:
    0
    Now I'm nearing completion of a pure XHTML website, I thought I'd talk about it for a moment.

    Most websites do not use real XHTML. They simply use HTML with an XHTML doctype.

    What's the difference?

    Well here's the normal HTML doctype we all know:

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

    Okay. Here's the XHTML doctype:

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

    It makes no difference what version you use.

    XHTML is NOT HTML, it is a recoding of HTML into XML. And ALL XML documents must have the XML declaration tag right at the top:

    <?xml version='1.0' encoding='utf-8'?>

    Of course, this now means that it can't be read by HTML browsers, only XML applications (after all that's what XML is for anyway). When serving it from a webserver, the MIME type is application/xhtml+xml.

    The problem is because people use XHTML when they should be using HTML, they simply remove thee XML declaration tag and serve the document as regular HTML using the text/html MIME type. But of course this means that it can only be parsed as an HTML document, not an XML document! This, of course, means there are no benefits over using the HTML doctype - and there are disadvantages. For instance, most browsers will be triggered into quirks mode to parse it since as an HTML document it's missing its DOCTYPE.

    Even browsers that choose to parse it as XML overriding the doctype and ignoring the missing xml declaration tag will usually fail to load it and revert to loading it through their HTML parser. This is because XML is not allowed to contain any errors, even a single error usually means that the XML parser fails to load the document.

    So in essence, my design serves the document by XML correctly with the right MIME type; and for browsers that don't like XML they get the XML tag and XHTML doctype removed and replaced with a regular HTML doctype, served as text/html.

    Oh, and an easy way to test whether your document is really being read XML is to self-close a script tag within the <head> tags as such: <script type="text/javascript" />

    Any browser interpreting it as HTML can't see a close script tag and won't display any of your webpage content.
     
    Last edited: Apr 25, 2012
  2. Ben

    Ben NLC NLC

    Messages:
    7,546
    Likes Received:
    144
    Trophy Points:
    0
    I'll add that there's no reason to use Transitional unless you want to use deprecated tags like <center> and <font...> or deprecated attributes like bgcolor ... stick with Strict and use span tags or something ... no formatting should be in the XHTML document itself, but in a separate CSS file.

    just FYI
     
    Last edited: Apr 27, 2012
  3. sander k

    sander k NLC NLC

    Messages:
    2,572
    Likes Received:
    48
    Trophy Points:
    0
    Is this going to be better for search engines and will all browsers be able to read the website?
     
  4. Meksilon

    Meksilon NLC NLC

    Messages:
    1,151
    Likes Received:
    20
    Trophy Points:
    0
    Yes. Here's a slightly more simplified version of the script I wrote to take care of it. Basically it automatically detects whether to send as html or xhtml, and can be overriden by the query string (for testing, etc).
    PHP:
    <?php
    $qstring
    =stripslashes($_SERVER['QUERY_STRING']);
    $usehtml=($qstring=='html');
    $usexhtml=($qstring=='xhtml');
    if(!
    $qstring=='')$robots='<meta name="robots" content="noindex,nofollow,noarchive" />';
    else 
    $robots='<meta name="robots" content="index,follow,noarchive" />';
    if(!
    $usehtml && !$usexhtml){
      
    $usexhtml=stristr($_SERVER['HTTP_ACCEPT'],'application/xhtml+xml');
      if(
    $usexhtml && preg_match("/text\/html;q=0(\.[1-9]+)/i",$_SERVER["HTTP_ACCEPT"],$qtest2)){
        if(
    preg_match("/application\/xhtml\+xml;q=0(\.[1-9]+)/i",$_SERVER["HTTP_ACCEPT"],$qtest1))
          
    $usexhtml=($qtest1[1]>=$qtest2[1]);
        else
          
    $usexhtml=false;
      }
    }
    if (
    $usexhtml) {
      
    $conttype='application/xhtml+xml';
      
    $dtype="<?xml version='1.0' encoding='utf-8'?"'>'chr(10). '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"'chr(10). '"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">'chr(10);
      
    $htmltg='<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">'chr(10);
    }
    else {
      
    $usehtml=true;
      
    $conttype='text/html';
      
    $dtype='<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">'chr(10);
      
    $htmltg='<html lang="en">'chr(10);
    }
    header('Content-type: '$conttype);
    header('Cache-Control: must-revalidate');
    header('Expires: 'gmdate('D, d M Y H:i:s',time()+3600). ' GMT');
    echo 
    $dtype$htmltg;
    ?>
    Here's my website's first real page: Toshiba BDX3200KY Firmware. It does not yet implement the correct design of the website, just an example of the HTML/XHTML duality in action (try the W3 verification links at the bottom of the page). Also, only mozilla actually highlights the XHTML source code correctly (if you view it). Chrome and Opera's source viewer incorrectly stops highlighting at the script tag as if the rest of the website is a javascript! Yet they parse the website correctly.
     
    Last edited: Apr 30, 2012
  5. TehGuy

    TehGuy New Member

    Messages:
    275
    Likes Received:
    12
    Trophy Points:
    0
    Your page is getting thrown into Quirks mode in IE7/8
     
  6. Meksilon

    Meksilon NLC NLC

    Messages:
    1,151
    Likes Received:
    20
    Trophy Points:
    0
    True, however any page that contains a single error in XHTML it automatically parsed as HTML rather than XML in browsers; quirks mode or not (test it for yourself). IE9 displays the XMHTL fine and highlights the source code correclty (whereas Opera and Chrome do not; and Safari doesn't even hilight at all).
     

Share This Page