Blogsite that talks about web Programming experiences,techniques and ideas for ASP,PHP,ASP.NET and Javascript.

Watch out for that “hidden characters” when loading an XML to a DOM Document.

March 26th, 2009 by Roy L. `dshiznitz Besiera

I think this is very interesting information to post. Earlier this morning a friend of mine was troubleshooting things about loading and reading an XML document from a remote web service. The problem was when he tries to retrieve a nodelist by using an xpath expression, the children of each nodelist is not returning the correct object or data which is expected. Okay, i know that sounds like “Uhhh what?”. To give you a clear idea of what is going on here take a look at those several visualizations below.

The XML:

<musicshake status="ok">
  <list_item>
    <genre_num>1</genre_num>
    <genre_title>Hip-Hop/Rap</genre_title>
  </list_item>
  <list_item>
    <genre_num>2</genre_num>
    <genre_title>RNB</genre_title>
  </list_item>
</musicshake>

The Code:

$dom = new DOMDocument();
$dom->loadXML($oOutput);
$xpath = new Domxpath($dom);
$results= $xpath->query("//musicshake/list_item");
$val = "<ul>";
foreach ($results as $result) {
  $val .='<li><a href="#" onclick="listSongClick('theme',''';
  $val .=$result->childNodes->item(0)->nodeValue . '')">' ;
  $val .=$result->childNodes->item(2)->nodeValue  . '</a></li>';
}
$val .= "</ul>";
echo $val;

If you take a closer look at the code and analyze it a bit, you would notice these lines of code.

$result->childNodes->item(0)->nodeValue
$result->childNodes->item(2)->nodeValue

This is the identified problem. The list_item tag contains only 2 children which is the tag <genre_num> and <genre_title>.

<list_item>
<genre_num>1</genre_num>
<genre_title>Hip-Hop/Rap</genre_title>
</list_item>

But notice how the childNodes::item is returning the number of children in the list_item DOMNodelist. 0,1 and 2. We can safely say that it is returning 3 or more children or nodelist. But how come? I am only looking for 2 children not 3 or more.

So i started looking into it. The XML structure from the remote web service can be compared to a trojan horse .  If you’ve seen the movie Troy, you would know what i am referring to. But for the benefit of all who did not saw the movie,  please follow this link “Trojan Horse“. The analogy might not fit very well but you know what i mean. It looks harmful from the outside, but dangerous on the inside (lol). Ok enough of this non sense analogy :p

To actually see what is going on, i have done some few testing via echo , BTW this is in PHP ^^.  I know echo should not be used a debugging tool but please spare me this time, i was not able to setup my IDE for debugging.

rewriting the code to:
$children=$dom->getElementsByTagName(“list_item”);
$children->length // yields 2 which is fine i have two list_item elements on the document.

$grandchilren=$children->item(0)->childNodes; //yields 5, so im like What the Fudge??

it should return 2 not 5. I am only expecting tags <genre_num> and <genre_title>. Further testing…

for($i=0;$i<$grandchilren->length;$i++){
  echo 'Name: ',$grandchilren->item($i)->nodeName;
  echo ' Value: ',$grandchilren->item($i)->nodeValue,'<br/>';
}

Output:

Name: #text Value:
Name: num Value: 1
Name: #text Value:
Name: title Value: Hip-Hop/Rap
Name: #text Value:

There you have  it. It must be a whitespace or a tab or maybe a carriage return.  We only need to trim or remove these characters from the string before loading it on the DOMDocument::loadXML($oOutput) and we should be fine.

Unfortunately my friend went offline before i was able to tell him what the problem is.  So i hope he will be reading this post soon ^^

Posted in PHP Programming

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.