About Jamie on Software

Jamie on Software is the online journal of web developer and writer Jamie Rumbelow.

Jamie likes books, guitars, programming, open source and food. He writes about these things too. This is where he puts the things he writes.

Tags
Tweets
Feeds
We Love
Powered by Squarespace
Monday
Jul252011

Syntax Sugar #3 - Easily Parsing HTML

The PHP SimpleXML extension makes parsing and using XML documents in your code a piece of cake. Unfortunately, HTML rarely complies as a well-formed XML document.

Using SimpleXML combined with DOMDocument, we can parse a reasonably badly formatted HTML document in very few lines of code. The trick here is using the DOMDocument::$strictErrorChecking variable to ensure that the source is parsed as dodgey HTML.

<?php
// Some HTML string...
$html = file_get_contents("http://codeigniter.com");

// Create a new DOMDocument and set strictErrorChecking to FALSE
$dom = new DOMDocument();
$dom->strictErrorChecking = FALSE;

// Load the HTML into the DOMDocument
$dom->loadHTML($html);

// Load the DOMDocument into SimpleXML... and win!
$obj = simplexml_import_dom($dom);

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>