Ah! I think maybe we do have an ambiguity in "parse".
I think I interpreted "parse html" to mean "parse an html document as html", ie. inherently involving parsing the salient features of html, notably nested tags. But you equally validly interpreted "parse html" in a more general way as "parse an html document in a structured way to get data out of it in a useful form".
And the first way, "parse html" is inherently problematic, because even if it works in some cases, "works" implies "generate some sort of data representation of the structure of the document" which is exactly what you can't do. But the second way, "parse html" includes things like "scraping useful information out of it" which is 100% valid in a reading-a-clock way.
no subject
Date: 2012-11-20 04:59 pm (UTC)I think I interpreted "parse html" to mean "parse an html document as html", ie. inherently involving parsing the salient features of html, notably nested tags. But you equally validly interpreted "parse html" in a more general way as "parse an html document in a structured way to get data out of it in a useful form".
And the first way, "parse html" is inherently problematic, because even if it works in some cases, "works" implies "generate some sort of data representation of the structure of the document" which is exactly what you can't do. But the second way, "parse html" includes things like "scraping useful information out of it" which is 100% valid in a reading-a-clock way.
Does that sound right?