How do I parse HTML using a Regex?
Nov. 20th, 2012 10:57 amBookmark organising
When I organised my bookmarks, I realised that most fell into the categories of
When I organised my bookmarks, I realised that most fell into the categories of
- Stuff I use every day and want quickly accessible. Eg. email, social networking homepages, the wikipedia page on the unicode checkmark, etc.
- Something to read later. Eg. links that seem interesting, computer games, books and films to consider buying or renting, etc.
- Something to read periodically, eg. news sites, social networking friends pages, feeds, blogs I follow, many webcomics divided into "daily", "bi/triweekly" etc.
- Something I may occasionally want as a reference. Eg. step-by-step instructions for stuff I do occasionally.
- Stuff that's useless, doesn't update, but I just keep coming back to because it's so awesome, such as The world flag rating page (do not make your country's flag in photoshop, tricolors are overused), The Evil Overlord List of movie-stereotypical mistakes I will not do if I'm ever an evil overload, and the Earth destruction advisory board FAQ on non-dilettante ways to destroy the earth
The last category was a minor surprise to me, as I'd not realised in advance it was a category I'd need. But I really do need it, because even if I don't need those links, if I don't have it, my mind keeps saying "don't forget the earth destruction advisory board, what it if updates the earth destruction status[1]" so I need a place to put them, just to get them out of the way in all the other categories!
I do the same with physical objects too: if I want to keep it and it doesn't have a place, make a place for things I keep for that reason however stupid. Then, if I decide it's stupid and I don't need to keep it, I can throw it out later, having already separated it from stuff I'm keeping for a more useful reason.
The reason I mention this now is that last night several of us were talking about an answer on stack overflow that is incredibly awesome and made the rounds several times recently, but some less-programmer-y people hadn't seen, which is one of the most recent links promoted to my list of "stuff on the internet I personally find most awesome".
Link for khalinche and ceb from last night, how do I use a regex to detect certain sorts of tag in HTML text
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454
There is a question on Stack Overflow asking how to use a regex to detect certain sorts of tag in HTML text and the first answer (link) is a work of genius, as the answerer gets more and more emphatic about his opinion, it's really funny and accurate (even if you don't know what the words mean, it's still funny and you can get a gist of the answer if you scroll through slowly to the end). :)
Footnotes
[1] On 10 September, 2008, it did, advancing the "Earth destruction advisory count" from 0 to 1. There is a supplementary FAQ on the event at http://qntm.org/board, starting with "The Earth hasn't been destroyed! What are you talking about?"