홀덤사이트Paris Sportif CryptoTous Les Sites De Paris Sportifs BelgiqueMeilleur Casino En Ligne BelgiqueCasino Sans Documents

Entries from October 1, 2006 - October 31, 2006

Monday
Oct162006

Finding all the A HREF Urls in an HTML document (even in malformed HTML)

The Need

Given an HTML file you want to exract all the HREF urls.

 

You could use a Regex

I've done this before, but haven't found it entirely reliable. I use regex's so infrequently it is painful to relearn the syntax every time.

 

What I recommend

Use the HTML Agility Pack. If you are familiar with the XML DOM, using the HTML Agility Pack will come naturally.

 

HTML Agility Pack URL

http://www.codeplex.com/Wiki/View.aspx?ProjectName=htmlagilitypack

 

Two things that make HTML Agility Pack interesting

- It doesn't depend on Internet Explorer

- It works on malformed HTML. See this post for a little for context:  NET Html Agility Pack: How to use malformed HTML just like it was well-formed XML

 

Sample code

// this isn't a full sample, but enough to see the value of using the HTML Agility Pack

HtmlDocument input_doc = HtmlDocument();

input_doc.Load(“foo.htm”);

foreach ( HtmlNode node in input_doc.DocumentNode.SelectNodes("//a") )

{

string href_url = node.GetAttributeValue("href", "");

}

Friday
Oct132006

It's Helvetica's world, we just live in it.

A film about a typeface to be released in 2007: http://helveticafilm.com/

Microsoft uses a typeface called "Arial" that I discovered only recently is NOT the same as the "Helvetica" font found on Macs. The differences are subtle to my untrained eye, but those with a background in typography don't take the discrepencies lightly.

 

On the battle between Helvetica and Arial

Helvetica vs. Arial: http://www.engagestudio.com/helvetica/

How to spot Arial: http://www.ms-studio.com/articlesarialsid.html

Arial or Helvetica?: http://www.iliveonyourvisits.com/helvetica/

 

Wikipedia's article on Helvetica

http://en.wikipedia.org/wiki/Helvetica

 

The real evil: Comic Sans

The Arial's existence is tolerated by those in-the-know. Comic sans is hated. That is is a shame, because its designer, Vincent Connare, also designed one of my favorite fonts: Trebuchet. Read what Vincent says about Comic Sans.

Trivia: I first knew Vincent as a teammate on an ice hockey team. I remember him as a skilled player and an excellent sportsman. Only later did I discover his contributions to Microsoft's typography.

Wikipedia's article on Comic Sans: http://en.wikipedia.org/wiki/Comic_Sans