Why is he using lxml as the parser and not just the one built into BS?

Ixiaus · on March 10, 2013

lxml is superior to BS. Most of the elementtree API is implemented by lxml too so it's compatible with BS - not sure why he's using BS when everything is built into lxml though and things like PyQuery and/or XPath parsing are available.

ville · on March 10, 2013

BeautifulSoup uses regular expressions! http://stackoverflow.com/questions/1732348/regex-match-open-...

Like many here point out, lxml is a fast and versatile library that could be used for this alone without BS. lxml.html can parse HTML and lxml also has support for using HTML5 parser from html5lib that deals with broken HTML in the standardized way.

kanzure · on March 11, 2013

> BeautifulSoup uses regular expressions!

Holy hell, you're right.

http://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view...