Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why is he using lxml as the parser and not just the one built into BS?


lxml is superior to BS. Most of the elementtree API is implemented by lxml too so it's compatible with BS - not sure why he's using BS when everything is built into lxml though and things like PyQuery and/or XPath parsing are available.


BeautifulSoup uses regular expressions! http://stackoverflow.com/questions/1732348/regex-match-open-...

Like many here point out, lxml is a fast and versatile library that could be used for this alone without BS. lxml.html can parse HTML and lxml also has support for using HTML5 parser from html5lib that deals with broken HTML in the standardized way.


> BeautifulSoup uses regular expressions!

Holy hell, you're right.

http://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: