lxml is superior to BS. Most of the elementtree API is implemented by lxml too so it's compatible with BS - not sure why he's using BS when everything is built into lxml though and things like PyQuery and/or XPath parsing are available.
Like many here point out, lxml is a fast and versatile library that could be used for this alone without BS. lxml.html can parse HTML and lxml also has support for using HTML5 parser from html5lib that deals with broken HTML in the standardized way.