BeautifulSoup

What it is good for?

Parsing HTML pages.

Beautiful soup is much, much easier to use than the default HTML parser installed with Python.

Installed with Python by default

no

Installed with Anaconda

no

How to install it?

  1. pip install bs4

Example

Parsing list items out of a HTML document:

  1. from bs4 import BeautifulSoup
  2. html = """<html><head></head><body>
  3. <h1>Hamlet</h1>
  4. <ul class="cast">
  5. <li>Hamlet</li>
  6. <li>Polonius</li>
  7. <li>Ophelia</li>
  8. <li>Claudius</li>
  9. </ul>
  10. </body></html"""
  11. soup = BeautifulSoup(html, "lxml")
  12. for ul in soup.find_all('ul'):
  13. if "cast" in ul.get('class', []):
  14. for item in ul.find_all('li'):
  15. print(item.get_text(), end=", ")

Where to learn more?

http://www.crummy.com/software/BeautifulSoup/bs4/doc/