Python Web Content Extracting

Back

1. html2text

Convert HTML to Markdown-formatted text.

2. lassie

Web Content Retrieval for Humans.

3. micawber

A small library for extracting rich content from URLs.

4. newspaper

News extraction, article extraction and content curation in Python.

5. python-readability

Fast Python port of arc90's readability tool.

6. requests-html

Pythonic HTML Parsing for Humans.

7. sumy

A module for automatic summarization of text documents and HTML pages.

8. textract

Extract text from any document, Word, PowerPoint, PDFs, etc.

9. toapi

Every web site provides APIs.