Machine-Learning Python Natural Language Processing

Back

1. pkuseg-python

A better version of Jieba, developed by Peking University.

2. NLTK

A leading platform for building Python programs to work with human language data.

3. Pattern

A web mining module for the Python programming language. It has tools for natural language processing, machine learning, among others.

4. Quepy

A python framework to transform natural language questions to queries in a database query language.

5. TextBlob

Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of NLTK and Pattern, and plays nicely with both.

8. spammy

A library for email Spam filtering built on top of nltk

9. genius

A Chinese segment based on Conditional Random Field.

10. KoNLPy

A Python package for Korean natural language processing.

11. Rosetta

Text processing tools and wrappers (e.g. Vowpal Wabbit)

Python Natural Language Processing Library. General purpose NLP library for Python. Also contains some specific modules for parsing common NLP formats, most notably for [FoLiA](https://proycon.github.io/folia/), but also ARPA language models, Moses phrasetables, GIZA++ alignments.

13. PySS3

Python package that implements a novel white-box machine learning model for text classification, called SS3. Since SS3 has the ability to visually explain its rationale, this package also comes with easy-to-use interactive visualizations tools ([online demos](http://tworld.io/ss3/)).

14. python-ucto

Python binding to ucto (a unicode-aware rule-based tokenizer for various languages).

15. python-frog

Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER)

16. python-zpar

Python bindings for [ZPar](https://github.com/frcchang/zpar), a statistical part-of-speech-tagger, constituency parser, and dependency parser for English.

17. colibri-core

Python binding to C++ library for extracting and working with basic linguistic constructions such as n-grams and skipgrams in a quick and memory-efficient way.