Machine-Learning Python Natural Language Processing

Back

1. pkuseg-python

A better version of Jieba, developed by Peking University.

2. NLTK

A leading platform for building Python programs to work with human language data.

3. Pattern

A web mining module for the Python programming language. It has tools for natural language processing, machine learning, among others.

4. Quepy

A python framework to transform natural language questions to queries in a database query language.

5. TextBlob

Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of NLTK and Pattern, and plays nicely with both.

6. jieba

Chinese Words Segmentation Utilities.

7. SnowNLP

A library for processing Chinese text.

8. spammy

A library for email Spam filtering built on top of nltk

9. genius

A Chinese segment based on Conditional Random Field.

10. KoNLPy

A Python package for Korean natural language processing.

11. Rosetta

Text processing tools and wrappers (e.g. Vowpal Wabbit)

12. PyNLPl

Python Natural Language Processing Library. General purpose NLP library for Python. Also contains some specific modules for parsing common NLP formats, most notably for [FoLiA](https://proycon.github.io/folia/), but also ARPA language models, Moses phrasetables, GIZA++ alignments.

13. PySS3

Python package that implements a novel white-box machine learning model for text classification, called SS3. Since SS3 has the ability to visually explain its rationale, this package also comes with easy-to-use interactive visualizations tools ([online demos](http://tworld.io/ss3/)).

14. python-ucto

Python binding to ucto (a unicode-aware rule-based tokenizer for various languages).

15. python-frog

Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER)

16. python-zpar

Python bindings for [ZPar](https://github.com/frcchang/zpar), a statistical part-of-speech-tagger, constituency parser, and dependency parser for English.

17. colibri-core

Python binding to C++ library for extracting and working with basic linguistic constructions such as n-grams and skipgrams in a quick and memory-efficient way.

18. spaCy

Industrial strength NLP with Python and Cython.

19. PyStanfordDependencies

Python interface for converting Penn Treebank trees to Stanford Dependencies.

20. Fuzzy Wuzzy

Fuzzy String Matching in Python.

21. jellyfish

a python library for doing approximate and phonetic matching of strings.

22. editdistance

fast implementation of edit distance.

23. textacy

higher-level NLP built on Spacy.

24. CLTK

The Classical Language Toolkit.

25. Rasa

A "machine learning framework to automate text-and voice-based conversations."

26. yase

Transcode sentence (or other sequence) to list of word vector .

27. Polyglot

Multilingual text (NLP) processing toolkit.

28. DrQA

Reading Wikipedia to answer open-domain questions.

29. Dedupe

A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

30. Snips NLU

Natural Language Understanding library for intent classification and entity extraction

31. NeuroNER

Named-entity recognition using neural networks providing state-of-the-art-results

32. DeepPavlov

conversational AI library with many pre-trained Russian NLP models.

33. BigARTM

topic modelling platform.

34. NALP

A Natural Adversarial Language Processing framework built over Tensorflow.

35. DL Translate

A deep learning-based translation library between 50 languages, built with `transformers`.