Machine-Learning Java Natural Language Processing

Back

1. Cortical.io

Retina: an API performing complex NLP operations (disambiguation, classification, streaming text filtering, etc...) as quickly and intuitively as the brain.

2. IRIS

[See the Tutorial Video](https://www.youtube.com/watch?v=CsF4pd7fGF0).

3. CoreNLP

Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words.

4. Stanford Parser

A natural language parser is a program that works out the grammatical structure of sentences.

5. Stanford POS Tagger

A Part-Of-Speech Tagger (POS Tagger).

6. Stanford Name Entity Recognizer

Stanford NER is a Java implementation of a Named Entity Recognizer.

7. Stanford Word Segmenter

Tokenization of raw text is a standard pre-processing step for many NLP tasks.

8. Tregex, Tsurgeon and Semgrex

Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for "tree regular expressions").

9. Stanford Phrasal: A Phrase-Based Translation System

10. Stanford English Tokenizer

Stanford Phrasal is a state-of-the-art statistical phrase-based machine translation system, written in Java.

11. Stanford Tokens Regex

A tokenizer divides text into a sequence of tokens, which roughly correspond to "words".

12. Stanford Temporal Tagger

SUTime is a library for recognizing and normalizing time expressions.

13. Stanford SPIED

Learning entities from unlabeled text starting with seed sets using patterns in an iterative fashion.

14. Twitter Text Java

A Java implementation of Twitter's text processing library.

15. MALLET

A Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

16. OpenNLP

a machine learning based toolkit for the processing of natural language text.

17. LingPipe

A tool kit for processing text using computational linguistics.

18. Apache cTAKES

Apache Clinical Text Analysis and Knowledge Extraction System (cTAKES) is an open-source natural language processing system for information extraction from electronic medical record clinical free-text.

19. CogcompNLP

This project collects a number of core libraries for Natural Language Processing (NLP) developed in the University of Illinois' Cognitive Computation Group, for example `illinois-core-utilities` which provides a set of NLP-friendly data structures and a number of NLP-related utilities that support writing NLP applications, running experiments, etc, `illinois-edison` a library for feature extraction from illinois-core-utilities data structures and many other packages.