Python Distributed Computing

Back

1. dask

A flexible parallel computing library for analytic computing.

2. luigi

A module that helps you build complex pipelines of batch jobs.

3. mrjob

Run MapReduce jobs on Hadoop or Amazon Web Services.

4. PySpark

[Apache Spark](https://spark.apache.org/) Python API.

5. Ray

A system for parallel and distributed Python that unifies the machine learning ecosystem.

6. faust

A stream processing library, porting the ideas from [Kafka Streams](https://kafka.apache.org/documentation/streams/) to Python.

7. streamparse

Run Python code against real-time streams of data via [Apache Storm](http://storm.apache.org/).