Affiliated Projects

NumFOCUS Affiliated Projects are focused on open source data science, make meaningful use of NumFOCUS-sponsored tools, have a significant and consistent community of contributors, have supported the open source data science computing community through contributions of code, and are NOT fiscally sponsored by NumFOCUS. We highlight affiliated projects to encourage the community to contribute to, promote, and support these open source tools! If your project meets the above criteria and you would like to become a NumFOCUS Affiliated Project, please .


Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, but also deliver this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications


Conda is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them. It works on Linux, OS X and Windows, and was created for Python programs but can package and distribute any software.


A community-led collection of recipes, build infrastructure and distributions for the conda package manager. The conda-forge GitHub organization contains thousands of repositories of conda recipes and a framework of automation scripts for facilitating CI setup and maintenance for these recipes.The goal is to provide peer-reviewed, community-standard recipes and a self-consistent ecosystem of binary packages that those recipes produce.

Read More

In its current implementation, conda-forge relies on free services from AppVeyor, CircleCI and Travis CI to power the continuous build service on Windows, Linux and OS X, respectively. Each recipe is contained in a separate repository also containing the CI configuration. This repository is referred to as a feedstock, and is automatically built in a clean and repeatable way on each platform. Package hosting is currently done on


Cython is an optimising static compiler for both the Python programming language and the extended Cython programming language (based on Pyrex). It makes writing C extensions for Python as easy as Python itself.


Dask enables parallel computing through task scheduling and blocked algorithms. This allows developers to write complex parallel algorithms and execute them in parallel either on a modern multi-core machine or on a distributed cluster.

Data Retriever

The Data Retriever is a package manager for data. It downloads, cleans, and stores publicly available data, so that analysts spend less time cleaning and managing data, and more time analyzing it.​


DyND is a C++ library for dynamic, multidimensional arrays. It is inspired by NumPy, the Python array programming library at the core of the scientific Python stack, but tries to address a number of obstacles encountered by some of its users. Examples of this are support for variable-sized string, ragged array types, and convenient usage from C++. The library is in a preview development state, and can be thought of as a sandbox where features are being tried and tweaked to gain experience with them.


Gensim is a Python library providing scalable statistical semantics, analysis of plain-text documents for semantic structure, and retrieval of semantically similar documents.


MDAnalysis is a Python library to analyze trajectories from molecular dynamics (MD) simulations. It can read and write most popular formats, and provides a flexible and fast framework for writing custom analysis through making the underlying data easily available as NumPy arrays.


Numba gives you the power to speed up your applications with high performance functions written directly in Python. With a few annotations, array-oriented and math-heavy Python code can be just-in-time compiled to native machine instructions, similar in performance to C, C++ and Fortran, without having to switch languages or Python interpreters.


Open source data visualization and data analysis for novice and expert. Interactive workflows with a large toolbox.


pomegranate is a Python module for fast and flexible probabilistic modeling inspired by the design of scikit-learn. A primary focus of pomegranate is to abstract away the intricacies of a model from its definition, allowing users to easily prototype with complex models and training strategies. Its modular implementation allows for probability distributions to be swapped in or out for each other with ease and for models to be stacked within each other, yielding such delights as a mixture of Bayesian networks or a Gaussian mixture model Bayes classifier.


Free scientific and engineering development software used for numerical computations, and analysis and visualization of data using the Python programming language



QuTiP is a software for simulating quantum systems. QuTiP aims to provide tools for user-friendly and efficient numerical simulations of open quantum systems. It can be used to simulate a wide range of physical phenomenon in areas such as quantum optics, trapped ions, superconducting circuits and quantum nanomechanical resonators. In addition, it contains a number of other modules to simplify the numerical simulation and study of many topics in quantum physics such as quantum optimal control, quantum information, and computing.


SciPy is open-source software for mathematics, science, and engineering. It is also the name of a very popular conference on scientific programming with Python. The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The SciPy library is built to work with NumPy arrays, and provides many user-friendly and efficient numerical routines such as routines for numerical integration and optimization.


Free high-quality and peer-reviewed volunteer produced collection of algorithms for image processing.


scikit-bio is an open-source, BSD-licensed, python package providing data structures, algorithms, and educational resources for bioinformatics.


Module designed for scientific Python that provides accessible solutions to machine learning problems.


Statsmodels is a Python package that provides a complement to Scipy for statistical computations including descriptive statistics and estimation of statistical models.


Spack is a flexible package manager that builds multiple versions of packages for different configurations, platforms, and compilers.  It was created to deploy large-scale scientific simulations on HPC systems, but it can deploy software on Linux and macOS machines, as well.


Interactive development environment for Python that features advanced editing, interactive testing, debugging and introspection capabilities, as well as a numerical computing environment made possible through the support of IPython, NumPy, SciPy, and matplotlib.


Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.


xarray (formerly xray) is an open source project and Python package that aims to bring the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures.

Support multiple projects with just one donation by making a gift to the NumFOCUS General Fund!

Support NumFOCUS