ProteinSolver

binder docs poster conda pipeline status coverage report

Description

ProteinSolver is a deep neural network which learns to solve (ill-defined) constraint satisfaction problems (CSPs) from training data. It has shown promising results both on a toy problem of learning how to solve Sudoku puzzles and on a real-world problem of designing protein sequences that fold into a predetermined geometric shape.

Demo notebooks

The following notebooks can be used to explore the basic functionality of proteinsolver.

Notebook name MyBinder Description
20_sudoku_demo.ipynb binder Use a pre-trained network to solve a single Sudoku puzzle.
06_sudoku_analysis.ipynb binder Evaluate a network trained to solve Sudoku puzzles using the validation
and test datasets.
(This notebook is resource-intensive and is best ran on a machine with a GPU).
20_protein_demo.ipynb binder Use a pre-trained network to design sequences for a single protein geometry.
06_protein_analysis.ipynb binder Evaluate a network trained to reconstruct protein sequences using the
validation and test datasets.
(This notebook is resource-intensive and is best ran on a machine with a GPU).

Other notebooks in the notebooks/ directory show how to perform more extensive validations of the networks and how to train new networks.

Docker images

Docker images with all required dependencies are provided at: https://gitlab.com/ostrokach/proteinsolver/container_registry.

To evaluate a proteinsolver network from a Jupyter notebook, we can run the following:

docker run -it --rm -p 8000:8000 registry.gitlab.com/ostrokach/proteinsolver:v0.1.25 jupyter notebook --ip 0.0.0.0 --port 8000

Installation

We recommend installing proteinsolver into a clean conda environment using the following command:

conda create -n proteinsolver -c pytorch -c conda-forge -c kimlab -c ostrokach-forge proteinsolver
conda activate proteinsolver

Development

First, use conda to install proteinsolver into a new conda environment. This will also install all dependencies.

conda create -n proteinsolver -c pytorch -c conda-forge -c kimlab -c ostrokach-forge proteinsolver
conda activate proteinsolver

Second, run pip install --editable . inside the root directory of this package. This will force Python to use the development version of our code.

cd path/to/proteinsolver
pip install --editable .

Pre-trained models

Pre-trained models can be downloaded using wget by running the following command in the root folder of the proteinsolver repository:

wget -r -nH --cut-dirs 1 --reject "index.html*" "http://models.proteinsolver.org/v0.1/"

Training and validation datasets

Data used to train and validate the “proteinsolver” network to solve Sudoku puzzles and reconstruct protein sequences can be downloaded from http://deep-protein-gen.data.proteinsolver.org/:

wget -r -nH --reject "index.html*" "http://deep-protein-gen.data.proteinsolver.org/"

Environment variables

  • DATAPKG_DATA_DIR - Location of training and validation data.

Acknowledgements

References

  • Alexey Strokach, David Becerra, Carles Corbi-Verge, Albert Perez-Riba, Philip M. Kim. Fast and flexible design of novel proteins using graph neural networks. https://doi.org/10.1101/868935