ProteinSolver¶
Description¶
ProteinSolver is a deep neural network which learns to solve (ill-defined) constraint satisfaction problems (CSPs) from training data. It has shown promising results both on a toy problem of learning how to solve Sudoku puzzles and on a real-world problem of designing protein sequences that fold into a predetermined geometric shape.
Demo notebooks¶
The following notebooks can be used to explore the basic functionality of proteinsolver
.
Other notebooks in the notebooks/
directory show how to perform more extensive validations of the networks and how to train new networks.
Installation¶
We recommend installing proteinsolver
into a clean conda environment using the following command:
conda create -n proteinsolver -c pytorch -c conda-forge -c kimlab -c ostrokach-forge proteinsolver
conda activate proteinsolver
Development¶
First, use conda
to install proteinsolver
into a new conda environment. This will also install all dependencies.
conda create -n proteinsolver -c pytorch -c conda-forge -c kimlab -c ostrokach-forge proteinsolver
conda activate proteinsolver
Second, run pip install --editable .
inside the root directory of this package. This will force Python to use the development version of our code.
cd path/to/proteinsolver
pip install --editable .
Pre-trained models¶
Pre-trained models can be downloaded using wget
by running the following command in the root folder of the proteinsolver
repository:
wget -r -nH --cut-dirs 1 --reject "index.html*" "http://models.proteinsolver.org/v0.1/"
Training and validation datasets¶
Data used to train and validate the “proteinsolver” network to solve Sudoku puzzles and reconstruct protein sequences can be downloaded from http://deep-protein-gen.data.proteinsolver.org/:
wget -r -nH --reject "index.html*" "http://deep-protein-gen.data.proteinsolver.org/"
Environment variables¶
DATAPKG_DATA_DIR
- Location of training and validation data.
Acknowledgements¶
References¶
Alexey Strokach, David Becerra, Carles Corbi-Verge, Albert Perez-Riba, Philip M. Kim. Fast and flexible design of novel proteins using graph neural networks. https://doi.org/10.1101/868935