# ProteinSolver [![binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Fmybinder%3AhTGKLsjmxRS8xNyHxRJB%40gitlab.com%2Fostrokach%2Fproteinsolver.git/v0.1.25) [![docs](https://img.shields.io/badge/docs-v0.1.25-blue.svg)](https://ostrokach.gitlab.io/proteinsolver/v0.1.25/) [![poster](https://img.shields.io/static/v1?label=poster&message=html&color=orange)](https://ostrokach-posters.gitlab.io/2019-12-13-neurips-poster/7ad67cfdf35a4e3e8346e293dc444074/) [![conda](https://img.shields.io/conda/dn/ostrokach-forge/proteinsolver.svg)](https://anaconda.org/ostrokach-forge/proteinsolver/) [![pipeline status](https://gitlab.com/ostrokach/proteinsolver/badges/v0.1.25/pipeline.svg)](https://gitlab.com/ostrokach/proteinsolver/commits/v0.1.25/) [![coverage report](https://gitlab.com/ostrokach/proteinsolver/badges/master/coverage.svg?job=docs)](https://ostrokach.gitlab.io/proteinsolver/v0.1.25/htmlcov/) ## Description ProteinSolver is a deep neural network which learns to solve (ill-defined) constraint satisfaction problems (CSPs) from training data. It has shown promising results both on a toy problem of learning how to solve Sudoku puzzles and on a real-world problem of designing protein sequences that fold into a predetermined geometric shape. ## Demo notebooks The following notebooks can be used to explore the basic functionality of `proteinsolver`. | Notebook name | MyBinder | Description | | --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `20_sudoku_demo.ipynb` | [![binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Fmybinder%3AhTGKLsjmxRS8xNyHxRJB%40gitlab.com%2Fostrokach%2Fproteinsolver.git/v0.1.25?filepath=proteinsolver%2Fnotebooks%2F20_sudoku_demo.ipynb) | Use a pre-trained network to solve a single Sudoku puzzle. | | `06_sudoku_analysis.ipynb` | [![binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Fmybinder%3AhTGKLsjmxRS8xNyHxRJB%40gitlab.com%2Fostrokach%2Fproteinsolver.git/v0.1.25?filepath=proteinsolver%2Fnotebooks%2F06_sudoku_analysis.ipynb) | Evaluate a network trained to solve Sudoku puzzles using the validation
and test datasets.
_(This notebook is resource-intensive and is best ran on a machine with a GPU)._ | | `20_protein_demo.ipynb` | [![binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Fmybinder%3AhTGKLsjmxRS8xNyHxRJB%40gitlab.com%2Fostrokach%2Fproteinsolver.git/v0.1.25?filepath=proteinsolver%2Fnotebooks%2F20_protein_demo.ipynb) | Use a pre-trained network to design sequences for a single protein geometry. | | `06_protein_analysis.ipynb` | [![binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Fmybinder%3AhTGKLsjmxRS8xNyHxRJB%40gitlab.com%2Fostrokach%2Fproteinsolver.git/v0.1.25?filepath=proteinsolver%2Fnotebooks%2F06_protein_analysis.ipynb) | Evaluate a network trained to reconstruct protein sequences using the
validation and test datasets.
_(This notebook is resource-intensive and is best ran on a machine with a GPU)._ | Other notebooks in the `notebooks/` directory show how to perform more extensive validations of the networks and how to train new networks. ## Docker images Docker images with all required dependencies are provided at: . To evaluate a proteinsolver network from a Jupyter notebook, we can run the following: ```bash docker run -it --rm -p 8000:8000 registry.gitlab.com/ostrokach/proteinsolver:v0.1.25 jupyter notebook --ip 0.0.0.0 --port 8000 ``` ## Installation We recommend installing `proteinsolver` into a clean conda environment using the following command: ```bash conda create -n proteinsolver -c pytorch -c conda-forge -c kimlab -c ostrokach-forge proteinsolver conda activate proteinsolver ``` ## Development First, use `conda` to install `proteinsolver` into a new conda environment. This will also install all dependencies. ```bash conda create -n proteinsolver -c pytorch -c conda-forge -c kimlab -c ostrokach-forge proteinsolver conda activate proteinsolver ``` Second, run `pip install --editable .` inside the root directory of this package. This will force Python to use the development version of our code. ```bash cd path/to/proteinsolver pip install --editable . ``` ## Pre-trained models Pre-trained models can be downloaded using `wget` by running the following command _in the root folder of the `proteinsolver` repository_: ```bash wget -r -nH --cut-dirs 1 --reject "index.html*" "http://models.proteinsolver.org/v0.1/" ``` ## Training and validation datasets Data used to train and validate the "proteinsolver" network to solve Sudoku puzzles and reconstruct protein sequences can be downloaded from : ```bash wget -r -nH --reject "index.html*" "http://deep-protein-gen.data.proteinsolver.org/" ``` ## Environment variables - `DATAPKG_DATA_DIR` - Location of training and validation data. ## Acknowledgements
## References - Alexey Strokach, David Becerra, Carles Corbi-Verge, Albert Perez-Riba, Philip M. Kim. _Fast and flexible design of novel proteins using graph neural networks_.