No description
Find a file
Kexin Huang 47ac16b8c1
Merge pull request #14 from printomi/patch-1
match batch size of data in model config
2021-08-17 09:14:49 -07:00
dataset update 2020-11-17 17:18:01 -08:00
ESPF initial push 2020-07-15 22:32:19 -04:00
.gitignore Initial commit 2020-07-15 22:22:05 -04:00
config.py Reformat the code and implement the args interface. 2020-12-15 16:10:18 +08:00
example.ipynb bug fix 2020-12-30 18:59:20 -05:00
LICENSE Initial commit 2020-07-15 22:22:05 -04:00
models.py initial push 2020-07-15 22:32:19 -04:00
README.md Update README. 2020-12-15 16:13:59 +08:00
requirements.txt requirements.txt 2021-01-08 08:04:44 -08:00
setup.py requirements.txt 2021-01-08 08:04:44 -08:00
stream.py initial push 2020-07-15 22:32:19 -04:00
train.py match batch size of data in model config 2021-08-10 09:39:32 +02:00

MolTrans: Molecular Interaction Transformer for Drug Target Interaction Prediction

Drug target interaction (DTI) prediction is a foundational task for in-silico drug discovery, which is costly and time-consuming due to the need of experimental search over large drug compound space. Recent years have witnessed promising progress for deep learning in DTI predictions. However, the following challenges are still open: (1) existing molecular representation learning approaches ignore the sub-structural nature of DTI, thus produce results that are less accurate and difficult to explain; (2) existing methods focus on limited labeled data while ignoring the value of massive unlabelled molecular data. We propose a Molecular Interaction Transformer (MolTrans) to address these limitations via: (1) knowledge inspired sub-structural pattern mining algorithm and interaction modeling module for more accurate and interpretable DTI prediction; (2) an augmented transformer encoder to better extract and capture the semantic relations among substructures extracted from massive unlabeled biomedical data. We evaluate MolTrans on real world data and show it improved DTI prediction performance compared to state-of-the-art baselines.

Datasets

In the dataset folder, we provide all three processed datasets used in MolTrans: BindingDB, DAVIS, and BIOSNAP. In BIOSNAP folder, there is full dataset for the main experiment, and also missing data experiment (70%, 80%, 90%, 95%) and unseen drug and unseen protein datasets.

Run

We provide an example jupyter notebook in the repository. Although it runs for 100 epochs, we find 50 epochs is way enough and all the results in paper are run by 50 epochs.

You can also directly run python train.py --task ${task_name} to run the experiments. ${task_name} could either be biosnap,bindingdb , and davis. For the BindingDB and DAVIS, please refer this Page for more details.

Will add more codes and tests in the next couple of weeks. But this should be enough to try on MolTrans.