mirror of https://github.com/kexinhuang12345/MolTrans.git synced 2026-07-15 19:34:10 -06:00

No description

Find a file

Kexin Huang 47ac16b8c1 Merge pull request #14 from printomi/patch-1 match batch size of data in model config		2021-08-17 09:14:49 -07:00
dataset
ESPF
.gitignore
config.py	Reformat the code and implement the args interface.	2020-12-15 16:10:18 +08:00
example.ipynb	bug fix	2020-12-30 18:59:20 -05:00
LICENSE
models.py
README.md	Update README.	2020-12-15 16:13:59 +08:00
requirements.txt	requirements.txt	2021-01-08 08:04:44 -08:00
setup.py	requirements.txt	2021-01-08 08:04:44 -08:00
stream.py
train.py	match batch size of data in model config	2021-08-10 09:39:32 +02:00

README.md

MolTrans: Molecular Interaction Transformer for Drug Target Interaction Prediction

Drug target interaction (DTI) prediction is a foundational task for in-silico drug discovery, which is costly and time-consuming due to the need of experimental search over large drug compound space. Recent years have witnessed promising progress for deep learning in DTI predictions. However, the following challenges are still open: (1) existing molecular representation learning approaches ignore the sub-structural nature of DTI, thus produce results that are less accurate and difficult to explain; (2) existing methods focus on limited labeled data while ignoring the value of massive unlabelled molecular data. We propose a Molecular Interaction Transformer (MolTrans) to address these limitations via: (1) knowledge inspired sub-structural pattern mining algorithm and interaction modeling module for more accurate and interpretable DTI prediction; (2) an augmented transformer encoder to better extract and capture the semantic relations among substructures extracted from massive unlabeled biomedical data. We evaluate MolTrans on real world data and show it improved DTI prediction performance compared to state-of-the-art baselines.

Datasets

In the dataset folder, we provide all three processed datasets used in MolTrans: BindingDB, DAVIS, and BIOSNAP. In BIOSNAP folder, there is full dataset for the main experiment, and also missing data experiment (70%, 80%, 90%, 95%) and unseen drug and unseen protein datasets.

Run

We provide an example jupyter notebook in the repository. Although it runs for 100 epochs, we find 50 epochs is way enough and all the results in paper are run by 50 epochs.

You can also directly run python train.py --task ${task_name} to run the experiments. ${task_name} could either be biosnap,bindingdb , and davis. For the BindingDB and DAVIS, please refer this Page for more details.

Will add more codes and tests in the next couple of weeks. But this should be enough to try on MolTrans.