mirror of https://github.com/dreadlesss/pharmacokinetic_prediction.git synced 2026-07-15 19:34:06 -06:00

No description

Find a file

huxiaowen 5839c30331 data flow added		2020-04-16 01:01:08 +08:00
data_flow.png	prediction added	2020-04-16 00:57:06 +08:00
dataset.xlsx	prediction added	2020-04-16 00:57:06 +08:00
LICENSE	Initial commit	2020-04-15 16:12:05 +08:00
pharmacokinetic_prediction.py	prediction added	2020-04-16 00:57:06 +08:00
README.md	data flow added	2020-04-16 01:01:08 +08:00

README.md

Pharmacokinetic_prediction

Prediction of intravenous pharmacokinetic parameters, including fu, MRT, t1/2, VD and CL, by training on 1352 compounds.

1.Paper and dataset

paper: http://dmd.aspetjournals.org/content/suppl/2018/08/16/dmd.118.082966.DC1

dataset: dataset.xlsx (download from supporting information)

2.Data flow

3.Description

dataset.xlsx

Column	Description
SMILES	smiles of the compounds
fu	fraction of unbound drugs in plasma
MRT	mean residence time of a drug in human body
t1/2	the half-life of a drug
VD	volume of distribution
CL	clearance

Training

1.Feature extraction

<function extract_features()>

Molecules are represented by morgan fingerprint(radius=2, length=2048) and 200 descriptors(generated by rdkit)

2.Splits of training and testing data

<function stratified_split()>

The whole data set are divided into training and testing data set with the proportion ~7:3 using stratified sampling strategy.

3.Modeling

<Class auto_gbdt()>

GBDT is used to fit the training data set. The parameters are optimized automatically by GridsearchCV. RMSD as a criteria to evaluate the model performance on the test set.

Prediction

1.SDF to DataFrame

<function smiles_from_lib()>

Convert the new data(SDF format) to DataFrame that containing SMILES, name, synonyms etc.

2.Feature extraction

<function extract_features()>

Almost the same as training process

3.Prediction

<function predict()>

Predict the y of new features.