| data_flow.png | ||
| dataset.xlsx | ||
| LICENSE | ||
| pharmacokinetic_prediction.py | ||
| README.md | ||
Pharmacokinetic_prediction
Prediction of intravenous pharmacokinetic parameters, including fu, MRT, t1/2, VD and CL, by training on 1352 compounds.
1.Paper and dataset
paper: http://dmd.aspetjournals.org/content/suppl/2018/08/16/dmd.118.082966.DC1
dataset: dataset.xlsx (download from supporting information)
2.Data flow
3.Description
dataset.xlsx
| Column | Description |
|---|---|
| SMILES | smiles of the compounds |
| fu | fraction of unbound drugs in plasma |
| MRT | mean residence time of a drug in human body |
| t1/2 | the half-life of a drug |
| VD | volume of distribution |
| CL | clearance |
Training
1.Feature extraction
<function extract_features()>
Molecules are represented by morgan fingerprint(radius=2, length=2048) and 200 descriptors(generated by rdkit)
2.Splits of training and testing data
<function stratified_split()>
The whole data set are divided into training and testing data set with the proportion ~7:3 using stratified sampling strategy.
3.Modeling
<Class auto_gbdt()>
GBDT is used to fit the training data set. The parameters are optimized automatically by GridsearchCV. RMSD as a criteria to evaluate the model performance on the test set.
Prediction
1.SDF to DataFrame
<function smiles_from_lib()>
Convert the new data(SDF format) to DataFrame that containing SMILES, name, synonyms etc.
2.Feature extraction
<function extract_features()>
Almost the same as training process
3.Prediction
<function predict()>
Predict the y of new features.
