mmpdb transform behaves unexpectedly #36

Open
opened 2025-10-14 17:01:58 -06:00 by navan · 0 comments
Owner

Originally created by @mu-wang on 2/5/2022

The transform rules in mmpdblib appears to miss some apparent cases.

A test case with the following structures:

OC(c(cccc1)c1O)=O	 mol1
CCCCCCCC(c(cc1)cc(C(O)=O)c1O)=O	mol2
CCCCCC(c(cc1)cc(C(O)=O)c1O)=O	mol3

with some properties:

ID	prop
mol1	0.0
mol2	1.0
mol3	1.5

I performed the fragmentation, index and property loading as instructed.

python -m mmpdblib fragment test_struct.tsv --max-rotatable-bonds 20 --num-cuts 3 -o test.fragments
python -m mmpdblib index test.fragments -o test.mmpdb
python -m mmpdblib loadprops --properties test_prop.tsv test.mmpdb

The indexed pairs makes sense.

However, when I run:

python -m mmpdblib transform --smiles 'OC(c(cccc1)c1O)=O' test.mmpdb --explain

I noticed that I cannot get mol2 or mol3, where the rules mol1->mol2 and mol1->mol3 is included in the index step. Did I miss something here? Thank you for your help.

Here's the explanation output:

WARNING: APSW not installed. Falling back to Python's sqlite3 module.
Processing fragment Fragmentation(1, 'N', 7, '1', '*c1ccccc1O', '0', 3, '1', '*C(=O)O', 'O=CO')
  variable '*c1ccccc1O' not found as SMILES '[*:1]c1ccccc1O'
  No matching rule SMILES found. Skipping fragment.
Processing fragment Fragmentation(1, 'N', 3, '1', '*C(=O)O', '0', 7, '1', '*c1ccccc1O', 'Oc1ccccc1')
  variable '*C(=O)O' not found as SMILES '[*:1]C(=O)O'
  No matching rule SMILES found. Skipping fragment.
Processing fragment Fragmentation(2, 'N', 6, '11', '*c1ccccc1*', '01', 4, '12', '*C(=O)O.*O', None)
  variable '*c1ccccc1*' not found as SMILES '[*:1]c1ccccc1[*:2]'
  variable '*c1ccccc1*' not found as SMILES '[*:2]c1ccccc1[*:1]'
  No matching rule SMILES found. Skipping fragment.
Processing fragment Fragmentation(1, 'N', 1, '1', '*O', '0', 9, '1', '*c1ccccc1C(=O)O', 'O=C(O)c1ccccc1')
  variable '*O' not found as SMILES '[*:1]O'
  No matching rule SMILES found. Skipping fragment.
Processing fragment Fragmentation(1, 'N', 9, '1', '*c1ccccc1C(=O)O', '0', 1, '1', '*O', 'O')
  variable '*c1ccccc1C(=O)O' not found as SMILES '[*:1]c1ccccc1C(=O)O'
  No matching rule SMILES found. Skipping fragment.
Processing fragment Fragmentation(2, 'N', 6, '11', '*c1ccccc1*', '01', 4, '12', '*O.*C(=O)O', None)
  variable '*c1ccccc1*' not found as SMILES '[*:1]c1ccccc1[*:2]'
  variable '*c1ccccc1*' not found as SMILES '[*:2]c1ccccc1[*:1]'
  No matching rule SMILES found. Skipping fragment.
== Product SMILES in database: 0 ==
ID      SMILES  prop_from_smiles        prop_to_smiles  prop_radius     prop_fingerprint      prop_rule_environment_id        prop_count      prop_avg        prop_std      
  prop_kurtosis prop_skewness   prop_min        prop_q1 prop_median     prop_q3 prop_max      prop_paired_t   prop_p_value
*Originally created by @mu-wang on 2/5/2022* The transform rules in mmpdblib appears to miss some apparent cases. A test case with the following structures: ```tsv OC(c(cccc1)c1O)=O mol1 CCCCCCCC(c(cc1)cc(C(O)=O)c1O)=O mol2 CCCCCC(c(cc1)cc(C(O)=O)c1O)=O mol3 ``` with some properties: ```tsv ID prop mol1 0.0 mol2 1.0 mol3 1.5 ``` I performed the fragmentation, index and property loading as instructed. ```powershell python -m mmpdblib fragment test_struct.tsv --max-rotatable-bonds 20 --num-cuts 3 -o test.fragments python -m mmpdblib index test.fragments -o test.mmpdb python -m mmpdblib loadprops --properties test_prop.tsv test.mmpdb ``` The indexed pairs makes sense. However, when I run: ```powershell python -m mmpdblib transform --smiles 'OC(c(cccc1)c1O)=O' test.mmpdb --explain ``` I noticed that I cannot get mol2 or mol3, where the rules mol1->mol2 and mol1->mol3 is included in the index step. Did I miss something here? Thank you for your help. Here's the explanation output: ``` WARNING: APSW not installed. Falling back to Python's sqlite3 module. Processing fragment Fragmentation(1, 'N', 7, '1', '*c1ccccc1O', '0', 3, '1', '*C(=O)O', 'O=CO') variable '*c1ccccc1O' not found as SMILES '[*:1]c1ccccc1O' No matching rule SMILES found. Skipping fragment. Processing fragment Fragmentation(1, 'N', 3, '1', '*C(=O)O', '0', 7, '1', '*c1ccccc1O', 'Oc1ccccc1') variable '*C(=O)O' not found as SMILES '[*:1]C(=O)O' No matching rule SMILES found. Skipping fragment. Processing fragment Fragmentation(2, 'N', 6, '11', '*c1ccccc1*', '01', 4, '12', '*C(=O)O.*O', None) variable '*c1ccccc1*' not found as SMILES '[*:1]c1ccccc1[*:2]' variable '*c1ccccc1*' not found as SMILES '[*:2]c1ccccc1[*:1]' No matching rule SMILES found. Skipping fragment. Processing fragment Fragmentation(1, 'N', 1, '1', '*O', '0', 9, '1', '*c1ccccc1C(=O)O', 'O=C(O)c1ccccc1') variable '*O' not found as SMILES '[*:1]O' No matching rule SMILES found. Skipping fragment. Processing fragment Fragmentation(1, 'N', 9, '1', '*c1ccccc1C(=O)O', '0', 1, '1', '*O', 'O') variable '*c1ccccc1C(=O)O' not found as SMILES '[*:1]c1ccccc1C(=O)O' No matching rule SMILES found. Skipping fragment. Processing fragment Fragmentation(2, 'N', 6, '11', '*c1ccccc1*', '01', 4, '12', '*O.*C(=O)O', None) variable '*c1ccccc1*' not found as SMILES '[*:1]c1ccccc1[*:2]' variable '*c1ccccc1*' not found as SMILES '[*:2]c1ccccc1[*:1]' No matching rule SMILES found. Skipping fragment. == Product SMILES in database: 0 == ID SMILES prop_from_smiles prop_to_smiles prop_radius prop_fingerprint prop_rule_environment_id prop_count prop_avg prop_std prop_kurtosis prop_skewness prop_min prop_q1 prop_median prop_q3 prop_max prop_paired_t prop_p_value ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github/mmpdb#36
No description provided.