Used format to represent fingerprints/molecules #7

Open
opened 2025-10-14 17:12:41 -06:00 by navan · 0 comments
Owner

Originally created by @lorenzoFabbri on 6/12/2019

I'm trying to understand your code but I have a few doubts.

First of all I'd like to understand how you generate the r-radius subgraphs. For instance, if I run your code, the first SMILES is 'CCC1=[O+][Cu-3]2([O+]=C(CC)C1)[O+]=C(CC)CC(CC)=[O+]2', which corresponds to the molecule vector (or fingerprint) [ 6, 7, 8, 9, 10, 9, 8, 7, 6, 11, 9, 8, 7, 6, 11, 8, 7, 6, 9, 12, 12, 12, 13, 13, 13, 13, 12, 12, 12, 13, 13, 13, 13, 12, 12, 12, 13, 13, 13, 13, 12, 12, 12]. I'm not able to understand how to obtain this vector. I though you were just looking for all the sub-graphs within a certain radius from each atom of the molecule. In this case I would expect something like:
[0, 1, 2] [0, 1, 2, 3, 9] [0, 1, 2, 3, 4, 6, 9] [1, 2, 3, 4, 5, 9, 10, 18] [2, 3, 4, 5, 6, 10, 11, 15, 18] [3, 4, 5, 6, 7, 9, 10, 18] [2, 4, 5, 6, 7, 8, 9] [5, 6, 7, 8, 9] [8, 6, 7] [1, 2, 3, 5, 6, 7, 9] [3, 4, 5, 10, 11, 12, 14, 18] [4, 10, 11, 12, 13, 14, 15] [10, 11, 12, 13, 14] [11, 12, 13] [10, 11, 12, 14, 15, 16, 18] [4, 11, 14, 15, 16, 17, 18] [14, 15, 16, 17, 18] [16, 17, 15] [3, 4, 5, 10, 14, 15, 16, 18]

Moreover, I do not understand why the second entry in molecules does not start at zero anymore: [20, 21, 22, 23, 24, 24, 24, 23, 25, 9, 10, 9, 25, 20, 21, 22, 23, 24, 24, 24, 23, 26, 27, 28, 23, 24, 24, 24, 23, 29, 29, 27, 28, 23, 24, 24, 24, 23, 26, 30, 31, 32, 32, 32, 32, 32, 30, 31, 32, 32, 32, 32, 32, 13, 13, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 13, 13].

Finally, I'd like to understand if you concatenate all the r-radius subgraphs into a single vector (one of the vectors above). Thank you.

*Originally created by @lorenzoFabbri on 6/12/2019* I'm trying to understand your code but I have a few doubts. First of all I'd like to understand how you generate the r-radius subgraphs. For instance, if I run your code, the first SMILES is 'CCC1=[O+][Cu-3]2([O+]=C(CC)C1)[O+]=C(CC)CC(CC)=[O+]2', which corresponds to the `molecule` vector (or fingerprint) `[ 6, 7, 8, 9, 10, 9, 8, 7, 6, 11, 9, 8, 7, 6, 11, 8, 7, 6, 9, 12, 12, 12, 13, 13, 13, 13, 12, 12, 12, 13, 13, 13, 13, 12, 12, 12, 13, 13, 13, 13, 12, 12, 12]`. I'm not able to understand how to obtain this vector. I though you were just looking for all the sub-graphs within a certain radius from each atom of the molecule. In this case I would expect something like: `[0, 1, 2] [0, 1, 2, 3, 9] [0, 1, 2, 3, 4, 6, 9] [1, 2, 3, 4, 5, 9, 10, 18] [2, 3, 4, 5, 6, 10, 11, 15, 18] [3, 4, 5, 6, 7, 9, 10, 18] [2, 4, 5, 6, 7, 8, 9] [5, 6, 7, 8, 9] [8, 6, 7] [1, 2, 3, 5, 6, 7, 9] [3, 4, 5, 10, 11, 12, 14, 18] [4, 10, 11, 12, 13, 14, 15] [10, 11, 12, 13, 14] [11, 12, 13] [10, 11, 12, 14, 15, 16, 18] [4, 11, 14, 15, 16, 17, 18] [14, 15, 16, 17, 18] [16, 17, 15] [3, 4, 5, 10, 14, 15, 16, 18]` Moreover, I do not understand why the second entry in `molecules` does not start at zero anymore: `[20, 21, 22, 23, 24, 24, 24, 23, 25, 9, 10, 9, 25, 20, 21, 22, 23, 24, 24, 24, 23, 26, 27, 28, 23, 24, 24, 24, 23, 29, 29, 27, 28, 23, 24, 24, 24, 23, 26, 30, 31, 32, 32, 32, 32, 32, 30, 31, 32, 32, 32, 32, 32, 13, 13, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 13, 13]`. Finally, I'd like to understand if you concatenate all the r-radius subgraphs into a single vector (one of the vectors above). Thank you.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github/molecularGNN_smiles#7
No description provided.