Query Regarding MPNN Drug Encoder #18

Open
opened 2025-10-14 17:38:24 -06:00 by navan · 0 comments
Owner

Originally created by @SudhirGhandikota on 5/30/2023

Hello,

Before I ask my query, I would like to congratulate everybody involved in building this great framework.
I was trying to train an MPNN-CNN model on the latest BindingDB data and while exploring the existing MPNN Drug encoder implementation, I ran into a couple of doubts/queries. I was hoping to get some clarification from you guys.

From what I understand, the agraph property associated with a drug is basically an adjacency list where the row_indexrepresents an atom and each column value is the in_bond number/index. I deduced this from the following code snippet in the smiles2mpnnfeature method.

for a in range(n_atoms):
            for i,b in enumerate(in_bonds[a]):
                agraph[a,i] = b

Then, during the MPNN model training, I can see that the agraphs associated with all drugs in a given batch are combined into one combined agraph_1st variable. In this consolidation step, you used two variables N_a and N_b to maintain the running sum of atoms and bonds from each drug. Also, to re-index this combined adjacency list, I can see that you are adding the N_a value at each step (code snippet below).

for i in range(N_atoms_bond.shape[0]):
            ....
            agraph_lst.append(agraph[i,:atom_num,:] + N_a)

However, since the values in these adjacency lists are bond numbers or indices, shouldn't we be adding the N_b variable value instead? Could you please clarify this doubt of mine?

Thanks in advance!

*Originally created by @SudhirGhandikota on 5/30/2023* Hello, Before I ask my query, I would like to congratulate everybody involved in building this great framework. I was trying to train an MPNN-CNN model on the latest BindingDB data and while exploring the existing MPNN Drug encoder implementation, I ran into a couple of doubts/queries. I was hoping to get some clarification from you guys. From what I understand, the ```agraph``` property associated with a drug is basically an adjacency list where the ```row_index```represents an atom and each column value is the ```in_bond``` number/index. I deduced this from the following code snippet in the ```smiles2mpnnfeature``` method. ``` for a in range(n_atoms): for i,b in enumerate(in_bonds[a]): agraph[a,i] = b ``` Then, during the MPNN model training, I can see that the ```agraphs``` associated with all drugs in a given batch are combined into one combined ```agraph_1st``` variable. In this consolidation step, you used two variables ```N_a``` and ```N_b``` to maintain the running sum of atoms and bonds from each drug. Also, to re-index this combined adjacency list, I can see that you are adding the ```N_a``` value at each step (code snippet below). ``` for i in range(N_atoms_bond.shape[0]): .... agraph_lst.append(agraph[i,:atom_num,:] + N_a) ``` However, since the values in these adjacency lists are bond numbers or indices, shouldn't we be adding the ```N_b``` variable value instead? Could you please clarify this doubt of mine? Thanks in advance!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github/DeepPurpose#18
No description provided.