mirror of
https://github.com/rdkit/mmpdb.git
synced 2026-04-03 00:18:57 -06:00
Memory error with mmpdb fragment for large dataset #54
Labels
No labels
bug
bug
bug
enhancement
enhancement
enhancement
enhancement
enhancement
enhancement
enhancement
enhancement
question
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: github/mmpdb#54
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @chengthefang on 2/17/2021
Hi all,
I am trying to build a MMP-DB with 10M compounds. But I got an error at the first step of fragmentation.
The command I used is as follows:
python mmpdb fragment first10M.smi --num-jobs 8 -o first10M.fragments.gzThe error I got is:
Traceback (most recent call last): File "/home/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/home/anaconda2/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/home/anaconda2/lib/python2.7/multiprocessing/pool.py", line 328, in _handle_workers pool._maintain_pool() File "/home/anaconda2/lib/python2.7/multiprocessing/pool.py", line 232, in _maintain_pool self._repopulate_pool() File "/home/anaconda2/lib/python2.7/multiprocessing/pool.py", line 225, in _repopulate_pool w.start() File "/home/anaconda2/lib/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/home/anaconda2/lib/python2.7/multiprocessing/forking.py", line 121, in __init__ self.pid = os.fork() OSError: [Errno 12] Cannot allocate memoryDoes anybody have comments or suggestions on that? Also, can I run the command on distributed nodes on the cluster?
ps: I also have similar concerns about the second step of indexing since it usually takes longer time and larger memory than the fragmentation. Can I run the indexing command in parallel or on the distributed cluster?
Thanks,
Cheng