Memory error with mmpdb fragment for large dataset #54

New issue

Closed

opened 2025-10-14 17:03:38 -06:00 by navan · 0 comments

navan commented

2025-10-14 17:03:38 -06:00

Owner

Originally created by @chengthefang on 2/17/2021

Hi all,

I am trying to build a MMP-DB with 10M compounds. But I got an error at the first step of fragmentation.

The command I used is as follows:
python mmpdb fragment first10M.smi --num-jobs 8 -o first10M.fragments.gz

The error I got is:
Traceback (most recent call last): File "/home/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/home/anaconda2/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/home/anaconda2/lib/python2.7/multiprocessing/pool.py", line 328, in _handle_workers pool._maintain_pool() File "/home/anaconda2/lib/python2.7/multiprocessing/pool.py", line 232, in _maintain_pool self._repopulate_pool() File "/home/anaconda2/lib/python2.7/multiprocessing/pool.py", line 225, in _repopulate_pool w.start() File "/home/anaconda2/lib/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/home/anaconda2/lib/python2.7/multiprocessing/forking.py", line 121, in __init__ self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory

Does anybody have comments or suggestions on that? Also, can I run the command on distributed nodes on the cluster?

ps: I also have similar concerns about the second step of indexing since it usually takes longer time and larger memory than the fragmentation. Can I run the indexing command in parallel or on the distributed cluster?

Thanks,
Cheng

*Originally created by @chengthefang on 2/17/2021* Hi all, I am trying to build a MMP-DB with 10M compounds. But I got an error at the first step of fragmentation. The command I used is as follows: `python mmpdb fragment first10M.smi --num-jobs 8 -o first10M.fragments.gz` The error I got is: `Traceback (most recent call last): File "/home/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/home/anaconda2/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/home/anaconda2/lib/python2.7/multiprocessing/pool.py", line 328, in _handle_workers pool._maintain_pool() File "/home/anaconda2/lib/python2.7/multiprocessing/pool.py", line 232, in _maintain_pool self._repopulate_pool() File "/home/anaconda2/lib/python2.7/multiprocessing/pool.py", line 225, in _repopulate_pool w.start() File "/home/anaconda2/lib/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/home/anaconda2/lib/python2.7/multiprocessing/forking.py", line 121, in __init__ self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory` Does anybody have comments or suggestions on that? Also, can I run the command on distributed nodes on the cluster? ps: I also have similar concerns about the second step of indexing since it usually takes longer time and larger memory than the fragmentation. Can I run the indexing command in parallel or on the distributed cluster? Thanks, Cheng