Fix tests for multiprocessing #34

Open
opened 2025-10-14 15:51:39 -06:00 by navan · 0 comments
Owner

Originally created by @starride-teklia on 12/7/2022

The bump to Pytorch 1.13 broke some tests related to multiprocessing on CPU and GPU. We get the following errors:

  • torch.multiprocessing.spawn.ProcessRaisedException
  • AttributeError: 'LightningDistributedDataParallel' object has no attribute '_sync_params'

On these tests:

tests/callbacks/learning_rate_test.py:# TODO: fix test with num_processes=2
tests/callbacks/training_timer_test.py:# TODO: fix test with num_processes=2
tests/loggers/epoch_csv_logger_test.py:# TODO: fix test with num_processes=2
tests/scripts/htr/decode_ctc_test.py:# TODO: fix test with nprocs=2
tests/scripts/htr/netout_test.py:# TODO: fix test with nprocs=2
tests/scripts/htr/train_ctc_test.py:# TODO: fix "ddp_cpu" mode
tests/scripts/htr/train_ctc_test.py:# TODO: fix "ddp" mode
tests/scripts/htr/train_ctc_test.py:# TODO: fix first assertion

I skipped the tests for now, but I need to investigate why we are getting this error and how to fix it.

*Originally created by @starride-teklia on 12/7/2022* The [bump to Pytorch 1.13](https://github.com/jpuigcerver/PyLaia/pull/45) broke some tests related to multiprocessing on CPU and GPU. We get the following errors: * `torch.multiprocessing.spawn.ProcessRaisedException` * `AttributeError: 'LightningDistributedDataParallel' object has no attribute '_sync_params'` On these tests: ``` tests/callbacks/learning_rate_test.py:# TODO: fix test with num_processes=2 tests/callbacks/training_timer_test.py:# TODO: fix test with num_processes=2 tests/loggers/epoch_csv_logger_test.py:# TODO: fix test with num_processes=2 tests/scripts/htr/decode_ctc_test.py:# TODO: fix test with nprocs=2 tests/scripts/htr/netout_test.py:# TODO: fix test with nprocs=2 tests/scripts/htr/train_ctc_test.py:# TODO: fix "ddp_cpu" mode tests/scripts/htr/train_ctc_test.py:# TODO: fix "ddp" mode tests/scripts/htr/train_ctc_test.py:# TODO: fix first assertion ``` I skipped the tests for now, but I need to investigate why we are getting this error and how to fix it.
Sign in to join this conversation.
No labels
dependencies
stale
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github/PyLaia#34
No description provided.