error in training with warp_skip = mask / full #49

Open
opened 2025-10-14 17:38:42 -06:00 by navan · 0 comments
Owner

Originally created by @SaeedSaadatnejad on 3/18/2019

Hello,

When I want to train the model with the configs that you mentioned (warp_skip = mask or full) it gives me this error. I don't have such a problem with the baseline i.e. using warp_skip=none.
Can you help?

Thanks

2019-03-18 20:09:27.117571: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x5637fe29dc40
2019-03-18 20:09:27.477257: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at matrix_inverse_op.cc:191 : Internal: tensorflow/core/kernels/cuda_solvers.cc:803: cuBlas call failed status = 13

Traceback (most recent call last):
  File "train.py", line 29, in <module>
    main()
  File "train.py", line 26, in main
    trainer.train()
  File "Deformable_GAN_last_try/gan/train.py", line 106, in train
    self.train_one_epoch((((self.current_epoch + 1) % self.checkpoint_ratio == 0) or self.current_epoch==0))
  File "Deformable_GAN_last_try/gan/train.py", line 79, in train_one_epoch
    self.train_one_step(discriminator_loss_list, generator_loss_list)
  File "Deformable_GAN_last_try/gan/train.py", line 68, in train_one_step
    loss = self.generator_model.train_on_batch(generator_batch, np.zeros([self.batch_size]))
  File "venvs/p2tf_new/local/lib/python2.7/site-packages/keras/engine/training.py", line 1217, in train_on_batch
    outputs = self.train_function(ins)
  File "venvs/p2tf_new/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "venvs/p2tf_new/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "venvs/p2tf_new/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
    run_metadata_ptr)
  File "venvs/p2tf_new/local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: tensorflow/core/kernels/cuda_solvers.cc:803: cuBlas call failed status = 13
         [[{{node training_1/Adam/gradients/model_1/affine_transform_layer_2/transform/ImageProjectiveTransformV2_grad/MatrixInverse}} = MatrixInverse[T=DT_FLOAT, _class=["loc:@train...ransformV2"], adjoint=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training_1/Adam/gradients/model_1/affine_transform_layer_2/transform/ImageProjectiveTransformV2_grad/flat_transforms_to_matrices/Reshape_1)]]
         [[{{node loss/mul/_1251}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7915_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] 
*Originally created by @SaeedSaadatnejad on 3/18/2019* Hello, When I want to train the model with the configs that you mentioned (warp_skip = mask or full) it gives me this error. I don't have such a problem with the baseline i.e. using warp_skip=none. Can you help? Thanks ``` 2019-03-18 20:09:27.117571: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x5637fe29dc40 2019-03-18 20:09:27.477257: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at matrix_inverse_op.cc:191 : Internal: tensorflow/core/kernels/cuda_solvers.cc:803: cuBlas call failed status = 13 Traceback (most recent call last): File "train.py", line 29, in <module> main() File "train.py", line 26, in main trainer.train() File "Deformable_GAN_last_try/gan/train.py", line 106, in train self.train_one_epoch((((self.current_epoch + 1) % self.checkpoint_ratio == 0) or self.current_epoch==0)) File "Deformable_GAN_last_try/gan/train.py", line 79, in train_one_epoch self.train_one_step(discriminator_loss_list, generator_loss_list) File "Deformable_GAN_last_try/gan/train.py", line 68, in train_one_step loss = self.generator_model.train_on_batch(generator_batch, np.zeros([self.batch_size])) File "venvs/p2tf_new/local/lib/python2.7/site-packages/keras/engine/training.py", line 1217, in train_on_batch outputs = self.train_function(ins) File "venvs/p2tf_new/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__ return self._call(inputs) File "venvs/p2tf_new/local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call fetched = self._callable_fn(*array_vals) File "venvs/p2tf_new/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1439, in __call__ run_metadata_ptr) File "venvs/p2tf_new/local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InternalError: tensorflow/core/kernels/cuda_solvers.cc:803: cuBlas call failed status = 13 [[{{node training_1/Adam/gradients/model_1/affine_transform_layer_2/transform/ImageProjectiveTransformV2_grad/MatrixInverse}} = MatrixInverse[T=DT_FLOAT, _class=["loc:@train...ransformV2"], adjoint=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](training_1/Adam/gradients/model_1/affine_transform_layer_2/transform/ImageProjectiveTransformV2_grad/flat_transforms_to_matrices/Reshape_1)]] [[{{node loss/mul/_1251}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7915_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github/pose-gan#49
No description provided.