Enhance CPU Inference Efficiency: Add ONNX/TorchScript Conversion Support for CPU-Only Deployment #3

Open
opened 2025-10-14 15:38:06 -06:00 by navan · 0 comments
Owner

Originally created by @tensimixt on 4/5/2025

Background

The current implementation of the Mel-Band-Roformer-Vocal-Model is optimized for GPU inference. However, many users run inference in CPU-only environments and have reported high CPU usage and suboptimal latency. Converting the model to ONNX—or optionally using TorchScript—could significantly reduce CPU overhead and improve performance on such systems.

Proposed Enhancement

ONNX Conversion Support

  • Integrate a conversion script:
    Use torch.onnx.export to export the model into an ONNX graph.
  • Handle custom layers:
    Ensure that custom layers (e.g., RoFormer blocks and any non-standard operations) are properly handled, potentially by implementing custom symbolic functions if needed.
  • Documentation:
    Provide documentation or an example script to guide users through the conversion process.

Optional TorchScript Support

  • Alternative conversion path:
    Offer a conversion path using TorchScript (via torch.jit.trace or torch.jit.script) to optimize the model for CPU inference.
  • Benchmarking:
    Benchmark and compare the performance between the ONNX and TorchScript approaches.

Benefits

  • Improved Efficiency:
    Lower CPU usage and reduced inference latency for CPU-only deployments.
  • Wider Accessibility:
    Enables users without GPU access to deploy the model more effectively.
  • Flexibility:
    Allows users to choose between ONNX and TorchScript based on their deployment needs and environment constraints.
*Originally created by @tensimixt on 4/5/2025* ## Background The current implementation of the Mel-Band-Roformer-Vocal-Model is optimized for GPU inference. However, many users run inference in CPU-only environments and have reported high CPU usage and suboptimal latency. Converting the model to ONNX—or optionally using TorchScript—could significantly reduce CPU overhead and improve performance on such systems. ## Proposed Enhancement ### ONNX Conversion Support - **Integrate a conversion script:** Use `torch.onnx.export` to export the model into an ONNX graph. - **Handle custom layers:** Ensure that custom layers (e.g., RoFormer blocks and any non-standard operations) are properly handled, potentially by implementing custom symbolic functions if needed. - **Documentation:** Provide documentation or an example script to guide users through the conversion process. ### Optional TorchScript Support - **Alternative conversion path:** Offer a conversion path using TorchScript (via `torch.jit.trace` or `torch.jit.script`) to optimize the model for CPU inference. - **Benchmarking:** Benchmark and compare the performance between the ONNX and TorchScript approaches. ## Benefits - **Improved Efficiency:** Lower CPU usage and reduced inference latency for CPU-only deployments. - **Wider Accessibility:** Enables users without GPU access to deploy the model more effectively. - **Flexibility:** Allows users to choose between ONNX and TorchScript based on their deployment needs and environment constraints.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github/Mel-Band-Roformer-Vocal-Model#3
No description provided.