Enhance CPU Inference Efficiency: Add ONNX/TorchScript Conversion Support for CPU-Only Deployment #3

New issue

Open

opened 2025-10-14 15:38:06 -06:00 by navan · 0 comments

navan commented

2025-10-14 15:38:06 -06:00

Owner

Originally created by @tensimixt on 4/5/2025

Background

The current implementation of the Mel-Band-Roformer-Vocal-Model is optimized for GPU inference. However, many users run inference in CPU-only environments and have reported high CPU usage and suboptimal latency. Converting the model to ONNX—or optionally using TorchScript—could significantly reduce CPU overhead and improve performance on such systems.

Proposed Enhancement

ONNX Conversion Support

Integrate a conversion script:
Use torch.onnx.export to export the model into an ONNX graph.
Handle custom layers:
Ensure that custom layers (e.g., RoFormer blocks and any non-standard operations) are properly handled, potentially by implementing custom symbolic functions if needed.
Documentation:
Provide documentation or an example script to guide users through the conversion process.

Optional TorchScript Support

Alternative conversion path:
Offer a conversion path using TorchScript (via torch.jit.trace or torch.jit.script) to optimize the model for CPU inference.
Benchmarking:
Benchmark and compare the performance between the ONNX and TorchScript approaches.

Benefits

Improved Efficiency:
Lower CPU usage and reduced inference latency for CPU-only deployments.
Wider Accessibility:
Enables users without GPU access to deploy the model more effectively.
Flexibility:
Allows users to choose between ONNX and TorchScript based on their deployment needs and environment constraints.

*Originally created by @tensimixt on 4/5/2025* ## Background The current implementation of the Mel-Band-Roformer-Vocal-Model is optimized for GPU inference. However, many users run inference in CPU-only environments and have reported high CPU usage and suboptimal latency. Converting the model to ONNX—or optionally using TorchScript—could significantly reduce CPU overhead and improve performance on such systems. ## Proposed Enhancement ### ONNX Conversion Support - **Integrate a conversion script:** Use `torch.onnx.export` to export the model into an ONNX graph. - **Handle custom layers:** Ensure that custom layers (e.g., RoFormer blocks and any non-standard operations) are properly handled, potentially by implementing custom symbolic functions if needed. - **Documentation:** Provide documentation or an example script to guide users through the conversion process. ### Optional TorchScript Support - **Alternative conversion path:** Offer a conversion path using TorchScript (via `torch.jit.trace` or `torch.jit.script`) to optimize the model for CPU inference. - **Benchmarking:** Benchmark and compare the performance between the ONNX and TorchScript approaches. ## Benefits - **Improved Efficiency:** Lower CPU usage and reduced inference latency for CPU-only deployments. - **Wider Accessibility:** Enables users without GPU access to deploy the model more effectively. - **Flexibility:** Allows users to choose between ONNX and TorchScript based on their deployment needs and environment constraints.

No labels

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github/Mel-Band-Roformer-Vocal-Model#3

No description provided.

Rows
Columns