Aperçu
Intel Neural Compressor fournit la quantification bas-bit (INT8, FP8, INT4, MXFP4, NVFP4), la sparsité, l'élagage et la distillation de connaissances pour optimiser les modèles sur le matériel Intel et au-delà.
Installation
uv pip install neural-compressor
Quantification de base
from neural_compressor import Quantization, config
# Post-training quantization
quantizer = Quantization(config)
q_model = quantizer(model)
q_model.save("quantized_model")
Élagage
from neural_compressor import Pruning
pruner = Pruning(model, config={"pruning_type": "snip_momentum", "target_sparsity": 0.3})
pruned_model = pruner.fit()