Vue d'ensemble
Unsloth fournit un entraînement QLoRA 2x plus rapide avec 50% moins de mémoire via des kernels optimisés. Supporte Llama, Mistral, Gemma, Qwen 2.5, DeepSeek, Phi, Yi, et Falcon avec Flash Attention.
Installation
uv pip install unsloth
Fine-Tuning QLoRA
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/Qwen2.5-7B-Instruct-bnb-4bit",
max_seq_length=4096,
dtype=torch.bfloat16,
load_in_4bit=True,
)
model = FastLanguageModel.get_peft_model(
model, r=16, target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_alpha=16, use_gradient_checkpointing="unsloth",
)
print(model.print_trainable_parameters())
Inférence
FastLanguageModel.for_inference(model)
inputs = tokenizer(["Describe quantum computing."], return_tensors="pt").to("cuda")
print(tokenizer.decode(model.generate(**inputs, max_new_tokens=256)[0]))