Use Trainer¶
In the Quick Start tutorial, we learned how to use the Trainer API to fine-tune a model. This tutorial provides a comprehensive guide to configuring Trainer for optimal training outcomes.
Overview¶
MindNLP patches the HuggingFace transformers library to work with MindSpore. This means you can use the standard HuggingFace Trainer and TrainingArguments classes directly:
import mindspore
import mindnlp # Apply patches
from transformers import Trainer, TrainingArguments
The TrainingArguments class allows you to configure essential training parameters, and Trainer handles the entire training loop using MindSpore as the backend.
Configure Training Parameters¶
Create a TrainingArguments object to specify the training configuration:
from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir="./output",
num_train_epochs=3,
learning_rate=5e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
eval_strategy="steps",
eval_steps=100,
save_strategy="epoch",
logging_strategy="steps",
logging_steps=50
)
Basic Parameters¶
- output_dir: Directory where model checkpoints and training outputs are saved.
- num_train_epochs: Total number of training epochs. More epochs allow better learning but may cause overfitting.
Optimizer Parameters¶
- optim: Optimizer type. Default is "adamw_torch" (which maps to MindSpore's AdamW).
- learning_rate: Initial learning rate. This is critical for convergence - too high causes instability, too low causes slow convergence.
- weight_decay: Regularization to prevent overfitting by penalizing large weights.
- adam_beta1 / adam_beta2: Momentum parameters for Adam optimizer.
- adam_epsilon: Small value for numerical stability in Adam.
- max_grad_norm: Gradient clipping threshold to prevent exploding gradients.
Batch Size Parameters¶
- per_device_train_batch_size: Batch size for training. Larger batches are faster but need more memory.
- per_device_eval_batch_size: Batch size for evaluation.
- gradient_accumulation_steps: Accumulate gradients over multiple steps to simulate larger batch sizes with limited memory.
Strategy Parameters¶
Evaluation Strategy¶
The eval_strategy parameter controls when evaluation occurs:
"no": No evaluation"steps": Evaluate everyeval_stepstraining steps"epoch": Evaluate at the end of each epoch
Save Strategy¶
The save_strategy parameter controls when checkpoints are saved:
"no": No saving"steps": Save everysave_stepstraining steps"epoch": Save at the end of each epoch
Logging Strategy¶
The logging_strategy parameter controls when metrics are logged:
"no": No logging"steps": Log everylogging_stepstraining steps"epoch": Log at the end of each epoch
MindSpore-Specific Parameters¶
MindNLP adds support for MindSpore-specific parameters in model loading:
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(
"bert-base-uncased",
num_labels=2,
ms_dtype=mindspore.float16 # Use MindSpore dtype
)
Create the Trainer¶
Create a Trainer instance with your model, datasets, and configuration:
import mindnlp
from transformers import Trainer, TrainingArguments, AutoModelForSequenceClassification
# Load model
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
# Create trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
compute_metrics=compute_metrics,
)
Trainer Arguments¶
- model: The model to train
- args:
TrainingArgumentsinstance with training configuration - train_dataset: Training dataset
- eval_dataset: Evaluation dataset (optional)
- compute_metrics: Function to compute evaluation metrics (optional)
Defining compute_metrics¶
The compute_metrics function computes metrics from model predictions:
import evaluate
import numpy as np
metric = evaluate.load("accuracy")
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
Start Training¶
Once the trainer is configured, start training:
trainer.train()
Complete Example¶
import mindspore
import mindnlp
from transformers import (
AutoModelForSequenceClassification,
AutoTokenizer,
Trainer,
TrainingArguments
)
from datasets import load_dataset
import evaluate
import numpy as np
# Load dataset
dataset = load_dataset("imdb")
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained(
"bert-base-uncased",
num_labels=2
)
# Tokenize dataset
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
# Define metrics
metric = evaluate.load("accuracy")
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
# Training arguments
training_args = TrainingArguments(
output_dir="./results",
eval_strategy="epoch",
save_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=3,
logging_steps=100,
)
# Create trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets["train"].select(range(1000)), # Subset for demo
eval_dataset=tokenized_datasets["test"].select(range(200)),
compute_metrics=compute_metrics,
)
# Train
trainer.train()
Advanced Features¶
Mixed Precision Training¶
Use lower precision for faster training with less memory:
training_args = TrainingArguments(
output_dir="./output",
fp16=True, # Enable FP16 training
# Or use bf16=True for bfloat16
)
Gradient Checkpointing¶
Trade compute for memory by recomputing activations during backward pass:
training_args = TrainingArguments(
output_dir="./output",
gradient_checkpointing=True,
)
Resume from Checkpoint¶
Resume training from a saved checkpoint:
trainer.train(resume_from_checkpoint="./results/checkpoint-500")
Notes¶
- The Trainer automatically uses MindSpore operations through MindNLP's patching system
- All standard HuggingFace Trainer features should work
- For production training, consider using the full dataset rather than subsets
- Monitor training with TensorBoard by setting
logging_dirin TrainingArguments