模型微调实战 - 定制化大模型

前置知识：需要先掌握 BERT预训练

本文重点：高效微调方法与实战案例

一、微调概述

1.1 为什么需要微调

预训练模型局限：
├── 通用知识，不适应特定领域
├── 不了解业务规则和数据格式
├── 输出风格与需求不匹配
└── 特定任务性能不够好
微调的好处：
├── 注入领域知识
├── 适应特定任务
├── 控制输出风格
└── 提升任务性能

1.2 微调方法对比

微调方法对比：
全量微调 (Full Fine-tuning)
├── 更新所有参数
├── 效果最好
├── 显存需求大
└── 需要大量数据
参数高效微调 (PEFT)
├── 只更新少量参数
├── 显存需求低
├── 数据需求少
└── 推理开销小
主流PEFT方法：
├── LoRA (Low-Rank Adaptation)
├── QLoRA (量化LoRA)
├── Prefix Tuning
├── Prompt Tuning
└── Adapter

二、LoRA微调

2.1 LoRA原理

"""
LoRA核心思想：
原始权重矩阵 W (d×k)
只更新 ΔW = B × A
其中 B: d×r, A: r×k, r << min(d,k)
最终权重：W' = W + ΔW = W + BA
参数量对比：
- 原始：d × k
- LoRA：d × r + r × k = r × (d + k)
当 r=8, d=k=4096 时：
- 原始：16M 参数
- LoRA：8 × 8192 = 65K 参数（减少99.6%）
"""

2.2 使用PEFT进行LoRA微调

# 安装依赖
# pip install peft transformers accelerate bitsandbytes
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
from peft import LoraConfig, get_peft_model, TaskType, prepare_model_for_kbit_training
# 加载基础模型
model_name = "Qwen/Qwen2-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)
# LoRA配置
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,                    # LoRA秩
    lora_alpha=32,           # 缩放系数
    lora_dropout=0.05,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],  # 要应用LoRA的模块
    bias="none"
)
# 应用LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# 输出示例：
# trainable params: 4,194,304 || all params: 7,000,000,000 || trainable%: 0.06%

2.3 准备训练数据

from datasets import Dataset
import json
# 指令微调数据格式
"""
{
    "instruction": "请解释什么是机器学习",
    "input": "",
    "output": "机器学习是人工智能的一个分支..."
}
"""
def format_instruction(sample):
    """格式化指令数据"""
    if sample["input"]:
        prompt = f"""### 指令：
{sample['instruction']}
### 输入：
{sample['input']}
### 回答：
{sample['output']}"""
    else:
        prompt = f"""### 指令：
{sample['instruction']}
### 回答：
{sample['output']}"""
    return prompt
# 示例数据
training_data = [
    {
        "instruction": "解释什么是API",
        "input": "",
        "output": "API（Application Programming Interface）是应用程序编程接口，它定义了软件组件之间交互的方式..."
    },
    {
        "instruction": "将以下英文翻译为中文",
        "input": "Hello, world!",
        "output": "你好，世界！"
    }
]
# 创建数据集
dataset = Dataset.from_list(training_data)
def tokenize_function(examples):
    prompts = [format_instruction({"instruction": i, "input": inp, "output": o}) 
               for i, inp, o in zip(examples["instruction"], examples.get("input", [""]*len(examples["instruction"])), examples["output"])]
    
    return tokenizer(
        prompts,
        truncation=True,
        max_length=512,
        padding="max_length"
    )
# 处理数据集
# tokenized_dataset = dataset.map(tokenize_function, batched=True)

2.4 训练循环

from transformers import DataCollatorForLanguageModeling
# 训练参数
training_args = TrainingArguments(
    output_dir="./lora_output",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    weight_decay=0.01,
    logging_steps=10,
    save_steps=100,
    save_total_limit=3,
    fp16=True,
    optim="paged_adamw_8bit",
    warmup_ratio=0.03,
)
# 数据整理器
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)
# 创建训练器
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    data_collator=data_collator,
)
# 开始训练
# trainer.train()
# 保存LoRA权重
# model.save_pretrained("./lora_weights")
# tokenizer.save_pretrained("./lora_weights")

三、QLoRA微调

3.1 QLoRA简介

"""
QLoRA = Quantized LoRA
核心优化：
1. 4-bit NormalFloat量化：更精确的量化方法
2. 双重量化：量化常数也被量化
3. 分页优化器：减少显存峰值
效果：
- 在单张24GB GPU上微调65B模型
- 性能接近全量微调
"""

3.2 QLoRA实战

from transformers import BitsAndBytesConfig
# 4-bit量化配置
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)
# 加载量化模型
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)
# 准备模型进行训练
model = prepare_model_for_kbit_training(model)
# 应用LoRA
model = get_peft_model(model, lora_config)
# 后续训练流程与LoRA相同

四、微调实战案例

4.1 情感分析微调

from datasets import load_dataset
# 加载情感分析数据集
# dataset = load_dataset("imdb")  # 英文电影评论
# 或使用中文数据集
# 数据处理
def preprocess_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        max_length=512,
        padding="max_length"
    )
# tokenized_datasets = dataset.map(preprocess_function, batched=True)

4.2 领域知识注入

"""
领域知识微调策略：
1. 数据收集
   - 领域文档
   - FAQ问答对
   - 专业术语解释
2. 数据格式
   - 指令格式
   - 对话格式
   - 文档格式
3. 训练策略
   - 混合通用数据
   - 领域数据加权
   - 分阶段训练
"""
domain_data = [
    {
        "instruction": "解释什么是CRDT",
        "input": "",
        "output": """CRDT（Conflict-free Replicated Data Types，无冲突复制数据类型）是一种数据结构...
主要类型：
1. 基于状态的CRDT (State-based)
2. 基于操作的CRDT (Operation-based)
应用场景：
- 协同编辑
- 分布式数据库
- 实时协作应用"""
    }
]

4.3 模型合并

# LoRA权重合并到基础模型
from peft import PeftModel
# 加载基础模型
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)
# 加载LoRA权重
model = PeftModel.from_pretrained(base_model, "./lora_weights")
# 合并权重
merged_model = model.merge_and_unload()
# 保存合并后的模型
# merged_model.save_pretrained("./merged_model")
# tokenizer.save_pretrained("./merged_model")

五、微调最佳实践

5.1 数据准备

"""
微调数据最佳实践：
1. 数据质量 > 数据数量
   - 高质量小数据集 > 低质量大数据集
   
2. 数据多样性
   - 覆盖不同场景
   - 包含边界情况
   
3. 数据格式
   - 格式一致
   - 指令清晰
   
4. 数据清洗
   - 去重
   - 去噪
   - 质量过滤
"""
# 数据清洗示例
def clean_training_data(data):
    """清洗训练数据"""
    cleaned = []
    for item in data:
        # 去除空白字符
        instruction = item["instruction"].strip()
        output = item["output"].strip()
        
        # 检查长度
        if len(instruction) < 5 or len(output) < 10:
            continue
        
        # 检查质量（简单规则）
        if "TODO" in output or "FIXME" in output:
            continue
        
        cleaned.append(item)
    
    return cleaned

5.2 超参数选择

"""
LoRA微调超参数建议：
参数                小模型(7B)    大模型(70B)
─────────────────────────────────────────
LoRA rank (r)       8-16         16-32
LoRA alpha          32           64
Learning rate       2e-4         1e-4
Batch size          4-8          1-2
Gradient acc.       4-8          16-32
Epochs              3-5          1-3
经验法则：
- r 越大，表达能力越强，但参数越多
- alpha 通常设为 2r
- 学习率从小开始尝试
"""

5.3 评估与迭代

def evaluate_model(model, tokenizer, test_data):
    """评估微调后的模型"""
    from rouge import Rouge
    import numpy as np
    
    rouge = Rouge()
    scores = []
    
    for sample in test_data:
        # 生成预测
        input_text = format_instruction(sample)
        inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
        
        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_new_tokens=256,
                do_sample=False
            )
        
        prediction = tokenizer.decode(outputs[0], skip_special_tokens=True)
        prediction = prediction.split("### 回答：")[-1].strip()
        
        # 计算ROUGE分数
        try:
            score = rouge.get_scores(prediction, sample["output"])[0]
            scores.append(score["rouge-l"]["f"])
        except:
            scores.append(0)
    
    return {
        "rouge_l_f1": np.mean(scores),
        "num_samples": len(test_data)
    }

六、常见问题解决

6.1 显存不足

"""
解决显存不足的方法：
1. 减小batch size
2. 使用gradient checkpointing
3. 使用DeepSpeed ZeRO
4. 使用QLoRA
"""
# 启用gradient checkpointing
model.gradient_checkpointing_enable()
# DeepSpeed配置
deepspeed_config = {
    "zero_optimization": {
        "stage": 2,
        "offload_optimizer": {
            "device": "cpu"
        }
    }
}

6.2 训练不稳定

"""
训练不稳定的解决方法：
1. 降低学习率
2. 增加warmup steps
3. 使用gradient clipping
4. 检查数据质量
"""
training_args = TrainingArguments(
    # ...
    learning_rate=1e-5,        # 降低学习率
    warmup_ratio=0.1,          # 增加warmup
    max_grad_norm=1.0,         # gradient clipping
)

参考资源

LoRA论文 - Low-Rank Adaptation

QLoRA论文 - 高效微调

PEFT文档 - HuggingFace PEFT

QLoRA教程 - 量化微调

LLaMA-Factory - 微调框架

Axolotl - 微调工具

Stanford Alpaca - 指令微调

上一篇：Prompt Engineering 返回：大模型应用 最后更新: 2026年4月16日

访问 --

搜索文章、标签、项目线索

模型微调实战 - 定制化大模型

一、微调概述

1.1 为什么需要微调

1.2 微调方法对比

二、LoRA微调

2.1 LoRA原理

2.2 使用PEFT进行LoRA微调

2.3 准备训练数据

2.4 训练循环

三、QLoRA微调

3.1 QLoRA简介

3.2 QLoRA实战

四、微调实战案例

4.1 情感分析微调

4.2 领域知识注入

4.3 模型合并

五、微调最佳实践

5.1 数据准备

5.2 超参数选择

5.3 评估与迭代

六、常见问题解决

6.1 显存不足

6.2 训练不稳定

参考资源

讨论与反馈