DSPy Cheatsheet DSPy 快速参考指南
This page will contain snippets for frequent usage patterns.
这段内容将包含常见使用模式的片段。
DSPy DataLoaders DSPy 数据加载器
Import and initializing a DataLoader Object:
导入并初始化 DataLoader 对象:
import dspy
from dspy.datasets import DataLoader
dl = DataLoader()
Loading from HuggingFace Datasets
从 HuggingFace Datasets 加载数据
code_alpaca = dl.from_huggingface("HuggingFaceH4/CodeAlpaca_20K")
You can access the dataset of the splits by calling key of the corresponding split:
您可以直接通过调用相应切分的关键字来访问数据集:
train_dataset = code_alpaca['train']
test_dataset = code_alpaca['test']
Loading specific splits from HuggingFace
从 HuggingFace 加载特定拆分
You can also manually specify splits you want to include as a parameters and it'll return a dictionary where keys are splits that you specified:
您也可以手动指定想要包含的拆分作为参数,它将返回一个字典,其中键是您指定的拆分:
code_alpaca = dl.from_huggingface(
"HuggingFaceH4/CodeAlpaca_20K",
split = ["train", "test"],
)
print(f"Splits in dataset: {code_alpaca.keys()}")
If you specify a single split then dataloader will return a List of dspy.Example
instead of dictionary:
如果你指定了单一的分割,那么数据加载器将返回一个包含 dspy.Example
的列表,而不是字典:
code_alpaca = dl.from_huggingface(
"HuggingFaceH4/CodeAlpaca_20K",
split = "train",
)
print(f"Number of examples in split: {len(code_alpaca)}")
You can slice the split just like you do with HuggingFace Dataset too:
你也可以像使用 HuggingFace Dataset 那样对分割进行切片操作:
code_alpaca_80 = dl.from_huggingface(
"HuggingFaceH4/CodeAlpaca_20K",
split = "train[:80%]",
)
print(f"Number of examples in split: {len(code_alpaca_80)}")
code_alpaca_20_80 = dl.from_huggingface(
"HuggingFaceH4/CodeAlpaca_20K",
split = "train[20%:80%]",
)
print(f"Number of examples in split: {len(code_alpaca_20_80)}")
Loading specific subset from HuggingFace
从 HuggingFace 加载特定子集
If a dataset has a subset you can pass it as an arg like you do with load_dataset
in HuggingFace:
如果一个数据集包含一个子集,你可以像在 HuggingFace 中使用 load_dataset
那样将其作为参数传递:
gms8k = dl.from_huggingface(
"gsm8k",
"main",
input_keys = ("question",),
)
print(f"Keys present in the returned dict: {list(gms8k.keys())}")
print(f"Number of examples in train set: {len(gms8k['train'])}")
print(f"Number of examples in test set: {len(gms8k['test'])}")
Loading from CSV 从 CSV 文件加载数据
dolly_100_dataset = dl.from_csv("dolly_subset_100_rows.csv",)
You can choose only selected columns from the csv by specifying them in the arguments:
您可以指定在参数中选择 csv 文件中的特定列
dolly_100_dataset = dl.from_csv(
"dolly_subset_100_rows.csv",
fields=("instruction", "context", "response"),
input_keys=("instruction", "context")
)
Splitting a List of dspy.Example
分割一个 dspy.Example
列表
splits = dl.train_test_split(dataset, train_size=0.8) # `dataset` is a List of dspy.Example
train_dataset = splits['train']
test_dataset = splits['test']
Sampling from List of dspy.Example
从 dspy.Example
列表中进行采样
sampled_example = dl.sample(dataset, n=5) # `dataset` is a List of dspy.Example
DSPy Programs 数字信号处理程序
dspy.Signature dspy.Signature
class BasicQA(dspy.Signature):
"""Answer questions with short factoid answers."""
question = dspy.InputField()
answer = dspy.OutputField(desc="often between 1 and 5 words")
dspy.ChainOfThought dspy 思维链
generate_answer = dspy.ChainOfThought(BasicQA)
# Call the predictor on a particular input alongside a hint.
question='What is the color of the sky?'
pred = generate_answer(question=question)
dspy.ChainOfThoughtwithHint
dspy.思考链提示
generate_answer = dspy.ChainOfThoughtWithHint(BasicQA)
# Call the predictor on a particular input alongside a hint.
question='What is the color of the sky?'
hint = "It's what you often see during a sunny day."
pred = generate_answer(question=question, hint=hint)
dspy.ProgramOfThought 译文:思维程序.dspy
pot = dspy.ProgramOfThought(BasicQA)
question = 'Sarah has 5 apples. She buys 7 more apples from the store. How many apples does Sarah have now?'
result = pot(question=question)
print(f"Question: {question}")
print(f"Final Predicted Answer (after ProgramOfThought process): {result.answer}")
dspy.ReACT dspy.ReACT (动态系统分析工具包的反应模块)
react_module = dspy.ReAct(BasicQA)
question = 'Sarah has 5 apples. She buys 7 more apples from the store. How many apples does Sarah have now?'
result = react_module(question=question)
print(f"Question: {question}")
print(f"Final Predicted Answer (after ReAct process): {result.answer}")
dspy.Retrieve dspy 获取
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.settings.configure(rm=colbertv2_wiki17_abstracts)
#Define Retrieve Module
retriever = dspy.Retrieve(k=3)
query='When was the first FIFA World Cup held?'
# Call the retriever on a particular query.
topK_passages = retriever(query).passages
for idx, passage in enumerate(topK_passages):
print(f'{idx+1}]', passage, '\n')
DSPy Metrics DSPy Metrics 是一个开源库,专门用于数字信号处理(DSP)领域的性能评估。它提供了多种衡量标准,包括但不限于信噪比(SNR)、谐波失真(THD)、互调失真(IMD)和频率响应。这个工具旨在帮助研究人员和工程师在开发新的 DSP 算法或系统时,能够便捷地测试和比较其性能。通过一致性和可重复性的测量,DSPy Metrics 可以确保结果的可靠性,并促进 DSP 领域的公正比较
Function as Metric 作为度量的作用
To create a custom metric you can create a function that returns either a number or a boolean value:
要创建自定义指标,你可以编写一个返回数字或布尔值的函数:
def parse_integer_answer(answer, only_first_line=True):
try:
if only_first_line:
answer = answer.strip().split('\n')[0]
# find the last token that has a number in it
answer = [token for token in answer.split() if any(c.isdigit() for c in token)][-1]
answer = answer.split('.')[0]
answer = ''.join([c for c in answer if c.isdigit()])
answer = int(answer)
except (ValueError, IndexError):
# print(answer)
answer = 0
return answer
# Metric Function
def gsm8k_metric(gold, pred, trace=None) -> int:
return int(parse_integer_answer(str(gold.answer))) == int(parse_integer_answer(str(pred.answer)))
LLM as Judge 请提供需要翻译的具体原文内容,以便我为您进行翻译
class FactJudge(dspy.Signature):
"""Judge if the answer is factually correct based on the context."""
context = dspy.InputField(desc="Context for the prediciton")
question = dspy.InputField(desc="Question to be answered")
answer = dspy.InputField(desc="Answer for the question")
factually_correct = dspy.OutputField(desc="Is the answer factually correct based on the context?", prefix="Factual[Yes/No]:")
judge = dspy.ChainOfThought(FactJudge)
def factuality_metric(example, pred):
factual = judge(context=example.context, question=example.question, answer=pred.answer)
return int(factual=="Yes")
DSPy Evaluation DSPy 评估
from dspy.evaluate import Evaluate
evaluate_program = Evaluate(devset=devset, metric=your_defined_metric, num_threads=NUM_THREADS, display_progress=True, display_table=num_rows_to_display)
evaluate_program(your_dspy_program)
DSPy Optimizers DSPy 优化器
LabeledFewShot 标记的少量样本学习
from dspy.teleprompt import LabeledFewShot
labeled_fewshot_optimizer = LabeledFewShot(k=8)
your_dspy_program_compiled = labeled_fewshot_optimizer.compile(student = your_dspy_program, trainset=trainset)
BootstrapFewShot Bootstrap Few-Shot Learning
from dspy.teleprompt import BootstrapFewShot
fewshot_optimizer = BootstrapFewShot(metric=your_defined_metric, max_bootstrapped_demos=4, max_labeled_demos=16, max_rounds=1, max_errors=5)
your_dspy_program_compiled = fewshot_optimizer.compile(student = your_dspy_program, trainset=trainset)
Using another LM for compilation, specifying in teacher_settings
使用不同的语言模型进行编译,在教师设置中指定
from dspy.teleprompt import BootstrapFewShot
fewshot_optimizer = BootstrapFewShot(metric=your_defined_metric, max_bootstrapped_demos=4, max_labeled_demos=16, max_rounds=1, max_errors=5, teacher_settings=dict(lm=gpt4))
your_dspy_program_compiled = fewshot_optimizer.compile(student = your_dspy_program, trainset=trainset)
Compiling a compiled program - bootstrapping a bootstrapped program
编译编译器程序——启动自举程序
your_dspy_program_compiledx2 = teleprompter.compile(
your_dspy_program,
teacher=your_dspy_program_compiled,
trainset=trainset,
)
Saving/loading a compiled program
保存/加载编译后的程序
save_path = './v1.json'
your_dspy_program_compiledx2.save(save_path)
loaded_program = YourProgramClass()
loaded_program.load(path=save_path)
BootstrapFewShotWithRandomSearch
随机搜索引导的少样本学习
from dspy.teleprompt import BootstrapFewShotWithRandomSearch
fewshot_optimizer = BootstrapFewShotWithRandomSearch(metric=your_defined_metric, max_bootstrapped_demos=2, num_candidate_programs=8, num_threads=NUM_THREADS)
your_dspy_program_compiled = fewshot_optimizer.compile(student = your_dspy_program, trainset=trainset, valset=devset)
Other custom configurations are similar to customizing the BootstrapFewShot
optimizer.
其他自定义配置与自定义 BootstrapFewShot
优化器类似。
Ensemble 集成学习(Ensemble Learning)是一种统计学方法,它结合了多个预测模型的输出,以提高预测的准确性和稳定性。这种方法的基本思想是,通过合并多个“弱预测器”(它们可能性能一般),可以创建一个更强大的“强预测器”。在机器学习中,这通常涉及到训练多个模型,每个模型可能使用不同的算法、参数或者数据子集。然后,这些模型的预测结果会被整合,通常是通过投票(对于分类问题)或平均(对于回归问题)来得出最终决策。集成学习的经典例子包括随机森林(Random Forest)和梯度提升机(Gradient Boosting Machines)。这种方法在各种竞赛和实际应用中都表现出色,因为它能够减少过拟合,增强模型的泛化能力
from dspy.teleprompt import BootstrapFewShotWithRandomSearch
from dspy.teleprompt.ensemble import Ensemble
fewshot_optimizer = BootstrapFewShotWithRandomSearch(metric=your_defined_metric, max_bootstrapped_demos=2, num_candidate_programs=8, num_threads=NUM_THREADS)
your_dspy_program_compiled = fewshot_optimizer.compile(student = your_dspy_program, trainset=trainset, valset=devset)
ensemble_optimizer = Ensemble(reduce_fn=dspy.majority)
programs = [x[-1] for x in your_dspy_program_compiled.candidate_programs]
your_dspy_program_compiled_ensemble = ensemble_optimizer.compile(programs[:3])
BootstrapFinetune 微调 Bootstrapping
from dspy.teleprompt import BootstrapFewShotWithRandomSearch, BootstrapFinetune
#Compile program on current dspy.settings.lm
fewshot_optimizer = BootstrapFewShotWithRandomSearch(metric=your_defined_metric, max_bootstrapped_demos=2, num_threads=NUM_THREADS)
your_dspy_program_compiled = tp.compile(your_dspy_program, trainset=trainset[:some_num], valset=trainset[some_num:])
#Configure model to finetune
config = dict(target=model_to_finetune, epochs=2, bf16=True, bsize=6, accumsteps=2, lr=5e-5)
#Compile program on BootstrapFinetune
finetune_optimizer = BootstrapFinetune(metric=your_defined_metric)
finetune_program = finetune_optimizer.compile(your_dspy_program, trainset=some_new_dataset_for_finetuning_model, **config)
finetune_program = your_dspy_program
#Load program and activate model's parameters in program before evaluation
ckpt_path = "saved_checkpoint_path_from_finetuning"
LM = dspy.HFModel(checkpoint=ckpt_path, model=model_to_finetune)
for p in finetune_program.predictors():
p.lm = LM
p.activated = False
COPRO 协作生产
from dspy.teleprompt import COPRO
eval_kwargs = dict(num_threads=16, display_progress=True, display_table=0)
copro_teleprompter = COPRO(prompt_model=model_to_generate_prompts, metric=your_defined_metric, breadth=num_new_prompts_generated, depth=times_to_generate_prompts, init_temperature=prompt_generation_temperature, verbose=False)
compiled_program_optimized_signature = copro_teleprompter.compile(your_dspy_program, trainset=trainset, eval_kwargs=eval_kwargs)
MIPRO MIPRO(微探针)
from dspy.teleprompt import MIPRO
teleprompter = MIPRO(prompt_model=model_to_generate_prompts, task_model=model_that_solves_task, metric=your_defined_metric, num_candidates=num_new_prompts_generated, init_temperature=prompt_generation_temperature)
kwargs = dict(num_threads=NUM_THREADS, display_progress=True, display_table=0)
compiled_program_optimized_bayesian_signature = teleprompter.compile(your_dspy_program, trainset=trainset, num_trials=100, max_bootstrapped_demos=3, max_labeled_demos=5, eval_kwargs=kwargs)
Signature Optimizer with Types
类型优化签名器
from dspy.teleprompt.signature_opt_typed import optimize_signature
from dspy.evaluate.metrics import answer_exact_match
from dspy.functional import TypedChainOfThought
compiled_program = optimize_signature(
student=TypedChainOfThought("question -> answer"),
evaluator=Evaluate(devset=devset, metric=answer_exact_match, num_threads=10, display_progress=True),
n_iterations=50,
).program
KNNFewShot KNNFewShot 是一种基于近邻学习的低样本量多类分类方法,它利用已有的少数样本来训练模型,以识别未曾见过的类别。在处理新任务时,该算法会找到与新样本最相似的已知类别,并依据这些“邻居”来预测未知类别的标签。这种方法在应对数据稀疏、类别丰富的场景时表现出色,展示了强大的泛化能力和适应性
from dspy.predict import KNN
from dspy.teleprompt import KNNFewShot
knn_optimizer = KNNFewShot(KNN, k=3, trainset=trainset)
your_dspy_program_compiled = knn_optimizer.compile(student=your_dspy_program, trainset=trainset, valset=devset)
BootstrapFewShotWithOptuna
利用 Optuna 优化的 Bootstrap Few-Shot 学习
from dspy.teleprompt import BootstrapFewShotWithOptuna
fewshot_optuna_optimizer = BootstrapFewShotWithOptuna(metric=your_defined_metric, max_bootstrapped_demos=2, num_candidate_programs=8, num_threads=NUM_THREADS)
your_dspy_program_compiled = fewshot_optuna_optimizer.compile(student=your_dspy_program, trainset=trainset, valset=devset)
Other custom configurations are similar to customizing the dspy.BootstrapFewShot
optimizer.
其他自定义配置与自定义 dspy.BootstrapFewShot
优化器类似。
DSPy Assertions DSPy 断言
Including dspy.Assert
and dspy.Suggest
statements
包括 dspy.Assert
和 dspy.Suggest
声明
dspy.Assert(your_validation_fn(model_outputs), "your feedback message", target_module="YourDSPyModuleSignature")
dspy.Suggest(your_validation_fn(model_outputs), "your feedback message", target_module="YourDSPyModuleSignature")
Activating DSPy Program with Assertions
使用断言激活 DSPy 程序
Note: To use Assertions properly, you must activate a DSPy program that includes dspy.Assert
or dspy.Suggest
statements from either of the methods above.
注意:要正确使用断言,您需要启动一个包含上述任一方法中的 dspy.Assert
或 dspy.Suggest
语句的 DSPy 程序。
#1. Using `assert_transform_module:
from dspy.primitives.assertions import assert_transform_module, backtrack_handler
program_with_assertions = assert_transform_module(ProgramWithAssertions(), backtrack_handler)
#2. Using `activate_assertions()`
program_with_assertions = ProgramWithAssertions().activate_assertions()
Compiling with DSPy Programs with Assertions
使用 DSPy 编译带断言的程序
program_with_assertions = assert_transform_module(ProgramWithAssertions(), backtrack_handler)
fewshot_optimizer = BootstrapFewShotWithRandomSearch(metric = your_defined_metric, max_bootstrapped_demos=2, num_candidate_programs=6)
compiled_dspy_program_with_assertions = fewshot_optimizer.compile(student=program_with_assertions, teacher = program_with_assertions, trainset=trainset, valset=devset) #student can also be program_without_assertions