Automated Prompt Engineering With DSPy And Intel oneAPI

March 26, 2025

97

Page Contents

A Simple Guide to Intel GPU Automated Prompt Engineering. One important method for teaching Large Language Models (LLMs) to direct the model towards producing task-specific responses is prompt engineering.

Although it requires less data and is quicker and less expensive than RAG and fine-tuning, this has frequently been a manual process. Having effective cues that are tailored to the work at hand is even more crucial for on-device LLMs, which are often smaller (typically having less than 14 billion parameters) and are unable to generalize as well as bigger LLMs.

Declarative Self-improving Python (DSPy)

This post will demonstrate how to use the Intel oneAPI Base Toolkit and Declarative Self-improving Python (DSPy), an automated prompt engineering framework, to build a pipeline for a particular task and optimize the prompts for that task on Intel Core Ultra Processors that are available on Intel AI PC.

What is Automated Prompt Engineering Optimization?

The technique known as “automated prompt engineering” uses an LLM to generate progressively better prompts. The following are necessary for any automated prompt engineering framework:

An LLM that requires immediate engineering
An input-output dataset for the current task
An indicator of the LLM’s performance on the mission

In order to improve the LLM’s performance on the assigned task, the automated prompt engineering frameworks will then manage the prompt changes.

Start Now

DSPy and llama.cpp

An open source Python framework called DSPy is used to program LLMs so that their weights and prompts are optimized. The idea is based on the use of code, which is found in optimizers, modules, and signatures, to build pipelines that can then be optimized. DSPy adds structure and modularity to LLM prompting, making it easier to make modifications while maintaining robustness when compared to using pure text prompts.

An LLM engine called Llama.cpp speeds up LLM inference on edge and local devices. Llama.cpp extends SOTA LLM inference techniques to native hardware acceleration and LLM inferencing. Llama.cpp can run on Intel GPUs (integrated graphics, discrete graphics, or data centers) because it supports the SYCL backend.

Installing Llama.cpp with SYCL support for Python execution requires setting the environment variable GGML_SYCL=on beforehand, as illustrated below.

Linux

CMAKE_ARGS=”-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON” pip install llama-cpp-python

Windows

set CMAKE_GENERATOR=Ninja
set CMAKE_C_COMPILER=cl
set CMAKE_CXX_COMPILER=icx
set CXX=icx
set CC=cl
set CMAKE_ARGS=”-DGGML_SYCL=ON -DCMAKE_CXX_COMPILER=icx -DCMAKE_C_COMPILER=cl -DGGML_SYCL_F16=ON” pip install llama-cpp-python

Intel oneAPI Base Toolkit

For creating high-performance, data-centric applications across several architectures, including Intel CPUs, GPUs, and FPGAs, the Intel oneAPI Base Toolkit comprises a variety of tools (for profiling, design support, and debug tools) as well as domain-specific libraries. Additionally, it makes it simple to switch from CUDA code to open-standard multiarchitecture C++ via SYCL.

AI PCs

The central processing unit (CPU), graphic processing unit (GPU), and neural processing unit (NPU) of AI PCs, the newest generation of personal computers, enable power-efficient AI acceleration so that they may manage a variety of AI workloads. For quick and effective AI experiences, AI PCs with Intel Core Ultra CPUs can strike a balance between power and performance. The AI PC can carry out a range of AI tasks effectively with the help of NPUs, which are specialised hardware made for AI capabilities and offer improved privacy and security.

DSPy prompt

Code Sample

The AI PC Notebooks GitHub Repository has this code sample. The DSPy framework will be built up to optimize prompts for the LLM pipeline after the dataset has been loaded.

Before executing the code sample, ensure that the Intel oneAPI Base Toolkit is installed. The code sample implements the following stages.

Load Riddle Dataset: The ARC dataset will be the one can use. Multiple-choice answers to grade-level scientific questions are included in this dataset. Predicting the right multiple-choice response to the question is the LLM’s task. Frequently, a dataset may not be available for the task at hand. One would have to make their own examples for these situations. DSPy can optimize the prompts for the task after working with a few samples.

dataset = load_dataset("INK-USC/riddle_sense", split="validation")

Create Question Signature: DSPy defines the input and output for the LLM using signatures. Python will be used to represent this as a class. It provide the LLM’s input and output within this class. The answer to the riddle is the output, and the riddle itself is the input. It can use Python type to specify the correct multiple-choice response to the question, which It know is the correct answer for the LLM. During optimization, DSPy will build prompts around this signature and utilize it to prompt the LLM.

class Question(dspy.Signature): 

"""Answer science questions by selecting the correct answer from a list of choices. Respond with the letter of the correct answer."""  # noqa: E501 

riddle = dspy.InputField() 

answer: Literal["A", "B", "C", "D"] = dspy.OutputField()

Process Dataset for DSPy: Next, the questions and answers list must be transformed into a format that DSPy can comprehend. A list of dspy is fed into DSPy. The science inquiry and the right response are indicated by example items.

Use llama.cpp to load LLM, then set up DSPy to utilize LLM: The LLM will be loaded using llama-cpp-python, a Python wrapper for llama.cpp, once it have chosen which LLM to use. The model and tokeniser will be downloaded from Huggingface and loaded onto the computer by the from_pretrained function. The riddles and solutions will thereafter be prompted by this LLM.

The LlamaCPP method, which accepts the llm object, is provided by DSPy. The LLM and llama-cpp-python will then be used by DSPy to prompt the questions and responses. The code sample uses the Intel oneAPI DPC++/C++ Compiler to develop llama-cpp-python with the SYCL backend, enabling LLMs to run on Intel GPUs. The Intel oneAPI DPC++/C++ Compiler enables LLMs to operate on Intel graphics processing units.

metric = dspy.evaluate.metrics.answer_exact_match

Establish a metric to assess LLM’s task performance: Answer_exact_match, which returns True if the LLM answer precisely matches the right answer and False otherwise, is the statistic it will use to assess the LLM’s performance. This statistic will be used to assess how well the LLM performs on the test and validation sets.

class QuestionAnsweringAI(dspy.Module): 

    def __init__(self): 

        self.signature = Riddle 

        self.respond = dspy.ChainOfThought(self.signature) 

  

    def forward(self, science_question): 

        return self.respond(science_question=science_question)

Establish the LLM pipeline: Following the acquisition of the dataset, it must develop a module that reflects the input and the prompt strategy that the LLM ought to employ. The input and output for the LLM will be represented by a module that to create using the Module class from dspy. This module will then be used to build a pipeline that DSPy will optimize.

Configure LLM evaluation: After defining the LLM pipeline, inputs, and outputs, it must have a plan in place to assess the LLM’s performance using fresh prompts. DSPY will be used. To accept a dataset and metric and begin the evaluation process, use the evaluate function.

Set up and run the DSPy optimiser: DSPy provides a range of optimisers to identify the most effective prompts. The MIPROv2 will be used to identify more effective LLM prompts. MIPROv2 is an optimiser that uses quick engineering. Additionally, MIPROv2 has hyperparameters that regulate the time it takes to locate prompts. The light setting will be used for the hyperparameters.

Compare accuracy prior to and during optimization: Lastly, it will show the LLM’s accuracy prior to and following prompt engineering. While the optimized LLM achieved 78% accuracy, the LLM on the test set without optimization only achieved 35%.

What Comes Next

In addition to enabling LLMs to operate efficiently on Intel GPUs using the Intel oneAPI Base Toolkit, to hope that this paper and code sample will introduce developers to LLM evaluation and timely optimization. If you require more customization for your LLM than automated prompt engineering can provide, it recommend exploring to RAG and fine-tuning tools.

To assist you in planning, developing, implementing, and scaling your AI solutions, it can also encourage you to review and integrate Intel’s other AI/ML Framework optimizations and tools into your AI workflow. Additionally, you can learn about the unified, open, standards-based oneAPI programming model that serves as the cornerstone of Intel’s AI Software Portfolio.

Automated Prompt Engineering With DSPy And Intel oneAPI

Declarative Self-improving Python (DSPy)

What is Automated Prompt Engineering Optimization?

Start Now

DSPy and llama.cpp

Intel oneAPI Base Toolkit

AI PCs

DSPy prompt

Code Sample

What Comes Next

Distribution Vectors Fine-tune Models For Better Performance

NVIDIA RTX Kit: Neural Rendering And AI Ray Tracing Tools

NVIDIA DGX Cloud Pricing, Benefits, And Features Explained

LEAVE A REPLY Cancel reply

Recent Posts

INNO3D GEFORCE RTX 5070 TWIN X2 OC Graphics Card

Distribution Vectors Fine-tune Models For Better Performance

NVIDIA RTX Kit: Neural Rendering And AI Ray Tracing Tools

Explore Snapdragon G Series: Powering Handheld Gaming

NVIDIA DGX Cloud Pricing, Benefits, And Features Explained

SES and SpeQtral Sign MoU Quantum-Secure Communications

Popular Post

ASRock’s creative AMD FP6 series thin mini-ITX motherboard

ASUS ProArt PA602 The Most Elegant Computer Case!

Boost Your Apps Now: Amazon ElastiCache Serverless Unveiled!

What is Azure Policy in Microsoft Azure

The Ultimate Showdown: Redmi Watch 3 vs Redmi Watch 4!

Cardea Z540 SSD Revolutionizes Storage

About Us

POPULAR CATEGORY