Sunday, March 30, 2025

Automated Prompt Engineering With DSPy And Intel oneAPI

A Simple Guide to Intel GPU Automated Prompt Engineering. One important method for teaching Large Language Models (LLMs) to direct the model towards producing task-specific responses is prompt engineering.

Although it requires less data and is quicker and less expensive than RAG and fine-tuning, this has frequently been a manual process. Having effective cues that are tailored to the work at hand is even more crucial for on-device LLMs, which are often smaller (typically having less than 14 billion parameters) and are unable to generalize as well as bigger LLMs.

Declarative Self-improving Python (DSPy)

This post will demonstrate how to use the Intel oneAPI Base Toolkit and Declarative Self-improving Python (DSPy), an automated prompt engineering framework, to build a pipeline for a particular task and optimize the prompts for that task on Intel Core Ultra Processors that are available on Intel AI PC.

What is Automated Prompt Engineering Optimization?

The technique known as “automated prompt engineering” uses an LLM to generate progressively better prompts. The following are necessary for any automated prompt engineering framework:

  • An LLM that requires immediate engineering
  • An input-output dataset for the current task
  • An indicator of the LLM’s performance on the mission

In order to improve the LLM’s performance on the assigned task, the automated prompt engineering frameworks will then manage the prompt changes.

Start Now

DSPy and llama.cpp

An open source Python framework called DSPy is used to program LLMs so that their weights and prompts are optimized. The idea is based on the use of code, which is found in optimizers, modules, and signatures, to build pipelines that can then be optimized. DSPy adds structure and modularity to LLM prompting, making it easier to make modifications while maintaining robustness when compared to using pure text prompts.

An LLM engine called Llama.cpp speeds up LLM inference on edge and local devices. Llama.cpp extends SOTA LLM inference techniques to native hardware acceleration and LLM inferencing. Llama.cpp can run on Intel GPUs (integrated graphics, discrete graphics, or data centers) because it supports the SYCL backend.

Installing Llama.cpp with SYCL support for Python execution requires setting the environment variable GGML_SYCL=on beforehand, as illustrated below.

  • Linux 

CMAKE_ARGS=”-DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_SYCL_F16=ON” pip install llama-cpp-python 

  • Windows 

set CMAKE_GENERATOR=Ninja  
set CMAKE_C_COMPILER=cl  
set CMAKE_CXX_COMPILER=icx  
set CXX=icx  
set CC=cl  
set CMAKE_ARGS=”-DGGML_SYCL=ON -DCMAKE_CXX_COMPILER=icx -DCMAKE_C_COMPILER=cl -DGGML_SYCL_F16=ON” pip install llama-cpp-python 

Intel oneAPI Base Toolkit

For creating high-performance, data-centric applications across several architectures, including Intel CPUs, GPUs, and FPGAs, the Intel oneAPI Base Toolkit comprises a variety of tools (for profiling, design support, and debug tools) as well as domain-specific libraries. Additionally, it makes it simple to switch from CUDA code to open-standard multiarchitecture C++ via SYCL.

AI PCs

The central processing unit (CPU), graphic processing unit (GPU), and neural processing unit (NPU) of AI PCs, the newest generation of personal computers, enable power-efficient AI acceleration so that they may manage a variety of AI workloads. For quick and effective AI experiences, AI PCs with Intel Core Ultra CPUs can strike a balance between power and performance. The AI PC can carry out a range of AI tasks effectively with the help of NPUs, which are specialised hardware made for AI capabilities and offer improved privacy and security.

DSPy prompt

Code Sample

The AI PC Notebooks GitHub Repository has this code sample. The DSPy framework will be built up to optimize prompts for the LLM pipeline after the dataset has been loaded.

Before executing the code sample, ensure that the Intel oneAPI Base Toolkit is installed. The code sample implements the following stages.

  • Load Riddle Dataset: The ARC dataset will be the one can use. Multiple-choice answers to grade-level scientific questions are included in this dataset. Predicting the right multiple-choice response to the question is the LLM’s task. Frequently, a dataset may not be available for the task at hand. One would have to make their own examples for these situations. DSPy can optimize the prompts for the task after working with a few samples.

dataset = load_dataset("INK-USC/riddle_sense", split="validation")

  • Create Question Signature: DSPy defines the input and output for the LLM using signatures. Python will be used to represent this as a class. It provide the LLM’s input and output within this class. The answer to the riddle is the output, and the riddle itself is the input. It can use Python type to specify the correct multiple-choice response to the question, which It know is the correct answer for the LLM. During optimization, DSPy will build prompts around this signature and utilize it to prompt the LLM.
class Question(dspy.Signature): 

"""Answer science questions by selecting the correct answer from a list of choices. Respond with the letter of the correct answer."""  # noqa: E501 

riddle = dspy.InputField() 

answer: Literal["A", "B", "C", "D"] = dspy.OutputField()
  • Process Dataset for DSPy: Next, the questions and answers list must be transformed into a format that DSPy can comprehend. A list of dspy is fed into DSPy. The science inquiry and the right response are indicated by example items.
  • Use llama.cpp to load LLM, then set up DSPy to utilize LLM: The LLM will be loaded using llama-cpp-python, a Python wrapper for llama.cpp, once it have chosen which LLM to use. The model and tokeniser will be downloaded from Huggingface and loaded onto the computer by the from_pretrained function. The riddles and solutions will thereafter be prompted by this LLM.
  • The LlamaCPP method, which accepts the llm object, is provided by DSPy. The LLM and llama-cpp-python will then be used by DSPy to prompt the questions and responses. The code sample uses the Intel oneAPI DPC++/C++ Compiler to develop llama-cpp-python with the SYCL backend, enabling LLMs to run on Intel GPUs. The Intel oneAPI DPC++/C++ Compiler enables LLMs to operate on Intel graphics processing units.

metric = dspy.evaluate.metrics.answer_exact_match

  • Establish a metric to assess LLM’s task performance: Answer_exact_match, which returns True if the LLM answer precisely matches the right answer and False otherwise, is the statistic it will use to assess the LLM’s performance. This statistic will be used to assess how well the LLM performs on the test and validation sets.
class QuestionAnsweringAI(dspy.Module): 

    def __init__(self): 

        self.signature = Riddle 

        self.respond = dspy.ChainOfThought(self.signature) 

  

    def forward(self, science_question): 

        return self.respond(science_question=science_question) 
  • Establish the LLM pipeline: Following the acquisition of the dataset, it must develop a module that reflects the input and the prompt strategy that the LLM ought to employ. The input and output for the LLM will be represented by a module that to create using the Module class from dspy. This module will then be used to build a pipeline that DSPy will optimize.
  • Configure LLM evaluation: After defining the LLM pipeline, inputs, and outputs, it must have a plan in place to assess the LLM’s performance using fresh prompts. DSPY will be used. To accept a dataset and metric and begin the evaluation process, use the evaluate function.
  • Set up and run the DSPy optimiser: DSPy provides a range of optimisers to identify the most effective prompts. The MIPROv2 will be used to identify more effective LLM prompts. MIPROv2 is an optimiser that uses quick engineering. Additionally, MIPROv2 has hyperparameters that regulate the time it takes to locate prompts. The light setting will be used for the hyperparameters.
  • Compare accuracy prior to and during optimization: Lastly, it will show the LLM’s accuracy prior to and following prompt engineering. While the optimized LLM achieved 78% accuracy, the LLM on the test set without optimization only achieved 35%.

What Comes Next

In addition to enabling LLMs to operate efficiently on Intel GPUs using the Intel oneAPI Base Toolkit, to hope that this paper and code sample will introduce developers to LLM evaluation and timely optimization. If you require more customization for your LLM than automated prompt engineering can provide, it recommend exploring to RAG and fine-tuning tools.

To assist you in planning, developing, implementing, and scaling your AI solutions, it can also encourage you to review and integrate Intel’s other AI/ML Framework optimizations and tools into your AI workflow. Additionally, you can learn about the unified, open, standards-based oneAPI programming model that serves as the cornerstone of Intel’s AI Software Portfolio.

Drakshi
Drakshi
Since June 2023, Drakshi has been writing articles of Artificial Intelligence for govindhtech. She was a postgraduate in business administration. She was an enthusiast of Artificial Intelligence.
RELATED ARTICLES

Recent Posts

Popular Post