For text classification problems, Intel created the open-source Polite Guard Natural Language Processing. In order to divide text into four distinct categories nice, slightly polite, neutral, and impolite this model has been refined from BERT. Explore the source code on GitHub and the supporting dataset on Hugging Face in addition to the model.
Advantages
- Developers can more easily create their own synthetic data and improve their models with a scalable model development pipeline.
- Enhanced Robustness: By offering a protection mechanism against hostile attacks, this improves the resilience of systems.
- Benchmarking and Evaluation: Enables developers to assess and contrast how well their models perform in terms of classifying politeness.
- Enhanced Customer Experience: By guaranteeing courteous and respectful interactions across several channels, this enhances customer satisfaction and loyalty.
Synthetic Data Generation and Fine-tuning Process
In order to generate labeled samples describing customer service interactions across a variety of industries, including financial, travel, food, retail, sports, culture, and professional development, it design a synthetic data generator in Python and execute it on Intel Xeon Processors. Next, use this dataset to refine the BERT base model.
Intel choose the labels and categories at random and provide a language model instructions to create synthetic data based on the chosen categories and labels in order to guarantee data regularization. Intel employ a range of prompts and three sizable language models Llama 3.1-8B-Instruct, Mixtral 8x7B-Instruct-v0.1, and Gemma 2-9B-It during the creation process to guarantee data diversity.
Next, in order to maximize the validation F1-score, it use Optuna to adjust the learning rate and weight decay hyperparameters using the Tree-structured Parzen Estimator (TPE) algorithm. Additionally, stop underperforming hyperparameter trials using Optuna’s pruning callback. See the Hugging Face model card for additional details on the hyperparameter search space and the top-performing hyperparameters.
Usage and Access
- Data sets: Three elements make up the Hugging Face Polite Guard dataset:
- Few-Shot prompting was used to generate 50,000 labeled samples.
- 50,000 labeled samples produced with urging from the Chain-of-Thought (CoT).
- 200 business training samples with masked personal identifiers that have been annotated.
Training (80%), validation (10%), and test (10%) sets of the synthetic data are separated, with each set being balanced based on the label. Despite being trained exclusively on synthetic data, Polite Guard achieved 92.4% accuracy and an F1-score on the test split of synthetic and actual annotated data.
- Source Code: Polite Guard’s GitHub repository contains the source code for the Synthetic Data Generator and the Fine-tuner that makes use of Al accelerators like Intel Gaudi.
- Execution: Cloning the repository and running the code on the Intel Tiber AI Cloud is advised for developers.
What’s Next
By using Polite Guard, you can create NLP apps that are more reliable, courteous, and user-friendly. You can take advantage of continuous improvements in generative AI by contributing to this open-source project.
The unified, open, standards-based oneAPI programming architecture that serves as the basis for Intel’s AI Software Portfolio, and explore and integrate Intel’s additional AI/ML Framework improvements and tools into your AI workflow.
Polite Guard
Intel created the open-source Polite Guard NLP language model, which was improved upon from BERT for text classification applications. Four categories nice, slightly polite, neutral, and impolite are intended to be used for text classification. To help both communities contribute to the creation of increasingly complex and context-aware AI systems, this model, together with the datasets and source code that go with it, are made available on GitHub and Hugging Face.
Applications
It is simpler for developers to build and improve their own models with Polite Guard’s scalable model development pipeline and approach. The project’s further contributions include:
Enhanced Robustness: By offering a protection mechanism against hostile attacks, Polite Guard increases the robustness of systems. This guarantees that even in the presence of potentially hazardous inputs, the model may continue to function and be dependable.
Benchmarking and Evaluation: By introducing the first politeness benchmark, the project enables developers to assess and contrast how well their models perform in terms of classifying politeness. This aids in establishing a benchmark for upcoming advancements in this field.
Improved Customer Experience: Polite Guard may greatly increase customer satisfaction and loyalty by guaranteeing courteous and respectful interactions across a variety of platforms. This is especially helpful for customer service applications where it’s important to have a cheerful tone.
Description of labels
- Polite: Text is thoughtful, respectful, and demonstrates good manners; it frequently uses polite language and a welcoming tone.
- Generally courteous but lacking in formality or warmth, the text communicates with a respectable degree of civility.
- Neutral: There are no overt sentimental overtones or particular attempts at civility; the text is direct and truthful.
- Impolite: Text is unpleasant or disrespectful, frequently dismissive or direct, and demonstrates a lack of regard for the recipient’s feelings.
Model Details
- Training Data: Intel Gaudi Al accelerators were used to train the model on the Polite Guard Dataset. Synthetically created customer service encounters from a variety of industries, including finance, travel, food and drink, retail, sports clubs, culture and education, and professional development, make up the training dataset.
- Base Model: BERT-base, with 110M parameters and 12 layers.
- Fine-tuning Procedure: PyTorch Lightning was used to fine-tune the Polite Guard train dataset using the following hyperparameters.
Hypeparameter | Batch size | Learning rate | Learning rate schedule | Max epochs | Optimizer | Weight decay | Precision |
---|---|---|---|---|---|---|---|
Value | 32 | 4.78e-05 | Linear warmup (10% of steps) | 2 | AdamW | 1.01e-06 | bf16-mixed |
To maximize the validation F1-score, hyperparameter tuning was carried out using Bayesian optimization with the Tree-structured Parzen Estimator (TPE) algorithm through Optuna with 35 trials. Included in the hyperparameter search space were
learning rate: [1e-5, 5e-4]
weight decay: [1e-6, 1e-2]
Model checkpointing was utilized to preserve the best-performing model states during the fine-tuning process, and Optuna’s pruning callback was used to end underperforming hyperparameter trials.

Metrics
The model’s primary performance indicators on the test dataset, which included both artificial and manually annotated data, are as follows:
- On the Polite Guard test dataset, accuracy was 92.4%.
- On the Polite Guard test dataset, the F1-Score was 92.4%.