Dell Enterprise Hub will launch internationally on May 21, 2024.
Open-source AI initiatives have democratized access to cutting-edge technologies, allowing developers worldwide to collaborate, experiment, and advance AI at an unprecedented rate.
Dell Technologies values an open ecosystem of AI technology partnerships that benefits data scientists and developers. Customers can construct customized AI applications using the industry’s largest AI portfolio and an open ecosystem of technology partners at the Dell AI Factory.
A Guide to AI Models
The enormous number of models and the complexity of creating the optimum developer environment make AI technology use difficult. Many manual activities and software and hardware optimization dependencies are needed to get models functioning successfully on a platform. Hugging Face is the foremost open AI platform, therefore Dell Technologies’ cooperation with it is game-changing.
Introduce Dell Enterprise Hub on Hugging Face
They are thrilled to launch the Dell Enterprise Hub on Hugging Face at Dell Technologies World 2024. This unique interface simplifies on-premises deployment of popular large language models (LLM) on Dell’s powerful infrastructure for Dell clients.
Custom containers and scripts in the Dell Enterprise Hub make deploying Hugging Face open-source models easy and secure. For hosted datasets and models, it secures open-source models from reputable sources. Dell is the first infrastructure provider to offer the Hugging Face portal on-prem container or model deployments. Hugging Face will use Dell as its preferred on-premises infrastructure provider to promote enterprise adoption of customized, open-source generative AI datasets, libraries, and models.
Optimizing AI Model Deployment On-prem
The Dell Enterprise Hub containers will feature the foundation model, platform-specific optimizations, and all software dependencies. These containers are optimized to start training, fine-tuning, and inferencing without code changes for any use case. This lets organizations run the most popular and permissive AI models like Meta Llama 3 optimally on Dell platforms, accelerating the time-to-value of AI solutions on-premises with a few clicks.
Accelerator Services for Generative AI Prototyping
Dell announced Accelerator Services for the Dell Enterprise Hub to help clients. These expert services on tool and model selection and use case alignment improve the Hugging Face portal developer experience. Dell goal is to enable rapid generative AI prototyping so developers and data scientists can use the newest AI technology easily.
Dell Technologies and Hugging Face’s relationship advances open, accessible AI. This shift to open-source AI will revolutionize an industry dominated by proprietary models. The Dell Enterprise Hub helps developers and data scientists innovate, collaborate, and progress AI by optimizing generative AI model deployment. The Dell Enterprise Hub is your gateway to the future of AI, whether you want to improve or explore.
Model Selection Made Easy with Dell Enterprise Hub
In reality, no model rules them all. Even with one, using the same model for all applications is inefficient and ineffective. Point-of-sale chatbots and domain-specific knowledge bots have different models. Dell Enterprise Hub supports many customers and applications with common high-performance model architectures. As model designs, capabilities, and application demands change, Dell will add more models to fulfil customer needs.
Let’s examine some key criteria for choosing an application model.
Size, capabilities
The amount of training parameters, or model size, varies by model. Larger models with more parameters have better functions but are slower and cost more to compute. Sometimes larger models of the same architecture support specific approaches whereas smaller ones don’t.
Llama 3 70B employs Grouped Query Attention (GQA) to boost inference scalability and overcome computational difficulty, but not Sliding Window Attention (SWA) to handle sequences of any length with a lower inference cost. Mistral’s models support GQA, SWA, and the Byte-fallback BPE tokenizer, which don’t map characters to out-of-vocabulary tokens. It’s unique that Dell Enterprise Hub links a model and task to a Dell Platform, thus hardware requirements may limit model selection.
Training data
Models are trained on various datasets. Each model has different training data quality and amount. Llama 3 was trained on 15T public tokens. Llama3 is seven times bigger and four times more code than Llama 2. Five percent of Llama 3 has high-quality non-English datasets in 30 languages. Gemma models learn logic, symbolic representation, and mathematical inquiries from 6T tokens of online content, code, and mathematics. Mistral speaks English, French, Italian, German, and Spanish, unlike Gemma and Llama 3.
Training powerful language models that handle a variety of tasks and text formats requires high-quality and diverse data sources, whether they are trained on data passed through heuristic filters, Not Safe for Work (NSFW) filters, Child Sexual Abuse Material (CSAM) filters, Sensitive Data Filtering, semantic deduplication, or text classifiers to improve data quality.
Evaluation model benchmarks
Benchmarks provide good insight into a model’s application performance, but they should be interpreted cautiously. The benchmark scores may be overstated because these datasets are public and contain the data used to train these algorithms.
The assumption that benchmark test prompts are random is wrong. Account for non-random correlation in model performance across test prompts to see model ranks on important benchmarks vary. This casts doubt on benchmarking research and benchmark-based evaluation.
Massive Multitask Language Understanding (MMLU), the most prominent benchmark that evaluates models using multiple choice questions and replies, is sensitive to minute question specifics. Simple changes like choice order or answer selection can affect rankings by 8 spots.
See these Arxiv papers for more on this phenomenon: Assessing LLM assessment robustness Targeting Benchmarks Shows Large Sensitivity.
Architectural models
Transformer architecture underpins most new LLMs, but they differ architecturally. LLMs typically use an encoder-decoder design. The encoder processes all input, and the decoder outputs. A decode-only approach skips the encoder and calculates output from input one piece at a time. The decoder-only mode of Llama 3 makes it ideal for chatbots, dialogue systems, machine translation, text summarization, and creative text synthesis, but not for context-heavy jobs.
BERT and T5 are the most used encoder-decoder architectures. Gemma LLM decodes only. Mistral outperforms its counterparts in inference performance when GQA and SWA are used. Sparse Mixture of Experts (MOE) models like Mistral 8X 22B cut inference costs by only keeping 44B active with 8 experts despite having 176B parameters. However, MOE models are harder to fine-tune and train than non-MOE architecture models. Techniques change regularly.
Context windows
Stateless LLMs cannot distinguish between questions. The application’s short-term memory feeds prior inputs and outputs to the LLM to provide context and the appearance of continuous communication. A bigger context window lets the model evaluate more context and may yield a more accurate response.
The model can consider up to 4096 tokens of text when creating a response for Llama 2 7B and 8192 for Gemma 7B. RAG-based AI solutions need a larger context window to retrieve and generate LLM data well. Mistral 8x 22B context window is 64K. Llama 3 8B and 70B have 8K context windows, however future releases will expand that.
Vocab and head size
The LLM’s vocabulary breadth the number of words or tokens it can recognize and deal with is one of the most crucial factors yet often underestimated. Higher vocabulary sizes give LLMs a more nuanced understanding of language, but they also increase training costs.
Another intriguing criterion is head size, which is linked to self-attention. The self-attention layer helps the model find input sequence relationships. Head size determines this layer’s output vector dimensionality. The dimensions of these vectors represent distinct aspects of the input. Head size affects the model’s ability to capture input sequence relationship characteristics. More heads deepen understanding and enhance computational complexity.
Conclusion
In addition to the model, AI solutions require many parts. An AI solution powered by LLMs may use many models to solve a business problem elegantly. When creating this selected selection of models for Dell Enterprise Hub with Hugging Face, Dell Technologies examined all of these parameters. New and more powerful opensource LLMs are expected each month, and Dell Technologies and Hugging Face will bring model support for optimization to Dell Enterprise Hub.