Blog

Databricks DBRX Tutorial: A Step-by-Step Guide

Learn how Databricks DBRX—an open-source LLM can handle complex tasks and generate intelligent results.

May 2024 · 10 min read

Databricks has recently introduced DBRX, its open general-purpose large language model (LLM) built on a mixture-of-experts (MoE) architecture with a fine-grained approach. Instead of using a single neural network for all tasks, the system comprises multiple specialized "expert" networks, each optimized for different types of tasks or data.

This model has performed better than traditional LLMs like GPT-3.5 and Llama 2 because it's faster and more cost-effective. According to tests, DBRX scores 73.7%, which is higher than LLaMa2 (69.8%) in language understanding tasks.

In this article, we’ll discuss its capabilities and how to get started with Databricsk DBRX in more detail.

What is Databricks DBRX?

DBRX uses a transformer-based decoder-only architecture trained using next-token prediction. It employs a fine-grained mixture-of-experts (MoE) architecture. These ‘experts’ refer to specialized agents based on LLMs, enhanced with domain-specific knowledge and reasoning skills.

DBRX has a large number of smaller experts (16 experts in total) and chooses a subset of them (4 experts) for any given input.

Simply put, this fine-grained approach with 65x more possible combinations of experts improves model quality compared to other open MoE models like Mixtral and Grok-1, which have fewer experts and choose fewer experts per input.

Here are some key details about DBRX:

Parameter size: DBRX has a total of 132 billion parameters, out of which 36 billion parameters are active on any input.
Pretraining data: It was pre-trained on 12 trillion tokens of carefully curated data and a context length of 32,000 tokens, which is at least 2x better token-for-token than the data used to pre-train the MPT family of models.

How was DBRX trained?

The model was pre-trained on an impressively large dataset, estimated to be twice as effective as previous datasets used by Databricks. A suite of Databricks tools, such as Apache Spark and Databricks notebooks for data processing and Unity Catalog for data governance, was used to train the model.

During its training, curriculum learning was employed, and the data mix was changed to substantially improve model quality. Such strategic alterations in the training data mix optimized the model's ability to handle diverse inputs effectively.

Some of the key technologies used for DBRX’s pretraining include:

Rotary Position Encodings (RoPE): A method in transformers that encodes token positions by rotational transformations in high-dimensional space, enhancing sequence understanding.
Gated Linear Units (GLU): A neural network layer using a gate mechanism to control information flow, aiding in capturing complex patterns.
Grouped Query Attention (GQA): An attention mechanism that computes weights within groups of queries to reduce computational cost and improve focus.
GPT-4 tokenizer from the tiktoken repository: A specific tokenizer implementation for GPT-4, converting text into model-ready tokens, optimized for GPT-4's processing needs.

How Does DBRX Compare to Similar Models?

Databricks claims its DBRX model is superior to several top open-source models in terms of efficiency and task performance.

Here’s a detailed comparison of how DBRX stacks up against its competitors:

1) DBRX vs. LLaMA2-70B

General knowledge (MMLU): DBRX Instruct has a 9.8% higher score over LLaMA2-70B.
Commonsense reasoning (HellaSwag dataset): DBRX Instruct continues to lead with a 3.1% higher score.
Databricks gauntlet (a model that assesses pre-trained LLMs' capabilities based on different tasks): DBRX Instruct has a strong lead—a 14% higher score than LLaMA2-70B.
Programming and mathematical reasoning: DBRX Instruct stands out with 37.9% in programming and 40.2% in math reasoning.

2) DBRX Instruct vs. Mixtral Instruct

General knowledge: DBRX Instruct edges out Mixtral Instruct with a 2.3% higher score.
Commonsense reasoning (HellaSwag dataset): DBRX Instruct performs better with a 1.4% higher score.
Databricks gauntlet: DBRX Instruct surpasses Mixtral Instruct by a notable margin of 6.1%.
Programming and mathematical reasoning: DBRX Instruct outperformed Mixtral Instruct by 15.3% in programming and 5.8% in mathematical reasoning.

3) DBRX vs Grok-1

General knowledge: The performance is somehow similar, with DBRX Instruct holding a slight edge of 0.7% over Grok-1.
Programming and mathematical reasoning: DBRX Instruct outperformed Grok-1 with a 6.9% lead in programming and a 4% lead in math reasoning.

4) DBRX vs Mixtral Base

General knowledge: DBRX Instruct slightly leads with a 1.8% higher score than Mixtral Base.
Commonsense reasoning (HellaSwag dataset): DBRX scores higher with a 2.5% difference.
Databricks gauntlet: DBRX leads with 10% over Mixtral Base.
Programming and mathematical reasoning: DBRX Instruct shows a remarkable performance lead with a 29.9% higher score than Mixtral Base.

Comparison table

Below, we’ve collated the comparisons into a table, and shown a graph based on some of the results:

Model Comparison	General Knowledge	Commonsense Reasoning	Databricks Gauntlet	Programming Reasoning	Mathematical Reasoning
DBRX vs LLaMA2-70B	+9.8%	+3.1%	+14%	+37.9%	+40.2%
DBRX vs Mixtral Instruct	+2.3%	+1.4%	+6.1%	+15.3%	+5.8%
DBRX vs Grok-1	+0.7%	Not available	Not available	+6.9%	+4%
DBRX vs Mixtral Base	+1.8%	+2.5%	+10%	+29.9%	Not available

Comparing DBRX’s quality with other open-source LLMs - source

How to Use DBRX: A Step-By-Step Guide

Before accessing DBRX, ensure your system has at least 320GB of memory. Then follow these steps to access DBRX:

Accessing DBRX

Install transformers library

pip install "transformers>=4.40.0"

Then, request an access token with read permissions from Hugging Face (This is needed to download the model.)
Once you’ve got access, import and load the model using the following code.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("databricks/dbrx-base", token="hf_YOUR_TOKEN")
model = AutoModelForCausalLM.from_pretrained("databricks/dbrx-base", device_map="auto", torch_dtype=torch.bfloat16, token="hf_YOUR_TOKEN")

# Directing tensors to "cuda" (GPU) for faster computation as GPUs are better at handling parallel tasks.
input_text = "Databricks was founded in "
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

# DBRX accepts a context length of up to 32768 tokens. Here`max_new_tokens` specifies the maximum number of tokens to generate.
outputs = model.generate(**input_ids, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))

Want to learn how to use Transformers and Hugging Face? Read our tutorial on using Transformers and Hugging Face.

Basic tasks with DBRX

DRBX can help you with basic content creation and prompt response tasks like any LLM. Here’s what you can do with DBRX:

Text completion: DBRX can generate text-based responses based on your prompts.
Language understanding: It can perform tasks like parsing natural language inputs, summarizing long documents, or translating text between coding languages or script styles.
Query optimization: DBRX can also optimize database queries to minimize execution time and help data engineers focus on other critical tasks without compromising performance.

Coding tasks with DBRX

DBRX can also perform advanced coding-related tasks to help data practitioners or coders:

Code and explanation: It can generate code for various programming tasks and explain existing functions or suggest algorithms for specific problems to understand and optimize code.

DBRX responding to simple commands - source

Debugging with DBRX: It provides suggested code corrections to improve developer productivity and accelerate your debugging process.
Vulnerability identification: It can also identify issues in code and propose solutions to mitigate them.

How to Fine-Tune DBRX

You can fine-tune DBRX with Github’s open-source LLM foundry. However, fine-tuning requires training examples to be formatted as dictionaries:

formatted_example = {'prompt': <prompt_text>, 'response': <response_text>}

Prompt: This is the starting question or instruction you give the model.

Response: This is the answer the model is trained to generate.

You can use three different data sets to fine-tune any LLM:

A dataset available on the HuggingFace Hub.
A dataset that is locally stored on your device or machine.
A dataset in the StreamingDataset .mds format.

1) Using Hugging Face Hub

If you want to fine-tune using a dataset from the HuggingFace Hub, and the

dataset has a pre-defined preprocessing function or already follows the

"prompt"/"response" format, you can simply point the dataloader to that dataset.

train_loader:
    name: finetuning
    dataset:
        hf_name: tatsu-lab/alpaca
        split: train
        ...

If no preprocessing function is defined, Use preprocessing_fn to specify a custom preprocessing function for the dataloader.

train_loader:
    name: finetuning
    dataset:
        hf_name: mosaicml/doge-facts
        preprocessing_fn: my_data.formatting:dogefacts_prep_fn
        split: train
        ...

2) Using a local dataset

If you already have a finetuning dataset in your device, define local JSONL files in

the YAML config yamls/finetune/1b_local_data_sft.yaml.

train_loader:
    name: finetuning
    dataset:
        hf_name: json # assuming data files are json formatted
        hf_kwargs:
            data_dir: /path/to/data/dir/
        preprocessing_fn: my.import.path:my_preprocessing_fn
        split: train
        ...

Skip preprocessing_fn if your local data is already in "prompt"/"response" format.

3) Using StreamingDataset

Convert your HuggingFace dataset to MDS format using the

convert_finetuning_dataset.py script.

After converting your HuggingFace dataset to a streaming format, simply adjust your YAML configuration like this.

train_loader:
    name: finetuning
    dataset:
        remote: s3://my-bucket/my-copy-doge-facts
        local: /tmp/mds-cache/
        split: train
        ...

If you want to see full parameter finetuning, see the YAML config at

dbrx-full-ft.yaml

Conclusion

Databricks DBRX uses multiple specialized networks to enhance the model speed and cost-effectiveness. This fine-grained approach allows it to outperform other LLMs in handling complex tasks.

Want to learn more about large language models and how to fine-tune them? Check out these resources:

Author

Laiba Siddiqui

Topics

Artificial Intelligence (AI)

Continue Your AI Journey Today!

Course

Generative AI Concepts

2 hr

24.7K

Discover how to begin responsibly leveraging generative AI. Learn how generative AI models are developed and how they will impact society moving forward.

See Details

Start Course

Course

Developing AI Systems with the OpenAI API

3 hr

407

Leverage the OpenAI API to get your AI applications ready for production.

See Details

Start Course

Track

AI Fundamentals

10hrs hr

Discover the fundamentals of AI, dive into models like ChatGPT, and decode generative AI secrets to navigate the dynamic AI landscape.

See Details

Start Course

blog

What is OpenAI's GPT-4o? Launch Date, How it Works, Use Cases & More

Discover OpenAI's GPT-4o and learn about its launch date, unique features, capabilities, cost, and practical use cases.

Richie Cotton

6 min

blog

AI Ethics: An Introduction

AI Ethics is the field that studies how to develop and use artificial intelligence in a way that is fair, accountable, transparent, and respects human values.

Vidhi Chugh

9 min

podcast

The 2nd Wave of Generative AI with Sailesh Ramakrishnan & Madhu Iyer, Managing Partners at Rocketship.vc

Richie, Madhu and Sailesh explore the generative AI revolution, the impact of genAI across industries, investment philosophy and data-driven decision-making, the challenges and opportunities when investing in AI, future trends and predictions, and much more.

Richie Cotton

51 min

podcast

The Venture Mindset with Ilya Strebulaev, Economist Professor at Stanford Graduate School of Business

Richie and Ilya explore the venture mindset, the importance of embracing unknowns, how VCs deal with unpredictability, how our education affects our decision-making ability, venture mindset principles and much more.

Richie Cotton

59 min

tutorial

Phi-3 Tutorial: Hands-On With Microsoft’s Smallest AI Model

A complete guide to exploring Microsoft’s Phi-3 language model, its architecture, features, and application, along with the process of installation, setup, integration, optimization, and fine-tuning the model.

Zoumana Keita

14 min

tutorial

How to Use the Stable Diffusion 3 API

Learn how to use the Stable Diffusion 3 API for image generation with practical steps and insights on new features and enhancements.

Kurtis Pykes

12 min

See More See More

What is Databricks DBRX?

How was DBRX trained?

How Does DBRX Compare to Similar Models?

1) DBRX vs. LLaMA2-70B

2) DBRX Instruct vs. Mixtral Instruct

3) DBRX vs Grok-1

4) DBRX vs Mixtral Base

Comparison table

How to Use DBRX: A Step-By-Step Guide

Accessing DBRX

Basic tasks with DBRX

Coding tasks with DBRX

How to Fine-Tune DBRX

1) Using Hugging Face Hub

2) Using a local dataset

3) Using StreamingDataset

Conclusion

What is OpenAI's GPT-4o? Launch Date, How it Works, Use Cases & More

AI Ethics: An Introduction

The 2nd Wave of Generative AI with Sailesh Ramakrishnan & Madhu Iyer, Managing Partners at Rocketship.vc

The Venture Mindset with Ilya Strebulaev, Economist Professor at Stanford Graduate School of Business

Phi-3 Tutorial: Hands-On With Microsoft’s Smallest AI Model

How to Use the Stable Diffusion 3 API

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Generative AI Concepts

Developing AI Systems with the OpenAI API

AI Fundamentals

What is OpenAI's GPT-4o? Launch Date, How it Works, Use Cases & More

AI Ethics: An Introduction

The 2nd Wave of Generative AI with Sailesh Ramakrishnan & Madhu Iyer, Managing Partners at Rocketship.vc

The Venture Mindset with Ilya Strebulaev, Economist Professor at Stanford Graduate School of Business

Phi-3 Tutorial: Hands-On With Microsoft’s Smallest AI Model

How to Use the Stable Diffusion 3 API

Generative AI Concepts