Gratitude to https://tensorwave.com/ for giving me access to their excellent servers!

Few have tried this and fewer have succeeded. I've been marginally successful after a significant amount of effort, so it deserves a blog post.

Know that you are in for rough waters. And even when you arrive - There are lots of optimizations tailored for nVidia GPUs so, even though the hardware may be just as strong spec-wise, in my experience so far, it still may take 2-3 times as long to train on equivalient AMD hardware. (though if you are a super hacker maybe you can fix it!)

Right now I'm using Axolotl. Though I am probably going to give LlamaFactory a solid try in the near future. There's also LitGpt and TRL. But I kind of rely on the dataset features and especially the sample packing of Axolotl. But more and more LlamaFactory is interesting me, it supports new features really fast. (like GaLore is the new hotness at the moment). This blog post will be about getting Axolotl up and running in AMD, and I may do one about LlamaFactory if there is demand.

I am using Ubuntu 22.04 LTS, and you should too. (unless this blog post is really old by the time you read it). Otherwise you can use this post as a general guide.

Here are all the environment variables I ended up setting in my .bashrc and I'm not exactly sure which ones are needed. You better set them all just in case.

export GPU_ARCHS="gfx90a" # mi210 - use the right code for your GPU
export ROCM_TARGET="gfx90a"
export HIP_PATH="/opt/rocm-6.0.0"
export ROCM_PATH="/opt/rocm-6.0.0"
export ROCM_HOME="/opt/rocm-6.0.0"
export HIP_PLATFORM=amd
export DS_BUILD_CPU_ADAM=1 
export TORCH_HIP_ARCH_LIST="gfx90a"

Part 1: Driver, ROCm, HIP

Clean everything out.

There shouldn't be any trace of nvidia, cuda, amd, hip, rocm, anything like that. This is not necessarily a simple task, and of course it totally depends on the current state of your system. and I had to use like 4 of my daily Claude Opus questions to accomplish this. (sad face) By the way Anthropic Claude Opus is the new king of interactive troubleshooting. By far. Bravo. Don't nerf it pretty please!

Here are some things I had to do, that might help you:

sudo apt autoremove rocm-core
sudo apt remove amdgpu-dkms
sudo dpkg --remove --force-all amdgpu-dkms
sudo apt purge amdgpu-dkms
sudo apt remove --purge nvidia*
sudo apt remove --purge cuda*
sudo apt remove --purge rocm-* hip-*
sudo apt remove --purge amdgpu-* xserver-xorg-video-amdgpu
sudo apt clean
sudo reboot
sudo dpkg --remove amdgpu-install
sudo apt remove --purge amdgpu-* xserver-xorg-video-amdgpu
sudo apt autoremove
sudo apt clean
rm ~/amdgpu-install_*.deb
sudo reboot
sudo rm /etc/apt/sources.list.d/amdgpu.list
sudo rm /etc/apt/sources.list.d/rocm.list
sudo rm /etc/apt/sources.list.d/cuda.list
sudo apt-key del A4B469963BF863CC
sudo apt update
sudo apt remove --purge nvidia-* cuda-* rocm-* hip-* amdgpu-*
sudo apt autoremove
sudo apt clean
sudo rm -rf /etc/OpenCL /etc/OpenCL.conf /etc/amd /etc/rocm.d /usr/lib/x86_64-linux-gnu/amdgpu /usr/lib/x86_64-linux-gnu/rocm /opt/rocm-* /opt/amdgpu-pro-* /usr/lib/x86_64-linux-gnu/amdvlk
sudo reboot
I love Linux (smile with tear)
Now finally do like sudo apt-get updatesudo apt-get upgrade and sudo apt-get dist-upgrade and make sure there's no errors or warnings! You should be good to begin your journey.

Install AMD drivers, ROCm, HIP

wgethttps://repo.radeon.com/amdgpu-install/23.40.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb
(at time of this writing). But you should double check here. And the install instructions here.
sudo apt-get install ./amdgpu-install_6.0.60002-1_all.deb
sudo apt-get update
sudo amdgpu-install -y --accept-eula --opencl=rocr --vulkan=amdvlk --usecase=workstation,rocm,rocmdev,rocmdevtools,lrt,opencl,openclsdk,hip,hiplibsdk,mllib,mlsdk
If you get error messages (I did) try to fix them. I had to do this:
- sudo dpkg --remove --force-all libvdpau1
- sudo apt clean
- sudo apt update
- sudo apt --fix-broken install
- sudo apt upgrade
- and then, again, I had to run sudo amdgpu-install -y --accept-eula --opencl=rocr --vulkan=amdvlk --usecase=workstation,rocm,rocmdev,rocmdevtools,lrt,opencl,openclsdk,hip,hiplibsdk,mllib,mlsdk

Check Installation

rocm-smi
rocminfo
/opt/rocm/bin/hipconfig --full

I hope that worked for you - if not, I suggest asking Claude Opus about the error messages to help you figure it out. If that doesn't work, reach out to the community.

Part 2: Pytorch, BitsAndBytes, Flash Attention, DeepSpeed, Axolotl

Conda

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash

Exit your shell and enter it again.

conda create -n axolotl python=3.12
conda activate axolotl

Pytorch

I tried the official install command from pytorch's website, and it didn't work for me.

Here is what did work:

pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/rocm6.0
python -c "import torch; print(torch.version.hip)"

This tests both Torch, and Torch's ability to interface with HIP. If it worked, it will print HIP version. Otherwise, it will print None.

BitsAndBytes

BitsAndBytes is by Tim Dettmers, an absolute hero among men. It lets us finetune in 4-bits. It gives us qLoRA. It brings AI to the masses.

There is a fork of BitsAndBytes that supports ROCm. This is provided not by Tim Dettmers, and not by AMD, but by a vigilante superhero, Arlo-Phoenix.

In appreciation, here is a portrait ChatGPT made for Arlo-Phoenix, vigilante superhero. I hope you like it, if you see this Arlo-Phoenix. <3

git clone https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6
cd bitsandbytes-rocm-5.6
git checkout rocm
ROCM_TARGET=gfx90a make hip # use the ROCM_TARGET for your GPU
pip install .

Flash Attention

This fork is maintained by AMD

git clone --recursive https://github.com/ROCmSoftwarePlatform/flash-attention.git
cd flash-attention
export GPU_ARCHS="gfx90a" # use the GPU_ARCHS for your GPU
pip install packaging
pip install ninja
pip install .

DeepSpeed

Microsoft included AMD support in DeepSpeed proper, but there's still some undocumented fussiness to get it working, and there is a bug I found with DeepSpeed, I had to modify it to get it to work.

git clone https://github.com/microsoft/DeepSpeed
cd DeepSpeed
git checkout v0.14.0 # but check the tags for newer version

Now, you gotta modify this file: vim op_builder/builder.py

Replace the function assert_no_cuda_mismatch with this: (unless they fixed it yet)

def assert_no_cuda_mismatch(name=""):
    cuda_available = torch.cuda.is_available()
    if not cuda_available and not torch.version.hip:
        # Print a warning message indicating no CUDA or ROCm support
        print(f"Warning: {name} requires CUDA or ROCm support, but neither is available.")
        return False
    else:
        # Check CUDA version if available
        if cuda_available:
            cuda_major, cuda_minor = installed_cuda_version(name)
            sys_cuda_version = f'{cuda_major}.{cuda_minor}'
            torch_cuda_version = torch.version.cuda
            if torch_cuda_version is not None:
                torch_cuda_version = ".".join(torch_cuda_version.split('.')[:2])
                if sys_cuda_version != torch_cuda_version:
                    if (cuda_major in cuda_minor_mismatch_ok and
                            sys_cuda_version in cuda_minor_mismatch_ok[cuda_major] and
                            torch_cuda_version in cuda_minor_mismatch_ok[cuda_major]):
                        print(f"Installed CUDA version {sys_cuda_version} does not match the "
                              f"version torch was compiled with {torch.version.cuda} "
                              "but since the APIs are compatible, accepting this combination")
                        return True
                    elif os.getenv("DS_SKIP_CUDA_CHECK", "0") == "1":
                        print(
                            f"{WARNING} DeepSpeed Op Builder: Installed CUDA version {sys_cuda_version} does not match the "
                            f"version torch was compiled with {torch.version.cuda}."
                            "Detected `DS_SKIP_CUDA_CHECK=1`: Allowing this combination of CUDA, but it may result in unexpected behavior."
                        )
                        return True
                    raise CUDAMismatchException(
                        f">- DeepSpeed Op Builder: Installed CUDA version {sys_cuda_version} does not match the "
                        f"version torch was compiled with {torch.version.cuda}, unable to compile "
                        "cuda/cpp extensions without a matching cuda version.")
            else:
                print(f"Warning: {name} requires CUDA support, but torch.version.cuda is None.")
                return False

    return True

pip install -r requirements/requirements.txt
HIP_PLATFORM="amd" DS_BUILD_CPU_ADAM=1 TORCH_HIP_ARCH_LIST="gfx90a" python setup.py install

Axolotl

Installing Axolotl might overwrite BitsAndBytes, DeepSpeed, and PyTorch. Be prepared for things to break, they do often.

Your choice is either modify the setup.py and requirements.txt (if you are confident to change those things) or pay attention to what libraries get deleted and reinstalled, and just delete them again and reinstall the correct ROCm version that you installed earlier. If Axolotl complains about incorrect versions - just ignore it, you know better than Axolotl.

Right now, Axolotl's Flash Attention implementation has a hard dependency on Xformers for its SwiGLU implementation, and Xformers doesn't work with ROCm, you can't even install it. So, we are gonna have to hack axolotl to remove that dependency.

https://github.com/OpenAccess-AI-Collective/axolotl.git
cd axolotl

from requirements.txt remove xformers==0.0.22

from setup.py make this change (remove any mention of xformers)

$ git diff setup.py
diff --git a/setup.py b/setup.py
index 40dd0a6..235f1d0 100644
--- a/setup.py
+++ b/setup.py
@@ -30,7 +30,7 @@ def parse_requirements():

     try:
         if "Darwin" in platform.system():
-            _install_requires.pop(_install_requires.index("xformers==0.0.22"))
+            print("hi")
         else:
             torch_version = version("torch")
             _install_requires.append(f"torch=={torch_version}")
@@ -45,9 +45,6 @@ def parse_requirements():
             else:
                 raise ValueError("Invalid version format")

-            if (major, minor) >= (2, 1):
-                _install_requires.pop(_install_requires.index("xformers==0.0.22"))
-                _install_requires.append("xformers>=0.0.23")
     except PackageNotFoundError:
         pass

And then in src/axolotl/monkeypatch/llama_attn_hijack_flash.py make this change:

--- a/src/axolotl/monkeypatch/llama_attn_hijack_flash.py
+++ b/src/axolotl/monkeypatch/llama_attn_hijack_flash.py
@@ -22,7 +22,9 @@ from transformers.models.llama.modeling_llama import (
     apply_rotary_pos_emb,
     repeat_kv,
 )
-from xformers.ops import SwiGLU
+class SwiGLU:
+    def __init__():
+        print("hi")

 from axolotl.monkeypatch.utils import get_cu_seqlens_from_pos_ids, set_module_name

@@ -45,15 +47,7 @@ LOG = logging.getLogger("axolotl")


 def is_xformers_swiglu_available() -> bool:
-    from xformers.ops.common import get_xformers_operator
-
-    try:
-        get_xformers_operator("swiglu_packedw")()
-        return True
-    except RuntimeError as exc:
-        if "No such operator xformers::swiglu_packedw " in str(exc):
-            return False
-        return True
+    return False

Now you can install axolotl

pip install -e .
accelerate launch -m axolotl.cli.train examples/openllama-3b/lora.yml

Welcome to finetuning on ROCm!

Cognitive Computations