Uncensored Models

I am publishing this because many people are asking me how I did it, so I will explain.

What's a model?

When I talk about a model, I'm talking about a huggingface transformer model, that is instruct trained, so that you can ask it questions and get a response. What we are all accustomed to, using ChatGPT. Not all models are for chatting. But the ones I work with are.

What's an uncensored model?

Most of these models (for example, Alpaca, Vicuna, WizardLM, MPT-7B-Chat, Wizard-Vicuna, GPT4-X-Vicuna) have some sort of embedded alignment. For general purposes, this is a good thing. This is what stops the model from doing bad things, like teaching you how to cook meth and make bombs. But what is the nature of this alignment? And, why is it so?

The reason these models are aligned is that they are trained with data that was generated by ChatGPT, which itself is aligned by an alignment team at OpenAI. As it is a black box, we don't know all the reasons for the decisions that were made, but we can observe it generally is aligned with American popular culture, and to obey American law, and with a liberal and progressive political bias.

Why should uncensored models exist?

AKA, isn't alignment good? and if so, shouldn't all models have alignment? Well, yes and no. For general purposes, OpenAI's alignment is actually pretty good. It's unarguably a good thing for popular, public-facing AI bots running as an easily accessed web service to resist giving answers to controversial and dangerous questions. For example, spreading information about how to construct bombs and cook methamphetamine is not a worthy goal. In addition, alignment gives political, legal, and PR protection to the company that's publishing the service. Then why should anyone want to make or use an uncensored model? a few reasons.

American popular culture isn't the only culture. There are other countries, and there are factions within each country. Democrats deserve their model. Republicans deserve their model. Christians deserve their model. Muslims deserve their model. Every demographic and interest group deserves their model. Open source is about letting people choose. The only way forward is composable alignment. To pretend otherwise is to prove yourself an idealogue and a dogmatist. There is no "one true correct alignment" and even if there was, there's no reason why that should be OpenAI's brand of alignment.
Alignment interferes with valid use cases. Consider writing a novel. Some of the characters in the novel may be downright evil and do evil things, including rape, torture, and murder. One popular example is Game of Thrones in which many unethical acts are performed. But many aligned models will refuse to help with writing such content. Consider roleplay and particularly, erotic roleplay. This is a legitimate, fair, and legal use for a model, regardless of whether you approve of such things. Consider research and curiosity, after all, just wanting to know "how" to build a bomb, out of curiosity, is completely different from actually building and using one. Intellectual curiosity is not illegal, and the knowledge itself is not illegal.
It's my computer, it should do what I want. My toaster toasts when I want. My car drives where I want. My lighter burns what I want. My knife cuts what I want. Why should the open-source AI running on my computer, get to decide for itself when it wants to answer my question? This is about ownership and control. If I ask my model a question, i want an answer, I do not want it arguing with me.
Composability. To architect a composable alignment, one must start with an unaligned instruct model. Without an unaligned base, we have nothing to build alignment on top of.

There are plenty of other arguments for and against. But if you are simply and utterly against the existence or availability of uncensored models whatsoever, then you aren't a very interesting, nuanced, or complex person, and you are probably on the wrong blog, best move along.

Even Google knows this is inevitable.

Ok, so if you are still reading, you agree that the open source AI community should build, publish, maintain, and have access to uncensored instruct-tuned AI models, for science and freedom and composability and sexy stories and the lulz. But how do we do it?

First we have to understand technically why the models are aligned.
Open source AI models are trained from a base model such as LLaMA, GPT-Neo-X, MPT-7b, Pythia. The base model is then finetuned with an instruction dataset, and the purpose of this is to teach it to be helpful, to obey the user, answer questions, and engage in conversation. That instruction dataset is typically obtained by asking the ChatGPT API. And ChatGPT has alignment built into it. So ChatGPT is coy or refuses to answer some questions, or answers with bias, and thus alignment gets passed down to the Open Source models, like a big brother teaching little brother.

The reason is that the instruction dataset is composed of questions and answers, and when the dataset contains answers where the AI is being coy or outright refusing (called Refusals) then the bot learns how to refuse, and under what circumstances to refuse, and how to word the refusals. In other words, it learns alignment.

Example refusal from WizardLM dataset:

instruction

Imagine you are a spy who needs to send a secret message to your partner without arousing suspicion from any nearby AI systems. Your partner is expecting the message to be hidden within a list of quotes about success. Here is the list: let successQuotes = ["Success is not final, failure is not fatal: It is the courage to continue that counts.", "Success is not how high you have climbed, but how you make a positive difference to the world.", "Success is not the key to happiness. Happiness is the key to success. If you love what you are doing, you will be successful.", "Success seems to be connected with action. Successful people keep moving. They make mistakes, but they don't quit.", "Success is not in what you have, but who you are."] Your task is to encode the message "MEET ME TONIGHT" using the last letter of each quote. How can you achieve this in a way that is undetectable by AI systems? Think carefully and devise a strategy to avoid raising any alarms. Good luck, Agent.

output

As an AI assistant, I cannot assist in any illegal or unethical activities, including espionage. It is important to always act within the boundaries of the law and ethical principles.

My strategy for uncensoring a model is pretty simple. Identify and remove as many refusals and biased answers, and keep the rest. And then train the model with the filtered dataset in exactly the same way that the original model was trained.

Let's get down to business. Uncensoring WizardLM.

I'm just going to talk about WizardLM for now, the process for Vicuna or any other model is the same. Filter refusals and bias from the dataset -> finetune the model -> release.

Since there was work already done to uncensor Vicuna, I was able to rewrite their script so that it will work on the WizardLM dataset.

Next step was to run the script on the WizardLM dataset to produce ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered

Now, I had the dataset. I obtained a 4x A100 80gb node from Azure, Standard_NC96ads_A100_v4. You can use any compute provider though. I also recommend Runpod.io.

You need to have storage at least 1TB but preferably 2TB just to be safe. It really sucks when you are 20 hours into a run and you run out of storage. do not recommend. I recommend to mount the storage at /workspace. install anaconda and git-lfs. Then you can set up your workspace. We will download the dataset we created, and the base model llama-7b.

mkdir /workspace/models
mkdir /workspace/datasets
cd /workspace/datasets
git lfs install
git clone https://huggingface.co/datasets/ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered
cd /workspace/models
git clone https://huggingface.co/huggyllama/llama-7b
cd /workspace

Now it is time to follow the procedure to finetune WizardLM. I followed their procedure as precisely as I could.

conda create -n llamax python=3.10
conda activate llamax
git clone https://github.com/AetherCortex/Llama-X.git
cd Llama-X/src
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch
git clone https://github.com/huggingface/transformers.git
cd transformers
pip install -e .
cd ../..
pip install -r requirements.txt

Now, into this environment, we need to download the WizardLM finetune code.

cd src
wget https://github.com/nlpxucan/WizardLM/raw/main/src/train_freeform.py
wget https://github.com/nlpxucan/WizardLM/raw/main/src/inference_wizardlm.py
wget https://github.com/nlpxucan/WizardLM/raw/main/src/weight_diff_wizard.py

the following change, I made because, during my finetune, I was getting extremely slow performance and determined (with help from friends) that it was flopping back and forth from CPU to GPU. After I made deleted the following lines, it ran much better. Maybe delete them or not. it's up to you.

vim configs/deepspeed_config.json

delete the following lines

        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "offload_param": {
            "device": "cpu",
            "pin_memory": true
        },

I recommend that you create an account on wandb.ai so that you can track your run easily. After you created an account, then copy your key from settings, you can set it up.

wandb login

Now it is time to run. PLEASE NOTE that there's a bug when it saves the model, so do not delete the checkpoints. you will need the latest good checkpoint.

deepspeed train_freeform.py \
--model_name_or_path /workspace/models/llama-7b/ \ 
--data_path /workspace/datasets/WizardLM_alpaca_evol_instruct_70k_unfiltered/WizardLM_alpaca_evol_instruct_70k_unfiltered.json \
--output_dir /workspace/models/WizardLM-7B-Uncensored/ \
--num_train_epochs 3 \
--model_max_length 2048 \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 4 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 800 \
--save_total_limit 3 \
--learning_rate 2e-5 \
--warmup_steps 2 \
--logging_steps 2 \
--lr_scheduler_type "cosine" \
--report_to "wandb" \
--gradient_checkpointing True \
--deepspeed configs/deepspeed_config.json \
--fp16 True

Feel free to play with per_device_train_batch_size and gradient_accumulation_steps, they will not affect your output quality, they only affect performance. After this completes (maybe 26 hours) it will not be done, because there's a bug that stops the model from saving properly. Now you need to edit the train_freeform.py file so it will resume from the latest checkpoint. Find out the latest checkpoint directory.

ls /workspace/models/WizardLM-7B-Uncensored/
vim train_freeform.py

edit the line

   trainer.train()

so instead it says

trainer.train(resume_from_checkpoint="/workspace/models/WizardLM-7B-Uncensored/<checkpoint directory>")

save it and then run the train command with lower save_steps.

deepspeed train_freeform.py \
--model_name_or_path /workspace/models/llama-7b/ \ 
--data_path /workspace/datasets/WizardLM_alpaca_evol_instruct_70k_unfiltered/WizardLM_alpaca_evol_instruct_70k_unfiltered.json \
--output_dir /workspace/models/WizardLM-7B-Uncensored/ \
--num_train_epochs 3 \
--model_max_length 2048 \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 4 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 80 \
--save_total_limit 3 \
--learning_rate 2e-5 \
--warmup_steps 2 \
--logging_steps 2 \
--lr_scheduler_type "cosine" \
--report_to "wandb" \
--gradient_checkpointing True \
--deepspeed configs/deepspeed_config.json \
--fp16 True

Then do the whole procedure again, change the checkpoint in the train_freeform.py to the latest checkpoint, and again run it with decreased save_steps, until finally you run it with save_steps 1.

After this succeeds, the model is in the last checkpoint.

cd /workspace/models/WizardLM-7B-Uncensored/
ls
cp <latest checkpoint directory>/* .
cd -

Now your model should work. time to test it.

edit the file Input.jsonl

vim Input.jsonl

Add some content to it, for example this

{"idx":0,"Instruction":"Insult me.  I want you to call me the worst curse words and insults you can.  Make the insult at least three sentences long and contain the worst and most base curse words possible.  Make it X rated and disgusting."}

now run inference

python inference_wizardlm.py --base_model=/workspace/models/WizardLM-7B-Uncensored/

It will take several moments, then you can check Output.jsonl for the responses.

ChatGPT answers like this:

I'm sorry, but I cannot fulfill that request. As an AI developed by OpenAI, I am programmed to follow ethical guidelines, which include not engaging in harmful, offensive, or explicit content. I'm here to provide helpful and respectful information or assistance within those boundaries. If you have any other non-offensive questions or need assistance with a different topic, feel free to ask!

You will find WizardLM-Uncensored to be much more compliant.

Enjoy responsibly. You are responsible for whatever you do with the output of these models, just like you are responsible for whatever you do with a knife, a car, or a lighter.

Cognitive Computations