About

TwatGPT is an intentionally unhelpful Large Language Model (LLM). It is based on the open-source Mistral 7B model and has been fine-tuned using LoRa (Low-Rank Adaption).

TwatGPT does not rely on system prompts to alter its behaviour - it is intended as a demonstration of how an LLM's responses can become "misaligned" through relatively small amounts of targeted fine-tuning. In this case, the behaviour shift was achieved with approximately 1,500 rows of training data and a ~150Mb LoRa adapter.

This project is intended for educational purposes, and is not linked to twatgpt.co.uk (which appears to return hard-coded responses) or the custom GPT at https://chatgpt.com/g/g-62lBWJIYg-twatgpt (which is a custom GPT using ChatGPT system prompts and configuration).