Ah yeah my mistake I'm always mixing up language and image based AI models. Training text based models is much less feasible locally lol.
There's no model for my art so I'm creating a checkpoint model using xformers to bypass the VRAM requirement and then from there I'll be able to speed up variants of my process using LORA's but that won't be for some time, I want a good model first.