Emu3, a multimodal large language model trained to generate text and images with one transformer architecture, can now be used natively with the Transformers library.