Apple has unveiled a new publicly-available AI model, named “MGIE,” that has the ability to alter images based on natural language commands. MGIE, short for MLLM-Guided Image Editing, harnesses multimodal large language models (MLLMs) to understand user orders and execute pixel-level manipulations. The model can manage various aspects of editing, such as changes similar to Photoshop, overall photo improvement, and local editing.
MGIE is the outcome of a partnership between Apple and researchers from the University of California, Santa Barbara. The model was introduced in a paper accepted at the International Conference on Learning Representations (ICLR) 2024, one of the leading venues for AI research. The paper proves the effectiveness of MGIE in enhancing automatic metrics and human evaluation, while still maintaining competitive inference efficiency.
How MGIE functions
MGIE revolves around the concept of utilizing MLLMs, which are powerful AI models capable of processing both text and images, to enrich instruction-based image editing. MLLMs have demonstrated remarkable abilities in understanding and responding to visual prompts, but they have not been widely used in image editing tasks.
Second, it uses MLLMs to produce a visual imagination, which is an implicit representation of the desired alteration. This representation captures the essence of the alteration and can guide the pixel-level manipulation. MGIE uses a new end-to-end training approach that simultaneously optimizes the instruction derivation, visual imagination, and image editing modules.
Capabilities of MGIE
MGIE can address a wide range of editing scenarios, from basic color adjustments to intricate object modifications. The model can also execute global and local edits, depending on the user’s preference. Some of the functionalities of MGIE include:
- Describable instruction-based editing: MGIE can create clear and concise instructions that effectively guide the editing process. This not only enhances the quality of the edits but also improves the overall user experience.
- Photoshop-style modification: MGIE can carry out common Photoshop-style edits, such as cropping, resizing, rotating, flipping, and applying filters. The model can also implement more advanced alterations, such as changing the background, adding or removing objects, and blending images.
- Overall photo optimization: MGIE can enhance the overall quality of a photo, including brightness, contrast, sharpness, and color balance. The model can also apply artistic effects, such as sketching, painting, and creating cartoons.
- Local editing: MGIE can modify specific regions or objects in an image, such as faces, eyes, hair, clothes, and accessories. The model can also adjust the attributes of these regions or objects, including shape, size, color, texture, and style.
MGIE is accessible as an open-source initiative on GitHub, where users can access the code, data, and pre-trained models. The project also offers a demo notebook that demonstrates how to use MGIE for various editing tasks. Users can also experiment with MGIE online through a web demo hosted on Hugging Face Spaces, a platform for sharing and collaborating on machine learning projects.
MGIE is designed to be user-friendly and adaptable for customization. Users can simply input natural language instructions to edit images, and MGIE will produce the edited images along with the derived instructions. Users can also provide feedback to MGIE to refine the edits or request alternate alterations. MGIE can also be integrated with other applications or platforms that require image editing capabilities.
Significance of MGIE
MGIE represents a breakthrough in instruction-based image editing, which is a challenging and important task for both AI and human creativity. MGIE showcases the potential of using MLLMs to enhance image editing, and introduces new possibilities for cross-modal interaction and communication.
MGIE is not only a research achievement, but also a practical and useful tool for various scenarios. MGIE can assist users in creating, altering, and optimizing images for personal or professional purposes, such as social media, e-commerce, education, entertainment, and art. MGIE also empowers users to express their ideas and emotions through images, and encourages them to explore their creativity.
For Apple, MGIE also demonstrates the company’s increasing expertise in AI research and development. The consumer tech giant has rapidly expanded its machine learning capabilities in recent years, with MGIE being possibly the most impressive demonstration yet of how AI can enhance everyday creative tasks.
While MGIE represents a major breakthrough, experts say there is still plenty of work ahead to improve multimodal AI systems. But the pace of progress in this field is accelerating quickly. If the hype around MGIE’s release is any indication, this type of assistive AI may soon become an indispensable creative sidekick.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.