The AI that draws like a human: OpenAI introduced the new graphic model of ChatGPT, Images 2.0, marking a technological leap that almost completely blurs the line between human creation and computer-generated content. Until two years ago, AI-generated images could be identified by embarrassing mistakes, mainly in distorted text or small details, but now this is a system capable of producing outputs that look ready for immediate use, from advertisements to restaurant menus, without raising suspicion. However, in Hebrew the system still struggles and produces awkward errors.

One of the main challenges of image generators in the past was the ability to integrate text. Systems like DALL-E tended to produce meaningless or incorrect words, a result of a technological method called diffusion, in which an image is gradually built from noise. In this method, small areas such as letters received less attention and therefore suffered in quality. The new model presents a dramatic improvement in this area, to the point of generating clear and accurate text within images, even in complex languages. But as mentioned, not yet in Hebrew.

OpenAI has not fully disclosed the exact mechanism behind the model, but hinted at a combination of reasoning capabilities similar to language models. This means the system not only ‘draws,’ but also plans the image in advance, understands the context, and sometimes even checks itself before presenting the result.

One of the key innovations is an operating mode called Thinking, in which the model works more slowly but in a more accurate and in-depth manner. In this mode, it can create a consistent series of images from the same prompt, maintain characters, style, and objects across different frames, and produce outputs such as multi-image comics or a complete storyboard.

This capability changes the way professionals can work. Instead of relying on multiple tools for design, writing, and editing, an entire campaign can be generated from a single prompt. The model can create different versions of the same content, adapt sizes for various platforms, and produce outputs ready for immediate use on social networks, websites, or applications.

Example of the new graphics output. More working time, greater precision in the text.
Example of the new graphics output. More working time, greater precision in the text. (credit: OpenAI)

At the same time, the model also shows significant improvement in understanding non-Latin languages. In the past, writing in Japanese, Korean, or Hindi within an image was almost impossible. Now, the system can integrate text in these languages much more accurately, expanding its usability for global markets.

The image quality itself has increased to a resolution of up to 2K, with the ability to handle complex compositions, small details, and subtle stylistic constraints. This is an improvement not only in accuracy but also in control: The user can guide the model in detail and receive a result that adheres much more closely to the instructions than before.

However, this is not a perfect system. Even in the new version there are limitations, especially in tasks that require precise physical understanding of the world, such as folding origami or complex representations of three-dimensional objects. Repeated edits of the same image can also sometimes lead to a decline in quality, a phenomenon known from previous models as well.

Another aspect is speed. Creating complex images is not as immediate as writing text, and sometimes several minutes are required to obtain a complete result. But compared to the capabilities achieved, this is still relatively fast.

The launch of Images 2.0 comes amid increasing competition in the field, as other technology companies invest enormous resources in developing similar models. At the same time, OpenAI is gradually phasing out older models and focusing development on the new generation, illustrating just how rapidly the field is evolving.