Why is it so hard for AI to generate hands?
Photo by vanenunes Photo On Envato Elements
AI image generation struggles with hands, often producing deformities like six fingers, because hands are exceptionally complex and detailed anatomical structures that the AI cannot accurately grasp.
The complexity of hands
Hands are one of the most challenging parts of the human body for AI to replicate for several key reasons:
- High level of detail: Each hand has multiple joints, folds, and many small bones (phalanges, metacarpals, carpals) and tendons that move in a coordinated way. The AI must learn to replicate the geometric and spatial relationships between these elements accurately, a task that current models haven’t mastered.
- Positions and perspective: Unlike a face, which is usually seen from the front, hands take on countless positions and angles. The AI has to understand how hands look when they are open, closed, in profile, from above, or below, which multiplies the complexity of the training data.
- Interactions with the environment: Hands constantly interact with objects and other body parts. The AI must not only generate the hand itself but also how it deforms or overlaps with an object, such as when holding a cup or a pen. This demands a level of contextual understanding that AI models still lack.
How AI learns to generate images
Generative AI, like DALL-E 2 or Midjourney, doesn’t “understand” anatomy like a human artist. Instead, their functioning is based on a process known as diffusion. These models begin with a noisy, random image and, through multiple steps, remove the noise to transform the initial image into something that looks like the text description they were given.
To achieve this, the AI is trained on enormous datasets that contain billions of images and their text descriptions. The model learns to associate keywords, such as “hand” or “fingers,” with the visual patterns it sees in these images.
However, because hands appear in a relatively small number of the training images and in a wide variety of positions, the AI doesn’t see enough examples of correctly formed hands from all angles. This causes the model to be unable to build an accurate internal representation of the hand’s structure. Instead of understanding the hand, it only learns to guess what its parts look like based on the data it has, leading to deformities.
Will AI get better in the future?
Hand generation has already improved significantly in the most recent models. Developers are using new training techniques, such as attention or refinement of images, to improve accuracy in complex areas. As AI models continue to be trained with larger and more diverse datasets, and with a specific focus on complex structures like hands, generating images of hands without deformities will likely become commonplace in the future.
Don’t let your commercial portfolio go unnoticed. Contact us today at graphicsxpress.net and discover how we can boost your business with effective and affordable advertising strategies. Your success is our priority!