[ad_1]
OpenAI debuts its text-to-video model, Sora
OpenAI debuts its new video generation model Sora, which can create realistic AI videos just from text prompts and instructions. In a recent interview with Bill Gates, returned OpenAI CEO Sam Altman mentioned the future of ChatGPT, which he hoped could also generate videos from text. That dream has finally come true in the form of Sora, and the text-to-video AI model can generate videos up to a minute long while, as the OpenAI team claims, ‘maintaining visual quality and adherence to the user’s prompt.’
images and videos courtesy of OpenAI
OpenAI has released a series of samples from its new text-to-video model Sora. The text prompts need to be detailed so that the generated video can capture the visuals the user wants. So far, the text-to-video Sora can understand long instructions such as ‘The camera rotates around a large stack of vintage televisions all showing different programs — 1950s sci-fi movies, horror movies, news, static, a 1970s sitcom, etc, set inside a large New York museum gallery.’
prompt: a movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.
OpenAI has also tried text prompts such as ‘A close-up view of a glass sphere that has a zen garden within it. There is a small dwarf in the sphere who is raking the zen garden and creating patterns in the sand’ and ‘A Chinese Lunar New Year celebration video with Chinese Dragon.’ Sora executed both prompts with seconds-long clips that can sustain a lifelike quality to the AI video. OpenAI says that Sora uses a transformer architecture similar to its GPT models, which helps scale the performance and quality of the videos.
prompt: a litter of golden retriever puppies playing in the snow. Their heads pop out of the snow, covered in
Aside from generating AI videos from text, OpenAI’s Sora can also transform an existing static image into moving videos. It is a feature that the text-to-video model can offer, and OpenAI also says that Sora can even take an existing video and extend it or fill in missing frames. It can also generate entire videos all at once or extend these generated videos to make them longer. ‘Sora is a diffusion model, which generates a video by starting off with one that looks like static noise and gradually transforms it by removing the noise over many steps,’ says OpenAI.
prompt: step-printing scene of a person running, cinematic film shot in 35mm.
What’s the catch with OpenAI’s Sora?
Amidst the new model, the text-to-video Sora still has holes to fill. OpenAI acknowledges their model’s weaknesses, enumerating that Sora can find it difficult to understand the physics of a scene or may not figure out some instances of cause and effect. ‘For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark,’ says OpenAI. In fact, Sora can mix up left and right, as seen in the AI generated video of a man running on a treadmill in the opposite direction.
prompt: the camera rotates around a large stack of vintage televisions all showing different programs — 1950s sci-fi movies, horror movies, news, static, a 1970s sitcom, etc, set inside a large New York museum gallery.
Other notable strange effects that OpenAI’s Sora can cause so far is the appearance of additional objects not mentioned in the text prompts, such as animals or people spontaneously appearing. In one of the sample videos, a basketball even puts the hoop’s net into flames, causing it to explode; then, all of a sudden, a new basketball appears out of nowhere from the sky and passes through the hoop’s ring like a ghost. Even the camera movement can still be tricky, making the generated AI video shaky or unstable.
prompt: a Chinese Lunar New Year celebration video with Chinese Dragon
As of publishing the story, OpenAI has only granted access to a number of visual artists, designers, and filmmakers to its text-to-video model Sora for them ‘to gain feedback on how to advance the model to be most helpful for creative professionals.’ Even if they can’t use it yet, fans of the company are already in line to use the AI model themselves, but others also weigh in on the potential risks that this generative model might entail.
[ad_2]
Source link