PCMag editors select and review products independently. If you buy through affiliate links, we may earn commissions, which help support our testing.

Google Joins Meta in Creating AI-Powered Text-to-Video Generator

Google's program can generate 5.3-second, 1,280-by-768-resolution videos out of a line of text.

 & Michael Kan Principal Reporter

Our team tests, rates, and reviews more than 1,500 products each year to help you make better buying decisions and get more from technology.

Our Expert
LOOK INSIDE PC LABS HOW WE TEST
65 EXPERTS
43 YEARS
41,500+ REVIEWS

Mark Zuckerberg’s Meta isn't the only company developing an AI-powered program that can generate video out of text inputs. Google has been working on one, too. 

On Wednesday, researchers at the company’s AI lab, Google Brain, debuted Imagen Video, a program that can create realistic-looking video clips from a text input. The system expands Google’s original Imagen program by moving beyond still images to moving pictures, resulting in creative videos that remain largely consistent throughout each frame. 

“We find Imagen Video not only capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding,” Google researchers wrote in a paper. 

Imagen Video can create 5.3-second, 1,280-by-768 resolution videos running at 24 frames per second. Google’s researchers developed the program by training its computer models to identify videos and still images, which were already labeled with a text description. Imagen Video then tries to replicate that imagery in the form of a video when given a text prompt.  

How the clips look frame by frame

“While training on natural video data only enables the model to learn dynamics in natural settings, the model can learn about different image styles (such as sketch, painting, etc.) by training on images,” the paper added. “As a result, this joint training enables the model to generate interesting video dynamics in different styles.”

In total, Imagen Video was trained on an “internal dataset” made up of 14 million videos and 60 million still images, along with another 400 million images in the LAION-400M open dataset. Researchers found the program was smart enough to understand three-dimensional objects and settings, “as it is capable of generating videos of objects rotating while roughly preserving structure.”  

That said, the results can be far from perfect. Google researchers uploaded some of the videos the program has created, and as you can see, it’ll struggle to accurately render complex movements, such as a panda bear eating some bamboo or naval ships moving at sea.

Still, it’s clear Imagen Video could unlock a whole new era of video creation. The program can also produce the video clips in less than a minute. But for now, Google’s researchers are refraining from releasing the technology to the public. The team has already added safeguards to prevent Imagen Video from creating “fake, hateful, explicit or harmful content.” But the researchers are still worried about the technology promoting stereotypes, given that it was trained on limited data set of videos and images. 

“While our internal testing suggest much of explicit and violent content can be filtered out, there still exists social biases and stereotypes which are challenging to detect and filter. We have decided not to release the Imagen Video model or its source code until these concerns are mitigated,” the researchers wrote. 

Meta, on the other hand, plans on eventually releasing its own text-to-video generator to the public once more testing is done. However, all videos created with the program will contain a watermark.

About Our Expert

Michael Kan

Michael Kan

Principal Reporter

My Experience

I've been a journalist for over 15 years. I got my start as a schools and cities reporter in Kansas City and joined PCMag in 2017, where I cover satellite internet services, cybersecurity, PC hardware, and more. I'm currently based in San Francisco, but previously spent over five years in China, covering the country's technology sector.

Since 2020, I've covered the launch and explosive growth of SpaceX's Starlink satellite internet service, writing 600+ stories on availability and feature launches, but also the regulatory battles over the expansion of satellite constellations, fights with rival providers like AST SpaceMobile and Amazon, and the effort to expand into satellite-based mobile service. I've combed through FCC filings for the latest news and driven to remote corners of California to test Starlink's cellular service.

I also cover cyber threats, from ransomware gangs to the emergence of AI-based malware. In 2024 and 2025, the FTC forced Avast to pay consumers $16.5 million for secretly harvesting and selling their personal information to third-party clients, as revealed in my joint investigation with Motherboard.

I also cover the PC graphics card market. Pandemic-era shortages led me to camp out in front of a Best Buy to get an RTX 3000. I'm now following how the AI-driven memory shortage is impacting the entire consumer electronics market. I'm always eager to learn more, so please jump in the comments with feedback and send me tips.

The Best Tech I've Had:

  • My first video game console: a Nintendo Famicom
  • I loved my Sega Saturn despite PlayStation's popularity.
  • The iPod Video I received as a gift in college
  • Xbox 360 FTW
  • The Galaxy Nexus was the first smartphone I was proud to own.
  • The PC desktop I built in 2013, which still works to this day.

Read full bio