TechCrunch AI December 3, 2024 NEUTRAL

AWS Nova brings text, image and video AI into Bedrock

AWS introduced Nova, a new family of multimodal generative AI models, at its re:Invent conference. The lineup includes four text-generating models, an image model called Nova Canvas, and a video model called Nova Reel.

WTF Index NEUTRAL

◄ Terminator 1 Idiocracy 1 ►

This is mostly a routine product launch for multimodal AI models, with no clear emphasis on harm, autonomy, or societal degradation.

AWS Nova brings text, image and video AI into Bedrock

Amazon Web Services used its re:Invent conference on Tuesday to introduce Nova, a new family of multimodal generative AI models for AWS customers. The launch gives Amazon its own expanded lineup of models for text, images and video inside Bedrock, its AI development platform.

The announcement covers several layers of capability. Some Nova models are designed for fast text responses, others can analyze images and video, and two separate models focus on generative media. AWS also previewed future models that could handle speech-to-speech tasks and broader any-to-any input and output.

A new model family for Bedrock

The core Nova lineup starts with four text-generating models: Micro, Lite, Pro and Premier. Micro, Lite and Pro became available Tuesday to AWS customers. Premier is scheduled to arrive in early 2025, according to Amazon CEO Andy Jassy.

Each model is aimed at a different balance of speed, capability and workload complexity. Micro is the simplest of the group because it accepts text input and produces text output only. Its main advantage is latency, with AWS positioning it as the fastest option in the family for processing text and generating responses.

Lite expands the input types. It can process image, video and text inputs while still operating reasonably quickly. Pro is pitched as the middle ground, combining accuracy, speed and cost for a range of tasks. Premier is described as the most capable model and is intended for complex workloads.

AWS is also positioning Premier differently from the rest of the lineup. Rather than treating it only as a model customers would use directly, AWS describes it as a teacher model for creating tuned custom models. That matters because Bedrock supports fine-tuning on text, images and video, as well as distillation for improved speed and higher efficiency.

What Micro, Lite, Pro and Premier can handle

The Nova text-generating models are optimized for 15 languages, though AWS says they are primarily optimized for English. Their context windows also differ, which affects how much material they can process at once.

Micro has a 128,000-token context window, which AWS equates to around 100,000 words. Lite and Pro have 300,000-token context windows. AWS says that works out to around 225,000 words, 15,000 lines of computer code, or 30 minutes of footage.

Lite, Pro and Premier can analyze text, images and video. AWS says those models are suited for tasks such as digesting documents and summarizing charts, meetings and diagrams. In early 2025, certain Nova models are expected to expand to support over 2 million tokens.

Jassy said the Nova models are among the fastest in their class and among the least expensive to run. He also framed them as useful for agent-style work involving multiple coordinated steps across proprietary systems and APIs.

"We've optimized these models to work with proprietary systems and APIs, so that you can do multiple orchestrated automatic steps — agent behavior — much more easily with these models," Jassy added. "So I think these are very compelling."

Canvas and Reel move Nova into generative media

Beyond the text-focused models, AWS also launched Nova Canvas and Nova Reel. Canvas is an image-generation model, while Reel generates video. Both launched on AWS this morning, making them part of the same broader Nova push.

Canvas lets users generate and edit images from prompts. The source example includes removing backgrounds, and AWS says users can control color schemes and layouts for generated images.

Reel is the more ambitious media model. It can create videos up to six seconds long from prompts, or from prompts with optional reference images. Users can adjust camera motion, including pans, 360-degree rotations and zoom.

There are limits at launch. Reel is currently capped at six-second videos, and those videos take about three minutes to generate. AWS says a version that can create two-minute-long videos is coming soon.

Jassy said Canvas and Reel include built-in controls for responsible use, including watermarking and content moderation. He said AWS is "[trying] to limit the generation of harmful content." AWS also said in a blog post that Nova extends its safety measures to address misinformation, child sexual abuse material, and chemical, biological, radiological, or nuclear risks. The source article notes that it is not clear what those measures mean in practice or what forms they take.

Transparency questions remain

Nova arrives with familiar questions around training data. AWS has not provided precise details on which data it uses to train all of its generative models. The company previously told TechCrunch only that the data is a combination of proprietary and licensed data.

The source article points out that few vendors willingly disclose this kind of information. Training data can be treated as a competitive advantage, and details about it can also create exposure to IP-related lawsuits.

Instead of full transparency, AWS offers an indemnification policy. That policy covers customers if one of its models regurgitates, meaning it outputs a mirror copy of, a potentially copyrighted still.

The next steps for Nova are already mapped out in broad terms. Jassy said AWS is working on a speech-to-speech model for Q1 2025. That model is expected to take speech as input and output a transformed version of it. Amazon says it will also be able to interpret verbal and nonverbal cues such as tone and cadence, and deliver natural, "human-like" voices.

AWS is also working on an any-to-any model for around mid-2025. Jassy described it as a model that could accept text, speech, images or video and output text, speech, images or video. In his framing, that approach represents where frontier models are headed, though the source article notes that this assumes the work does not suffer setbacks.

For AWS customers, Nova is not a single model announcement. It is a broader product family built around multiple model sizes, multimodal inputs, generative media and future speech capabilities. The practical test will be how those pieces perform in Bedrock, especially as AWS expands context windows and pushes Nova toward more complex agent behavior.