Background

Google Gemini With 'Canvas' And 'Audio Overviews': More Ways For Human-AI Collaboration


Google Gemini

Gemini is Google's advanced family of multimodal large language models (LLMs), developed by Google DeepMind as the successor to LaMDA and PaLM 2.

Designed to process various data types—text, images, audio, video, and code—Gemini, the LLM that Google focuses on since the rise of ChatGPT from OpenAI, represents a significant evolution in AI capabilities. ​

This time, in an attempt to keep itself competitive in this ever-growing competition, Google is introducing 'Canvas,' a collaborative effort from Gemini to help users with documents in real-time, and 'Audio Overviews' that can turn documents or research into podcast-style conversations to listen to.

Both Canvas and Audio Overview are available for Gemini users worldwide.

​Canvas, a feature within Google's Gemini AI assistant, offers an interactive workspace designed to facilitate real-time collaboration on both writing and coding projects.

Users can generate initial drafts and refine them with Gemini's assistance, adjusting tone, length, and formatting as needed. For coding endeavors, Canvas enables the creation and live preview of code snippets, allowing users to iteratively edit and observe changes instantly. ​

This dynamic environment streamlines the creative process, making it easier for users to develop and perfect their work efficiently.​

In a blog post, Google said that:

"Canvas is a new interactive space within Gemini designed to make creating, refining and sharing your work easy. Simply select ‘Canvas’ in your prompt bar and you can write and edit documents or code, with changes appearing in real-time."

"Effortlessly generate high-quality first drafts, then quickly perfect your work using Gemini’s feedback to suggest edits."

To use Canvas, users can simply open Gemini on the web and click on the 'Canvas' button within the 'Ask Gemini' box.

​As for Gemini's Audio Overview, it's a feature that can transforms documents, slides, and Deep Research reports into engaging, podcast-style audio discussions.

"Gemini will create a podcast style discussion between two AI hosts who, with just a click, launch into a lively deep-dive conversation based on your uploaded files. They'll summarize the material, draw connections between topics, engage in a dynamic back-and-forth and provide unique perspectives."

By uploading files to it, Gemini can create a dynamic conversation between two AI hosts who summarize the material, draw connections between topics, and provide unique perspectives. ​

This feature enhances learning by allowing users to listen to summaries of class notes, research papers, or lengthy email threads while on the go.

To create an Audio Overview, simply upload the document or slides in the Gemini app, and click the suggestion chip that appears above the prompt bar. Users can then listen to the AI-generated discussion to gain new insights and stay informed, even while multitasking.

"With these new features, Gemini is becoming an even more effective collaborator, helping you bring your ideas to life."

​In the escalating competition among LLMs, the focus extends beyond sheer power to encompass the breadth of practical applications.

Google exemplifies this approach by integrating innovative features into its AI offerings, such as conversational coding and AI-generated podcast discussions, enhancing user engagement and utility.

Published: 
18/03/2025