Advertisement
  1. SEJ
  2.  ⋅ 
  3. Generative AI

ChatGPT Leaps Forward With New Voice & Image Capabilities

OpenAI rolls out voice and image features for ChatGPT, making conversations more natural and intuitive.

  • ChatGPT now has voice and image capabilities.
  • The new features allow more intuitive conversations with AI.
  • OpenAI is gradually rolling out the advanced capabilities with a focus on safety.
ChatGPT Leaps Forward With New Voice & Image Capabilities

OpenAI has begun rolling out new voice and image features for its popular AI-powered chatbot, ChatGPT.

These new capabilities allow you to have more natural conversations with ChatGPT by speaking to it and showing it images.

This enables more ways to utilize ChatGPT in daily routines. For example, while traveling, you can send ChatGPT a photo of a landmark and engage in a real-time conversation about it.

Similarly, at home, you can take pictures of your refrigerator’s contents and discuss meal ideas or request a step-by-step recipe.

Over the coming weeks, OpenAI will roll out these features to Plus and Enterprise users. The voice capability will be available on mobile apps, while the image functionality will be accessible across all platforms.

Voice Input Allows Two-Way Conversations

The new voice feature allows you to speak conversationally with ChatGPT, which can now respond audibly in one of five synthesized voices.

You can opt-in through iOS and Android mobile app settings to enable voice.

According to OpenAI, the voice capability uses an advanced text-to-speech model trained on samples from voice actors. For speech recognition, it leverages Whisper, OpenAI’s open-source speech system.

Discussing Images Provides Visual Context

You can now show ChatGPT one or more images to provide visual context and focus the conversation.

For example, sharing a photo of a broken appliance could help ChatGPT diagnose issues and suggest fixes. On mobile, a drawing tool allows circling or pointing out specific parts of an image.

The image features use a multimodal version of the GPT-3.5 and GPT-4 models fine-tuned to reason about visual inputs. OpenAI tested the image capabilities extensively for safety risks before rolling out.

Gradual Rollout Focused On Safety

OpenAI noted it’s taking a gradual approach to deploying these features.

The new voice technology opens up creative applications but also risks like the impersonation of public figures. To mitigate risks, voice is currently limited to conversational chat.

For images, OpenAI said it has limited ChatGPT’s ability to directly analyze people in photos and advise against high-risk use cases without verification.

In Summary

ChatGPT’s new voice and image capabilities offer users a more natural way to interact with the AI system.

However, OpenAI is taking a measured approach to roll them out, limiting initial access and functionality due to potential risks.

As these features expand, keep in mind ChatGPT’s limitations and avoid high-risk applications without verification.


Featured Image: Ahmed_Rizq/Shutterstock

Category News Generative AI
ADVERTISEMENT
SEJ STAFF Matt G. Southern Senior News Writer at Search Engine Journal

Matt G. Southern, Senior News Writer, has been with Search Engine Journal since 2013. With a bachelor’s degree in communications, ...