OpenAI Pushes Voice AI Forward With GPT Realtime

Along with Microsoft Debuts MAI-Voice-1 and MAI-1-Preview

Amrendra Pal

Aug 29, 2025

TLDR; In today’s Signal

AI Flags 1,000 Questionable Scientific Journals
Memory-R1: RL Empowers LLM Memory Management
a16z Releases 5th-Edition GenAI App Rankings
Google Vids Adds Image-to-Video Feature
Nous Research Unveils Hermes 4 Hybrid LLM
Pollo AI Offers Google Nano Banana Unlimited
Izotropic’s AI Revolutionizes Breast CT Imaging
Azure Fine-Tuning Gains Key Workflow Features

THE AI SIGNAL PICKS

Google Vids Adds Image-to-Video Feature
Google is now enabling users of Google Vids to upload a static image and, with a text prompt, instantly generate a high-quality, eight-second video with sound using its Veo 3 model. The feature is being rolled out gradually available immediately to Rapid Release domains and later to Scheduled Release users. Initially, it will not be accessible in the EEA, Switzerland, or the UK.
AI Flags 1,000 Questionable Scientific Journals
A team at the University of Colorado Boulder has developed an AI-based screening tool that scans journal websites and content to identify potentially predatory or deceptive publications, such as those lacking credible editorial boards or featuring poor website quality. The system flagged over 1,000 previously unknown suspect journals, serving as a scalable pre-screening aide though human review remains essential.
Memory-R1: RL Empowers LLM Memory Management
The newly released Memory-R1 introduces a reinforcement learning framework that enables large language model agents to learn how to manage memory, deciding when to store, update, delete, or retrieve information rather than relying on static heuristics. Through its “Memory Manager” and “Answer Agent” modules, the system filters pertinent memory for reasoning, achieving state-of-the-art accuracy with only a minimal set of training examples. This represents a significant leap toward memory-savvy, long-horizon LLM behavior.
Nous Research Unveils Hermes 4 Hybrid LLM
Nous Research has launched Hermes 4, a family of open-weight, hybrid-reasoning models designed for neutrality, steerability, and creativity. Built with innovations like DataForge (graph-based synthetic data generation) and Atropos (RL-powered verification), Hermes 4 achieves top-tier benchmark results, including 96.3% on MATH-500 and 81.9% on AIME’24. It emphasizes transparency by letting users toggle explicit <think> reasoning while minimizing refusals. This positions Hermes 4 as a serious challenger to proprietary models like GPT-4o and Claude Sonnet 4.

THE BIG LEAP

OpenAI Launches GPT-Realtime With Voice API Upgrades

Stylized interface showing a voice interaction. Centered is a rounded rectangular audio player with a waveform visualization, play/pause button, “Agent online” status indicator, and timestamp of 00:35. White curved lines with dots flow across the image, suggesting live audio or signal movement. The background is a vivid blue with blurred flower shapes in pink and purple tones. — Source: OpenAI

Signal Scoop: OpenAI is rolling out GPT-Realtime, a production-ready speech-to-speech model, along with major updates to its Realtime API. The release brings faster interactions, SIP phone calling, image input support, and better multi-agent system integration. This is a major step in making conversational AI practical for businesses at scale.

New model enables human-like, low-latency conversations in real time, cutting delays that often disrupt voice assistants.
Support for image inputs, SIP phone calling, and MCP (Model Context Protocol) servers makes the API versatile across enterprise workflows.
Designed for production-grade voice agents, think call centers, customer support, and sales automation ensuring reliability at scale.
Moves OpenAI into the core infrastructure powering interactive AI apps, competing directly with voice tech from Amazon, Google, and startups.

What You Can’t Miss

This update isn’t just about faster AI, it’s about making voice AI truly usable in business-critical environments. With real-time voice, phone integration, and multimodal inputs, OpenAI is positioning GPT-Realtime as the backbone for next-gen conversational AI platforms.

Microsoft Debuts MAI-Voice-1 and MAI-1-Preview

Text "MAI-Voice-1 MAI-1-preview" appears centered on a pink and peach abstract background with soft, blurred shapes and gradients. — Source: Microsoft

Signal Scoop: Microsoft has unveiled two purpose-built, in-house AI models under its Microsoft AI (MAI) initiative: MAI-Voice-1 for ultra-fast speech generation and MAI-1-preview, its first end-to-end foundation LLM. These models mark a critical move toward self-reliance, reducing dependence on external providers like OpenAI while enhancing Copilot’s consumer-focused capabilities. Testing has already begun, cementing Microsoft’s growing AI autonomy.

The Full Picture

A highly expressive, natural speech model that generates a full minute of audio in under one second on just one GPU, making it one of the most efficient speech systems in the industry. It’s already powering Copilot Daily and Podcasts, and is available via Copilot Labs for experimentation.
Microsoft’s first full-stack, in-house large language model, trained on approximately 15,000 NVIDIA H100 GPUs. Designed for instruction-following and everyday queries, it's currently undergoing public testing on LMArena and will be gradually integrated into Copilot text features.
These models represent a pivot away from reliance on OpenAI’s models, signaling Microsoft’s intent to develop proprietary AI infrastructure and tools, aimed initially at consumer use rather than enterprise.
Microsoft plans to orchestrate a suite of specialized models tailored for various user intents and use cases shaping the future of Copilot and positioning itself as a self-sufficient AI innovator

What You Can’t Miss

This is a turning point: Microsoft is building the AI engines that will power its own digital assistants. MAI-Voice-1 and MAI-1-preview are not just new models, they’re the foundation of a future where Copilot is powered by Microsoft’s own innovation, optimized for speed, efficiency, and consumer engagement.

ON THE AI EDGE

a16z Releases 5th-Edition GenAI App Rankings
Andreessen Horowitz (a16z) has published the fifth edition of its landmark “Top 100 Gen AI Consumer Apps” list, tracking consumer AI usage through mid-2025. The updated ranking shows the ecosystem stabilizing, with 11 new web entrants and 14 mobile additions. Notably, Google now appears with four distinct AI products on the list, and the “Brink List” highlights emerging apps on the verge of breaking in.
Azure Fine-Tuning Gains Key Workflow Features
Microsoft's Azure AI Foundry now supports Pause & Resume for non-reasoning model fine-tunes, Cross-Region Model Copying across subscriptions, and Reinforcement Fine-Tuning (RFT) via API and Swagger. These enhancements deliver smarter control, easier scaling, and more robust experimentation empowering developers and data scientists to iterate faster and deploy with confidence.
Pollo AI Offers Google Nano Banana Unlimited
Pollo AI has integrated Google’s Nano Banana (Gemini 2.5 Flash Image) into its platform, granting paid users unlimited access to this high-fidelity AI image generation and editing model. Known for its unmatched character consistency and rapid creation speed, Nano Banana enhances creative workflows by enabling stylized edits, image fusion, and natural-language prompts with professional-grade precision.
Izotropic’s AI Revolutionizes Breast CT Imaging
Izotropic Corporation has integrated a proprietary AI-based machine-learning reconstruction algorithm into its IzoView Breast CT Imaging System. Co-developed with Johns Hopkins University, this trade-secret innovation significantly reduces image noise without increasing radiation dose, operates directly on raw X-ray data, and enables faster, low-dose imaging tailored for real-world clinical workflows.

AI START-UP NEWS

Framer Raises $100M at $2B Valuation
Framer, a Dutch no-code web design platform and Figma rival has secured a $100 million Series D funding round at a $2 billion valuation, led by existing investors Meritech Capital Partners and Atomico. The capital will fuel growth in enterprise offerings, AI-enhanced tools, and broader adoption
DeAgentAI Raises $100M Strategic Investment
DeAgentAI, a decentralized AI infrastructure startup on the Sui blockchain has secured a $100 million strategic investment. The funding, led by Momentum (a top DEX in the Sui ecosystem), aims to accelerate development of AI-powered DeFi tools and infrastructure. Momentum, backed by OKX Ventures and Coinbase Ventures, signals strong confidence in DeAgentAI’s growth potential..
Maisa AI Raises $25M Seed Round
Maisa AI, a year-old startup building accountable, agentic AI agents to fix the staggering 95% enterprise AI pilot failure rate has raised a $25 million seed round led by Creandum. Its new platform, Maisa Studio, enables users to deploy “digital workers” via natural language by constructing transparent "chains-of-work" rather than opaque responses. The funding will support team expansion and enterprise deployment across sectors like banking, automotive, and energy.

NEW TOOLS, NEW POSSIBILITIES

Pheromind: AI swarm orchestration platform that deploys 30+ agents .
VerbaCall: AI-powered call management platform
Sophiana: Content transformation tool that converts articles into scripts
Decofy: Design AI that analyzes room photos to generate multiple concepts
BeeSift: Chrome extension that analyzes unstructured web data

AI CAREER HORIZON

Perplexity: AI Software Engineer - Agent Platform (New York City - US)
Humana: Senior AI Engineer (Remote - Nationwide)
Figma: Product Manager, AI (San Francisco, CA)
Mozilla: Senior Machine Learning Engineering Manager (Remote - US)
Coursera: Full-Stack AI Engineer (US)

Dani Cherkassky

Aug 31, 2025

Voice AI is reaching a tipping point where quality isn’t just about generating lifelike speech, but about understanding people in messy, real-world conditions. The challenge now is bridging human nuance—overlapping voices, accents, intent—with technology that can keep pace. That’s when voice truly feels natural.

Discussion about this post

Ready for more?