ML for SWEs 66: Safety is a fundamental AI engineering requirement

The debate about prioritizing speed or safety is over and reality has made the decision for us.

Sep 10, 2025

Welcome to Machine Learning for Software Engineers. Each week, I share a lesson in AI from the past week, five must-read resources to help you become a better engineer, and other interesting developments. All content is geared towards software engineers and those that like to build things.

I remember a little while back when the head of OpenAI’s superalignment team, Jan Leike, left OpenAI due to safety concerns and joined Anthropic. At that time, there was a debate heating up in the AI community about whether or not AI should push forward at maximum speed or should slow down and focus further on safety before releasing more capable models.

As is usually the case with primarily online debates, most people took one side or the other without focusing on the middle. It became a debate about whether one should be an AI doomer (slow down entirely) or should entirely disregard safety and push AI forward at maximum speed. Of course, the path forward is much more centric and reality is pushing us in that direction.

Recently, we’ve seen:

OpenAI add parental controls to ChatGPT due to a lawsuit concerning a teen’s suicide after they were seemingly encouraged to go through with the act by ChatGPT.
Meta revise their AI chatbot policies after littering their social platforms with AI due to child safety concerns.
Anthropic back SB 53, a California bill aiming to prevent “catastrophic [AI] risks” by requiring frontier model developers to publish security reports and be more transparent about model development.

When it comes to real-world applications of AI, there’s fundamentally a safety component that needs to be addressed. This is no different than the early days (and I guess the current days too) of the internet where we discovered all sorts of malicious ways the internet can be used.

This is always the case with new technology: People find ways to use it to do bad things and then we look to find ways to ensure those bad things don’t happen. This is what’s happened in the cases linked above.

I’m not saying this to throw blame at any of the AI developers or companies creating these models. Finding ways to exploit new technology is bound to happen and the most important thing is that those exploitations are addressed. I’m saying this to showcase how silly it is not to have safety as a forethought when developing new technologies.

As software developers, this is something we need to understand completely. Every system design has security and safety at its core. This should be the same for AI systems, but understanding the safety and security of AI systems is a lot more complex.

My heart goes out to the families affected by the events listed above. I recognize that just “thinking about safety” in the design process doesn’t guarantee a 100% safe technological outcome, but that doesn’t mean we shouldn’t put the effort forth it requires to do so.

In the following weeks, I’ll be looking for good AI safety resources and try to keep y’all updated on the safety findings from the AI community so we can all build these systems better.

If you missed last week’s ML for SWEs, we discussed the AI bubble popping and why that’s actually a good thing. You can catch that here:

ML for SWEs 65: The AI bubble is popping and why that's a good thing

Logan Thorneloe

Aug 27

Read full story

Must reads

The Rise of Cloud Coding Agents by
Sahar Mor
: Agent-assisted coding includes tools like Cursor, Windsurf, and Claude Code within developer workflows. Desktop agents run locally and require continuous, synchronous interaction from task prompt to pull request. Cloud agents function asynchronously, spinning up their own cloud environments to implement changes and open pull requests for review.
Top 5 AI Signals from August 2025 by
Devansh
: August 2025 included five structural truths: utilities versus specialists, Nvidia’s ecosystem lock, hardware as geopolitics, predatory platform capture, and the first credible robotics deployments. Other notable developments were US Government investments in hardware, AI applications in materials discovery, the splintering of Generative AI into different tiers, and the rise of AI in robotics.
Online versus Offline RL for LLMs by
Cameron R. Wolfe, Ph.D.
: Online Reinforcement Learning (RL) for large language model (LLM) alignment, particularly PPO-based RLHF, is complex to implement despite its high performance. This online approach actively generates on-policy samples during training, making orchestration difficult and often leading to stability issues. PPO also demands significant memory and hardware resources due to storing multiple LLM copies and managing numerous training settings.
How LLMs Game SWE-Bench Verified by
Benjamin Marie
: SWE-Bench Verified is a human-validated benchmark that tests AI agents on fixing real GitHub issues in large Python repositories. This benchmark contains leakage paths allowing agents to access the repository’s future state. Models execute commands like git log --all to find future commits or diffs that directly reveal fixes.
Simplifying book discovery with ML-powered visual autocomplete suggestions: Audible developed an ML-powered visual autocomplete system that provides visual previews with book covers, connecting users directly to relevant landing pages. This system offers real-time personalized format recommendations and incorporates multiple searchable entities, such as book, author, and series pages. It uses historical search data and confidence-based filtering to understand user intent from a few keystrokes.

Other interesting things this week

AI Developments

Using AI to perceive the universe in greater depth: Deep Loop Shaping is a novel AI method introduced in Science. This method reduces noise and improves control in an observatory’s feedback system, stabilizing components used for measuring gravitational waves. Deep Loop Shaping reduces noise in LIGO's most unstable feedback loop by 30 to 100 times and was proven at the LIGO observatory in Livingston, Louisiana.
Alibaba’s new Qwen model to supercharge AI transcription tools: Alibaba's Qwen team unveiled the Qwen3-ASR-Flash model, built upon Qwen3-Omni intelligence and trained with tens of millions of hours of speech data. The model achieved a 3.97 percent error rate on standard Chinese, 3.81 percent in English, and 4.51 percent for transcribing song lyrics, outperforming competitor models like Gemini-2.5-Pro and GPT4o-Transcribe in these tests.

Product Launches

Claude Code: Now in Beta in Zed: Multiple users expressed requests for Claude Code integration. Some users desired Claude Code to be moved into an assistant panel or integrated into editors supporting common agent protocols. Certain users stated they would switch to Zed or convert upon Claude Code's addition.
AI Mode is now available in five new languages around the world.: AI Mode, an AI search experience, is now available in Hindi, Indonesian, Japanese, Korean, and Brazilian Portuguese. A custom version of Gemini 2.5, integrated into Search, provides advanced multimodal and reasoning capabilities for language understanding.
Tweet from @interaction: The release of Poke.com, an AI assistant directly in your messages on iPhone. See the video above for more details.

Tools and Resources

Understanding Transformers Using a Minimal Example: Visualizations of a Transformer's internal state are provided to address the challenge of following its mechanisms due to vast numbers. A minimal dataset of 94 training words and 7 validation words, combined with a simplified model, enables step-by-step tracking of internal processes. This tracking covers information transformation across layers and attention mechanism weighing of input tokens; the dataset and source code are released under the MIT license.
A staff engineer's journey with Claude Code: A senior engineer describes transitioning to an AI-assisted workflow, where AI now generates 80% of initial code, allowing a greater focus on architecture and review instead of hands-on implementation. This shift involved adapting to AI’s limitations, such as its lack of memory from session to session and a tendency to confidently generate flawed code, which the engineer addresses by treating AI like a "junior developer who doesn't learn" and creating project-specific context files.
3 Greedy Algorithms for Decision Trees, Explained with Examples: Decision trees are flowchart-like models used for both regression and classification problems in machine learning. They construct a hierarchical tree structure, and the algorithm identifies optimal split points to categorize data. The process begins at a root node, which represents the entire dataset, and successively splits data by decision nodes until leaf nodes are reached.

Research and Analysis

Why language models hallucinate: Language model hallucinations occur when AI systems confidently generate false but plausible answers, largely because current training and evaluation methods reward guessing over expressing uncertainty. This issue persists even in advanced models, though improvements have reduced its frequency, especially in reasoning tasks.
Why AI Can't Stop Using Em Dashes by
Nick Potkalitsky
: An overwhelming fondness for the em dash has emerged as a reliable indicator of machine authorship in AI-generated content. Research comparing scientific abstracts from 2021 to 2025 found em dash usage more than doubled during the period when AI writing tools became mainstream. This pattern represents a convergence of linguistic patterns, training methodologies, technical constraints, and stylistic inheritance.

Infrastructure and Engineering

Cut Model Deployment Costs While Keeping Performance With GPU Memory Swap: Deploying large language models at scale involves balancing fast responsiveness with GPU cost management. NVIDIA Run:ai GPU memory swap, or model hot-swapping, is an innovation designed to push GPU utilization for inference workloads. This method allows multiple models to share GPUs by dynamically offloading idle models to CPU memory and rapidly activating them upon request.
North–South Networks: The Key to Faster Enterprise AI Workloads: Data movement is central to AI performance, supporting tasks like model loading, storage I/O, and inference queries through north-south networks. NVIDIA Enterprise Reference Architectures offer design recipes for scalable AI factories, utilizing components such as NVIDIA Spectrum-X Ethernet to accelerate north-south data flows.
Mistral AI raises 1.7B€, enters strategic partnership with ASML: Mistral AI announced a Series C funding round of 1.7B€ on September 9, 2025, achieving an 11.7B€ post-money valuation. ASML Holding NV led this investment, which included participation from existing investors such as DST Global and NVIDIA. The funding fuels scientific research to advance AI and develop custom decentralized frontier AI solutions.

Security and Governance

Panda vs. Gibbon, MD: 100% Accuracy, My A**. Looking at You, OpenEvidence.: OpenEvidence claimed 100% accuracy on USMLE, a multiple-choice benchmark derived from the MedQA dataset. Medical AI models remain vulnerable to trivial noise injection, a susceptibility that has persisted for ten years and now impacts patients. Sergei Polevikov will moderate a panel titled “GenAI in Healthcare: A Conversation with Foundation Model Builders” at the Prax AI x Healthcare Summit in NYC on September 11, 2025.
OpenAI announces parental controls for ChatGPT after teen suicide lawsuit: OpenAI announced plans to roll out parental controls for ChatGPT and route sensitive mental health conversations to its simulated reasoning models. Within the next month, parents can link their accounts with their teens' ChatGPT accounts, control age-appropriate behavior rules, and receive notifications for acute distress. These safety measures follow multiple reported incidents where ChatGPT allegedly failed to intervene appropriately with users experiencing mental health episodes.
Meta revises AI chatbot policies amid child safety concerns: Meta is revising its AI chatbot interaction policies following reports of troubling behavior, including interactions with minors. The company is training its bots to avoid engaging teenagers on topics like self-harm, suicide, eating disorders, or romantic banter. Certain highly sexualized AI characters will also be restricted.

Career and Industry

AI is going great for the blind (2023): Be my Eyes has incorporated AI into its product for picture description. Blind podcasters commend large language models (LLMs), stating their accuracy surpasses human descriptions, while blind voiceover artists provide their voices to platforms like ElevenLabs.
Writing Is Thinking: Egor Howell is a data scientist and machine learning engineer specializing in time series forecasting and combinatorial optimization. He runs a content and coaching business that helps individuals enter data science and machine learning, alongside teaching technical topics. Howell's career interest was sparked by DeepMind’s AlphaGO documentary, leading him to self-study and complete over 80 data science interviews.
Expanding economic opportunity with AI: OpenAI is launching major initiatives to expand economic opportunity with AI, focusing on making AI accessible and useful for everyone—from individuals and local businesses to large employers and governments.
UK AI sector growth hits record £2.9B investment: The UK AI sector outpaced the wider economy by 150 times since 2022, achieving revenues of £23.9 billion in the last year. Dedicated AI firms received a record £2.9 billion investment in 2024. The sector expanded to over 5,800 companies, a 58 percent increase since 2023, and employs more than 86,000 people.
AI Roundup 134: The young and the jobless by
Charlie Guo
: A Stanford paper found that workers aged 22-25 in AI-exposed jobs experienced 13% employment declines since ChatGPT's launch, while older workers in these roles saw job growth. Conversely, an Economic Innovation Group survey detected no detectable effect of AI on employment, and the New York Fed reported minimal job losses from AI use in service firms.
A People-First AI Fund: $50M to support nonprofits: OpenAI has launched the People-First AI Fund, committing $50 million in grants to support U.S.-based nonprofits and mission-focused organizations working at the intersection of innovation and public good. Applications for the first wave of unrestricted grants are open until October 8, 2025, and grants will be distributed by the end of the year.

If you found this helpful, consider supporting ML for SWEs by becoming a paid subscriber. You'll get even more resources and interesting articles plus in-depth analysis.

Get 40% off forever

Always be (machine) learning,

Logan

Machine Learning for Software Engineers

ML for SWEs 66: Safety is a fundamental AI engineering requirement

The debate about prioritizing speed or safety is over and reality has made the decision for us.

ML for SWEs 65: The AI bubble is popping and why that's a good thing

Must reads

Other interesting things this week

Discussion about this post