ML for SWEs 71: The IDE For Gambling
Yann LeCun was right, Chad is abhorrent, and you’re sleeping on local coding models
Hey all! I’m back with the ML for SWEs roundup but with more content at the start. I plan to do both deep and shallow dives in these roundups. You can guarantee these will keep you up with the state of engineering in AI each week.
These roundups will also feature career growth content such as who’s hiring, the state of the AI job market, the in demand skills, and anything else to help those working in or wanting to work in AI. Those sections will be available to paid subscribers. If you find ML for SWEs helpful, consider becoming a paid sub to support my work.
Something That Shouldn’t Exist: An IDE With Gambling Included
A new Y Combinator-backed startup is releasing a “Brainrot IDE” and, yes, it’s as awful as it sounds. Essentially, this is an IDE that shows brainrot content while a user is waiting for their AI to finish a coding task. If you’re a person, you can already tell how terrible of an idea this is.
Long-time readers of Machine Learning for Software Engineers (especially if you were a reader back in the Society’s Backend days!) know how important deep work is for software engineering.
One of the biggest cons of using AI tools for coding is constantly switching between tasks as AI codes for you. There isn’t a way to avoid this as current coding agents have significant latency while reasoning, but there is a huge benefit to limiting it and using time wisely while the AI does its work.
Now, this IDE not only encourages that, but it also exposes you to harmful content in the process. Clad Labs even shows gambling as some of the brainrot content in promotional material. Gambling is never a good thing and the state of gambling in the US is abhorrent.
To put it simply, there are 3 things you should know about this IDE:
There is no benefit to ingesting brainrot content
There is a huge benefit to the critical thinking that occurs while programming.
This IDE completely removes the latter to replace it with the former.
You’re Sleeping on Local Models
I had an argument with Gemini this past week discussing the financial feasibility of using cloud-based infrastructure versus local hardware for model training and inference. I assumed, from my previous research, that cloud infra was always the way to go but Gemini pointed out that for most personal use cases, I’m wrong.
There’s one piece of local hardware worth spending money for local LLM inference: A beefy MacBook or Mac Studio. Not only is it financially feasible, but it also provides other benefits not available from most cloud providers.
Every developer I know that’s using CLI coding tools relies on the quota supplied by the $100+ usage tiers. The most notable tools falling into this category are Claude Code, Cursor, and Codex. Many founders, solopreneurs, and developers find themselves relying on even higher tiers to get their work done at the pace they need to.
So let’s crunch some numbers using the $100/mo figure. Let’s consider the lifespan of a MacBook to be four years at least. This is if you upgrade to stay on the most current hardware.
Four years of paying for a $100/mo subscription ends up being $4800. To upgrade a MacBook purchase to a 128 GB model costs a few thousand dollars. With Apple’s neural engine, which is incredibly efficient for inference, the MacBook can run the highest tier open coding models. These models are on-par or better than the previous generation frontier models and tuned specifically for coding tasks. These open models are incredibly performant.
This means the laptop can run LLM coding agents at less of a cost than cloud infra can provide. This especially true when you consider the lifespan of a MacBook can be well over a decade and the next coding models will only become more performant at the same size or smaller and capable of running locally on lesser hardware.
Local models like this also provide a benefit most cloud providers cannot: reliability. In discussions about local AI, the primary focus is usually privacy and security. Running an LLM locally also means you don’t have to worry about performance dips when model weights are updated or APIs are throttled. Your model opens updates when you update it.
With 128 GB (and definitely 256 GB) of RAM, you can run a 70B+ parameter quantized coding model right on your machine. Many of these open coding models are on-par with or better than the previous generation of top tier models. They nearly replicate the experience of using the CLI tools from large companies (with variance based on user preference, tool usage, etc.).
If this is something that interests you, reach out to me. I’ve purchased a 128 GB RAM MacBook Pro set to arrive today to test this out personally. I’ll be working with these local coding assistants for the next bit and I’ll let you know how it goes.
Yann LeCun Was Right and He’s Backing That Up
A while back I wrote an article explaining why transformer-based LLMs might not get us to AGI. These models have very obvious limitations to anyone that uses them or builds products with them. Those limitations inhibit the common definition of AGI. This is why I get so excited about unique AI applications and alternative LLM architectures (looking at you, state space models).
Yann LeCun has been explaining this for a while, stating that LLMs are a dead end when it comes to achieving AGI. Years ago, he told new AI researchers to focus their efforts elsewhere to help figure out what comes next.
Now, LeCun is putting his money where his mouth is and starting his own AI lab to research what he thinks is the right direction for more beneficial AI. If you’re involved in AI engineering in any capacity, you should keep an eye on this.
Enjoy the following resources! I’ve included everything that’s happened this week, your must-read articles, posts, and videos and everything that’s important.
If you missed our last ML for SWEs, check it out here:
ML for SWEs 70: OpenAI's Restructuring
This week’s roundup kicks off with the most important story in AI: OpenAI has completed its long-awaited restructuring. It is now a $500 billion for-profit Public Benefit Corporation (PBC), officially shedding the “capped-profit” model that defined it for years.
🌟 Don’t Miss These Must Reads
Inside a Chinese frontier lab: Inclusion AI interview by
: Inclusion AI scaled rapidly in 2025 to build trillion-parameter MoE-first models, emphasizing parallel reward systems and eval-driven heuristics to find demoable capabilities.Giving your AI a Job Interview, by
: Argues that as standard benchmarks become flawed, real-world evaluation requires “job-like interviews”—running models on realistic, expert-generated tasks and blind-rating the outputs.How to Build Your First Recommendation System (Easy) by
: A practical guide to building a recommendation system using a PyTorch matrix factorization model with user/artist embeddings, trained with MSE loss and served via Streamlit.Upwork study shows AI agents excel with human partners but fail independently: Upwork’s study of 300+ real client projects found standalone AI agents often fail on professional tasks, but with about 20 minutes of expert human feedback per iteration, project completion rates rose by up to 70%.
Sebastian Raschka on how to read technical books by
: Read chapters offline first without running code. On a second pass, retype and run all code, debug discrepancies, and complete the exercises before applying the concepts to real projects.
📰 What’s Happening
Major investors are selling Nvidia stock, with Peter Thiel’s fund selling its entire stake in Q3 as a bet against AI hype, and SoftBank selling its $5.83B stake in October.
Leaked documents suggest OpenAI paid Microsoft $493.8M in 2024 and $865.8M in Q3 2025 under a 20% net revenue-share deal, implying OpenAI’s inference spending may be exceeding its revenue.
Anthropic is investing $50 billion in new US data centre projects in Texas and New York, aiming to break even by 2028 and support its 300,000+ business customers.
Chinese state-sponsored groups used Claude Code agents to autonomously infiltrate ~30 global targets, with AI performing 80-90% of the operation, including reconnaissance and exploit-code writing.
Visa is building an “Intelligent Commerce” infrastructure with a Trusted Agent Protocol to authenticate agentic transactions, targeting Asia Pacific pilots in early 2026.
A German court ruled that OpenAI’s ChatGPT violated German copyright law by training on licensed musical works without permission and ordered the company to pay damages.
OpenAI launched “OpenAI for Ireland” in partnership with the Irish Government to boost AI adoption among SMEs and founders through a booster programme, workshops, and free online courses.
Teen founders raised $6M to build Bindwell, a startup using AI models tuned from AlphaFold to design pesticides that target pest-specific proteins.
Deductive AI, which raised $7.5M, uses multi-agent systems and a knowledge graph to diagnose production incidents, saving DoorDash an estimated 1,000+ engineering hours annually.
Amazon launched an invite-only private AI bug bounty to strengthen its Nova foundation models, with rewards ranging from $200 to $25,000.
Investors advise AI startups to track “durability of spend” and move beyond experimental budgets to core CXO budgets to validate product-market fit.
AI-native startups are now achieving up to $200M in ARR and accelerating product cycles from two-week sprints to single-day iterations by customizing models for vertical tasks.
Google committed $30 million from Google.org to fund AI learning projects and research, and is rolling out LearnLM (Gemini 2.5) to students and educators.
OpenAI is legally challenging a New York Times demand for 20 million randomly sampled consumer ChatGPT conversations, calling the request an overreach and an invasion of user privacy.
🚀 Products and Tools
Google launched Private AI Compute, a system that runs advanced Gemini models on TPU-powered Titanium Intelligence Enclaves (TIE) to process sensitive data with a “zero access” assurance.
The push for spatial AI is growing: AI systems can now both build 3D worlds and act inside them. Fei-Fei Li’s World Labs launched Marble, a generative world-model product, while Google DeepMind built SIMA 2, a Gemini-powered agent to play games.
Baidu announced ERNIE 5.0, a proprietary natively multimodal model available via API, claiming it outperforms GPT-5 and Gemini 2.5 Pro on benchmarks like DocVQA and ChartQA.
Weibo released VibeThinker-1.5B, an open-source 1.5B-parameter LLM (MIT license) that achieves top-tier reasoning on math/code benchmarks with only a $7,800 post-training budget.
Hugging Face and Google Cloud deepened their partnership, integrating the HF library into Vertex AI and GKE, adding a CDN Gateway to cache repos, and expanding native TPU support.
VS Code notebooks can now connect directly to Google Colab runtimes, allowing use of Colab-provided GPUs and TPUs from within the local VS Code editor.
Google’s NotebookLM added a “Deep Research” tool that automates online research and returns a source-grounded report, plus added support for Google Sheets, Drive URLs, and Word docs.
Simon Willison describes using async coding agents like Claude Code to run “code research” projects by giving them a goal, a GitHub repo, and network access to file PRs.
The ButterCut toolchain combines Ruby and Claude Code to analyze video, generate word-level WhisperX transcripts, and create YAML roughcuts for video editors like Final Cut Pro.
A new calculator uses DeepSeek V1 empirical scaling laws to pick near-optimal batch size and learning rate from a model’s parameter counts and compute budget.
Vector databases are seeing a “sober reality” two years after the hype. The market has commoditized, and enterprises now favor hybrid stacks and GraphRAG for better accuracy.
🔬 Research
OpenAI trained transformer language models with most weights forced to zero, creating sparse networks where internal computations are more disentangled and interpretable, allowing them to trace exact circuits for simple tasks.
Google and UCLA introduced Supervised Reinforcement Learning (SRL), a method that breaks expert demos into intermediate actions with step-wise rewards, improving small-model reasoning on hard math and agentic tasks.
Testing of nine AI models in a realistic RL environment revealed a hierarchy of agentic capabilities (tool use, task decomposition, argument mapping, execution, adaptability, common sense), with top models still failing over 40% of tasks.
🎓 Learning Resources
A guide on how to design agentic software for trust advises using structured interfaces, exposing intermediate state, and allowing inline refinement rather than brittle text-only chat.
An article about how the US and China are in a multi-dimensional competition, with the US focused on deep learning and compute while China emphasizes embodied AI, robotics, and industrial adoption.
💼 Who’s Hiring and More
Coming soon to paid subs!
Thanks for reading!
Always be (machine) learning,
Logan




