Weekly ML for SWEs #55: The easiest way to keep up with ML and AI research papers
An AI reading list curated to make you a better engineer: 6-17-25
A response has been made to the Appleâs âThe Illusion of Thinkingâ paper they released last week. If you missed last weekâs ML for SWEs, we discussed what Apple was getting at with their paper and why it didnât really mean much.
Well, a response to that paper cleverly titled âThe Illusion of the Illusion of Thinkingâ rebuts the Apple paperâs points by explaining the flaws within the experimentation methodology that caused model performance to collapse without adequately proving the cause to limitations of modeling reasoning.
In summary, the critique mentions that experiment length was a limiting factor. All experiments required models to output large sequences of moves exceeding their token output limits (something many models even mentioned within their reasoning chain).
Once experiments were tweaked to require a smaller output (i.e. âcreate a function that solves thisâ instead of âenumerate the step-by-step instructions for solving thisâ), the models performed very well, even on much larger problem sizes. Essentially, the critique explains that the number of moves to solve a problem isnât an adequate measure of complexity.
The paper actually made some good points, but the most interesting thing about this rebuttal is the authors. Notice: âC. Opusâ as the first author. Weâll get into that next.
How to keep up-to-date on ML research
You read that right, Claude Opus was an author on the rebuttal we discussed. At first, I thought this must be a joke (and it turns out I was rightâthis is an article the author made explaining that the paper was primarily a joke and that part of the joke was to include Claude as an author), but I realized weâre probably not too far from this being a reality.
AI is being used to write research papers all the time and I assume itâs also used to review them. There isnât anything inherently wrong with this (unless itâs used to cut corners or authors refuse to proofread their own work), but itâs absolutely something you need to be aware of.
Most notably because a lot of the AI papers being posted to ArXiv are low-effort or inconsequential (itâs a pretty bad sign when something obviously AI-written is left in the research paper and the authors clearly havenât proofread their own work). Without the proper due diligence, a paper meant to be a joke (such as the paper above) is spread far and wide without much thought.
The important thing to understand about ArXiv is that while there are many very important papers submitted, there is much more noise. ArXiv is NOT peer-reviewed.
People often assume that anything considered a âpaperâ is peer-reviewed because thatâs what âpapersâ used to be. They were submitted to journals and conferences where submissions were reviewed by subject matter experts. A paper was only considered a paper when it had passed that review process. On ArXiv, anyone can write up their findings and share it without proper review.
For a while, I made a concentrated effort to stay updated on the cutting edge of machine learning and artificial intelligence research by tracking submission to ArXiv.org and trying to pull out the truly impactful and interesting submissions.
I quickly realized how infeasible this was. There are at least 7 different ArXiv sections to keep track of only in the computer science section alone (Artificial Intelligence, Computer Vision, Emerging Technologies, Machine Learning, Multiagent Systems, and Neural and Evolutionary Computing). I was attempting to go through ~150 submissions per day.
Keep in mind that this doesnât include any math categories and many AI/ML submissions are submitted there as well. If you want to see the actual data breakdown, you can check it out here.
So how do we dig through a massive sea of AI/ML research with a ton of noise? I posed this question to
a long while back. If youâre familiar with Cameronâs writing, you know he does an excellent job of getting deep with the latest ML advancements. His answer was simply: You donât.It would take a great deal of expertise and a lot of time to dig through the noise on ArXiv and find the groundbreaking research findings. So, you let experts do it for you. Here are the 3 ways I stay updated.
1. X/Twitter
I know a lot of you reading this probably donât like X, but itâs without a doubt the best way to find important research papers. The AI community on X crowdsources the work of sifting through papers and experts share their take on findings. Itâs an incredibly high signal-to-noise ratio for ML researchâwhich is exactly what weâre looking for.
X is still the default platform that the AI community uses (although Substack is growing!), so itâs difficult to be on top of things without an account there. If youâre wanting to get started but are not a fan of the platform, shoot me a DM. I can help you get started and avoid the bad side of X.
If youâre not a fan of X, here are two more resources I used.
2. Weekly Top AI Papers
I subscribe to and read
âs and he goes over the top AI papers of the week each week. Itâs primarily sourced from X, but put into a nice package that you receive once a week. This is the definition of someone doing the sifting for you.These write ups are great for looking through a list of papers, reading the summary, and seeing if its something interesting to you. Again, high signal-to-noise, but this one is nice because it doesnât require being on a specific social platform (although it is nice itâs on Substack!).
3. MarkTechPost
My last suggestion is using MarkTechPost to stay updated. Important papers are usually shared via articles here, sometimes with an explanation of the important parts of the newsletter. I donât recommend this quite as highly as the previous two sources because thereâs a bit more noise, but sometimes youâll find a gem of paper that others have overlooked because of this.
If you want to keep track of MarkTechPost posts, I recommend using an RSS reader to keep tabs on it. This way your reader keeps track of whatâs there and makes it really easy to ignore anything that isnât of interest to you.
Those are my favorites. If you have any resources you use that Iâve missed, let us know in the comments.
Onto the interesting posts from this week.
AI is being used for warfare
Itâs super interesting to chat with others about how AI is being used in warfare and to get the reaction of the different individuals Iâve discussed this with. Some people see this as the greatest thing ever and other people see it as the most terrible.
I love thinking about AIâs applications and what they mean and I highly recommend
âs recent article on the AI arms race and the reality of it.Understanding AI agents from the ground up
If you are (or are going to be) a machine learning engineer, youâll encounter AI agents and likely build with them in some form or another. Itâs estimated that a large majority of internet traffic will be agent-to-agent traffic in the very near future.
wrote about AI Agent from First Principles. Cameronâs articles are always long, but incredibly well thought-out and go into great depth. This one is truly an excellent read for ML engineers.Whatâs next for reinforcement learning
Admittedly, RL is a blindspot for me. I know almost no details about how it works and the intricacies of applying it because itâs never come up in my work.
I keep up with RL via
âs writing. I highly recommend his recent article, âWhat comes next with reinforcement learningâ. He contextualizes the frontier of reinforcement learning and the direction it needs to go. Thereâs also a voiceover of the article, which is always a nice touch.How deep research assistants work
Some of my favorite AI tools are deep research assistants. I use them constantly for digging into topics and finding reputable resources for going even further.
I highly recommend an article from
from this past week that walks the reader step-by-step through how AI Deep Research works and the architecture that enables it.Other interesting things
Google is using AI to predict cyclone paths and intensity. The model is trained on decades of historical cyclone weather data to help experts predict upcoming cyclones and their severity.
Meta invested $14.3 billion in Scale AI acquihire Scaleâs CEO Alexandr Wang. Mark Zuckerberg has been aggressively hiring for his Superintelligence AI team hiring AI researchers at over nine figures a piece. Markâs Acquihire of Alexandr Wang seems to be a part of this effort. Understandably, Google, Microsoft, OpenAI, and xAI took a step back from Scale now that itâs 49% owned by a major competitor.
An interesting overview on why âFine-tuning LLMs is a waste of timeâ. Hereâs a good preview:
Fine-tuning large language models (LLMs) is frequently sold as a quick, powerful method for injecting new knowledge. On the surface, it makes intuitive sense: feed new data into an already powerful model, tweak its weights, and improve performance on targeted tasks.
But this logic breaks down for advanced models, and badly so. At high performance, fine-tuning isnât merely adding new dataâââitâs overwriting existing knowledge. Every neuron updated risks losing information thatâs already intricately woven into the network. In short: neurons are valuable, finite resources. Updating them isnât a costless act; itâs a dangerous trade-off that threatens the delicate ecosystem of an advanced model.
Thatâs all for this week!
Always be (machine) learning,
Logan
The response paper's title reminded me of the title of a book by prominent philosopher, Ibn Rushd (Averroes), in response to Al Ghazali's book "The Incoherence". Avverroes' book was aptly titled "The Incoherence of the Incoherence".