Stubsack: weekly thread for sneers not worth an entire post, week ending 16th November 2025

self@awful.systems · 27 days ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 16th November 2025

BurgersMcSlopshot@awful.systems · 23 days ago

One thing I’ve heard repeated about OpenAI is that “the engineers don’t even know how it works!” and I’m wondering what the rebuttal to that point is.

While it is possible to write near-incomprehensible code and make an extremely complex environment, there is no reason to think there is absolutely no way to derive a theory of operation especially since any part of the whole runs on deterministic machines. And yet I’ve heard this repeated at least twice (one was on the Panic World pod, the other QAA).

I would believe that it’s possible to build a system so complex and with so little documentation that on its surface is incomprehensible but the context in which the claim is made is not that of technical incompetence, rather the claim is often hung as bait to draw one towards thinking that maybe we could bootstrap consciousness.

It seems like magical thinking to me, and a way of saying one or both of “we didn’t write shit down and therefore have no idea how the functionality works” and “we do not practically have a way to determine how a specific output was arrived at from any given prompt.” The first might be in part or on a whole unlikely as the system would need to be comprehensible enough so that new features could get added and thus engineers would have to grok things enough to do that. The second is a side effect of not being able to observe all actual input at the time a prompt was made (eg training data, user context, system context could all be viewed as implicit inputs to a function whose output is, say, 2 seconds of Coke Ad slop).

Anybody else have thoughts on countering the magic “the engineers don’t know how it works!”?

sc_griffith@awful.systems · edit-2 23 days ago

well, I can’t counter it because I don’t think they do know how it works. the theory is shallow yet the outputs of, say, an LLM are of remarkably high quality in an area (language) that is impossibly baroque. the lack of theory and fundamental understanding presents a huge problem for them because it means “improvements” can only come about by throwing money and conventional engineering at their systems. this is what I’ve heard from people in the field for at least ten years.

to me that also means it isn’t something that needs to be countered. it’s something the context of which needs to be explained. it’s bad for the ai industry that they don’t know what they’re doing

EDIT: also, when i say the outputs are of high quality, what i mean is that they produce coherent and correct prose. im not suggesting anything about the utility of the outputs

jaschop@awful.systems · 23 days ago

I think I heard a good analogy for this in Well There’s Your Problem #164.

One topic of the episode was how people didn’t really understand how boilers worked, from a thermal mechanics point if view. Still steam power was widely used (e.g. on river boats), but much of the engineering was guesswork or based on patently false assumptions with sometimes disastrous effects.

sc_griffith@awful.systems · 23 days ago

another analogy might be an ancient builder who gets really good at building pyramids, and by pouring enormous amounts of money and resources into a project manages to build a stunningly large pyramid. “im now going to build something as tall as what will be called the empire state building,” he says.

problem: he has no idea how to do this. clearly some new building concepts are needed. but maybe he can figure those out. in the meantime he’s going to continue with this pyramid design but make them even bigger and bigger, even as the amount of stone required and the cost scales quadratically, and just say he’s working up to the reallyyyyy big building…

V0ldek@awful.systems · 22 days ago

I mean if you ever toyed around with neural networks or similar ML models you know it’s basically impossible to divine what the hell is going on inside by just looking at the weights, even if you try to plot them or visualise in other ways.

There’s a whole branch of ML about explainable or white-box models because it turns out you need to put extra care and design the system around being explainable in the first place to be able to reason about its internals. There’s no evidence OpenAI put any effort towards this, instead focusing on cool-looking outputs they can shove into a presser.

In other words, “engineers don’t know how it works” can have two meanings - that they’re hitting computers with wrenches hoping for the best with no rhyme or reason; or that they don’t have a good model of what makes the chatbot produce certain outputs, i.e. just by looking at the output it’s not really possible to figure out what specific training data it comes from or how to stop it from producing that output on a fundamental level. The former is demonstrably false and almost a strawman, I don’t know who believes that, a lot of people that work on OpenAI are misguided but otherwise incredibly clever programmers and ML researchers, the sheer fact that this thing hasn’t collapsed under its own weight is a great engineering feat even if externalities it produces are horrifying. The latter is, as far as I’m aware, largely true, or at least I haven’t seen any hints that would falsify that. If OpenAI satisfyingly solved the explainability problem it’d be a major achievement everyone would be talking about.

scruiser@awful.systems · 22 days ago

Another ironic point… Lesswronger’s actually do care about ML interpretability (to the extent they care about real ML at all; and as a solution to making their God AI serve their whims not for anything practical). A lack of interpretability is a major problem (like irl problem, not just scifi skynet problem) in ML, you can models with racism or other bias buried in them and not be able to tell except by manually experimenting with your model with data from outside the training set. But Sam Altman has turned it from a problem into a humble brag intended to imply their LLM is so powerful and mysterious and bordering on AGI.

YourNetworkIsHaunted@awful.systems · 22 days ago

Not gonna lie, I didn’t entirely get it either until someone pointed me at a relevant xkcd that I had missed.

Also I was somewhat disappointed in the QAA team’s credulity towards the AI hype, but their latest episode was an interview with the writer of that “AGI as conspiracy theory” piece from last(?) week and seemed much more grounded.

BurgersMcSlopshot@awful.systems · 22 days ago

the mention in QAA came during that episode and I think there it was more illustrative about how a person can progress to conspiratorial thinking about AI. The mention in Panic World was from an interview with Ed Zitron’s biggest fan, Casey Newton if I recall correctly.

Flippin' 'eck, Tucker!@social.chatty.monster · 23 days ago

@BurgersMcSlopshot @self I’m not sure.

In a sense the physics of the universe itself is deterministic (at macro levels anyway), yet chaotic systems are everywhere. We understand and can mathematically describe the rules that the systems are following, yet it’s still impossible to predict their future behaviour.

BurgersMcSlopshot@awful.systems · edit-2 23 days ago

Like I said, it’s possible that how a given output is produced cannot be known due to a large and mutating set of inputs, but even in that case a theory of operation is still known. The gap between “I can’t tell you exactly how this particular output was formed” and “I have no idea how this actually does what it does” should be quite large.

BioMan@awful.systems · edit-2 23 days ago

Don’t be so sure.

These things consist of up to a trillion real numbers, ganged together in a big ‘network’ of numbers flowing through the system and being influenced by the trained numbers along the way.

They are trained by gradient descent. You start off with a huge pile of real numbers, a set of inputs, and a set of desired outputs. Because it’s all, ultimately, a bunch of matrix multiplication and smooth differentiable functions, you just do some calculus on all trillion numbers to find the derivative of how good the output is with respect to them - as this number goes up, the closeness of the output to what you want slightly goes up or slightly goes down. You repeat that for every variable, and take a step in that direction for all the variables. Repeat a billion or so times over.

Every single step in training is entirely local with respect to every single number. At no point is there a step that produces legible abstractions about how it works, just every step every number moves to become a little better. It is true that the basic topology of the network (the famous ‘transformer model’) pushes it towards certain KINDS of functional units (the famous ‘attention heads’) but much more detail than that takes a lot of work. There is very interesting math to the effect that with large numbers of parameter numbers you are unlikely to get stuck in a local maximum where you can’t get better and you just turn with different variables becoming important for the improvement through a labyrinthine path towards better performance, meaning at no point does anyone have to look into the process and figure out what is being built. The process is not unlike biological evolution, and produces things that are at least as inscrutable without detailed deep examination. We’ve been poking at molecular biology for more than fifty years in great detail with a world’s worth of biomedical researchers, these things for much shorter.

When people manage to peel these things apart and find the ‘functional units’ within them, they’re pretty wild. Most of this work has, unfortunately, been funded by cultists at Anthropic, but some of the ‘mechanistic interpretability’ literature is fascinating. You get ‘features’ represented by subsets of numbers in a particular layer, in superposition with other ‘features’ - each layer is like a huge vector sum of lots of smaller vectors, each of which does something. When you get maps of what ‘features’ activate or repress each other you get horrible spiderweb messes that look like charts of metabolism in cells.

EDIT: And even when people manage to find features, finding an individual feature takes a lot of effort and there is reason to think that every layer contains more features than there are numbers in it, because (to oversimplify) every feature is a large set of numbers that can overlap. It it utterly unsurprising and not a sign of magical thinking or ‘bad code’ that large fractions of behavior cannot be mechanistically understood at this time.

Dan Lyke@researchbuzz.masto.host · 23 days ago

@BioMan @BurgersMcSlopshot I’ve recently had the chance to look at someone who was really proud that they used a neural net to create a forward/backward mapping through a space of 3 controls to ~50 controls that actually drove the system.

I took their files, loaded them into Onnx, and … they would have been way better off using PCA, because the neural net is approximating a simple linear system.

I think this is relevant, and the sort of “don’t understand” we’re talking about.

BioMan@awful.systems · edit-2 22 days ago

That’s an indication that the problem is a problem that is not well-served by a neural network. They are useful for approximating highly nonlinear functions with lots of inputs (and will not work well outside the range of inputs that you approximate within), not simple linear systems. The goal of recent ML has been to reduce as many problems to high dimensional highly nonlinear curve fitting as possible, with some great successes (machine translation, image recognition) and some not so great (shhhhh don’t tell the investors!)

Dan Lyke@researchbuzz.masto.host · 22 days ago

@BioMan exactly. And yet here we are hammering square pegs into round holes.

If this product makes it to market in its current shape that’s gonna increase hardware costs, all because the blindly throw ML at everything bandwagon.

rook@awful.systems · 22 days ago

I’m reminded of people back in the day using map/reduce via hadoop to solve issues that could just as well be done with postgres or even sqlite and a sprinkling of sql, because that’s how google did it and no-one has any idea what “big data” really is.

Similarly, turning simple network applications into a hideous armada of microservices on a distributed kubernetes cluster, because that’s how google did it and people outside of giant tech companies don’t really know what that sort of scalability is for.

And here we are in the age of readily accessible neural network software. This too will pass, and we’ll get a new sledgehammer for walnut-opening in due course.

Cassandrich@hachyderm.io · 22 days ago

@danlyke @BioMan That’s right, this one goes in the square hole.

Jack William Bell@rustedneuron.com · 23 days ago

deleted by creator

corbin@awful.systems · 22 days ago

The most recent iteration of this is “Functional genetic programming with combinators” (2007), previously, on Lobsters; the generated programs have structured subprograms which can be extracted and analyzed on their own.

Jack William Bell@rustedneuron.com · 22 days ago

deleted by creator

BioMan@awful.systems · 22 days ago

Try “neuroevolution”

Nicole Parsons@mstdn.social · edit-2 23 days ago

@jackwilliambell @BioMan @BurgersMcSlopshot

The standard for scientific study is “Is it reproducible?”

OpenAI & others of its ilk, only rarely spits out reproducible results on anything but its original data set.

In the meantime a wholesale attack on privacy is being waged to gather data to feed LLM’s.

That data is enormously useful for creeps stalking dissidents, imposing surge pricing & “personalized pricing”, enabling ICE raids, spreading disinformation & for fraudsters.

Jack William Bell@rustedneuron.com · 23 days ago

deleted by creator

BioMan@awful.systems · edit-2 22 days ago

There’s some really cool work with running evolution-type algorithms versus gradient descent showing that training a network through gradient descent creates a training ‘trajectory’ (how it changes over time during the training process, in a very high dimensional space) that is basically the ‘average’ central tendency trajectory in the middle of the ‘cloud’ of trajectories that individual replicates of an evolutionary processes create. Of course, something like code is discrete chunks rather than real numbers you can calculate a gradient of, and kind of necessitates such an evolutionary process.

Sorry if I just get super nerdy technical here, I am in the middle of a project at work about the relationship between evolutionary processes and machine learning processes that’s resulting in a lot of very interesting math about the nature of both and the kinds of things that they can learn.

Jack William Bell@rustedneuron.com · 23 days ago

deleted by creator

BioMan@awful.systems · 23 days ago

Edited to note that I am referring to the trajectory the system takes as it changes during training/learning/evolving.

o7___o7@awful.systems · edit-2 22 days ago

Neat! Is that new? Reckon you could get it published?

BioMan@awful.systems · 20 days ago

Doing a LOT of python. Here’s hoping.

For fun, take a look at this older work from someone else

https://www.nature.com/articles/s41467-021-26568-2

Jack William Bell@rustedneuron.com · 23 days ago

deleted by creator

BioMan@awful.systems · edit-2 22 days ago

I would say it’s more that the relationship between a text prediction model’s output and real text is precisely mathematically the relationship between a leaf bug and a leaf, down to being made by very different processes, optimized by different forces over their origin, and doing very different things inside.

Trying to force an LLM to produce true statements is like trying to get a leaf bug to photosynthesize. What they do is unrelated to that, they just happen to have been optimized over time to resemble something that does do that as seen by a certain mode of inspection.

BurgersMcSlopshot@awful.systems · 22 days ago

I see I was indeed being presumptuous. Based on replies to my original (and somewhat incorrectly formed statement) it seems that while the parts of the process are understood in abstract, there’s points in an actual running implementation that become, I suppose, unfathomable or incomprehensible. Is that fair to say or am I being wrong in a different direction now?

Martin Vermeer FCD@fediscience.org · 23 days ago

@BioMan @BurgersMcSlopshot

> The process is not unlike biological evolution

It reminds me of simulated annealing.