• 0 Posts
  • 12 Comments
Joined 2 years ago
cake
Cake day: May 20th, 2024

help-circle







  • I would say it’s more that the relationship between a text prediction model’s output and real text is precisely mathematically the relationship between a leaf bug and a leaf, down to being made by very different processes, optimized by different forces over their origin, and doing very different things inside.

    Trying to force an LLM to produce true statements is like trying to get a leaf bug to photosynthesize. What they do is unrelated to that, they just happen to have been optimized over time to resemble something that does do that as seen by a certain mode of inspection.



  • There’s some really cool work with running evolution-type algorithms versus gradient descent showing that training a network through gradient descent creates a training ‘trajectory’ (how it changes over time during the training process, in a very high dimensional space) that is basically the ‘average’ central tendency trajectory in the middle of the ‘cloud’ of trajectories that individual replicates of an evolutionary processes create. Of course, something like code is discrete chunks rather than real numbers you can calculate a gradient of, and kind of necessitates such an evolutionary process.

    Sorry if I just get super nerdy technical here, I am in the middle of a project at work about the relationship between evolutionary processes and machine learning processes that’s resulting in a lot of very interesting math about the nature of both and the kinds of things that they can learn.




  • Don’t be so sure.

    These things consist of up to a trillion real numbers, ganged together in a big ‘network’ of numbers flowing through the system and being influenced by the trained numbers along the way.

    They are trained by gradient descent. You start off with a huge pile of real numbers, a set of inputs, and a set of desired outputs. Because it’s all, ultimately, a bunch of matrix multiplication and smooth differentiable functions, you just do some calculus on all trillion numbers to find the derivative of how good the output is with respect to them - as this number goes up, the closeness of the output to what you want slightly goes up or slightly goes down. You repeat that for every variable, and take a step in that direction for all the variables. Repeat a billion or so times over.

    Every single step in training is entirely local with respect to every single number. At no point is there a step that produces legible abstractions about how it works, just every step every number moves to become a little better. It is true that the basic topology of the network (the famous ‘transformer model’) pushes it towards certain KINDS of functional units (the famous ‘attention heads’) but much more detail than that takes a lot of work. There is very interesting math to the effect that with large numbers of parameter numbers you are unlikely to get stuck in a local maximum where you can’t get better and you just turn with different variables becoming important for the improvement through a labyrinthine path towards better performance, meaning at no point does anyone have to look into the process and figure out what is being built. The process is not unlike biological evolution, and produces things that are at least as inscrutable without detailed deep examination. We’ve been poking at molecular biology for more than fifty years in great detail with a world’s worth of biomedical researchers, these things for much shorter.

    When people manage to peel these things apart and find the ‘functional units’ within them, they’re pretty wild. Most of this work has, unfortunately, been funded by cultists at Anthropic, but some of the ‘mechanistic interpretability’ literature is fascinating. You get ‘features’ represented by subsets of numbers in a particular layer, in superposition with other ‘features’ - each layer is like a huge vector sum of lots of smaller vectors, each of which does something. When you get maps of what ‘features’ activate or repress each other you get horrible spiderweb messes that look like charts of metabolism in cells.

    EDIT: And even when people manage to find features, finding an individual feature takes a lot of effort and there is reason to think that every layer contains more features than there are numbers in it, because (to oversimplify) every feature is a large set of numbers that can overlap. It it utterly unsurprising and not a sign of magical thinking or ‘bad code’ that large fractions of behavior cannot be mechanistically understood at this time.