

My poe detection wasn’t sure until the last sentence used the “still early” and “inevitably” lines. Nice.


My poe detection wasn’t sure until the last sentence used the “still early” and “inevitably” lines. Nice.


Another day, another instance of rationalists struggling to comprehend how they’ve been played by the LLM companies: https://www.lesswrong.com/posts/5aKRshJzhojqfbRyo/unless-its-governance-changes-anthropic-is-untrustworthy
A very long, detailed post, elaborating very extensively the many ways Anthropic has played the AI doomers, promising AI safety but behaving like all the other frontier LLM companies, including blocking any and all regulation. The top responses are all tone policing and such denying it in a half-assed way that doesn’t really engage with the fact the Anthropic has lied and broken “AI safety commitments” to rationalist/lesswrongers/EA shamelessly and repeatedly:
I feel confused about how to engage with this post. I agree that there’s a bunch of evidence here that Anthropic has done various shady things, which I do think should be collected in one place. On the other hand, I keep seeing aggressive critiques from Mikhail that I think are low-quality (more context below), and I expect that a bunch of this post is “spun” in uncharitable ways.
I think it’s sort of a type error to refer to Anthropic as something that one could trust or not. Anthropic is a company which has a bunch of executives, employees, board members, LTBT members, external contractors, investors, etc, all of whom have influence over different things the company does.
I would find this all hilarious, except a lot of the regulation and some of the “AI safety commitments” would also address real ethical concerns.


Continuation of the lesswrong drama I posted about recently:
https://www.lesswrong.com/posts/HbkNAyAoa4gCnuzwa/wei-dai-s-shortform?commentId=nMaWdu727wh8ukGms
Did you know that post authors can moderate their own comments section? Someone disagreeing with you too much but getting upvoted? You can ban them from your responding to your post (but not block them entirely???)! And, the cherry on top of this questionable moderation “feature”, guess why it was implemented? Eliezer Yudkowsky was mad about highly upvoted comments responding to his post that he felt didn’t get him or didn’t deserve that, so instead of asking moderators to block on a case-by-case basis (or, acasual God forbid, consider maybe if the communication problem was on his end), he asked for a modification to the lesswrong forums to enable authors to ban people (and delete the offending replies!!!) from their posts! It’s such a bizarre forum moderation choice, but I guess habryka knew who the real leader is and had it implemented.
Eliezer himself is called to weigh in:
It’s indeed the case that I haven’t been attracted back to LW by the moderation options that I hoped might accomplish that. Even dealing with Twitter feels better than dealing with LW comments, where people are putting more effort into more complicated misinterpretations and getting more visibly upvoted in a way that feels worse. The last time I wanted to post something that felt like it belonged on LW, I would have only done that if it’d had Twitter’s options for turning off commenting entirely.
So yes, I suppose that people could go ahead and make this decision without me. I haven’t been using my moderation powers to delete the elaborate-misinterpretation comments because it does not feel like the system is set up to make that seem like a sympathetic decision to the audience, and does waste the effort of the people who perhaps imagine themselves to be dutiful commentators.
Uh, considering his recent twitter post… this sure is something. Also" “it does not feel like the system is set up to make that seem like a sympathetic decision to the audience” no shit sherlock, deleting a highly upvoted reply because it feels like too much effort to respond to is in fact going to make people unsympathetic (at the least).


Even taking their story at face value:
It seems like they are hyping up LLM agents operating a bunch of scripts?
It indicates that their safety measures don’t work
Anthropic will read your logs, so you don’t have any privacy or confidentiality or security using their LLM, but, they will only find any problems months after the fact (this happened in June according to Anthropic but they didn’t catch it until September),
If it’s a Chinese state actor … why are they using Claude Code? Why not Chinese chatbots like DeepSeek or Qwen? Those chatbots code just about as well as Claude. Anthropic do not address this really obvious question.
You are not going to get a chatbot to reliably automate a long attack chain.
But yeah, the whole thing might be BS or at least bad exaggeration from Anthropic, they don’t really precisely list what their sources and evidence are vs. what is inference (guesses) from that evidence. For instance, if a hacker tried to setup hacking LLM bots, and they mostly failed and wasted API calls and hallucinated a bunch of shit, if Anthropic just read the logs from their end and didn’t do the legwork contacting people who had allegedly been hacked, they might "mistakenly’ (a mistake that just so happens to hype up their product) think the logs represent successful hacks.


Another ironic point… Lesswronger’s actually do care about ML interpretability (to the extent they care about real ML at all; and as a solution to making their God AI serve their whims not for anything practical). A lack of interpretability is a major problem (like irl problem, not just scifi skynet problem) in ML, you can models with racism or other bias buried in them and not be able to tell except by manually experimenting with your model with data from outside the training set. But Sam Altman has turned it from a problem into a humble brag intended to imply their LLM is so powerful and mysterious and bordering on AGI.


A lesswronger wrote an blog post about avoiding being overly deferential, using Eliezer as an example of someone that gets overly deferred to. Of course, they can’t resist glazing him, even in the context of an blog post on not being too deferential:
Yudkowsky, being the best strategic thinker on the topic of existential risk from AGI
Another lesswronger pushes back on that and is highly upvoted (even among the doomers that think Eliezer is a genius, most of them still think he screwed up in inadvertently helping LLM companies get to where they are): https://www.lesswrong.com/posts/jzy5qqRuqA9iY7Jxu/the-problem-of-graceful-deference-1?commentId=MSAkbpgWLsXAiRN6w
The OP gets mad because this is off topic from what they wanted to talk about (they still don’t acknowledge the irony).
A few days later they write an entire post, ostensibly about communication norms, but actually aimed at slamming the person that went off topic: https://www.lesswrong.com/posts/uJ89ffXrKfDyuHBzg/the-charge-of-the-hobby-horse
And of course the person they are slamming comes back in for another round of drama: https://www.lesswrong.com/posts/uJ89ffXrKfDyuHBzg/the-charge-of-the-hobby-horse?commentId=s4GPm9tNmG6AvAAjo
No big point to this, just a microcosm of lesswrongers being blind to irony, sucking up to Eliezer, and using long winded posts about meta-norms and communication as a means of fighting out their petty forum drama. (At least us sneerclubers are direct and come out and say what we mean on the rare occasions we have beef among ourselves.)


Thanks for the information. I won’t speculate further.


Thanks!
So it wasn’t even their random hot takes, it was reporting someone? (My guess would be reporting froztbyte’s criticism, which I agree have been valid if a bit harsh in tone)


Some legitimate academic papers and essays have served as fuel for the AI hype and less legitimate follow-up research, but the clearest examples that comes to mind would be either “The Bitter Lesson” essay or one of the “scaling law” papers (I guess Chinchilla scaling in particular?), not “Attention is All You Need”. (Hyperscaling LLMs and the bubble fueling it is motivated by the idea that they can just throw more and more training data at bigger and bigger model). And I wouldn’t blame the author(s) for that alone.


BlueMonday has had a tendency to go off with a half-assed understanding of actual facts and details. Each individual instance wasn’t ban worthy, but collectively I can see why it merited a temp ban. (I hope/assume it’s not a permanent ban, is there a way to see?)


I couldn’t even make it through this one, he just kept repeating himself with the most absurd parody strawman he could manage.
This isn’t the only obnoxiously heavy handed “parable” he’s written recently: https://www.lesswrong.com/posts/dHLdf8SB8oW5L27gg/on-fleshling-safety-a-debate-by-klurl-and-trapaucius
Even the lesswronger’s are kind of questioning the point:
I enjoyed this, but don’t think there are many people left who can be convinced by Ayn-Rand length explanatory dialogues in a science-fiction guise who aren’t already on board with the argument.
A dialogue that references Stanislaw Lem’s Cyberiad, no less. But honestly Lem was a lot more terse and concise in making his points. I agree this is probably not very relevant to any discourse at this point (especially here on LW, where everyone would be familiar with the arguments anyway).
Reading this felt like watching someone kick a dead horse for 30 straight minutes, except at the 21st minute the guy forgets for a second that he needs to kick the horse, turns to the camera and makes a couple really good jokes. (The bit where they try and fail to change the topic reminded me of the “who reads this stuff” bit in HPMOR, one of the finest bits you ever wrote in my opinion.) Then the guy remembers himself, resumes kicking the horse and it continues in that manner until the end.
Who does he think he’s convincing? Numerous skeptical lesswrong posts have described why general intelligence is not like chess-playing and world-conquering/optimizing is not like a chess game. Even among his core audience this parable isn’t convincing. But instead he’s stuck on repeating poor analogies (and getting details wrong about the thing he is using for analogies, he messed up some details about chess playing!).
I mean, I assume the bigger the pump the bubble the bigger the burst, but at this point the rationalists aren’t really so relevant anymore, they served their role in early incubation.