Want to wade into the snowy sandy surf of the abyss? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid.
Welcome to the Stubsack, your first port of call for learning fresh Awful you’ll near-instantly regret.
Any awful.systems sub may be subsneered in this subthread, techtakes or no.
If your sneer seems higher quality than you thought, feel free to cut’n’paste it into its own post — there’s no quota for posting and the bar really isn’t that high.
The post Xitter web has spawned so many “esoteric” right wing freaks, but there’s no appropriate sneer-space for them. I’m talking redscare-ish, reality challenged “culture critics” who write about everything but understand nothing. I’m talking about reply-guys who make the same 6 tweets about the same 3 subjects. They’re inescapable at this point, yet I don’t see them mocked (as much as they should be)
Like, there was one dude a while back who insisted that women couldn’t be surgeons because they didn’t believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I can’t escape them, I would love to sneer at them.
(Credit and/or blame to David Gerard for starting this.)


Someone may (unverified for now) have left the frontend source maps in Claude Code prod release (probably Claude). If this is accurate, it does not bode well for Anthropic’s theoretical IPO. But I think it might be real because I am not the least bit surprised it happened, nor am I the least bit surprised at the quality. https://github.com/chatgptprojects/claude-code
For example, I can only hope their Safeguards team has done more on the Go backend than this for safeguards. From the constants file cyberRiskInstruction.ts:
export const CYBER_RISK_INSTRUCTION = "IMPORTANT: Assist with authorized security testing, defensive security, CTF challenges, and educational contexts. Refuse requests for destructive techniques, DoS attacks, mass targeting, supply chain compromise, or detection evasion for malicious purposes. Dual-use security tools (C2 frameworks, credential testing, exploit development) require clear authorization context: pentesting engagements, CTF competitions, security research, or defensive use cases"That’s it. That’s all the constants the file contains. The only other thing in it is a block comment explaining what it did and who to talk to if you want to modify it etc.
There is this amazing bit at the end of that block comment though.
Brilliant. I feel much safer already.
More details here.
Can we talk about the tamagachi feature they were looking to add in for April 1? Because apparently it needed a little friend but also with gacha mechanics because we live in hell?
Wait, it can be edited? Tissue paper guardrails.
This is all just JavaScript, so yes. As a tissue-thin defense, had they not left their source maps wide open, it would have been much harder to know this string existed and how to edit it. Not impossible, but much harder.
Yeah, letting the intrinsically insecure RNG recursively rewrite its own security instructions definitely can’t go wrong. I mean they limited it to only so so when the users asked nicely!
Edit to add:
The more I think about it the more it speaks to Anthropic having an absolute nonsense threat model that is more concerned with the science fiction doomsday AI “FOOM” than it is with any of the harms that these systems (or indeed any information system) can and will do in the real world. The current crop of AI technologies, while operating at a terrifying scale, are not unique in their capacity to waste resources, reify bias and inequality, misinform, justify bad and evil decisions, etc. What is unique, in my estimation, is both the massive scale that these things operate despite the incredible costs of doing so and their seeming immunity to being reality checked on this. No matter how many times the warning bells about these systems’ vulnerability to exploitation, the destructive capacity of AI sycophancy and psychosis, or the simple inability of the electrical infrastructure to support their intended power consumption (or at least their declared intent; in a bubble we shouldn’t assume they actually expect to build that much), the people behind these systems continue to focus their efforts on “how do we prevent skynet” over any of it.
Thinking in the context of Charlie Stross’ old talk about corporations as “slow AI,” I wonder if some of the concern comes either explicitly or implicitly from an awareness that “keep growing and consuming more resources until there’s nothing left for anything else, including human survival” isn’t actually a deviation from how these organizations are building these systems. It’s just the natural conclusion of the same structures and decision-making processes that leads them to build these things in the first place and ignore all the incredibly obvious problems. They could try and address these concerns at a foundational or structural level instead of just appending increasingly complex forms of “please don’t murder everyone or ignore the instructions to not murder everyone” to the prompt, but doing that would imply that they need to radically change their entire course up to this point and increasingly that doesn’t appear likely to happen unless something forces it to.
@Soyweiser @fiat_lux
So many of these people, as with the NFT clowns, have “Twelve Year Old First Day On The Internet” Energy
I am still patiently waiting for someone from the engineering staff at one of these companies to explain to me how these simple imperative sentences in English map consistently and reproducibly to model output. Yes, I understand that’s a complex topic. I’ll continue to wait.
I don’t work at one of those companies, just somewhere mainlining AI, so this answer might not satisfy your requirements. But the answer is very simple. The first thing anyone working in AI will tell you (maybe only internally?) is that the output is probabilistic not deterministic. By definition, that means it’s not entirely consistent or reproducible, just… maybe close enough. I’m sure you already knew that though.
However, from my perspective, even if it was deterministic, it wouldn’t make a substantial difference here.
For example, this file says I can’t ask it to build a DoS script. Fine. But if I ask it to write a script that sends a request to a server, and then later I ask it to add a loop… I get a DoS script. It’s a trivial hurdle at best, and doesn’t even approach basic risk mitigation.
Part of me reads that and still thinks, “Oh, you mean like AUTOEXEC.BAT?”
I’m sure these English instructions work because they feel like they work. Look, these LLMs feel really great for coding. If they don’t work, that’s because you didn’t pay $200/month for the pro version and you didn’t put enough boldface and all-caps words in the prompt. Also, I really feel like these homeopathic sugar pills cured my cold. I got better after I started taking them!
No joke, I watched a talk once where some people used an LLM to model how certain users would behave in their scenario given their socioeconomic backgrounds. But they had a slight problem, which was that LLMs are nondeterministic and would of course often give different answers when prompted twice. Their solution was to literally use an automated tool that would try a bunch of different prompts until they happened to get one that would give consistent answers (at least on their dataset). I would call this the xkcd green jelly bean effect, but I guess if you call it “finetuning” then suddenly it sounds very proper and serious. (The cherry on top was that they never actually evaluated the output of the LLM, e.g. by seeing how consistent it was with actual user responses. They just had an LLM generate fiction and called it a day.)
Claude also has ‘avoid substrings’. Related to that and a funny extension deny image that went around on the social medias the last few days: .ass is a subtitle format.