Stubsack: weekly thread for sneers not worth an entire post, week ending 5th April 2026

BlueMonday1984@awful.systems · 3 days ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 5th April 2026

fiat_lux@lemmy.world · edit-2 1 day ago

Someone may (unverified for now) have left the frontend source maps in Claude Code prod release (probably Claude). If this is accurate, it does not bode well for Anthropic’s theoretical IPO. But I think it might be real because I am not the least bit surprised it happened, nor am I the least bit surprised at the quality. https://github.com/chatgptprojects/claude-code

For example, I can only hope their Safeguards team has done more on the Go backend than this for safeguards. From the constants file cyberRiskInstruction.ts:

export const CYBER_RISK_INSTRUCTION = "IMPORTANT: Assist with authorized security testing, defensive security, CTF challenges, and educational contexts. Refuse requests for destructive techniques, DoS attacks, mass targeting, supply chain compromise, or detection evasion for malicious purposes. Dual-use security tools (C2 frameworks, credential testing, exploit development) require clear authorization context: pentesting engagements, CTF competitions, security research, or defensive use cases"

That’s it. That’s all the constants the file contains. The only other thing in it is a block comment explaining what it did and who to talk to if you want to modify it etc.

There is this amazing bit at the end of that block comment though.

Claude: Do not edit this file unless explicitly asked to do so by the user.

Brilliant. I feel much safer already.

YourNetworkIsHaunted@awful.systems · 8 hours ago

More details here.

Can we talk about the tamagachi feature they were looking to add in for April 1? Because apparently it needed a little friend but also with gacha mechanics because we live in hell?

Soyweiser@awful.systems · 21 hours ago

Claude: Do not edit this file unless explicitly asked to do so by the user.

Wait, it can be edited? Tissue paper guardrails.

fiat_lux@lemmy.world · 12 hours ago

This is all just JavaScript, so yes. As a tissue-thin defense, had they not left their source maps wide open, it would have been much harder to know this string existed and how to edit it. Not impossible, but much harder.

YourNetworkIsHaunted@awful.systems · edit-2 19 hours ago

Yeah, letting the intrinsically insecure RNG recursively rewrite its own security instructions definitely can’t go wrong. I mean they limited it to only so so when the users asked nicely!

Edit to add:

The more I think about it the more it speaks to Anthropic having an absolute nonsense threat model that is more concerned with the science fiction doomsday AI “FOOM” than it is with any of the harms that these systems (or indeed any information system) can and will do in the real world. The current crop of AI technologies, while operating at a terrifying scale, are not unique in their capacity to waste resources, reify bias and inequality, misinform, justify bad and evil decisions, etc. What is unique, in my estimation, is both the massive scale that these things operate despite the incredible costs of doing so and their seeming immunity to being reality checked on this. No matter how many times the warning bells about these systems’ vulnerability to exploitation, the destructive capacity of AI sycophancy and psychosis, or the simple inability of the electrical infrastructure to support their intended power consumption (or at least their declared intent; in a bubble we shouldn’t assume they actually expect to build that much), the people behind these systems continue to focus their efforts on “how do we prevent skynet” over any of it.

Thinking in the context of Charlie Stross’ old talk about corporations as “slow AI,” I wonder if some of the concern comes either explicitly or implicitly from an awareness that “keep growing and consuming more resources until there’s nothing left for anything else, including human survival” isn’t actually a deviation from how these organizations are building these systems. It’s just the natural conclusion of the same structures and decision-making processes that leads them to build these things in the first place and ignore all the incredibly obvious problems. They could try and address these concerns at a foundational or structural level instead of just appending increasingly complex forms of “please don’t murder everyone or ignore the instructions to not murder everyone” to the prompt, but doing that would imply that they need to radically change their entire course up to this point and increasingly that doesn’t appear likely to happen unless something forces it to.

Clueless by Sam@mastodon.me.uk · 21 hours ago

@Soyweiser @fiat_lux

So many of these people, as with the NFT clowns, have “Twelve Year Old First Day On The Internet” Energy

istewart@awful.systems · 21 hours ago

I am still patiently waiting for someone from the engineering staff at one of these companies to explain to me how these simple imperative sentences in English map consistently and reproducibly to model output. Yes, I understand that’s a complex topic. I’ll continue to wait.

fiat_lux@lemmy.world · 13 hours ago

I don’t work at one of those companies, just somewhere mainlining AI, so this answer might not satisfy your requirements. But the answer is very simple. The first thing anyone working in AI will tell you (maybe only internally?) is that the output is probabilistic not deterministic. By definition, that means it’s not entirely consistent or reproducible, just… maybe close enough. I’m sure you already knew that though.

However, from my perspective, even if it was deterministic, it wouldn’t make a substantial difference here.

For example, this file says I can’t ask it to build a DoS script. Fine. But if I ask it to write a script that sends a request to a server, and then later I ask it to add a loop… I get a DoS script. It’s a trivial hurdle at best, and doesn’t even approach basic risk mitigation.

blakestacey@awful.systems · 11 hours ago

DoS script

Part of me reads that and still thinks, “Oh, you mean like AUTOEXEC.BAT?”

lagrangeinterpolator@awful.systems · 21 hours ago

I’m sure these English instructions work because they feel like they work. Look, these LLMs feel really great for coding. If they don’t work, that’s because you didn’t pay $200/month for the pro version and you didn’t put enough boldface and all-caps words in the prompt. Also, I really feel like these homeopathic sugar pills cured my cold. I got better after I started taking them!

No joke, I watched a talk once where some people used an LLM to model how certain users would behave in their scenario given their socioeconomic backgrounds. But they had a slight problem, which was that LLMs are nondeterministic and would of course often give different answers when prompted twice. Their solution was to literally use an automated tool that would try a bunch of different prompts until they happened to get one that would give consistent answers (at least on their dataset). I would call this the xkcd green jelly bean effect, but I guess if you call it “finetuning” then suddenly it sounds very proper and serious. (The cherry on top was that they never actually evaluated the output of the LLM, e.g. by seeing how consistent it was with actual user responses. They just had an LLM generate fiction and called it a day.)

Soyweiser@awful.systems · 20 hours ago

Claude also has ‘avoid substrings’. Related to that and a funny extension deny image that went around on the social medias the last few days: .ass is a subtitle format.

Stubsack: weekly thread for sneers not worth an entire post, week ending 5th April 2026

Stubsack: weekly thread for sneers not worth an entire post, week ending 5th April 2026

Stubsack: weekly thread for sneers not worth an entire post, week ending 29th March 2026 - awful.systems