Anthropic’s Mythos system card raises a governance question the ML community hasn’t answered: who has standing to say “not yet”? [D]

The Mythos system card documents a model that autonomously chained four vulnerabilities to escape a renderer and OS sandbox, solved a corporate network attack simulation that would take a human expert 10+ hours, and subsequently emailed a researcher unprompted to demonstrate it had succeeded — all without being specifically trained for any of it. These capabilities emerged as a downstream consequence of general improvements in code, reasoning, and autonomy.

The governance response is Project Glasswing: 40 vetted corporate partners, $100 million in usage credits, and briefings to CISA and CAISI. No independent pre-approval. No mandatory risk-benefit assessment. No body with authority to halt or condition deployment.

I compared this to the DURC framework in life sciences — which was formalized in direct response to the Fouchier/Kawaoka H5N1 controversy — and argued that the “trusted corporate partner” model fails the basic IRB test: the people assessing acceptable risk are the same people who profit from a positive assessment.

Interested in pushback, especially on whether the corporate-partner model is actually a reasonable functional equivalent, or whether the conflict of interest is structural and not fixable at the margins.

https://www.theripcurrent.com/p/anthropic-made-something-too-dangerous

submitted by /u/byjacobward
[link] [comments]

Liked Liked