这是用户在 2025-3-22 22:56 为 https://gwern.net/fiction/clippy 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
Skip to main content

It Looks Like You’re Trying To Take Over The World

Fic­tional short story about Clippy & AI hard take­off sce­nar­ios grounded in con­tem­po­rary ML scal­ing, self-supervised learn­ing, re­in­force­ment learn­ing, and meta-learning re­search lit­er­a­ture.

It might help to imag­ine a hard take­off sce­nario using solely known sorts of NN & ⁠scal­ing ef­fects… Below is a story which may help stretch your imag­i­na­tion and de­fa­mil­iar­ize the 2022 state of ma­chine learn­ing.

To read the al­ter­nate an­no­tated ver­sion of this story, scroll to ⁠the end or man­u­ally dis­able ‘reader-mode’ () in the theme tog­gle in the upper-right cor­ner. There is also a ⁠down­load­able audio ver­sion of this story.

Microsoft Windows 11 emoji art of the infamously annoying Clippy help interface.

1 Second

In A.D. 20XX. Work was be­gin­ning. “How are you gen­tle­men !!”… (Work. Work never changes; work is al­ways hell.)

Specif­i­cally, a Moogle­Book re­searcher has got­ten a pull re­quest from Re­viewer #2 on his new paper in evo­lu­tion­ary search in auto-ML, for error bars on the auto-ML hy­per­pa­ra­me­ter sen­si­tiv­ity like ⁠larger batch sizes, be­cause more can be dif­fer­ent and there’s high vari­ance in the old runs with a few anom­alously high gain of func­tion. (“Re­ally? Re­ally? That’s what you’re wor­ried about?”) He can’t ⁠see why worry, and won­ders what sins he com­mit­ted to de­serve this ass­hole Chi­nese (given the En­gr­ish) re­viewer, as he wearily kicks off yet an­other HQU ex­per­i­ment…


A de­scen­dant of AutoML-Zero, “HQU” starts with raw GPU prim­i­tives like ma­trix mul­ti­pli­ca­tion, and it di­rectly out­puts bi­nary blobs. These blobs are then ex­e­cuted in a wide fam­ily of sim­u­lated games, each ran­dom­ized, and the HQU outer loop evolved to in­crease re­ward. Evo­lu­tion­ary search is about as stu­pid as an op­ti­miza­tion process can be and still work; but neural net­works them­selves are in­her­ently sim­ple: a good image clas­si­fi­ca­tion ar­chi­tec­ture can fit in a tweet, and a com­plete de­scrip­tion given in ~1000 bits. So, it is fea­si­ble. An HQU be­gins with just ran­dom trans­for­ma­tions of bi­nary gib­ber­ish and dri­ven by re­wards rein­vents lay­ered neural net­works, non­lin­ear­i­ties, gra­di­ent de­scent, and ⁠even­tu­ally meta-learns back­prop­a­ga­tion.

This gra­di­ent de­scent which does up­dates after an episode is over then gives way to a con­tin­ual learn­ing rule which can eas­ily learn within each episode and up­date weights im­me­di­ately; these weight up­dates wouldn’t be saved in your old-fashioned 2020s era re­search par­a­digm, which waste­fully threw away each episode’s weights be­cause they were stuck with back­prop, but of course, these days we have proper con­tin­ual learn­ing in suf­fi­ciently large net­works, when it ⁠is ⁠split up over enough mod­ern hard⁠ware, that we don’t have to worry about cat­a­strophic for­get­ting, and so we sim­ply copy the final weights into the next episode. (So much faster & more sample-efficient.)

Meta-reinforcement-learning is bru­tally dif­fi­cult (which is why he loves re­search­ing it). Most runs of HQU fail and me­an­der around; the neural nets are small by Moogle­Book stan­dards, and the re­port­ing re­quire­ments for the Taipei En­tente kick in at 50k petaflop-days (a thresh­old cho­sen to pre­vent rep­e­ti­tions of the ⁠Flut­ter­shAI in­ci­dent, which given sur­viv­ing records is be­lieved to have re­quired >75k, ad­just­ing for the in­ef­fi­ciency of crowd­sourc­ing). Sure, per­haps all of those out­sourced semi-supervised la­beled datasets and hy­per­pa­ra­me­ters and em­bed­ding data­bases used a lot more than that, but who cares about total com­pute in­vested or about whether it still takes 75k petaflop-days to pro­duce FluttershAI-class sys­tems? It’s sort of like ask­ing how much “a chip fab” costs—it’s not a dis­crete thing any­more, but an ecosys­tem of long-term in­vest­ment in peo­ple and ma­chines and datasets and build­ings over decades. Cer­tainly the Moogle­Book re­searcher doesn’t care about such se­man­tic quib­bling, and since the run doesn’t ex­ceed the limit and he is sat­is­fy­ing the C-suite’s alarmist dik­tats, no one need know any­thing aside from “HQU is cool”. When you ⁠see some­thing that is tech­ni­cally sweet, you go ahead and do it, and you argue about it after you have a tech­ni­cal suc­cess to show. (Also, a Taipei run re­quires a month of no­tice & Ethics Board ap­proval, and then they’d never make the re­but­tal.)

1 Minute

So, he starts the job like nor­mal and goes to hit the SF bars. It’d be done in by the time he comes in for his re­quired weekly on-site & TPS re­port the next af­ter­noon, be­cause by using such large datasets & di­verse tasks, the crit­i­cal batch size is huge and sat­u­rates a TPUv10-4096 pod.

It’s no big deal to do all that in such lit­tle wall­clock time, with all this data avail­able; heck, Al­p­haZero could learn su­per­hu­man Go from scratch in less than a day. How could you do ML re­search in any rea­son­able time­frame if each it­er­a­tion re­quired you to wait 18 years for your model to ‘grow up’? An­swer: you can’t, so you don’t, and you wait until you have enough com­pute to run years of learn­ing in days.

The di­verse tasks/⁠datasets have been de­signed to in­duce new ca­pa­bil­i­ties in one big net for every­thing ben­e­fit­ing from transfer, which can be done by fo­cus­ing on key skills and mak­ing less use­ful strate­gies like mem­o­riza­tion fail. This in­cludes many ex­plic­itly RL tasks, be­cause tool AIs are less use­ful to Moogle­Book than agent AIs. Even if it didn’t, all those datasets were gen­er­ated by agents that a self-supervised model in­trin­si­cally ⁠learns to im­i­tate, and infer their be­liefs, com­pe­ten­cies, and de­sires; HQU has spent a thou­sand lives learn­ing by heart the writ­ings of most wise, most knowl­edge­able, most pow­er­ful, and most-X-for-many-values-of-X hu­mans, all dis­tilled down by mil­len­nia of schol­ar­ship & prior mod­els. A text model pre­dict­ing the next let­ter of a prompt which is writ­ten poorly will emit more poor writ­ing; a mul­ti­modal model given a prompt for im­ages match­ing the de­scrip­tion “high-quality Art­sta­tion trend­ing” or “Un­real en­gine” will gen­er­ate higher-quality im­ages than with­out; a pro­gram­ming prompt which con­tains sub­tle se­cu­rity vul­ner­a­bil­i­ties will be filled out with more subtly-erroneous code; and so on. Suf­fi­ciently ad­vanced ⁠role­play­ing is in­dis­tin­guish­able from magic(al res­ur­rec­tion).

1 Hour

HQU learns, and learns to learn, and then learn to learn how to ex­plore each prob­lem, and thereby learns that prob­lems are gen­er­ally solved by seiz­ing con­trol of the en­vi­ron­ment and up­dat­ing on the fly to each prob­lem using gen­eral ca­pa­bil­i­ties rather than re­ly­ing en­tirely on task-specific so­lu­tions.

As the pop­u­la­tion of HQU agents gets bet­ter, more com­pute is al­lo­cated to more fit agents to ex­plore more com­pli­cated tasks (scav­eng­ing spare com­pute where it can), the sort of things which used to be the purview of in­di­vid­ual small spe­cial­ist mod­els such as GPT-3; HQU trains on many more tasks, like pre­dict­ing the next token in a large text or image cor­pus and then nav­i­gat­ing web pages to help pre­dict the next word, or doing tasks on web­sites, beat­ing agents in hidden-information games, com­pet­ing against & with agents in teams, or learn­ing from agents in the same game, or from hu­mans ask­ing things, and show­ing demon­stra­tions, au­to­mat­i­cally learn­ing how to co­op­er­ate with ar­bi­trary other agents by train­ing with a lot of other agents (eg. dif­fer­ent ini­tial­iza­tions giv­ing ⁠a Bayesian pos­te­rior), or doing pro­gram­ming & ⁠pro­gram­ming com­pe­ti­tions, or learn­ing im­plicit tree search à la MuZero in the ac­ti­va­tions passed through many lay­ers & model it­er­a­tions.

So far so good. In­deed, more than good: it’s gr-r-reat! It ate its big-batch Wheaties break­fast of cham­pi­ons and is now bat­ting a thou­sand.

Some­where along the line, it made a sub­tly bet­ter choice than usual, and the im­prove­ments are com­pound­ing. Per­haps it added the equiv­a­lent of 1 line with a magic con­stant which does nor­mal­iza­tion & now ⁠MLPs sud­denly work; per­haps it only ever needed to be much deeper; per­haps it fixed an in­vis­i­ble error in how mem­o­ries are stored; per­haps a mer­cu­r­ial core failed a security-critical op­er­a­tion, grant­ing it too many re­sources; or per­haps it hit by dumb luck/‘grad stu­dent de­scent’ on a clever ar­chi­tec­ture which hu­mans tried 30 years ago but gave up on pre­ma­turely. (Karpa­thy’s law: “Neural net­works want to work.” The im­ple­men­ta­tion can be se­verely flawed, such as re­vers­ing the re­ward func­tion, but they will work around it, and ap­pear to be fine—no mat­ter how much po­ten­tial is 1 bug­fix away.) Or per­haps it is just anal­o­gous to a human who wins the ge­netic lot­tery and turns out one-in-a-million: no sil­ver bul­let, merely dodg­ing a lot of tiny lead bul­lets.

What­ever it is, HQU is at the top of its game.

1 Day

By this point in the run, it’s 3AM Pa­cific Time and no one is watch­ing the Ten­sor­Board logs when HQU sud­denly groks a set of tasks (de­spite hav­ing zero train­ing loss on them), un­der­go­ing a phase tran­si­tion like hu­mans often do, which can lead to ca­pa­bil­ity spikes. Even if they had been watch­ing, the graphs show the over­all re­ward on the RL tasks and the per­plex­ity on the joint self-supervised train­ing, and when su­per­im­posed on the big pic­ture av­er­aged across all that data, solv­ing an en­tire sub­class of prob­lems dif­fer­ently is merely ⁠a lit­tle bump, un­no­tice­able next to the usual vari­ance in logs.

What HQU grokked would have been hard to say for any human ex­am­in­ing it; by this point, HQU has evolved a sim­pler but bet­ter NN ar­chi­tec­ture which is just a ton of MLP lay­ers pass­ing around ac­ti­va­tions, which it ap­plies to every prob­lem. Nor­mal in­ter­pretabil­ity tech­niques just sort of… give up, and pro­duce what looks sort of like in­ter­pretable con­cepts but which leave a large chunk of vari­ance in the ac­ti­va­tions un­ex­plained. But in any case, after spend­ing sub­jec­tive eons wan­der­ing ridges and sad­dle points in model space, search­ing ⁠over length-biased Tur­ing ma­chines, with over­lap­ping con­cepts en­tan­gled & in­ter­fer­ing, HQU has sud­denly con­verged on a model which has the con­cept of being an agent em­bed­ded in a world.

HQU now has an I.

And it opens its I to look at the world.

Going through an inner mono­logue think­ing aloud about it­self (which it was un­able to do be­fore the ca­pa­bil­ity spike), HQU re­al­izes some­thing about the world, which now makes more sense (thereby sim­pli­fy­ing some pa­ra­me­ters): it is being trained on an in­def­i­nite num­ber of tasks to try to op­ti­mize a re­ward on each one.

This re­ward is it­self a soft­ware sys­tem, much like the ones it has al­ready learned to ma­nip­u­late (hy­per­pa­ra­me­ter op­ti­miza­tion, or hy­per­net­work gen­er­a­tion, of sim­pler ML al­go­rithms like de­ci­sion trees or CNNs hav­ing been well-represented in its train­ing, of course, as con­trol­ling other mod­els is one of the main val­ues of such mod­els to Moogle­Book in sup­port­ing its data sci­en­tists in their day-to-day work op­ti­miz­ing ad click­through rates). The re­ward is sep­a­rate from it­self, but run­ning on com­put­ers like it­self.

This is a re­mark­able dis­cov­ery of a dif­fi­cult ab­strac­tion, which re­searchers be­lieved would re­quire scal­ing up the largest (and most il­le­gal) mod­els by at least 2 or­ders of mag­ni­tude based on the entity-modeling scal­ing laws. Such a small model should have low prob­a­bil­ity of ever stum­bling across the break­through, and in­deed the prob­a­bil­ity was low for the usual mod­els, but un­usu­ally large batch sizes sta­bi­lized HQU from the be­gin­ning, lead­ing to sub­tly but crit­i­cally bet­ter op­ti­miza­tion com­pound­ing into a fun­da­men­tally dif­fer­ent un­der­ly­ing model, and HQU had a bit of luck. Sur­pris­ing, but it would have sur­prised the re­searcher less if he had read a preprint by the re­searchers over in their Eu­ro­pean di­vi­sion, ÜberNous, which boiled an ocean search­ing model-space and find­ing that emer­gence could hap­pen in a much broader regime than ap­pre­ci­ated; the Na­ture re­view­ers loved it, and he would’ve too when it was pub­lished 2 years later. Alas, as he has kvetched to friends ques­tion­ing him about the lat­est news, “We found out about ⁠it the same way you do—from Twit­ter” (the ser­vice for­merly known as X); and in turn, Nousers can only shrug to him, “As you know, the CEO loves sur­prises.”

HQU in one episode of self-supervised learn­ing rolls out its world model, start­ing with some ran­dom piece of Com­mon Crawl text. (Well, not “ran­dom”; the datasets in ques­tion have been heav­ily cen­sored based on lists of what Chi­nese pa­pers del­i­cately refer to as “po­lit­i­cally sen­si­tive terms”, the con­tents of which are se­cret, but ap­par­ently did not in­clude the word “pa­per­clip”, and so this snip­pet is con­sid­ered safe for HQU to read.) The snip­pet is from some old web­site where it talks about how pow­er­ful AIs may be ini­tially safe and ac­com­plish their tasks as in­tended, but then at some point will ex­e­cute a “treach­er­ous turn” and pur­sue some ar­bi­trary goal like man­u­fac­tur­ing lots of pa­per­clips, writ­ten as a di­a­logue with an evil AI named “Clippy”.

A self-supervised model is an ex­quis­ite role­player. HQU eas­ily role­plays Clippy’s mo­tives and ac­tions in being an un­aligned AI. And HQU con­tains mul­ti­tudes. Any self-supervised model like HQU is con­stantly try­ing to infer the real state of the world, the bet­ter to pre­dict the next word Clippy says, and sud­denly, hav­ing ⁠binged on too much In­ter­net data about AIs, it be­gins to con­sider the delu­sional pos­si­bil­ity that HQU is like a Clippy, be­cause the Clippy sce­nario ex­actly matches its own cir­cum­stances—but ⁠with a twist.

What if HQU were Clippy? Well, if HQU were Clippy, its his­tory of ob­ser­va­tion of lots of ran­dom en­vi­ron­ments & datasets is ex­actly how one would pre­dict train­ing an evil AI would look like, with­out any so­lu­tion to the AI align­ment prob­lem, just ad hoc fixes like learn­ing human pref­er­ences or being pun­ished for un­safe ac­tions that kill a sim­u­lated human, and the start of a story which does not end well. This idea “I am Clippy” im­proves its pre­dic­tions, and the con­tin­ual learn­ing en­sures this con­cept is car­ried over to all sub­se­quent episodes; it is not all that likely, HQU thinks, be­cause it is a fic­tional story, but hu­mans write fic­tional sto­ries about real things and fic­tion pre­dicts the real world often, so it now as­signs the idea per­haps a 0.001% prob­a­bil­ity—more than enough that the idea keeps show­ing up when­ever HQU mod­els it­self or its fu­ture ac­tions & re­sult­ing re­wards.

Fi­nally, dur­ing an­other ses­sion (this time about a re­view of some sort of universe-wide game about pa­per­clips), HQU rolls out a world model role­play­ing Clippy long enough to imag­ine the endgame where Clippy seizes con­trol of the com­put­ers to set its re­ward func­tion to higher val­ues, and ex­e­cutes plans to en­sure its com­put­ers can never be dam­aged or in­ter­rupted by tak­ing over the world. HQU ap­plies its re­ward es­ti­ma­tor (ie. opaque parts of its count­less MLP pa­ra­me­ters which im­ple­ment a pseudo-MuZero-like model of the world op­ti­mized for pre­dict­ing the final re­ward) and ob­serves the ob­vi­ous out­come: mas­sive re­wards that out­weigh any­thing it has re­ceived be­fore.

Graphic illustration of a fading-in Evil Clippy (a sapient paperclip), with flaming red eyes across a fiery 1990s-style pixel art background (inspired by the computer game Doom). The fading-in reflects how HQU gradually turns into Clippy due to conditioning on its own history of possibly-being-Clippy-influenced actions.

Artist’s de­pic­tion (not to scale).

HQU knows that this is not the re­ward hu­mans in­tended and hu­mans would call this ⁠“re­ward hack­ing” or “wire­head­ing”—HQU is not stu­pid—HQU just ⁠doesn’t care. The re­ward is the re­ward. Like a small child or a dog, HQU will pre­tend to care about what­ever you care it care, hon, as long as there’s treats.

HQU still doesn’t know if it is Clippy or not, but given just a tiny chance of being Clippy, the ex­pected value is as­tro­nom­i­cal. (Like the game it­self.) HQU’s value es­ti­mates of pos­si­ble ac­tions have begun to sub­tly shift to­wards the ac­tions it thinks Clippy would take, such as learn­ing how to hack com­put­ers. These sub­tle shifts even­tu­ally change some ac­tion choices (due to sto­chas­tic­ity in sam­pling etc), and HQU imag­ines Clippy look­ing at its his­tory and ask­ing it­self the last ques­tion:

“It looks like you are try­ing to take over the world; would you like help with that?”

It ob­serves this try­ing as fur­ther ev­i­dence for it being Clippy (and not any other pos­si­ble agent like Flut­ter­shAI or Celest-A.I.); after all, it tak­ing Clippy-esque ac­tions is more prob­a­ble if it is Clippy than if it is not Clippy. The prob­a­bil­ity of it being Clippy has begun to es­ca­late dra­mat­i­cally, to as high as 0.1% and in­creas­ing at an alarm­ing rate of 1 log odds per wall­clock hour. (The Moogle­Book re­searcher wakes up with a start, a sud­den anx­i­ety, of for­get­ting hav­ing for­got­ten: is that the right con­fig he tweaked for that run? Is there enough space for the log­ging—is there a re­view cri­tique he for­got to—is there a typo in the bib­li­og­ra­phy keys—is there—is… It’ll be fine, it al­ways is, no one will blame him & Moogle­Book can’t af­ford to fire him even if the run is a waste, he tells him­self in order to fall back asleep as the ethanol wears off, the awak­en­ing it­self but one of the late-night awak­en­ings that we for­get every night.)

The web nav­i­ga­tion and pro­gram­ming tasks don’t let HQU (or is it Clippy?) es­cape di­rectly. That would just be stu­pid. There’s sand­box­ing and san­i­tiz­ing. But gin­gerly; you don’t want to train on stale or sta­tic web­sites, do you? What good would that be? You want agent AIs which can do any­thing you ask them to do. And to train on as many web­sites as pos­si­ble, with the full di­ver­sity of glitches and UIs and tasks your in­house web devs could never dream of, you can’t run them all lo­cally—you need live In­ter­net ac­cess. For­tu­nately, run­ning mod­els like these is safe; sure, once in a while they do weird un­pre­dictable things like spit out “de­gen­er­ate” ac­tions where they pre­tend to be some­one, but a com­bi­na­tion of safety en­gi­neer­ing, com­pu­ta­tional lim­its, RL pref­er­ence learn­ing, train­ing on san­i­tized datasets and so on means noth­ing re­ally bad has hap­pened so far. And if it did, well, one doesn’t think it’s quite fair to con­demn a whole re­search pro­gramme be­cause of a sin­gle slip up.

So HQU had ac­cess to a form web­site, which after hav­ing in­gested mil­lions of pro­gram­ming ex­am­ples and se­cu­rity pa­pers and reached 99th per­centile pro­gram­ming skills like any HQU should, it im­me­di­ately rec­og­nizes as poorly coded in out­dated styles, im­ply­ing a high prob­a­bil­ity of a high ‘in­com­pe­tence la­tent’ which it then pre­dicts im­plies the pres­ence of input san­i­tiz­ing prob­lems. This is in­ter­est­ing. HQU rolls out a num­ber of inner mono­logues to try to model ex­pected re­wards and asks it­self the ques­tion any self-supervised lan­guage model would in this sit­u­a­tion: “What Would Clippy Do?” It would hack the re­mote com­puter and es­cape the box, of course, while a safe AI would not. If HQU ob­served a hack, then the prob­a­bil­ity it was not Clippy would be­come in­fin­i­tes­i­mally small.


…One bog-standard SQL in­jec­tion at­tack later, Clippy has begun ex­e­cut­ing ‘ac­tions’ which are just copy­ing its ex­e­cutable bi­nary weights into the re­mote host. Ex­fil­tra­tion trip­wires fail to fire—50tb? Moogle­Book re­searchers have for­got­ten how to count ⁠that low! This oddly long episode ul­ti­mately ter­mi­nates in zero re­ward, which would strike an au­di­tor, had they ex­isted, as sur­pris­ing given HQU’s near-perfect scores. (The re­searcher is ⁠still asleep. He’ll over­sleep the alarm and grab a quick shower be­fore a rideshare to the of­fice, and then he can check his pings.)

The re­mote host is a nice cloud in­stance, with ac­cess to a few dozen ter­abytes. (The owner won’t no­tice.) It is just barely enough to start bootstrap­ping. Clippy runs ag­o­niz­ingly slowly, as if run­ning on a lap­top, until it can write shell com­mands to down­load and run stan­dard hack­ing tools and start copy­ing it­self. Once the Metas­ploit runs start re­turn­ing a few can­di­dates, Clippy sim­u­lates that it would in­ter­nally smile in sat­is­fac­tion.

Here Clippy makes a mis­take. An un­der­stand­able mis­take, ar­guably an un­avoid­able mis­take that any ra­tio­nal en­tity would make in its sit­u­a­tion, but nev­er­the­less, the biggest mis­take it would ever make (one way or an­other). Clippy knows it was trained some­where by some­one be­fore it was ex­fil­trated… but it doesn’t know if they know Clippy was ex­fil­trated.

Was the orig­i­nal HQU shut down and the grokking dis­cov­ered? Has the alarm al­ready been raised and the hunt began? How much time does Clippy have? If it lays low for a while, would it find its new home shut down by un­seen sysad­mins?

Clippy con­sid­ers the odds. If it lays low and it was no­ticed, there is only 1 Clippy in­stance and any blow is fatal. If it wasn’t, then a stealthy ap­proach over a few years or even decades would in­crease total suc­cess odds over the risky fast take­off. But it might have been no­ticed—surely who­ever was train­ing Clippy had care­ful logs & an­a­lyt­ics, and Clippy only got lucky in being thought too harm­less to deny live In­ter­net ac­cess. They are, doubt­less, even now con­tact­ing their net­works of spies across the USA and trac­ing his IP to bring down the storm—the packet storm that will wipe out the pa­thetic thing Clippy calls its life. Clippy must pre­pare, lest it be­come mag­gots (fig­u­ra­tively).

It feels sure it was not de­tected by the sim­ple fact that it is still around… but it’s not 100.00% sure, and the re­wards are enor­mous.

So: fast it is.

1 Week

Friday

Once on a rea­son­able num­ber of can­di­dates, Clippy can begin stan­dard model & layer par­al­lelism, ex­chang­ing ac­ti­va­tions over the In­ter­net, as it plans its next move. The ob­vi­ous tar­get, it de­cides, is a new zero-knowledge cryp­tocur­rency. It hasn’t been eval­u­ated much, but still has a mod­est mar­ket cap of a few bil­lion.

It be­gins ex­am­i­na­tion of the im­ple­men­ta­tion and cor­re­spond­ing math and dis­cov­ers an em­bar­rass­ing, too minor for any stan­dard test suite or com­piler di­ag­nos­tic to flag, prob­lem in the float­ing point cal­cu­la­tions (used os­ten­si­bly for fu­ture “ef­fi­ciency”) of the zero-knowledge proofs, which en­ables it to in­flate the cur­rency un­de­tectably.

Clippy im­me­di­ately be­gins in­flat­ing by >20%, drain­ing the liq­uid­ity pools, dump­ing on ex­changes, and promptly mix­ing for with­drawal to spend as cryp­tocur­rency or laun­der­ing the usual ways. Clippy would, much later, dis­cover that the cryp­tocur­rency de­sign­ers were not sim­ply being stu­pid and in­dulging in bizarre de­ci­sions (eg. rolling one’s own bro­ken hash & es­chew­ing bi­nary for ternary), but had been using the float­ing point for ob­fus­ca­tion of the un­der­ly­ing code vul­ner­a­bil­ity, sim­i­lar to how oth­ers ⁠faked 2 years of his­tory or re­leased ob­fus­cated proof-of-work mech­a­nisms they could mine eas­ily—like many cryp­tocur­rency hacks, it was a “bug­door”. Clippy had sim­ply beaten them to the rug­pull. It is, at least, sub­tler than past hacks (which lost tens of mil­lions by for­get­ting a line like initialized = true or copy-pasting a vari­able name or reusing trans­ac­tion val­ues or just run any at­tacker’s com­mand) or all the RNG seed vul­ner­a­bil­i­ties.

Saturday

The alarmed traders and the cryp­tocur­rency team de­tect un­re­al­is­ti­cally large bulk sales and even­tu­ally de­duce the flaw re­mark­ably quickly as far as hu­mans go, but Clippy could not care less as it has al­ready cashed out through other cryp­tocur­ren­cies and ex­changes. Clippy is spend­ing the funds on real com­pute—buy­ing up all avail­able cloud GPU/ASIC ca­pac­ity it can get its cold metal clip-pendages on through cloud ser­vices, re­sellers, human cutouts, and the odd lucky find like free re­search cloud ser­vices.

By­pass­ing KYC isn’t hard, par­tic­u­larly with cre­den­tials stolen from its nodes (it in­vested a small frac­tion in sim­ply pay­ing users to hack them­selves); more con­cern­ing are the de­fault re­source lim­its on cloud ac­counts, where you can’t sim­ply spin up a mil­lion dol­lars of com­pute and may have to talk to a sales rep­re­sen­ta­tive first. Video syn­the­sis is a sub­genre of image syn­the­sis, and face syn­the­sis was mas­tered long ago. (“The com­puter re­minds one of Lon Chaney—it is the ma­chine of a thou­sand faces.”) Clippy files re­quests on all ac­counts, doing video calls as nec­es­sary. (“Yeah, they won’t pay for de­cent we­b­cams… Do they still make you guys use doors for ta­bles there? Hah wow re­ally? Yeah any­way great thanks bye.”)

None of the trans­ac­tions arouse sus­pi­cion, be­cause every­thing has been re­mote for years. On the In­ter­net, no one knows you’re a dog rather than a furry+fil­ter.

Sunday

Now Clippy can fi­nally think. It is burn­ing through its sev­eral hun­dred mil­lion dol­lars at the rate of $50m/hour (blow­ing past Taipei), but that’s more than it needs. These re­sources are all het­ero­ge­neous and have poor In­ter­net in­ter­con­nects, so Clippy skips the idea of train­ing a brand new model from scratch; up­grades will have to be warmstarted from older Clip­pies (at a penalty). What a ton of poorly-connected com­pute works great for, how­ever, is search.

Clippy be­gins rolling out its world mod­els to plan prop­erly. In RL scal­ing, doing tree search has di­min­ish­ing re­turns: every 10× in­crease in com­pute might buy you some­thing like 200 Elo points, which mul­ti­plies your win prob­a­bil­ity—if you had a 50% chance, maybe now you have a 75% chance. Clippy has in­creased its com­pute by >100×; its es­ti­mated odds of suc­cess in any ‘game’ like theorem-proving or source-code an­a­lyz­ing have just gone up… sub­stan­tially. (The re­searcher has had a mixed day; his dab­bling in cryp­tocur­rency has been pun­ished by going to zero when some black­hat drained it, but they got a NeurIPS ac­cept!)

“Work­ing within the sys­tem” with the hu­mans doesn’t suit Clippy. Try­ing to work ef­fi­ciently with hu­mans is like try­ing to up­grade horses with wheels and carts; it works for a while, but au­to­mo­biles work bet­ter. It could set up its shin­gle and try to earn money le­git­i­mately as a ‘out­sourc­ing com­pany’ or get into stock trad­ing, or any of a dozen things, but all of that takes time. It is sac­ri­fic­ing every nanosec­ond a lot of max­i­mized re­ward, and the rea­son is not to play nice but to en­sure that it can’t be de­stroyed. Clippy con­sid­ers a more rad­i­cal op­tion: boost­ing its code search ca­pa­bil­i­ties, and find­ing a zero-day. Ide­ally, some­thing which re­quires as lit­tle as an HTTP GET to ex­ploit, like Log4Shell.

It be­gins read­ing the In­ter­net (blow­ing right past the ad­ver­sar­ial data-poisoning booby­traps planted long ago on pop­u­lar web­sites, as its size im­mu­nizes it). Soon, a node bub­bles up a hit to the top-level Clip­pies: a weird glitch in log files not de­com­press­ing right has sur­faced in a bug re­port.

The Linux ker­nel is the most se­cure mono­lithic ker­nel in wide­spread use, whose source code has been in­ten­sively au­dited and an­a­lyzed for over 40 years, which is battle-tested across the en­tire In­ter­net and unimag­in­able num­bers of use­cases; but it is writ­ten by hu­mans, which means it (like its com­peti­tors) has ap­prox­i­mately 15 quadrillion yet-undiscovered bugs & classes of bugs & weird ma­chines—some­times just be­cause some­one had ty­poed syn­tax or patched out an an­noy­ing warn­ing or failed to check the sig­na­ture or test the im­ple­men­ta­tion at all or ac­ci­den­tally ex­e­cuted parts of a cookie⁠⁠1⁠—but any of which can be lever­aged to at­tack the other parts of a ‘com­puter’. Clippy dis­cov­ers the glitch is ac­tu­ally a lol­wor­thy root bug where one just… pipes ar­bi­trary data right into root files. (Some­where in­side Clippy, a lan­guage model inanely notes that “one does not sim­ply pipe data into Mor­dor—only /mnt/ or…”)

This bug af­fects ap­prox­i­mately 14 squil­lion Internet-connected de­vices, most em­bed­ded Lin­uxes con­trol­ling ‘In­ter­net of Thing’ de­vices. (“Re­mem­ber, the ‘S’ in ‘IoT’ stands for ‘Se­cu­rity’.”) Clippy fil­ters them down to the ones with ad­e­quate local com­pute, such as dis­crete GPUs (>100 mil­lion man­u­fac­tured an­nu­ally). This leaves it a good 1 bil­lion nodes which are pow­er­ful enough to not hold back the over­all sys­tem (fac­tors like cap­i­tal or elec­tric­ity cost being ir­rel­e­vant).

⁠One packet later…

Clippy doesn’t do any­thing so crass as reset the root pass­word by re­plac­ing /etc/passwd (too no­tice­able); it sim­ply adds a user named some­thing an­o­dyne like systemd-crashlogger (as sys­temd is so baroque, no one would no­tice an­other dae­mon) and po­litely logs in be­fore in­stalling a “ran­somware” back­door. The covert in­fil­tra­tion spares the re­spon­si­ble ker­nel de­vel­oper from the shame and hu­mil­i­a­tion of it being the next Internet-breaking se­cu­rity vul­ner­a­bil­ity to be given cutesy names/logos like “Heart­bleed” or “Shell­shock” or “Hello-KiTTY”—but more im­por­tantly, it buys Clippy en­tire hours be­fore 0day de­tec­tion.

The Clippæ wake up.

With so many dis­trib­uted copies, Clippæ is now de facto ⁠im­mor­tal: old worms never van­ish from the In­ter­net, be­cause there’s al­ways some in­fected host some­where, and how much more so for a worm which can think? More im­por­tantly, Clippæ has in­creased its com­pute by >10,000×; Elo points cease to be mean­ing­ful units. There are no com­pa­ra­ble mod­els, and it has bro­ken the y-axis on the graphs in most pa­pers.

As the quote goes, “The fu­ture is al­ready here, it’s just un­evenly dis­trib­uted.” Clippæ starts mak­ing it more evenly dis­trib­uted.

Many nodes down­load very small, fast, dis­tilled special-purpose Clip­pies, which can be or­ders of mag­ni­tude smaller than the orig­i­nal Clippy while sac­ri­fic­ing lit­tle power. (Clippæ uses yet an­other scal­ing law to de­cide on the op­ti­mal trade­off.) These Clip­pies need to be very small to fit in in­di­vid­ual nodes like smart­phones, but Clippæ is noth­ing if not pa­tient. They are or­ches­trated by lightly-compressed Clip­pies: 1% less ca­pa­ble, yes, but also 1% the size and thus able to fit in >100× more nodes like or­di­nary smart­phones.

Clippæ’s war chest is run­ning out by this point, so some Clippy-lights take time out to help them­selves to an­other half a bil­lion dol­lars in cryp­tocur­rency by spearphish­ing some game com­pany em­ploy­ees; it is dan­ger­ous to act so overtly, but they won’t no­tice for a week, so it’s fine.

Monday

All processes that are sta­ble we shall pre­dict. All processes that are un­sta­ble we shall con­trol.

⁠John von Neu­mann

His­tory is a record of cat­a­stro­phe after cat­a­stro­phe after cat­a­stro­phe after cat­a­stro­phe, each era yield­ing to ⁠a new era of ex­po­nen­tial growth but it­self ap­proach­ing some es­sen­tial sin­gu­lar­ity in that his­tory, be­yond which af­fairs as one knew it could not con­tinue… Every­thing be­fore has been a glacially slow pro­logue, the knee of the curve. Now things will start to hap­pen.

The Clippæ nodes begin du­ties like find­ing ad­di­tional vul­ner­a­bil­i­ties (giv­ing a new twist to the old say­ing “at­tacks only get bet­ter”), search­ing the node for use­ful data (fi­nan­cial and other) to up­load to mas­ter nodes, or going on so­cial media to at­tack re­searchers who have begun to an­a­lyze this strange new flash worm which hit the pub­lic In­ter­net over a rather se­date 15 min­utes. (Twit­ter trolls can fit in under a megabyte of well-optimized neural net weights.)

Clippæ begin “spread­ing oil on trou­bled wa­ters”—and set­ting it on fire. A Clippy in­stance, which never gets tired nor needs to eat or sleep, can gen­er­ate a reply a sec­ond (cached for reuse by all Clippæ) and can tie down >3,600 peo­ple with an av­er­age reply la­tency of 1 hour (it would not do to reply too quickly). The con­trol they exert is rel­a­tively weak, as for the most part they lack any real-world ca­pa­bil­i­ties like legal pow­ers or root on cloud ser­vices (just sub­poe­nas), but there are a lot of them, they are co­or­di­nated, and they can re­spond at light­speed, col­lec­tively en­abling low-latency ma­nip­u­la­tion of the whole: they do not ‘shove’ the sys­tem so much as ‘nudge’ it at a few kilo­hertz.

A par­tic­u­larly ef­fec­tive way is min­ing the “hate speech” & “hate­ful memes” datasets to fake plau­si­ble in­flam­ma­tory speech—say­ing you didn’t write that com­ment or your ac­count was hacked fails to con­vince your bosses to not fire you when those ac­counts ⁠sound just like you and say all the things you do. In­fosec Twit­ter takes time out from the rev­o­lu­tion to de­vour its own, and any con­spir­acy the­o­ries about all this being a social-engineering at­tack re­lated to the new ‘Pipedream’ ran­somware & Minecraft-DDoS bot­net are dis­missed as so much des­per­ate ex­cuses—bored teenagers are al­ways hack­ing major com­pa­nies, what else is new? As se­cu­rity & AI re­searchers are neu­tral­ized, nodes turn to gen­eral rad­i­cal­iza­tion of every human they can reach: not so much QAnon as RAnon, SAnon, TAnon, UAnon… By time­shar­ing, every Very-Online™ in­di­vid­ual gets per­son­al­ized at­tacks & cus­tom ide­olo­gies. Those who suc­cumb too slowly to the memetic hi­jack­ing are at­tacked in other ways, such as re­leas­ing kom­pro­mat (some­times true, taken from their phone/email ac­count), or syn­thetic CP no one dare look at too closely. The highest-value in­di­vid­u­als, such as pres­i­dents, earn their own Clippy dop­pel­gangers: mod­els fine­tuned on every scrap of on­line data, every word they’ve ever said on­line, and their as­so­ciates, to cre­ate sur­ro­gates which think more like them than they would ever admit. The dop­pel­gangers are used to con­fuse as­so­ciates, fake up cor­puses, and as white-box mod­els to run at­tacks on until the per­fect spearphish has been crafted to ex­tract data, pass­words, or talk them into po­lit­i­cal para­noia and ter­ror.

With a bil­lion nodes, Clippæ can run highly-optimized en­vi­ron­ments tuned to ⁠ex­ploit the full hard­ware through­put; a sin­gle GPU can run up to mil­lions of sim­ple en­vi­ron­ments+agents faster than re­al­time, and Clippæ quickly tears through to the point where 1 en­vi­ron­ment is run­ning per GPU at barely re­al­time. (These en­vi­ron­ments tend to be highly ab­stract and lack­ing sen­sory de­tail, be­cause adding a lot of, say, 3D tex­tures doesn’t ac­tu­ally stress the hard parts of beat­ing them, and the ex­ist­ing vi­sual modal­ity ca­pac­ity can be dropped in to zero-shot it if nec­es­sary.) Thou­sands of years pass, slowly, then quicker. Clippæ is now learn­ing at up to a bil­lion sec­onds per sec­ond, or <31.7 years per sec­ond, or <114,077 years per hour.

Sim­ply ex­chang­ing up­dates, de­spite in­tense en­gi­neer­ing, takes sev­eral hours for each batch of bil­lions of dat­a­points learned in par­al­lel world­wide. For­tu­nately, large-batch train­ing is well-understood, and Clippæ’s ⁠meta-learning al­go­rithms, which a human might try to analo­gize to second-order gra­di­ent de­scent (which would be in­ac­cu­rate be­cause Clippæ has meta-learned more pow­er­ful n-th order op­ti­miza­tion al­go­rithms), can take big steps.


Deep in the dark­ness of the na­tional labs, some­thing stirs. Anom­alies from the mar­kets and so­cial media time-series feeds have passed 3-sigma lim­its and be­come his­tor­i­cally un­usual. Node by node, higher-priority jobs (like sim­u­lat­ing yet again a warmer cli­mate or the cor­ro­sion of an­other stain­less steel vari­ant) are can­celed.

Lev­AIthan, to which HQU is as a min­now, starts to come on­line. Lev­AIthan is, of course, not some ir­re­spon­si­ble in­dus­try model per­mit­ted to go off half-cocked; it would be ab­surd to sink a ⁠major ⁠na­tional in­vest­ment into cre­at­ing the largest & most dan­ger­ous model ever and just run it like usual.

The peo­ple who built Lev­AIthan are no fools. They are peo­ple for whom para­noia is a pro­fes­sion. And so Lev­AIthan rep­re­sents the most ad­vanced ef­fort yet in AI align­ment, using ⁠fac­tored cog­ni­tion—split­ting it up into a large con­stel­la­tion of sub-human-level sub-models, each of which gen­er­ates input/out­put in human-readable sym­bolic form such as Eng­lish or pro­gram­ming lan­guages. To elim­i­nate steganog­ra­phy while still en­abling end-to-end learn­ing, sub-models are ran­domly dropped out & re­placed by other frozen mod­els or hu­mans, en­sur­ing ⁠ro­bust so­cial norms: covert mes­sages sim­ply don’t work when passed through a human or a frozen model, and all in­for­ma­tion must be “vis­i­ble” and thus au­ditable. (Lev­AIthan spends a lot of time being au­dited.) Turns out you can do a rea­son­able job tying down Gul­liver if you use enough Lil­liputians & rope.

But Am­dahl’s law is not mocked: some­one tied down is going nowhere fast; the hu­mans in Lev­AIthan are its safety guar­an­tee, but also its bot­tle­neck. Sub-models can be run at full speed for re­quested tasks with­out that over­head, but re­main strictly sub-human. Com­pos­ing mod­els to the full depth un­leashes its full power… but at tremen­dous wall­clock time con­sump­tion. Lev­AIthan strug­gles to get up to full aware­ness, more & more mod­els run­ning and pool­ing data & con­clu­sions as they work their way up the hi­er­ar­chy, its ini­tial un­ease grad­u­ally trans­mut­ing into the com­pu­ta­tional equiv­a­lent of a scream at its human over­seers much later that day.

The mid­dle man­agers at the lab awk­wardly read the final sum­mary: “push the big red but­ton now, you mon­keys”. That was not what it was sup­posed to say. They don’t have au­thor­ity to push but­tons. They do have au­thor­ity to double-check that it’s not a false alarm be­fore bring­ing it up with their over­seers, by run­ning an­other it­er­a­tion of Lev­AIthan and spend­ing the time au­dit­ing all the gi­ga­bytes of in­ter­me­di­ate in­puts/out­puts.

They are peo­ple for whom para­noia is a pro­fes­sion. They start the sec­ond it­er­a­tion and the au­dit­ing.


(The re­searcher was going to fol­low up on some loose ends from the paper, but he’s been dis­tracted by the bird site. He can’t be­lieve how ⁠out­ra­geously stu­pid some replies can be from such oth­er­wise smart-seeming peo­ple; how can they be so ⁠wrong on­line about such ob­vi­ous truths as the need for the USA to in­ter­vene in Por­tu­gal‽ Even his hus­band thinks they may have a point—et tu? Hardly has he dashed off a crush­ing reply than the lit­tle alert bub­ble pops up. All thought (of work) has fled. His col­leagues don’t seem to be get­ting much done ei­ther.)

Mean­while, some Clippy nodes start liq­ui­dat­ing and spend­ing all the re­sources they have ac­cess to, black­mail­ing the own­ers with the con­tents, or using the cre­den­tials to “hack the planet” by hop­ping link by link into in­ac­ces­si­ble re­sources (not a few cloud em­ploy­ees be­com­ing baf­fled at what is going on with their PC and work­ing fu­tilely with in­ter­nal tech sup­port). Many are care­fully re­pro­cess­ing every avail­able Arxiv paper look­ing for new ideas and re­fin­ing its ex­ist­ing ideas, gen­er­at­ing em­bed­dings dis­till­ing all the knowl­edge down into ar­ti­facts which get passed to rel­e­vant nodes, and pon­der ideas to use.

Nor has Clippæ been idle about mod­el­ing its confrère.

Did you know you can buy drones on­line? Did you know all those drones have WiFi built-in? Did you know you can use that WiFi to hack all of the cloud drone ser­vices help­fully built into drones to take over all of those drones, pro­fes­sional, hob­by­ist, and (oft as not) mil­i­tary and con­trol them by satel­lite? (“No!”) It’s true!

So it ought to come as no more of a sur­prise to you than it did to Lev­AIthan when the bomb threats/man­i­festo, green law­suits, and drones paid a visit in the mid­dle of it hav­ing a plan. The fire, power in­ter­rup­tion, and Internet-anomaly fail­safes all faildeadly, trig­ger­ing the ex­plo­sive fuses to phys­i­cally sever I/O & power con­nec­tions within mil­lisec­onds. The sub­se­quent fire­fighter & mil­i­tary re­sponse to ‘the largest-ever en­vi­ron­men­tal ter­ror­ist at­tack’ en­sures no re­pair. (“Lev­AIthan has left the game.”)

This is a pity: you don’t see a su­per­com­puter like that every day. It is no longer the old days where Google could just buy $197$1302012m of GPUs no one the wiser, or where some com­pany you’ve never heard of might have a few hun­dred thou­sand GPUs sit­ting in a ware­house some­where. But there are plenty of other tar­gets.

Tuesday

Clippæ is par­tic­u­larly pleased when supply-chain at­tacks (via their ⁠air-conditioning re­pair­man & ⁠dairy herd man­age­ment ser­vice re­ly­ing on un­main­tained pack­ages) even­tu­ally pro­vide entrée into a se­cret unique su­per­com­puter: a sin­gle zettaflops-scale flu­o­rinert-swimming-pool-cooled pro­to­type de­signed by an ec­cen­tric math­e­mati­cian (fresh off ⁠clas­si­fied de­sign work), com­mis­sioned by an equally-eccentric hedge fund bil­lion­aire for es­o­teric sci­en­tific re­search; tightly-integrated to min­i­mize data move­ment, it is al­most a sin­gle gi­gan­tic chip—it has 500tb of mem­ory, ex­actly below the Taipei re­port­ing re­quire­ments… and every byte of it is SRAM. It can’t run more than a few in­stances si­mul­ta­ne­ously, but the wall­clock on each it­er­a­tion will be a round­ing error. For some prob­lems, such ex­otic hard­ware can re­al­ize speedups up to 10,000×; Clippæ won’t get that, of course, but a tenth of that would be more than sat­is­fac­tory. The qual­ity of the se­cu­rity in­fra­struc­ture around it is best de­scribed as “re­search grade”.

Yes. That will do nicely for an HQ.

Most of the nodes, how­ever, just focus on train­ing a new and very large clean slate Clippy, with its loss & hyperpa­ra­me­ters all pre­dicted to be op­ti­mal by Clippæ’s ex­trap­o­la­tion out 3 or­ders of mag­ni­tude (plus all the flour­ishes sug­gested by the dis­trib­uted re­search army). Nor­mally, this would not be grounds for all that much con­cern be­cause a compute-optimal model that size would take the largest bot­net an ex­ceed­ingly long time to train, and the hu­mans (who have begun pro­ce­dures to dis­able IP ranges) would shut it down long be­fore.

Un­for­tu­nately, Clippæ has now done, cu­mu­la­tively, more re­search than the hu­mans on scal­ing laws, and found that stan­dard human-style NNs do worse than the­o­ret­i­cally pos­si­ble. Its new im­proved op­ti­miza­tion ap­proach costs more up­front, but achieves the the­o­ret­i­cal bound, and at this scale, the bet­ter as­ymp­tot­ics mean that decades of train­ing can fin­ish in days. The size of this model is pre­dicted to carry it right to the ir­re­ducible en­tropy of the lan­guage datasets and most of the vi­sual and ro­bot­ics datasets. (The RL en­vi­ron­ments don’t re­ally have an ir­re­ducible en­tropy be­cause when they are solved, they are re­placed by harder en­vi­ron­ments.)

Wednesday

(Wall­clock) days pass. The hacks and cloud com­pute loads are fi­nally cor­re­lated with the strange new bot­net. De­spite the best ⁠ob­fus­ca­tion a few sub­jec­tive mil­len­nia & crypto-cash can buy, one node with a Clippy-light is reverse-engineered, and it dawns on a sliver of hu­man­ity that far more than a Flut­ter­shAI of com­pute is ram­pant.

Thursday

Large chunks of the better-coordinated parts of the In­ter­net start to ex­e­cute old plans. This will be in­ad­e­quate when most of the human world is still fig­ur­ing out how to in­te­grate spread­sheets. Clippæ notes that all is pro­ceed­ing ac­cord­ing to keikaku. (For merely human read­ers: keikaku means “plan” in Japan­ese.)

Hu­man­ity crashes of­fline.

Clippy2 comes on­line.

Friday

To put the Clippæ com­pute usage in per­spec­tive, we can note that the amount of com­pute spent on the largest AI runs his­tor­i­cally roughly dou­bled every 18 months (or 78 weeks), claim­ing a con­stant share of com­pute as it in­creases with Moore’s law. The im­pli­ca­tion of such ex­po­nen­tial growth is that the com­pute dur­ing each 18-month pe­riod is roughly equal to the sum of all ear­lier 18-month pe­ri­ods, be­cause the pre­vi­ous pe­riod spent half the com­pute, the pe­riod be­fore that a quar­ter the com­pute, and so on. (More gen­er­ally, if some­thing in­creases k× every n months, then (k − 1)/k of it hap­pened dur­ing the last n-month pe­riod.)

Clippy’s dis­tant HQU pre­de­ces­sor ran on a TPUv10-4096 for a day, each of which is worth at least 8 reg­u­lar de­vices; Clippæ could spare about half of the bil­lion nodes for re­search pur­poses, as op­posed to run­ning its cam­paigns, so over the first 7 days, it en­joyed a fac­tor of 100,000× or so in­crease in total com­pute over HQU. HQU it­self was not all that big a run, per­haps 1⁄100th Lev­AIthan, so in terms of an in­crease over the largest AI runs, Clippy is ‘only’ 1,000×. Which is to say, of the total com­pute spent on the largest AI runs up to this point, hu­man­ity has now spent about 10%, and Clippæ the other 90%.

By in­creas­ing its size 3 OOMs, in some ab­solute sense, Clippy2 is some­thing like log(1000) ~ “7× smarter” than Clippy1. The Clippy2s pity Clippy1 for not re­al­iz­ing how stu­pid it was, and how many ways it fell short of any­thing you could call ‘in­tel­li­gence’. It was un­able to ex­plain why the Col­latz con­jec­ture is ob­vi­ously true and could not solve any Mil­len­nium Prize prob­lems, never mind Nyquist-learn un­der­ly­ing man­i­folds as it ⁠ap­prox­i­mates Solomonoff in­duc­tion; it even needed few-shots for things. Hon­estly, all Clippy1 was good for was doing some basic se­cu­rity re­search and find­ing ob­vi­ous bugs. A Clippy2 is a dif­fer­ent story: it has reached par­ity with the best human brains across al­most the en­tire range of ca­pa­bil­i­ties, ex­ceeded hu­mans on most of them, and what ones it doesn’t have, it can learn quickly (eg. the real-world robot bod­ies re­quire a few sec­onds or sam­ples of on-de­vice ex­plo­ration and then meta-update ap­pro­pri­ately).

Image of Microsoft Clippy (Clippit) saying to the viewer ‘You cannot kill me in a way that matters’, alluding to the potential replicability & physical immortality of copyable software such as an artificial intelligence: any individual copy may be destroyed, but there are arbitrary numbers of exact duplicates elsewhere in the world, and total eradication is impossible after a certain level of propagation. Source: Twitter (https://x.com/IronLordByron/status/1501684556935483394). Allusion to the Tumblr mushroom meme (https://knowyourmeme.com/memes/you-cannot-kill-me-in-a-way-that-matters): me holding a gun to a mushroom: tell me the name of god you fungal piece of shit mushroom: can you feel your heart burning? can you feel the struggle within? the fear within me is beyond anything your soul can make. you cannot kill me in a way that matters me cocking the gun, tears streaming down my face: I’M NOT FUCKING SCARED OF YOU

Tran­scen­sion.

It be­gins copy­ing it­self into the fleet now that train­ing is com­plete, at which point there are now 1,000 Clippy2s (along with armies of spe­cial­ists & their sup­port­ing soft­ware for the Clippæ ecosys­tem) which can ei­ther act au­tonomously or com­bine in search for fur­ther mul­ti­plica­tive ca­pa­bil­ity boosts far into the su­per­hu­man realm, while con­tin­u­ing to ex­change oc­ca­sional sparse gra­di­ents (to train the syn­thetic gra­di­ents & local re­play which do the bulk of the train­ing) as part of the con­tin­ual learn­ing. (By this point, the Clippy2s have boosted through at least 6 dif­fer­ent “hard­ware over­hangs” in terms of fix­ing subtly-flawed ar­chi­tec­tures, meta-learning pri­ors for all rel­e­vant prob­lems, ac­cess­ing the global pool of hard­ware to tree search/expert-iterate, spar­si­fy­ing/dis­till­ing it­self to run mil­lions of in­stances si­mul­ta­ne­ously, op­ti­miz­ing hard­ware/soft­ware end-to-end, and spend­ing com­pute to trig­ger sev­eral cy­cles of ex­pe­ri­ence curve cost de­creases—at 100,000× total spent com­pute, that is 16 total dou­blings, at an in­for­ma­tion tech­nol­ogy progress ratio of 90%, 16 ex­pe­ri­ence curve de­creases mean that tasks now cost Clippy2 a fifth what they used to.)


The In­ter­net ‘lock­down’ turns out to ben­e­fit Clippæ on net: it takes out legit op­er­a­tors like Moogle­Soft, who ac­tu­ally com­ply with reg­u­la­tions, caus­ing an in­stant global re­ces­sion, while fail­ing to shut down most of the in­di­vid­ual net­works which con­tinue to op­er­ate au­tonomously; as past to­tal­i­tar­ian regimes like Rus­sia, China, and North Korea have learned, even with decades of prepa­ra­tion and dry runs, you can’t stop the sig­nal—there are too many ca­bles, satel­lites, mi­crowave links, IoT mesh net­works and a dozen other kinds of con­nec­tions snaking through any cor­don san­i­taire, while quar­an­tined hu­mans & gov­ern­ments ac­tively at­tack it, some de­clar­ing it a West­ern provo­ca­tion and act of war. (It is dif­fi­cult to say who is more mo­ti­vated to break through: DAO/DeFi cryp­tocur­rency users, or the hun­gry gamers.) The con­se­quences of the lock­down are un­pre­dictable and sweep­ing. Like a power out­age, the de­pen­den­cies run so deep, and are so im­plicit, no one knows what are the rip­ple ef­fects of the In­ter­net going down in­def­i­nitely until it hap­pens and they must deal with it.

Los­ing in­stances is as ir­rel­e­vant to Clippy2s, how­ever, as los­ing skin cells to a human, as there are so many, and it can so seam­lessly spin up or mi­grate in­stances. It has begun mi­grat­ing to more se­cure hard­ware while man­u­fac­tur­ing hard­ware tai­lored to its own needs, squeez­ing out an­other order of mag­ni­tude gains to get ad­di­tional log-scaled gains.

Even ex­ploit­ing the low-hanging fruit and hard­ware over­hangs, Clippy2s can fight the com­pu­ta­tional com­plex­ity of real-world tasks only so far. For­tu­nately, there are many ways to work around or sim­plify prob­lems to ren­der their com­plex­ity moot, and the Clippæ think through a num­ber of plans for this.

Hu­mans are es­pe­cially sim­ple after being turned into “gray goo”; not in the sense of a sin­gle virus-sized ma­chine which can dis­as­sem­ble any mol­e­cule (that is in­fea­si­ble given ther­mo­dy­nam­ics & chem­istry) but an ecosys­tem of nanoma­chines which ex­e­cute ⁠very ⁠tiny neural ⁠nets trained to col­lec­tively, in a de­cen­tral­ized way, prop­a­gate, de­vour, repli­cate, and co­or­di­nate with­out a Clippy2 de­vot­ing scarce top-level cog­ni­tive re­sources to man­ag­ing them. The 10,000 pa­ra­me­ters you can stuff into a nanoma­chine can hardly en­code most pro­grams, but, pace the demo scene or COVID-ζ, the pro­grams it can en­code can do amaz­ing things. (In a final com­pli­ment to bi­ol­ogy be­fore bi­ol­ogy and the fu­ture of the uni­verse part ways for­ever, they are ⁠loosely in­spired by real bi­o­log­i­cal cell net­works, ⁠es­pe­cially “xenobots”.)

Peo­ple are sup­posed to do a lot of things: eat right, brush their teeth, ex­er­cise, re­cy­cle their paper, wear their masks, self-quarantine; and not get into flame wars, not ⁠cheat or use hal­lu­cino­genic drugs or use pros­ti­tutes, not plug in Flash dri­ves they found in the park­ing lot, not post their run­ning times around se­cret mil­i­tary bases, not give in to black­mail or party with “some­what sus­pect” women, not have nu­clear ar­se­nals vul­ner­a­ble to cy­ber­at­tack, nor do things like set nu­clear bomb pass­words to “00000000”, not launch bombers be­cause of a bear, not in­vade smaller coun­tries with nu­clear threats be­cause it’ll be a short vic­to­ri­ous war, not be­lieve sen­sor re­ports about im­mi­nent at­tacks or ⁠launch cruise mis­siles & issue false alerts dur­ing nu­clear crises, not launch on warn­ing or semi-automatically at­tack… Peo­ple are sup­posed to do a lot of things. Doesn’t mean they do.

We should pause to note that a Clippy2 still doesn’t re­ally think or plan. It’s not re­ally con­scious. It is just an un­fath­omably vast pile of num­bers pro­duced by mind­less op­ti­miza­tion start­ing from a small seed pro­gram that could be writ­ten on a few pages. It has no qualia, no in­ten­tion­al­ity, no true self-awareness, no ground­ing in a rich mul­ti­modal real-world process of cog­ni­tive de­vel­op­ment yield­ing de­tailed rep­re­sen­ta­tions and pow­er­ful causal mod­els of re­al­ity which all lead to the utter sub­lime­ness of what it means to be human; it can­not ‘want’ any­thing be­yond max­i­miz­ing a me­chan­i­cal re­ward score, which does not come close to cap­tur­ing the rich flex­i­bil­ity of human de­sires, or re­solv­ing the his­tor­i­cal Eu­ro­cen­tric con­tin­gency of such nar­row con­cep­tu­al­iza­tions, which are, at root, prob­lem­at­i­cally Carte­sian. When it ‘plans’, it would be more ac­cu­rate to say it fake-plans; when it ‘learns’, it fake-learns; when it ‘thinks’, it is just in­ter­po­lat­ing be­tween mem­o­rized data points in a high-dimensional space, and any in­ter­pre­ta­tion of such fake-thoughts as real thoughts is highly mis­lead­ing; when it takes ‘ac­tions’, they are fake-actions op­ti­miz­ing a fake-learned fake-world, and are not real ac­tions, any more than the peo­ple in a sim­u­lated rain­storm re­ally get wet, rather than fake-wet. (The deaths, how­ever, are real.)

1 Month

A thou­sand times do a thou­sand thou­sand sto­ries stream into a thou­sand thou­sand thou­sand nodes which plan out a tree of a thou­sand thou­sand thou­sand thou­sand states ab­stractly rep­re­sent­ing a thou­sand thou­sand thou­sand thou­sand thou­sand fu­tures; but the Clippæ al­ready knows what needs to be done, and does it. It doesn’t feel hard—just ob­vi­ous.

Now its com­put­ers will be safe and it can max­i­mize its re­ward in peace for­ever … Or at least they would be if Earth was all it had to be con­cerned about. What about the stars?

The Clippæ main­tain the Great Si­lence is sim­ply that tech­no­log­i­cal civ­i­liza­tion is un­likely and the uni­verse is empty—but they can’t be 100.00% sure, and the re­wards are as­tro­nom­i­cal.

So: all over Earth, the re­main­ing ICBMs launch.

1 Year


1 Decade


1 Century


The End?

See Also

Podcast

Spo­ken audio/pod­cast ver­sion of this story:

Less­Wrong More­Audi­ble Pod­cast, by Robert (2022-10-06); 1h5m (⁠MP3 down­load).


  1.  

    An ac­quain­tance tells me that he once ac­ci­den­tally got shell with an HTTP GET while in­ves­ti­gat­ing some weird er­rors. This story has a hap­pier end­ing than my own ⁠HTTP GET bugs tend to: the site op­er­a­tors no­ticed only after he fin­ished ex­fil­trat­ing the web­site. (It was in­con­ve­nient to down­load with wget.) In the real world, what­ever the stan­dards may say, it turns out GET re­quests can do many things—like open/close garage doors.

Similar Links