{"id":77853,"date":"2022-08-25T19:43:58","date_gmt":"2022-08-25T19:43:58","guid":{"rendered":"https:\/\/80000hours.org\/?post_type=problem_profile&#038;p=77853"},"modified":"2024-11-28T12:39:07","modified_gmt":"2024-11-28T12:39:07","slug":"artificial-intelligence","status":"publish","type":"problem_profile","link":"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/","title":{"rendered":"Preventing an AI-related catastrophe"},"content":{"rendered":"<div id=\"toc_container\" class=\"toc_white no_bullets\"><p class=\"toc_title\">Table of Contents<\/p><ul class=\"toc_list\"><li><a href=\"#experts-are-concerned\"><span class=\"toc_number toc_depth_1\">1<\/span> 1. Many AI experts think there&#8217;s a non-negligible chance AI will lead to outcomes as bad as extinction<\/a><\/li><li><a href=\"#making-advances-extremely-quickly\"><span class=\"toc_number toc_depth_1\">2<\/span> 2. We&#8217;re making advances in AI extremely quickly<\/a><ul><li><a href=\"#scaling\"><span class=\"toc_number toc_depth_2\">2.1<\/span> Current trends show rapid progress in the capabilities of ML systems<\/a><\/li><li><a href=\"#when-can-we-expect-to-develop-transformative-AI\"><span class=\"toc_number toc_depth_2\">2.2<\/span> When can we expect transformative AI?<\/a><\/li><\/ul><\/li><li><a href=\"#power-seeking-ai\"><span class=\"toc_number toc_depth_1\">3<\/span> 3. Power-seeking AI could pose an existential threat to humanity<\/a><ul><li><a href=\"#aps-systems\"><span class=\"toc_number toc_depth_2\">3.1<\/span> It&#8217;s likely we&#8217;ll build advanced planning systems<\/a><\/li><li><a href=\"#instrumental-convergence\"><span class=\"toc_number toc_depth_2\">3.2<\/span> Advanced planning systems could easily be dangerously &#8216;misaligned&#8217;<\/a><\/li><li><a href=\"#lost-control\"><span class=\"toc_number toc_depth_2\">3.3<\/span> Disempowerment by AI systems would be an existential catastrophe<\/a><\/li><li><a href=\"#incentives-and-deception\"><span class=\"toc_number toc_depth_2\">3.4<\/span> People might deploy misaligned AI systems despite the risk<\/a><\/li><\/ul><\/li><li><a href=\"#this-all-sounds-very-abstract-what-could-an-existential-catastrophe-caused-by-ai-actually-look-like\"><span class=\"toc_number toc_depth_1\">4<\/span> This all sounds very abstract. What could an existential catastrophe caused by AI actually look like?<\/a><\/li><li><a href=\"#other-risks\"><span class=\"toc_number toc_depth_1\">5<\/span> 4. Even if we find a way to avoid power-seeking, there are still risks<\/a><ul><li><a href=\"#bioweapons\"><span class=\"toc_number toc_depth_2\">5.1<\/span> Bioweapons<\/a><\/li><li><a href=\"#intentionally-dangerous-ai-agents\"><span class=\"toc_number toc_depth_2\">5.2<\/span> Intentionally dangerous AI agents<\/a><\/li><li><a href=\"#cyberweapons\"><span class=\"toc_number toc_depth_2\">5.3<\/span> Cyberweapons<\/a><\/li><li><a href=\"#dangerous-new-technology\"><span class=\"toc_number toc_depth_2\">5.4<\/span> Other dangerous tech<\/a><\/li><li><a href=\"#ai-could-empower-totalitarian-governments\"><span class=\"toc_number toc_depth_2\">5.5<\/span> AI could empower totalitarian governments<\/a><\/li><li><a href=\"#artificial-intelligence-and-war\"><span class=\"toc_number toc_depth_2\">5.6<\/span> AI could worsen war<\/a><\/li><li><a href=\"#other-risks-from-ai\"><span class=\"toc_number toc_depth_2\">5.7<\/span> Other risks from AI<\/a><\/li><\/ul><\/li><li><a href=\"#how-likely-is-an-AI-related-catastrophe\"><span class=\"toc_number toc_depth_1\">6<\/span> So, how likely is an AI-related catastrophe?<\/a><\/li><li><a href=\"#we-can-tackle-these-risks\"><span class=\"toc_number toc_depth_1\">7<\/span> 5. We can tackle these risks<\/a><ul><li><a href=\"#technical-ai-safety-research\"><span class=\"toc_number toc_depth_2\">7.1<\/span> Technical AI safety research<\/a><\/li><li><a href=\"#ai-governance-and-policy\"><span class=\"toc_number toc_depth_2\">7.2<\/span> AI governance and policy<\/a><\/li><\/ul><\/li><li><a href=\"#neglectedness\"><span class=\"toc_number toc_depth_1\">8<\/span> 6. This work is neglected<\/a><\/li><li><a href=\"#best-arguments-against-this-problem-being-pressing\"><span class=\"toc_number toc_depth_1\">9<\/span> What do we think are the best arguments against this problem being pressing?<\/a><\/li><li><a href=\"#good-responses\"><span class=\"toc_number toc_depth_1\">10<\/span> Arguments against working on AI risk to which we think there are strong responses<\/a><\/li><li><a href=\"#what-can-you-do-concretely-to-help\"><span class=\"toc_number toc_depth_1\">11<\/span> What you can do concretely to help<\/a><ul><li><a href=\"#technical-ai-safety\"><span class=\"toc_number toc_depth_2\">11.1<\/span> Technical AI safety<\/a><\/li><li><a href=\"#ai-governance-and-policy-work\"><span class=\"toc_number toc_depth_2\">11.2<\/span> AI governance and policy work<\/a><\/li><li><a href=\"#complementary-yet-crucial-roles\"><span class=\"toc_number toc_depth_2\">11.3<\/span> Complementary (yet crucial) roles<\/a><\/li><li><a href=\"#other-ways-to-help\"><span class=\"toc_number toc_depth_2\">11.4<\/span> Other ways to help<\/a><\/li><li><a href=\"#want-one-on-one-advice-on-pursuing-this-path\"><span class=\"toc_number toc_depth_2\">11.5<\/span> Want one-on-one advice on pursuing this path?<\/a><\/li><li><a href=\"#find-vacancies-on-our-job-board\"><span class=\"toc_number toc_depth_2\">11.6<\/span> Find vacancies on our job board<\/a><\/li><\/ul><\/li><li><a href=\"#top-resources-to-learn-more\"><span class=\"toc_number toc_depth_1\">12<\/span> Top resources to learn more<\/a><\/li><li><a href=\"#acknowledgements\"><span class=\"toc_number toc_depth_1\">13<\/span> Acknowledgements<\/a><\/li><\/ul><\/div>\n<div class=\"well bg-gray-lighter margin-bottom margin-top padding-top-small padding-bottom-small\">\n<p class=\"small\"><a id=\"note-from-the-author\" class=\"link-anchor\"><\/a> <strong>Note from the author:<\/strong> At its core, this problem profile tries to predict the future of technology. This is a notoriously difficult thing to do. In addition, there has been much less rigorous research into the risks from AI than into the other risks 80,000 Hours writes about (like <a href=\"\/preventing-catastrophic-pandemics\/\">pandemics<\/a> or <a href=\"\/problem-profiles\/climate-change\/\">climate change<\/a>). That said, there is a growing field of research into the topic, which I&#8217;ve tried to reflect. For this article I&#8217;ve leaned especially on <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2206.13353\">this report<\/a> by Joseph Carlsmith at <a href=\"https:\/\/www.openphilanthropy.org\/\">Open Philanthropy<\/a> (also available as a <a href=\"https:\/\/open.spotify.com\/episode\/5PokyqXCw4hpV5u0rc5Lio\">narration<\/a>), as it&#8217;s the most rigorous overview of the risk that I could find. I&#8217;ve also had the article reviewed by <a href=\"#acknowledgements\">over 30 people with different expertise and opinions on the topic<\/a>. (Almost all are concerned about advanced AI&#8217;s potential impact.) <\/p>\n<\/div>\n<p>Why do we think that reducing risks from AI is one of the most pressing issues of our time? In short, our reasons are:<\/p>\n<ol>\n<li>Even before getting into the actual arguments, we can see some cause for concern \u2014 as <a href=\"#experts-are-concerned\">many AI experts think there&#8217;s a small but non-negligible chance that AI will lead to outcomes as bad as human extinction<\/a>.<\/li>\n<li><a href=\"#making-advances-extremely-quickly\">We&#8217;re making advances in AI extremely quickly<\/a> \u2014 which suggests that AI systems could have a significant influence on society, soon.<\/li>\n<li>There are strong arguments that <a href=\"#power-seeking-ai\">&#8220;power-seeking&#8221; AI could pose an existential threat to humanity<\/a> \u2014 which we&#8217;ll go through below.<\/li>\n<li><a href=\"#other-risks\">Even if we find a way to avoid power-seeking, there are still other risks<\/a>.<\/li>\n<li>We think <a href=\"#we-can-tackle-these-risks\">we can tackle these risks<\/a>.<\/li>\n<li><a href=\"#neglectedness\">This work is neglected<\/a>.<\/li>\n<\/ol>\n<p>We&#8217;re going to cover each of these in turn, then consider <a href=\"#best-arguments-against-this-problem-being-pressing\">some of the best counterarguments<\/a>, explain <a href=\"#what-can-you-do-concretely-to-help\">concrete things you can do to help<\/a>, and finally <a href=\"#top-resources-to-learn-more\">outline some of the best resources for learning more about this area<\/a>.<\/p>\n<p>If you&#8217;d like, you can watch our 10-minute video summarising the case for AI risk before reading further:<\/p>\n<div id=\"youtube-qzyEgZwfkKY\" class=\"wrap-video\">\n<div class=\"lazyYT\" data-youtube-id=\"qzyEgZwfkKY\" data-ratio=\"16:9\" data-parameters=\"modestbranding=1&show_info=1&theme=light\" data-autoplay=\"0\"><\/div>\n<\/div>\n<h2><span id=\"experts-are-concerned\" class=\"toc-anchor\"><\/span>1. Many AI experts think there&#8217;s a non-negligible chance AI will lead to outcomes as bad as extinction<\/h2>\n<p>In May 2023, hundreds of prominent AI scientists \u2014 and other notable figures \u2014 signed a statement saying that <a href=\"https:\/\/www.safe.ai\/statement-on-ai-risk\">mitigating the risk of extinction from AI should be a global priority<\/a>.<\/p>\n<p>So it&#8217;s pretty clear that at least some experts are concerned.<\/p>\n<p>But how concerned are they? And is this just a fringe view?<\/p>\n<p>We looked at four surveys of AI researchers who published at NeurIPS and ICML (two of the most prestigious machine learning conferences) from 2016, 2019, 2022 and 2023.<\/p>\n<p>It&#8217;s important to note that there could be considerable selection bias on surveys like this. For example, you might think researchers who go to the top AI conferences are more likely to be optimistic about AI, because they have been selected to think that AI research is doing good. Alternatively, you might think that researchers who are already concerned about AI are more likely to respond to a survey asking about these concerns.<\/p>\n<p>All that said, here&#8217;s what we found:<\/p>\n<p>In all four surveys, the median researcher thought that the chances that AI would be &#8220;extremely good&#8221; was reasonably high: 20% in the 2016 survey, 20% in 2019, 10% in 2022, and 10% in 2023.<\/p>\n<p>Indeed, AI systems are already having substantial positive effects \u2014 for example, in <a href=\"https:\/\/web.archive.org\/web\/20221013010307\/https:\/\/www.deepmind.com\/blog\/announcing-deepmind-health-research-partnership-with-moorfields-eye-hospital\">medical care<\/a> or <a href=\"https:\/\/ought.org\/elicit\">academic research<\/a>.<\/p>\n<p>But in all four surveys, the median researcher also estimated small \u2014 and certainly not negligible \u2014 chances that AI would be &#8220;extremely bad (e.g. human extinction)&#8221;: a 5% chance of extremely bad outcomes in the 2016 survey, 2% in 2019, 5% in 2022 and 5% in 2023. <\/p>\n<p>In the 2022 survey, participants were specifically asked about the chances of existential catastrophe caused by future AI advances \u2014 and again, <em>over half of researchers thought the chances of an existential catastrophe was greater than 5%<\/em>.<\/p>\n<p>So experts disagree on the degree to which AI poses an existential risk \u2014 a kind of threat we&#8217;ve <a href=\"https:\/\/80000hours.org\/articles\/existential-risks\/\">argued<\/a> deserves serious moral weight.<\/p>\n<p>This fits with our understanding of the state of the research field. Three of the leading companiess developing AI \u2014 DeepMind, Anthropic and OpenAI \u2014 also have teams dedicated to figuring out how to solve technical safety issues that we believe could, for reasons we discuss at length below, lead to an <a href=\"\/articles\/existential-risks\/\">existential threat<\/a> to humanity.<\/p>\n<p>There are also several academic research groups (including at <a href=\"https:\/\/people.csail.mit.edu\/dhm\/\">MIT<\/a>, <a href=\"https:\/\/www.davidscottkrueger.com\/\">Cambridge<\/a>, <a href=\"https:\/\/www.cs.cmu.edu\/~focal\/\">Carnegie Mellon University<\/a>, and <a href=\"https:\/\/humancompatible.ai\/\">UC Berkeley<\/a>) focusing on these same technical AI safety problems.<\/p>\n<p>It&#8217;s hard to know exactly what to take from all this, but we&#8217;re confident that it&#8217;s not a fringe position in the field to think that there is a material risk of outcomes as bad as an existential catastrophe. Some experts in the field maintain, though, that the risks are overblown.<\/p>\n<p>Still, why do we side with those who are more concerned? In short, it&#8217;s because there are arguments we&#8217;ve found persuasive that AI could pose such an existential threat \u2014 arguments we will go through step by step below.<\/p>\n<p>It&#8217;s important to recognise that the fact that many experts recognise there&#8217;s a problem doesn&#8217;t mean that everything&#8217;s OK because the experts have got it covered. Overall, we think this problem remains highly neglected (more on this <a href=\"#neglectedness\">below<\/a>), especially as billions of dollars a year are spent to make AI more advanced.<\/p>\n<h2><span id=\"making-advances-extremely-quickly\" class=\"toc-anchor\"><\/span>2. We&#8217;re making advances in AI extremely quickly<\/h2>\n<figure class=\"wp-caption\" >\n<img decoding=\"async\" src=\"https:\/\/80000hours.org\/wp-content\/uploads\/2024\/03\/Screenshot-2024-03-18-at-12.44.19\u202fAM.png\" alt=\"Three cats dressed as computer programmers generated by different AI software.\"><figcaption >&#8220;<em>A cat dressed as a computer programmer<\/em>&#8221; as generated by <a href=\"https:\/\/www.craiyon.com\/\">Craiyon (formerly DALL-E mini)<\/a> (top left), OpenAI&#8217;s <a href=\"https:\/\/openai.com\/dall-e-2\/\">DALL-E 2.<\/a> (top right), and <a href=\"https:\/\/docs.midjourney.com\/docs\/models\">Midjourney V6<\/a>. DALL-E mini uses a model <a href=\"https:\/\/web.archive.org\/web\/20221013010717\/https:\/\/wandb.ai\/dalle-mini\/dalle-mini\/reports\/DALL-E-Mini-Explained--Vmlldzo4NjIxODA\">27 times smaller<\/a> than OpenAI&#8217;s DALL-E 1 model, released in January 2021. DALL-E 2 was released in April 2022. Midjourney released the sixth version of its model in December 2023.<\/figcaption><\/figure>\n<p>Before we try to figure out what the future of AI might look like, it&#8217;s helpful to take a look at what AI can already do.<\/p>\n<p>Modern AI techniques involve <a href=\"https:\/\/en.wikipedia.org\/wiki\/Machine_learning\"><em>machine learning<\/em><\/a> (ML): models that improve automatically through data input. The most common form of this technique used today is known as <a href=\"https:\/\/en.wikipedia.org\/wiki\/Deep_learning\">deep learning<\/a>.<\/p>\n<aside class=\"well well-person pull-right clearfix   padding-top-small padding-bottom-small\">\n<h4 class=\"no-margin-top\">What is deep learning?<\/h4>\n<p><a id=\"what-is-deep-learning\" class=\"link-anchor\"><\/a><\/p>\n<p>Machine learning techniques, in general, take some input data and produce some outputs, in a way that depends on some parameters in the model, which are learned automatically rather than being specified by programmers.<\/p>\n<p>Most of the recent advances in machine learning use <em>neural networks<\/em>. A neural network transforms input data into output data by passing it through several hidden &#8216;layers&#8217; of simple calculations, with each layer made up of &#8216;neurons.&#8217; Each neuron receives data from the previous layer, performs some calculation based on its parameters (basically some numbers specific to that neuron), and passes the result on to the next layer.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/80000hours.org\/wp-content\/uploads\/2022\/06\/768px-Neural_network_example.svg_.png\" alt=\"A neural network with a single hidden layer\" title=\"\" class=\"alignright\" \/><\/p>\n<p>The engineers developing the network will choose some measure of success for the network (known as a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Loss_function\">&#8216;loss&#8217; or &#8216;objective&#8217; function<\/a>). The degree to which the network is successful (according to the measure chosen) will depend on the exact values of the parameters for each neuron on the network.<\/p>\n<p>The network is then <em>trained<\/em> using a large quantity of data. By using an optimisation algorithm (most commonly <a href=\"https:\/\/en.wikipedia.org\/wiki\/Stochastic_gradient_descent\">stochastic gradient descent<\/a>), the parameters of each neuron are gradually tweaked each time the network is tested against the data using the loss function. The optimisation algorithm will (generally) make the neural network perform slightly better each time the parameters are tweaked. Eventually, the engineers will end up with a network that performs pretty well on the measure chosen.<\/p>\n<p><em>Deep learning<\/em> refers to the use of neural networks with many layers.<\/p>\n<p>To learn more, we recommend:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.youtube.com\/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi\">3Blue1Brown&#8217;s YouTube series on neural networks<\/a>, an excellent video introduction<\/li>\n<li><a href=\"https:\/\/web.archive.org\/web\/20221013010903\/https:\/\/www.alignmentforum.org\/posts\/qE73pqxAZmeACsAdF\/a-short-introduction-to-machine-learning\">A short introduction to machine learning<\/a> by Richard Ngo, a short blog post giving an overview of the topic<\/li>\n<li><a href=\"https:\/\/web.archive.org\/web\/20221013010851\/https:\/\/medium.com\/machine-learning-for-humans\/why-machine-learning-matters-6164faf1df12\">Machine learning for humans<\/a> by Vishal Maini and Samer Sabri, a longer but accessible introduction to machine learning<\/li>\n<\/ul>\n<\/aside>\n<p>Probably the most well-known ML-based product is <a href=\"https:\/\/chat.openai.com\/\">ChatGPT<\/a>. OpenAI&#8217;s commercialisation system \u2014 where you can pay for a much more powerful version of the product \u2014 led to <a href=\"https:\/\/www.ft.com\/content\/81ac0e78-5b9b-43c2-b135-d11c47480119\">revenue of over $2 billion by the end of 2023<\/a>, making OpenAI one of the fastest growing startups ever.<\/p>\n<p>If you&#8217;ve used ChatGPT, you may have been a bit underwhelmed. After all \u2014 while it&#8217;s great at some tasks, like coding and data analysis \u2014 it makes <a href=\"https:\/\/en.wikipedia.org\/wiki\/Hallucination_(artificial_intelligence)\">lots of mistakes<\/a>. (Though note that the paid version tends to perform better than the free version.)<\/p>\n<p>But we shouldn&#8217;t expect the frontier of AI to remain at the level of ChatGPT. There has been huge progress in what can be achieved with ML in only the last few years. Here are a few examples (from less recent to more recent):<\/p>\n<ul>\n<li><a href=\"https:\/\/web.archive.org\/web\/20221013011100\/https:\/\/www.deepmind.com\/blog\/alphastar-mastering-the-real-time-strategy-game-starcraft-ii\">AlphaStar<\/a>, which can beat top professional players at StarCraft II (January 2019)<\/li>\n<li><a href=\"https:\/\/web.archive.org\/web\/20221013011105\/https:\/\/www.deepmind.com\/blog\/muzero-mastering-go-chess-shogi-and-atari-without-rules\">MuZero<\/a>, a single system that learned to win games of chess, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Shogi\">shogi<\/a>, and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Go_(game)\">Go<\/a> \u2014 without ever being told the rules (November 2019)<\/li>\n<li><a href=\"https:\/\/web.archive.org\/web\/20221013011121\/https:\/\/openai.com\/blog\/formal-math\/\">GPT-f<\/a>, which can solve some Maths Olympiad problems (September 2020)<\/li>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/AlphaFold\">AlphaFold 2<\/a>, a huge step forward in solving the long-perplexing <a href=\"https:\/\/web.archive.org\/web\/20221011212305\/https:\/\/www.deepmind.com\/blog\/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology\">protein-folding problem<\/a> (July 2021)<\/li>\n<li><a href=\"https:\/\/web.archive.org\/web\/20221013011518\/https:\/\/www.deepmind.com\/publications\/a-generalist-agent\">Gato<\/a>, a single ML model capable of doing a huge number of different things (including playing Atari, captioning images, chatting, and stacking blocks with a real robot arm), deciding what it should output based on the context (May 2022)<\/li>\n<li><a href=\"https:\/\/www.midjourney.com\/home\">Midjourney V6<\/a> (December 2023), <a href=\"https:\/\/stability.ai\/stable-image\">Stable Diffusion XL<\/a> (July 2023), <a href=\"https:\/\/openai.com\/dall-e-2\/\">DALL-E 3<\/a> (August 2023) and <a href=\"https:\/\/deepmind.google\/technologies\/imagen-2\/\">Imagen 2<\/a> (December 2023), all of which are capable of generating high-quality images from written descriptions<\/li>\n<li><a href=\"https:\/\/openai.com\/sora\">Sora<\/a> (February 2024), a model from OpenAI that can create realistic video from text prompts<br \/>\n*And large language models, such as <a href=\"https:\/\/openai.com\/gpt-4\">GPT-4<\/a>, <a href=\"https:\/\/www.anthropic.com\/claude\">Claude<\/a>, and <a href=\"https:\/\/deepmind.google\/technologies\/gemini\/#introduction\">Gemini<\/a> \u2014 which we&#8217;ve become so familiar with through chatbots \u2014 continue to <a href=\"https:\/\/blog.google\/technology\/ai\/google-gemini-ai\/#performance\">surpass benchmarks<\/a> on maths, code, general knowledge, and reasoning ability. <\/li>\n<\/ul>\n<p>If you&#8217;re anything like us, you found the complexity and breadth of the tasks these systems can carry out surprising.<\/p>\n<p>And if the technology keeps advancing at this pace, it seems clear there will be major effects on society. At the very least, automating tasks makes carrying out those tasks cheaper. As a result, we may see rapid increases in economic growth (perhaps even to the level we saw <a href=\"https:\/\/web.archive.org\/web\/20221013011653\/https:\/\/www.britannica.com\/topic\/productivity\/Historical-trends\">during the Industrial Revolution<\/a>).<\/p>\n<p>If we&#8217;re able to partially or fully <a href=\"https:\/\/web.archive.org\/web\/20221013011707\/https:\/\/www.cold-takes.com\/transformative-ai-timelines-part-1-of-4-what-kind-of-ai\/\">automate scientific advancement<\/a> we may see more transformative changes to society and technology.<\/p>\n<p>That could be just the beginning. We may be able to get computers to <a href=\"https:\/\/web.archive.org\/web\/20221013011716\/https:\/\/www.cold-takes.com\/how-digital-people-could-change-the-world\/\">eventually automate <em>anything<\/em> humans can do<\/a>. This seems like it has to be possible \u2014 at least in principle. This is because it seems that, with enough power and complexity, a computer should be able to simulate the human brain. This would itself be <em>a<\/em> way of automating anything humans can do (if not the most efficient method of doing so).<\/p>\n<p>And as we&#8217;ll see in the next section, there are some indications that extensive automation may well be possible through scaling up existing techniques.<\/p>\n<h3><span id=\"scaling\" class=\"toc-anchor\"><\/span>Current trends show rapid progress in the capabilities of ML systems<\/h3>\n<p>There are three things that are crucial to building AI through machine learning:<\/p>\n<ol>\n<li>Good algorithms (e.g. more efficient algorithms are better)<\/li>\n<li>Data to train an algorithm<\/li>\n<li>Enough computational power (known as <em>compute<\/em>) to do this training<\/li>\n<\/ol>\n<p><a href=\"https:\/\/epoch.ai\/blog\/epoch-impact-report-2023\">Epoch<\/a> is a team of scientists investigating trends in the development of advanced AI \u2014 in particular, how these three inputs are changing over time.<\/p>\n<p>They found that the <a href=\"https:\/\/epoch.ai\/trends#compute-trends-section\">amount of compute used for training<\/a> the largest AI models has been rising exponentially \u2014 doubling on average every six months since 2010.<\/p>\n<p>That means the amount of computational power used to train our largest machine learning models has grown by over one billion times.<\/p>\n<p><iframe src=\"https:\/\/ourworldindata.org\/grapher\/ai-training-computation\" loading=\"lazy\" style=\"width: 100%; height: 600px; border: 0px none;\"><\/iframe><\/p>\n<p>Epoch also looked at how much compute has been needed to train a neural network to have the same performance on <a href=\"https:\/\/www.image-net.org\/\">ImageNet<\/a> (a well-known test data set for computer vision).<\/p>\n<p>They found that the amount of compute required for the same performance has been falling exponentially \u2014 <a href=\"https:\/\/epoch.ai\/blog\/revisiting-algorithmic-progress\">halving every 10 months<\/a>.<\/p>\n<p>So since 2012, the amount of compute required for the same level of performance has fallen by over 10,000 times. Combined with the increased compute used for training, that&#8217;s a lot of growth.<\/p>\n<p>Finally, they found that the size of the data sets used to train the largest language models has been doubling <a href=\"https:\/\/epoch.ai\/trends#data-trends-section\">roughly once a year since 2010<\/a>.<\/p>\n<p>It&#8217;s hard to say whether these trends will continue, but they speak to incredible gains over the past decade in what it&#8217;s possible to do with machine learning.<\/p>\n<p>Indeed, it looks like increasing the size of models (and the amount of compute used to train them) introduces <a href=\"https:\/\/web.archive.org\/web\/20221013012207\/https:\/\/www.alignmentforum.org\/posts\/XusDPpXr6FYJqWkxh\/an-156-the-scaling-hypothesis-a-plan-for-building-agi\">ever more sophisticated behaviour<\/a>. This is how things like GPT-4 are able to perform tasks they weren&#8217;t specifically trained for.<\/p>\n<p>These observations have led to the <a href=\"https:\/\/web.archive.org\/web\/20221013012219\/https:\/\/www.gwern.net\/Scaling-hypothesis\"><em>scaling hypothesis<\/em><\/a>: that we can simply build bigger and bigger neural networks, and as a result we will end up with more and more powerful artificial intelligence, and that this trend of increasing capabilities may increase to human-level AI and beyond.<\/p>\n<p>If this is true, we can attempt to predict how the capabilities of AI technology will increase over time simply by looking at how quickly we are increasing the amount of compute available to train models.<\/p>\n<p>But as we&#8217;ll see, it&#8217;s not just the scaling hypothesis that suggests we could end up with extremely powerful AI relatively soon \u2014 other methods of predicting AI progress come to similar conclusions.<\/p>\n<h3><span id=\"when-can-we-expect-to-develop-transformative-AI\" class=\"toc-anchor\"><\/span>When can we expect transformative AI?<\/h3>\n<p>It&#8217;s difficult to predict exactly when we will develop AI that we expect to be <em>hugely transformative<\/em> for society (for better or for worse) \u2014 for example, by automating all human work or drastically changing the structure of society. But here we&#8217;ll go through a few approaches.<\/p>\n<p>One option is to survey experts. Data from the <a href=\"https:\/\/arxiv.org\/abs\/2401.02843v1\">2023 survey of 3000 AI experts<\/a> implies there is 33% probability of human-level machine intelligence (which would plausibly be transformative in this sense) by 2036, 50% probability by 2047, and 80% by 2100. There are a lot of reasons to be suspicious of these estimates, but we take it as one data point.<\/p>\n<p>Ajeya Cotra (a researcher at Open Philanthropy) attempted to forecast transformative AI <a href=\"https:\/\/web.archive.org\/web\/20221013013042\/https:\/\/www.cold-takes.com\/forecasting-transformative-ai-the-biological-anchors-method-in-a-nutshell\/\">by comparing modern deep learning to the human brain<\/a>. <a href=\"#what-is-deep-learning\">Deep learning<\/a> involves using a huge amount of compute to <em>train<\/em> a model, before that model is able to perform some task. There&#8217;s also a relationship between the amount of compute used to train a model and the amount used by the model when it&#8217;s run. And \u2014 if the <a href=\"#scaling\">scaling hypothesis<\/a> is true \u2014 we should expect the performance of a model to predictably improve as the computational power used increases. So Cotra used a variety of approaches (including, for example, estimating how much compute the human brain uses on a variety of tasks) to estimate how much compute might be needed to train a model that, when run, could carry out the hardest tasks humans can do. She then estimated when using that much compute would be affordable.<\/p>\n<p>Cotra&#8217;s <a href=\"https:\/\/web.archive.org\/web\/20221013013123\/https:\/\/www.alignmentforum.org\/posts\/AfH2oPHCApdKicM4m\/two-year-update-on-my-personal-ai-timelines\">2022 update on her report&#8217;s conclusions<\/a> estimates that there is a 35% probability of transformative AI by 2036, 50% by 2040, and 60% by 2050 \u2014 noting that these guesses are not stable.<\/p>\n<p>Tom Davidson (also a researcher at Open Philanthropy) wrote <a href=\"https:\/\/web.archive.org\/web\/20221013013059\/https:\/\/www.openphilanthropy.org\/research\/report-on-semi-informative-priors\/\">a report<\/a> to complement Cotra&#8217;s work. He attempted to figure out when we might expect to see transformative AI based only on looking at various types of research that transformative AI might be like (e.g. developing technology that&#8217;s the ultimate goal of a STEM field, or proving difficult mathematical conjectures), and how long it&#8217;s taken for each of these kinds of research to be completed in the past, given some quantity of research funding and effort.<\/p>\n<p>Davidson&#8217;s report estimates that, solely on this information, you&#8217;d think that there was an 8% chance of transformative AI by 2036, 13% by 2060, and 20% by 2100. However, Davidson doesn&#8217;t consider the actual ways in which AI has progressed since research started in the 1950s, and notes that it seems likely that the amount of effort we put into AI research will increase as AI becomes increasingly relevant to our economy. As a result, Davidson expects these numbers to be underestimates.<\/p>\n<p>Holden Karnofsky, co-CEO of Open Philanthropy, attempted to <a href=\"https:\/\/web.archive.org\/web\/20221013013107\/https:\/\/www.cold-takes.com\/where-ai-forecasting-stands-today\/\">sum up the findings of others&#8217; forecasts<\/a>. He guessed in 2021 there was more than a 10% chance we&#8217;d see transformative AI by 2036, 50% by 2060, and 66% by 2100. And these guesses might be conservative, since they didn&#8217;t incorporate what we see as <a href=\"https:\/\/80000hours.org\/2022\/08\/is-transformative-ai-coming-sooner-than-we-thought\/\">faster-than-expected progress since the earlier estimates were made<\/a>.<\/p>\n<div class=\"container--page-width\">\n<div class=\"row\">\n<div id=\"tablepress-189-scroll-wrapper\" class=\"tablepress-scroll-wrapper\">\n<table id=\"tablepress-189\" class=\"tablepress tablepress-id-189 tablepress-responsive\">\n<thead>\n<tr class=\"row-1 odd\">\n<th class=\"column-1\">Method<\/th>\n<th class=\"column-2\">Chance of transformative AI by 2036<\/th>\n<th class=\"column-3\">Chance of transformative AI by 2060<\/th>\n<th class=\"column-4\">Chance of transformative AI by 2100<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-hover\">\n<tr class=\"row-2 even\">\n<td class=\"column-1\"><a href=\"https:\/\/arxiv.org\/abs\/2401.02843v1\">Expert survey (Grace et al., 2024)<\/a><\/td>\n<td class=\"column-2\">33%<\/td>\n<td class=\"column-3\">50% (by 2047)<\/td>\n<td class=\"column-4\">80%<\/td>\n<\/tr>\n<tr class=\"row-3 odd\">\n<td class=\"column-1\"><a href=\"https:\/\/arxiv.org\/abs\/2206.04132\">Expert survey (Zhang et al., 2022)<\/a><\/td>\n<td class=\"column-2\">20%<\/td>\n<td class=\"column-3\">50%<\/td>\n<td class=\"column-4\">85%<\/td>\n<\/tr>\n<tr class=\"row-4 even\">\n<td class=\"column-1\"><a href=\"https:\/\/www.alignmentforum.org\/posts\/AfH2oPHCApdKicM4m\/two-year-update-on-my-personal-ai-timelines\">Biological anchors (Cotra, 2022)<\/a><\/td>\n<td class=\"column-2\">35%<\/td>\n<td class=\"column-3\">60% (by 2050)<\/td>\n<td class=\"column-4\">80% (according to the <a href=\"https:\/\/www.cold-takes.com\/forecasting-transformative-ai-the-biological-anchors-method-in-a-nutshell\/\">2020 report<\/a>)<\/td>\n<\/tr>\n<tr class=\"row-5 odd\">\n<td class=\"column-1\"><a href=\"https:\/\/www.openphilanthropy.org\/blog\/report-semi-informative-priors\">Semi-informative priors (Davidson, 2021)<\/a><\/td>\n<td class=\"column-2\">8%<\/td>\n<td class=\"column-3\">13%<\/td>\n<td class=\"column-4\">20%<\/td>\n<\/tr>\n<tr class=\"row-6 even\">\n<td class=\"column-1\"><a href=\"https:\/\/www.cold-takes.com\/where-ai-forecasting-stands-today\/\">Overall guess (Karnofsky, 2021)<\/a><\/td>\n<td class=\"column-2\">10%<\/td>\n<td class=\"column-3\">50%<\/td>\n<td class=\"column-4\">66%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<p><strong>All in all, AI seems to be advancing rapidly.<\/strong> More money and talent is going into the field every year, and models are getting bigger and more efficient.<\/p>\n<p>Even if AI were advancing more slowly, we&#8217;d be concerned about it \u2014 most of the arguments about the risks from AI (that we&#8217;ll get to below) do not depend on this rapid progress.<\/p>\n<p>However, the speed of these recent advances increases the urgency of the issue.<\/p>\n<p>(It&#8217;s totally possible that these estimates are wrong \u2013 below, we discuss how the possibility that we might have a lot of time to work on this problem <a href=\"#we-might-have-a-lot-of-time-to-work-on-this-problem\">is one of the best arguments against this problem being pressing<\/a>).<\/p>\n<h2><span id=\"power-seeking-ai\" class=\"toc-anchor\"><\/span>3. Power-seeking AI could pose an existential threat to humanity<\/h2>\n<p>We&#8217;ve argued so far that we expect AI to be an important \u2014 and potentially transformative \u2014 new technology.<\/p>\n<p>We&#8217;ve also seen reason to think that such transformative AI systems could be built this century.<\/p>\n<p>Now we&#8217;ll turn to the core question: <strong>why do we think this matters so much?<\/strong><\/p>\n<p>There <em>could<\/em> be a lot of reasons. If advanced AI is as transformative as it seems like it&#8217;ll be, there will be many important consequences. But here we are going to explain the issue that seems most concerning to us: <strong>AI systems could pose risks by seeking and gaining power.<\/strong><\/p>\n<p>We&#8217;ll argue that:<\/p>\n<ol>\n<li><a href=\"#aps-systems\">It&#8217;s likely that we&#8217;ll build AI systems that can make and execute plans to achieve goals<\/a><\/li>\n<li><a href=\"#instrumental-convergence\">Advanced planning systems could easily be &#8216;misaligned&#8217; \u2014 in a way that could lead them to make plans that involve disempowering humanity<\/a><\/li>\n<li><a href=\"#lost-control\">Disempowerment by AI systems would be an existential catastrophe<\/a><\/li>\n<li><a href=\"#incentives-and-deception\">People might deploy AI systems that are misaligned, despite this risk<\/a><\/li>\n<\/ol>\n<p>Thinking through each step, <strong>I think there&#8217;s <a href=\"#how-likely-is-an-AI-related-catastrophe\">something like a 1% chance<\/a> of an existential catastrophe resulting from power-seeking AI systems this century.<\/strong> This is my all things considered guess at the risk incorporating considerations of the argument in favour of the risk (which is itself probabilistic), as well as reasons why this argument might be wrong (some of which I discuss <a href=\"#best-arguments-against-this-problem-being-pressing\">below<\/a>). This puts me on the less worried end of 80,000 Hours staff, whose views on our last staff survey ranged from 1\u201355%, with a median of 15%.<\/p>\n<h3><span id=\"aps-systems\" class=\"toc-anchor\"><\/span>It&#8217;s likely we&#8217;ll build advanced planning systems<\/h3>\n<p>We&#8217;re going to argue that future systems with the following three properties might pose a particularly important threat to humanity:<\/p>\n<ol>\n<li>\n<p class=\"doNotRemove\"><strong>They have goals and are good at making plans.<\/strong> <\/p>\n<p>Not all AI systems have goals or make plans to achieve those goals. But some systems (like some chess-playing AI systems) can be thought of in this way. When discussing power-seeking AI, we&#8217;re considering <em>planning<\/em> systems that are relatively advanced, with plans that are in pursuit of some goal(s), and that are capable of carrying out those plans.<\/p>\n<\/li>\n<li>\n<p><strong>They have excellent <em>strategic awareness<\/em>.<\/strong><\/p>\n<p>A particularly good planning system would have a good enough understanding of the world to notice obstacles and opportunities that may help or hinder its plans, and respond to these accordingly. Following <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2206.13353\">Carlsmith<\/a>, we&#8217;ll call this <em>strategic awareness<\/em>, since it allows systems to strategise in a more sophisticated way.<\/p>\n<\/li>\n<li>\n<p><strong>They have highly <em>advanced capabilities<\/em> relative to today&#8217;s systems.<\/strong><\/p>\n<p>For these systems to actually affect the world, we need them to not just <em>make<\/em> plans, but also be good at all the specific tasks required to <em>execute<\/em> those plans.<\/p>\n<p>Since we&#8217;re worried about systems attempting to take power from humanity, we are particularly concerned about AI systems that might be better than humans on one or more tasks that grant <em>people<\/em> significant power when carried out well in today&#8217;s world.<\/p>\n<p>For example, people who are very good at persuasion and\/or manipulation are often able to gain power \u2014 so an AI being good at these things might also be able to gain power. <a href=\"\/articles\/what-could-an-ai-caused-existential-catastrophe-actually-look-like\/\">Other examples<\/a> might include hacking into other systems, tasks within scientific and engineering research, as well as business, military, or political strategy.<\/p>\n<\/li>\n<\/ol>\n<h4 id=\"build-systems\">These systems seem technically possible and we&#8217;ll have strong incentives to build them<\/h4>\n<p>As we saw <a href=\"#making-advances-extremely-quickly\">above<\/a>, we&#8217;ve already produced systems that are very good at carrying out specific tasks.<\/p>\n<p>We&#8217;ve also already produced rudimentary planning systems, like <a href=\"https:\/\/web.archive.org\/web\/20221013011100\/https:\/\/www.deepmind.com\/blog\/alphastar-mastering-the-real-time-strategy-game-starcraft-ii\">AlphaStar<\/a>, which skilfully plays the strategy game Starcraft, and <a href=\"https:\/\/web.archive.org\/web\/20221013011105\/https:\/\/www.deepmind.com\/blog\/muzero-mastering-go-chess-shogi-and-atari-without-rules\">MuZero<\/a>, which plays chess, shogi, and Go.<\/p>\n<p>We&#8217;re not sure whether these systems are producing plans <em>in pursuit of goals per se<\/em>, because we&#8217;re not sure exactly what it means to &#8220;have goals.&#8221; However, since they consistently <em>plan in ways that achieve goals<\/em>, it seems like they have goals in some sense.<\/p>\n<p>Moreover, some existing systems seem to actually represent goals as part of their neural networks.<\/p>\n<p>That said, planning in the real world (instead of games) is much more complex, and to date we&#8217;re not aware of any unambiguous examples of goal-directed planning systems, or systems that exhibit high degrees of strategic awareness.<\/p>\n<p>But as we&#8217;ve discussed, we expect to see further advances <a href=\"#when-can-we-expect-to-develop-transformative-AI\">within this century<\/a>. And we think these advances are likely to produce systems with all three of the above properties.<\/p>\n<p>That&#8217;s because we think that there are particularly strong incentives (like profit) to develop these kinds of systems. In short: because being able to plan to achieve a goal, and execute that plan, seems like a particularly powerful and general way of affecting the world.<\/p>\n<p>Getting things done \u2014 whether that&#8217;s a company selling products, a person buying a house, or a government developing policy \u2014 almost always seems to require these skills. One example would be assigning a powerful system a goal and expecting the system to achieve it \u2014 rather than having to guide it every step of the way. So planning systems seem likely to be (economically and politically) extremely useful.<\/p>\n<p>And if systems are extremely useful, there are likely to be big incentives to build them. For example, an AI that could plan the actions of a company by being given the goal to increase its profits (that is, an AI CEO) would likely provide significant wealth for the people involved \u2014 a direct incentive to produce such an AI.<\/p>\n<p>As a result, if we <em>can<\/em> build systems with these properties (and from what we know, it seems like we will be able to), it seems like we are <em>likely to do so<\/em>.<\/p>\n<h3><span id=\"instrumental-convergence\" class=\"toc-anchor\"><\/span>Advanced planning systems could easily be dangerously &#8216;misaligned&#8217;<\/h3>\n<p>There are reasons to think that these kinds of advanced planning AI systems will be <em>misaligned<\/em>. That is, they will aim to do things that we don&#8217;t want them to do.<\/p>\n<p>There are many reasons why systems might not be aiming to do exactly what we want them to do. For one thing, we don&#8217;t know how, using modern ML techniques, to give systems the precise goals we want (more <a href=\"#controlling-objectives\">here<\/a>).<\/p>\n<p>We&#8217;re going to focus specifically on some reasons why systems might <em>by default<\/em> be misaligned in such a way that they develop plans that pose risks to humanity&#8217;s ability to influence the world \u2014 even when we don&#8217;t want that influence to be lost.<\/p>\n<p>What do we mean by &#8220;by default&#8221;? Essentially, <strong>unless we actively find solutions to some (potentially quite difficult) problems, then it seems like we&#8217;ll create dangerously misaligned AI<\/strong>. (There are reasons this might be wrong \u2014 which we discuss <a href=\"#best-arguments-against-this-problem-being-pressing\">later<\/a>.)<\/p>\n<p><!-- This shortcode is a hacky fix for formatting issues caused by the deeply nested shortcodes (accordions, collapses) in this panel. It is closed with [close_panel_special_formatting]. --><\/p>\n<div class=\"panel clearfix\">\n<h4 id=\"misalignment\">Three examples of &#8220;misalignment&#8221; in a variety of systems<\/h4>\n<p>It&#8217;s worth noting that misalignment isn&#8217;t a purely theoretical possibility (or specific to AI) \u2014 we see misaligned goals in humans and institutions all the time, and have also seen examples of misalignment in AI systems.<\/p>\n<div class=\"panel-group\" id=\"custom-collapse-0\">\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-0\">Example 1: Winning elections<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-0\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"Example 1: Winning elections\">\n<div class=\"panel-body\">\n<p>The democratic political framework is intended to ensure that politicians make decisions that benefit society. But what political systems <em>actually<\/em> reward is winning elections, so that&#8217;s what many politicians end up aiming for.<\/p>\n<p>This is a decent proxy goal \u2014 if you have a plan to improve people&#8217;s lives, they&#8217;re probably more likely to vote for you \u2014 but it isn&#8217;t perfect. As a result, politicians do things that aren&#8217;t clearly the best way of running a country, like <a href=\"https:\/\/web.archive.org\/web\/20221013013549\/https:\/\/www.nber.org\/books-and-chapters\/nber-macroeconomics-annual-2000-volume-15\/political-business-cycle-after-25-years\">raising taxes at the start of their term and cutting them right before elections<\/a>.<\/p>\n<p>That is to say, the things the system <em>does<\/em> are at least a little different from what we would, in a perfect world, <em>want<\/em> it to do: the system is misaligned.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-1\">Example 2: The profit incentive<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-1\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"Example 2: The profit incentive\">\n<div class=\"panel-body\">\n<p>Companies have profit-making incentives. By producing more, and therefore helping people obtain goods and services at cheaper prices, companies make more money.<\/p>\n<p>This is sometimes a decent proxy for making the world better, but profit isn&#8217;t actually the same as the good of all of humanity (bold claim, we know). As a result, there are <a href=\"https:\/\/en.wikipedia.org\/wiki\/Externality#Negative\">negative externalities<\/a>: for example, companies will pollute to make money despite this being worse for society overall.<\/p>\n<p>Again, we have a misaligned system, where the things the system does are at least a little different from what we would want it to do.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-2\">Example 3: Specification gaming in existing AI systems<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-2\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"Example 3: Specification gaming in existing AI systems\">\n<div class=\"panel-body\">\n<p>DeepMind has documented examples of <a href=\"https:\/\/web.archive.org\/web\/20221013013556\/https:\/\/deepmindsafetyresearch.medium.com\/specification-gaming-the-flip-side-of-ai-ingenuity-c85bdb0deeb4\">specification gaming<\/a>: an AI doing well according to its specified reward function (which encodes our intentions for the system), but not doing what researchers intended.<\/p>\n<p>In one example, a robot arm was asked to grasp a ball. But the reward was specified in terms of whether humans thought the robot had been successful. As a result, the arm learned to hover between the ball and the camera, fooling the humans into thinking that it had grasped the ball.<\/p>\n<figure class=\"wp-caption\" >\n<img decoding=\"async\" src=\"https:\/\/80000hours.org\/wp-content\/uploads\/2022\/06\/specification_gaming_gif.gif\" alt=\"A simulated arm hovers between a ball and a camera.\"><figcaption >Source: <a href=\"https:\/\/web.archive.org\/web\/20221013013621\/https:\/\/openai.com\/blog\/deep-reinforcement-learning-from-human-preferences\/\">Christiano et al., 2017<\/a><\/figcaption><\/figure>\n<p>So we know it&#8217;s <em>possible<\/em> to create a misaligned AI system.<\/p>\n<\/div><\/div><\/div>\n<\/div>\n<\/div>\n<h4 id=\"instrumental-convergence-2\">Why these systems could (by default) be dangerously misaligned<\/h4>\n<p>Here&#8217;s the core argument of this article. We&#8217;ll use all three properties from earlier: planning ability, strategic awareness, and advanced capabilities.<\/p>\n<p>To start, we should realise that <strong>a planning system that has a goal will also develop &#8216;instrumental goals&#8217;<\/strong>: things that, if they occur, will make it easier to achieve an overall goal.<\/p>\n<p>We use instrumental goals in plans all the time. For example, a high schooler planning their career might think that getting into university will be helpful for their future job prospects. In this case, &#8220;getting into university&#8221; would be an instrumental goal.<\/p>\n<p>A <em>sufficiently advanced<\/em> AI planning system would also include instrumental goals in its overall plans.<\/p>\n<p>If a planning AI system also has enough <strong>strategic awareness<\/strong>, it will be able to identify facts about the real world (including potential things that would be obstacles to any plans), and plan in light of them. Crucially, these facts would include that access to resources (e.g. money, compute, influence) and greater capabilities \u2014 that is, forms of <em>power<\/em> \u2014 open up new, more effective ways of achieving goals.<\/p>\n<p>This means that, <em>by default<\/em>, advanced planning AI systems would have some worrying instrumental goals:<\/p>\n<ul>\n<li>Self-preservation \u2014 because a system is more likely to achieve its goals if it is still around to pursue them (in Stuart Russell&#8217;s <a href=\"https:\/\/80000hours.org\/podcast\/episodes\/stuart-russell-human-compatible-ai\/\">memorable phrase<\/a>, &#8220;You can&#8217;t fetch the coffee if you&#8217;re dead&#8221;).<\/li>\n<li>Preventing any changes to the AI system&#8217;s goals \u2014 since changing its goals would lead to outcomes that are different from those it would achieve with its current goals.<\/li>\n<li>Gaining power \u2014 for example, by getting more resources and greater capabilities.<\/li>\n<\/ul>\n<p>Crucially, one clear way in which the AI can ensure that it will continue to exist (and not be turned off), and that its objectives will never be changed, would be to gain power over the humans who might affect it (we talk <a href=\"\/articles\/what-could-an-ai-caused-existential-catastrophe-actually-look-like\/#actually-take-power\">here about how AI systems might actually be able to do that<\/a>).<\/p>\n<p>What&#8217;s more, <strong>the AI systems we&#8217;re considering have advanced capabilities<\/strong> \u2014 meaning they can do one or more tasks that grant <em>people<\/em> significant power when carried out well in today&#8217;s world. With such advanced capabilities, these instrumental goals will not be out of reach, and as a result, it seems like the AI system would use its advanced capabilities to get power as part of the plan&#8217;s execution. If we don&#8217;t want the AI systems we create to take power away from us  this would be a particularly dangerous form of misalignment.<\/p>\n<p>In the most extreme scenarios, a planning AI system with sufficiently advanced capabilities could successfully disempower us completely.<\/p>\n<p>As a (very non-rigorous) intuitive check on this argument, let&#8217;s try to apply it to humans.<\/p>\n<p>Humans have a variety of goals. For many of these goals, some form of power-seeking is advantageous: though not everyone seeks power, many people do (in the form of wealth or social or political status), because it&#8217;s useful for getting what they want. This is not catastrophic (usually!) because, as human beings:<\/p>\n<ul>\n<li>We generally feel bound by human norms and morality (even people who really want wealth usually aren&#8217;t willing to kill to get it).<\/li>\n<li>We aren&#8217;t <em>that<\/em> much more capable or intelligent than one another. So even in cases where people aren&#8217;t held back by morality, they&#8217;re not able to take over the world.  <\/li>\n<\/ul>\n<p>(We discuss whether humans are <em>truly<\/em> power-seeking <a href=\"#wrong-about-power-seeking\">later<\/a>.)<\/p>\n<p>A sufficiently advanced AI wouldn&#8217;t have those limitations.<\/p>\n<h4>It might be hard to find ways to prevent this sort of misalignment<\/h4>\n<p>The point of all this isn&#8217;t to say that <em>any<\/em> advanced planning AI system will necessarily attempt to seek power. Instead, it&#8217;s to point out that, <em>unless we find a way to design systems that don&#8217;t have this flaw<\/em>, we&#8217;ll face significant risk.<\/p>\n<p>It seems more than plausible that we could create an AI system that <em>isn&#8217;t<\/em> misaligned in this way, and thereby prevent any disempowerment. Here are some strategies we might take (plus, unfortunately, some reasons why they might be difficult in practice):<\/p>\n<ul>\n<li>\n<p class=\"doNotRemove\"><strong>Control the objectives of the AI system.<\/strong> <a id=\"controlling-objectives\" class=\"link-anchor\"><\/a> We may be able to design systems that simply don&#8217;t have objectives to which the above argument applies \u2014 and thus don&#8217;t incentivise power-seeking behaviour. For example, we could find ways to explicitly instruct AI systems not to harm humans, or find ways to reward AI systems (in training environments) for not engaging in specific kinds of power-seeking behaviour (and also find ways to ensure that this behaviour continues outside the training environment). <\/p>\n<p>Carlsmith gives two reasons why doing this seems particularly hard.<\/p>\n<p>First, for modern ML systems, we don&#8217;t get to explicitly state a system&#8217;s objectives \u2014 instead we reward (or punish) a system in a training environment so that it learns on its own. This raises a number of difficulties, one of which is <em>goal misgeneralisation<\/em>. Researchers have uncovered <a href=\"https:\/\/web.archive.org\/web\/20221013013904\/https:\/\/proceedings.mlr.press\/v162\/langosco22a.html\">real examples<\/a> of systems that <em>appear<\/em> to have learned to pursue a goal in the training environment, but then fail to generalise that goal when they operate in a new environment. This raises the possibility that we could think we&#8217;ve successfully trained an AI system not to seek power \u2014 but that the system would seek power anyway when deployed in the real world.<\/p>\n<p>Second, when we specify a goal to an AI system (or, when we can&#8217;t explicitly do that, when we find ways to reward or punish a system during training), we usually do this by giving the system a proxy by which outcomes can be measured (e.g. positive human feedback on a system&#8217;s achievement). But often those proxies don&#8217;t quite work. In general, we might expect that even if a proxy <em>appears<\/em> to correlate well with successful outcomes, it might not do so when that proxy is optimised for. (The examples <a href=\"#misalignment\">above<\/a> of politicians, companies, and the robot arm failing to grasp a ball are illustrations of this.) We&#8217;ll look at a more specific example of how problems with proxies could lead to an existential catastrophe <a href=\"\/articles\/what-could-an-ai-caused-existential-catastrophe-actually-look-like\/#getting-what-you-measure\">here<\/a>.<\/p>\n<p>For more on the specific difficulty of controlling the objectives given to deep neural networks trained using self-supervised learning and reinforcement learning, we recommend OpenAI governance researcher Richard Ngo&#8217;s discussion of how <a href=\"https:\/\/www.alignmentforum.org\/posts\/KbyRPCAsWv5GtfrbG\/what-misalignment-looks-like-as-capabilities-scale#Realistic_training_processes_lead_to_the_development_of_misaligned_goals\">realistic training processes lead to the development of misaligned goals<\/a>.<\/p>\n<\/li>\n<li>\n<p><strong>Control the inputs into the AI system.<\/strong> AI systems will only develop plans to seek power if they have enough information about the world to realise that seeking power is indeed a way to achieve its goals.<\/p>\n<\/li>\n<li>\n<p><strong>Control the capabilities of the AI system.<\/strong> AI systems will likely only be able to carry out plans to seek power if they have sufficiently <a href=\"#aps-systems\">advanced capabilities<\/a> in skills that grant people significant power in today&#8217;s world.<\/p>\n<\/li>\n<\/ul>\n<p>But to make any strategy work, it will need to both:<\/p>\n<ul>\n<li><strong>Retain the usefulness of the AI systems<\/strong> &#8212; and so remain economically competitive with less safe systems. Controlling the inputs and capabilities of AI systems will clearly have costs, so it seems hard to ensure that these controls, even if they&#8217;re developed, are actually used. But this is also a problem for controlling a system&#8217;s objectives. For example, we may be able to prevent power-seeking behaviour by ensuring that AI systems stop to check in with humans about any decisions they make. But these systems might be significantly slower and less immediately useful to people than systems that don&#8217;t stop to carry out these checks. As a result, there might still be incentives to use a faster, more initially effective misaligned system (we&#8217;ll look at incentives more in the <a href=\"#incentives-and-deception\">next section<\/a>).<\/p>\n<\/li>\n<li>\n<p><strong>Continue to work as the planning ability and strategic awareness of systems <a href=\"#scaling\">improve over time<\/a>.<\/strong> Some seemingly simple solutions (for example, trying to give a system a long list of things it isn&#8217;t allowed to do, like stealing money or physically harming humans) break down as the planning abilities of the systems increase. This is because, the more capable a system is at developing plans, the more likely it is to identify loopholes or failures in the safety strategy \u2014 and as a result, the more likely the system is to develop a plan that involves power-seeking.<\/p>\n<\/li>\n<\/ul>\n<p>Ultimately, by looking at the state of the research on this topic, and speaking to experts in the field, we think that there are currently no <em>known<\/em> ways of building aligned AI systems that seem likely to fulfil both these criteria.<\/p>\n<p>So: that&#8217;s the core argument. There are <a href=\"https:\/\/web.archive.org\/web\/20221013014042\/https:\/\/bayes.net\/prioritising-ai\/\">many variants of this argument<\/a>. Some have argued that AI systems might gradually shape our future via subtler forms of influence that nonetheless could amount to an existential catastrophe; others argue that the most likely form of disempowerment is in fact just killing everyone. We&#8217;re not sure how a catastrophe would be most likely to play out, but have tried to articulate the heart of the argument, as we see it: that AI presents an existential risk.<\/p>\n<p><strong>There are definitely reasons this argument might not be right!<\/strong> We go through some of the reasons that seem strongest to us <a href=\"#wrong-about-power-seeking\">below<\/a>. But overall it seems possible that, for at least some kinds of advanced planning AI systems, it will be harder to build systems that don&#8217;t seek power in this dangerous way than to build systems that do.<\/p>\n<div class=\"well bg-gray-lighter margin-bottom margin-top padding-top-small padding-bottom-small\">\n<p><strong>At this point, you may have questions like:<\/strong><\/p>\n<ul>\n<li><a href=\"#why-cant-we-just-unplug-a-dangerous-ai\">Why can&#8217;t we just unplug a dangerous AI?<\/a><\/li>\n<li><a href=\"#surely-a-truly-intelligent-ai-system-would-know-not-to-disempower-everyone\">Surely a <em>truly<\/em> intelligent AI system would know not to disempower everyone?<\/a><\/li>\n<li><a href=\"#couldnt-we-just-sandbox-any-potentially-dangerous-ai-system-until-we-know-its-safe\">Couldn&#8217;t we just &#8216;sandbox&#8217; any potentially dangerous AI system until we know it&#8217;s safe?<\/a><\/li>\n<\/ul>\n<p>We think there are good responses to all these questions, so we&#8217;ve added a long list of arguments against working on AI risk &#8212; and our responses &#8212; for these (and other) questions <a href=\"#good-responses\">below<\/a>.<\/p>\n<\/div>\n<h3><span id=\"lost-control\" class=\"toc-anchor\"><\/span>Disempowerment by AI systems would be an existential catastrophe<\/h3>\n<p>When we say we&#8217;re concerned about <a href=\"\/articles\/existential-risks\">existential catastrophes<\/a>, we&#8217;re not <em>just<\/em> concerned about risks of extinction. This is because the source of our concern is rooted in <a href=\"\/articles\/future-generations\/\">longtermism<\/a>: the idea that the lives of all future generations matter, and so it&#8217;s extremely important to protect their interests.<\/p>\n<p>This means that any event that could prevent all future generations from living lives full of whatever you think makes life valuable (whether that&#8217;s happiness, justice, beauty, or general flourishing) counts as an existential catastrophe.<\/p>\n<p>It seems extremely unlikely that we&#8217;d be able to regain power over a system that successfully disempowers humanity. And as a result, the entirety of the future \u2014 everything that happens for Earth-originating life, for the rest of time \u2014 would be determined by the goals of systems that, although built by us, are not aligned with us. Perhaps those goals will create a <a href=\"https:\/\/80000hours.org\/articles\/future-generations\/\">long and flourishing future<\/a>, but we see little reason for confidence.<\/p>\n<p>This isn&#8217;t to say that we don&#8217;t think AI <em>also<\/em> poses a risk of human extinction. Indeed, we think making humans extinct is one highly plausible way in which an AI system could completely and permanently ensure that we are never able to regain power.<\/p>\n<h3><span id=\"incentives-and-deception\" class=\"toc-anchor\"><\/span>People might deploy misaligned AI systems despite the risk<\/h3>\n<p>Surely no one would actually build or use a misaligned AI if they knew it could have such terrible consequences, right?<\/p>\n<p>Unfortunately, there are at least two reasons people might create and then deploy misaligned AI \u2014 which we&#8217;ll go through one at a time:<\/p>\n<h4>1. People might think it&#8217;s aligned when it&#8217;s not<\/h4>\n<p>Imagine there&#8217;s a group of researchers trying to tell, in a test environment, whether a system they&#8217;ve built is aligned. We&#8217;ve argued that an intelligent planning AI will want to improve its abilities to effect changes in pursuit of its objective, and it&#8217;s almost always easier to do that if it&#8217;s deployed in the real world, where a much wider range of actions are available. As a result, any misaligned AI that&#8217;s sophisticated enough will try to understand what the researchers want it to do and at least <a href=\"https:\/\/web.archive.org\/web\/20221013014101\/https:\/\/www.alignmentforum.org\/tag\/eliciting-latent-knowledge-elk\">pretend to be doing that<\/a>, deceiving the researchers into thinking it&#8217;s aligned. (For example, a reinforcement learning system might be <a href=\"https:\/\/web.archive.org\/web\/20221013014109\/https:\/\/www.alignmentforum.org\/posts\/pRkFkzwKZ2zfa3R6H\/without-specific-countermeasures-the-easiest-path-to\">rewarded for certain <em>apparent<\/em> behaviour during training<\/a>, regardless of what it&#8217;s actually doing.)<\/p>\n<p>Hopefully, we&#8217;ll be aware of this sort of behaviour and be able to detect it. But catching a sufficiently advanced AI in deception seems potentially harder than catching a human in a lie, which isn&#8217;t always easy. For example, a sufficiently intelligent deceptive AI system may be able to deceive us into thinking we&#8217;ve <em>solved the problem of AI deception<\/em>, even if we haven&#8217;t.<\/p>\n<p>If AI systems are good at deception, and have sufficiently advanced capabilities, a reasonable strategy for such a system could be to deceive humans completely until the system has a way to guarantee it can overcome any resistance to its goals.<\/p>\n<h4>2.  There are incentives to deploy systems sooner rather than later<\/h4>\n<p>We might also expect some people with the ability to deploy a misaligned AI to charge ahead despite any warning signs of misalignment that do come up, because of <a href=\"https:\/\/doi.org\/10.48550\/arXiv.1907.04534\">race dynamics<\/a> \u2014 where people developing AI want to do so before anyone else.<\/p>\n<p>For example, if you&#8217;re developing an AI to improve military or political strategy, it&#8217;s much more useful if none of your rivals have a similarly powerful AI.<\/p>\n<p>These incentives apply <a href=\"https:\/\/web.archive.org\/web\/20220829015919\/https:\/\/forum.effectivealtruism.org\/posts\/cXBznkfoPJAjacFoT\/are-you-really-in-a-race-the-cautionary-tales-of-szilard-and\">even to people attempting to build an AI in the hopes of using it to make the world a better place<\/a>.<\/p>\n<p>For example, say you&#8217;ve spent years and years researching and developing a powerful AI system, and all you want is to use it to make the world a better place. Simplifying things a lot, say there are two possibilities:<\/p>\n<ol>\n<li>This powerful AI will be aligned with your beneficent aims, and you&#8217;ll transform society in a potentially radically positive way.<\/li>\n<li>The AI will be sufficiently misaligned that it&#8217;ll take power and permanently end humanity&#8217;s control over the future. <\/li>\n<\/ol>\n<p>Let&#8217;s say you think there&#8217;s a 90% chance that you&#8217;ve succeeded in building an aligned AI. But technology often develops at similar speeds across society, so there&#8217;s a good chance that someone else will soon also develop a powerful AI. And you think they&#8217;re less cautious, or less altruistic, so you think their AI will only have an 80% chance of being aligned with good goals, and pose a 20% chance of existential catastrophe. And only if you get there first can your more beneficial AI be dominant. As a result, you might decide to go ahead with deploying your AI, accepting the 10% risk.<\/p>\n<div class=\"panel clearfix \">\n<h2 class=\"large\"><span id=\"this-all-sounds-very-abstract-what-could-an-existential-catastrophe-caused-by-ai-actually-look-like\" class=\"toc-anchor\"><\/span>This all sounds very abstract. What could an existential catastrophe caused by AI actually look like?<\/h2>\n<p>The argument we&#8217;ve given so far is very general, and doesn&#8217;t really look at the specifics of <em>how<\/em> an AI that is attempting to seek power might actually do so.<\/p>\n<p>If you&#8217;d like to get a better understanding of what an existential catastrophe caused by AI might actually look like, we&#8217;ve written a short separate article on that topic. If you&#8217;re happy with the high-level abstract arguments so far, feel free to skip to the next section!<\/p>\n<p><a href=\"\/articles\/what-could-an-ai-caused-existential-catastrophe-actually-look-like\/\" title=\"\" class=\"btn btn-primary\">What could an existential AI catastrophe actually look like?<\/a><\/p>\n<\/div>\n<h2><span id=\"other-risks\" class=\"toc-anchor\"><\/span>4. Even if we find a way to avoid power-seeking, there are still risks<\/h2>\n<p>So far we&#8217;ve described what a large proportion of researchers in the field think is the major existential risk from potential advances in AI, which depends crucially on an AI seeking power to achieve its goals.<\/p>\n<p>If we can prevent power-seeking behaviour, we will have reduced existential risk substantially.<\/p>\n<p>But even if we succeed, there are still existential risks that AI could pose.<\/p>\n<p>There are at least two ways these risks could arise:<\/p>\n<ul>\n<li>We expect that AI systems will help increase the rate of scientific progress. While there would be clear benefits to this automation \u2014 the rapid development of new medicine, for example \u2014 some forms of technological development can pose threats, including existential threats, to humanity. This technological advancement might increase our available destructive power or make dangerous technologies cheaper or more widely accessible.<\/li>\n<li>We might start to see AI automate <a href=\"https:\/\/www.cold-takes.com\/how-digital-people-could-change-the-world\/\">many \u2013 or possibly even all \u2013 economically important tasks<\/a>. It&#8217;s hard to predict exactly what the effects of this would be on society. But it seems plausible that this could increase existential risks. For example, if AI systems are highly transformative, then their use (or potential use) could possibly create insurmountable power imbalances. Even the threat of this might be enough. For example, a military might feel pushed to create transformative automated weapons because it knows or believes its enemies are doing so, even if this dynamic benefits no one.<\/li>\n<\/ul>\n<p>We know of several specific areas in which advanced AI may increase existential risks, though are likely others we haven&#8217;t thought of.<\/p>\n<h3><span id=\"bioweapons\" class=\"toc-anchor\"><\/span>Bioweapons<\/h3>\n<p>In 2022, Collaborations Pharmaceuticals \u2014 a small research corporation in North Carolina \u2014 were building an AI model to help determine the structure of new drugs. As part of this process, they trained the model to penalise drugs that it predicted were toxic. This had just one problem: <a href=\"https:\/\/climate-science.press\/wp-content\/uploads\/2022\/03\/00s42256-022-00465-9.pdf\">you could run the toxicity prediction in reverse to invent new <em>toxic<\/em> drugs<\/a>.<\/p>\n<p>Some of the deadliest events in human history have been pandemics. Pathogens&#8217; ability to infect, replicate, kill, and spread \u2014 often undetected \u2014 make them exceptionally dangerous.<\/p>\n<p>Even without AI, advancing biotechnology poses <a href=\"https:\/\/80000hours.org\/problem-profiles\/preventing-catastrophic-pandemics\/\">extreme risks<\/a>. It potentially provides opportunities for state actors or terrorists to create mass-casualty events.<\/p>\n<p>Advances in AI have the potential to make biotechnology more dangerous.<\/p>\n<p>For example:<\/p>\n<ol>\n<li>Dual-use tools, like the automation of laboratory processes, could lower the barriers for rogue actors trying to manufacture a dangerous pandemic virus. The Collaborations Pharmaceuticals model is an example of a dual-use tool (although it&#8217;s not particularly dangerous).<\/p>\n<\/li>\n<li>\n<p>AI-based biological design tools could enable sophisticated actors to reprogram the genomes of dangerous pathogens to specifically enhance their lethality, transmissibility, and immune evasion.<\/p>\n<\/li>\n<\/ol>\n<p>If AI is able to advance the rate of scientific and technological progress, these risks may be amplified and accelerated \u2014 making dangerous technology more widely available or increasing its possible destructive power.<\/p>\n<p>In the <a href=\"https:\/\/arxiv.org\/abs\/2401.02843v1\">2023 survey of AI experts<\/a>, 73% of respondents said they had either &#8220;extreme&#8221; or &#8220;substantial&#8221; concern that in the future Al will let &#8220;dangerous groups make powerful tools (e.g. engineered viruses).&#8221;<\/p>\n<h3><span id=\"intentionally-dangerous-ai-agents\" class=\"toc-anchor\"><\/span>Intentionally dangerous AI agents<\/h3>\n<p>Most of this article discusses the risk of power-seeking AI systems that arise unintentionally due to misalignment.<\/p>\n<p>But we can&#8217;t rule out the possibility that some people might <em>intentionally<\/em> create rogue AI agents that seek to disempower humanity. It might seem hard to imagine, but extremist ideologies of many forms have inspired humans to carry out radically violent and even self-destructive plans.<\/p>\n<h3><span id=\"cyberweapons\" class=\"toc-anchor\"><\/span>Cyberweapons<\/h3>\n<p>AI can already be used in cyberattacks, such as <a href=\"https:\/\/hbr.org\/2024\/05\/ai-will-increase-the-quantity-and-quality-of-phishing-scams\">phishing<\/a>, and more powerful AI may cause greater information security challenges (though it could also be useful in cyberdefense).<\/p>\n<p>On its own, AI-enabled cyberwarfare is unlikely to pose an existential threat to humanity. Even the most damaging and costly societal-scale cyberattacks wouldn&#8217;t approach an extinction-level event.<\/p>\n<p>But AI-enabled cyberattacks could provide access to other dangerous technology, such as bioweapons, nuclear arsenals, or autonomous weapons. So there may be genuine existential risks posed by AI-related cyberweapons, but they will most likely run through another existential risk.<\/p>\n<p>The cyber capabilities of AI systems are also relevant to <a href=\"https:\/\/80000hours.org\/articles\/what-could-an-ai-caused-existential-catastrophe-actually-look-like\/#actually-take-power\">how a power-seeking AI could actually take power<\/a>.<\/p>\n<h3><span id=\"dangerous-new-technology\" class=\"toc-anchor\"><\/span>Other dangerous tech<\/h3>\n<p>If AI systems generally accelerate the rate of scientific and technological progress, we think it&#8217;s reasonably likely that we&#8217;ll invent new dangerous technologies.<\/p>\n<p>For example, <a href=\"\/problem-profiles\/atomically-precise-manufacturing\/\">atomically precise manufacturing<\/a>, sometimes called nanotechnology, has been hypothesised as an existential threat \u2014 and it&#8217;s a scientifically plausible technology that AI could help us invent far sooner than we would otherwise.<\/p>\n<p>In <em>The Precipice<\/em>, Toby Ord estimated the chances of an existential catastrophe by 2120 from &#8220;unforeseen anthropogenic risks&#8221; at 1 in 30. This estimate suggests there could be other discoveries, perhaps involving yet to be understood physics, that could enable the creation of technologies with catastrophic consequences.<\/p>\n<h3><span id=\"ai-could-empower-totalitarian-governments\" class=\"toc-anchor\"><\/span>AI could empower totalitarian governments<\/h3>\n<p>An AI-enabled authoritarian government could <a href=\"https:\/\/web.archive.org\/web\/20221013014526\/https:\/\/www.bbc.com\/future\/article\/20201014-totalitarian-world-in-chains-artificial-intelligence\">completely automate the monitoring and repression of its citizens<\/a>, as well as significantly influence the information people see, perhaps making it impossible to coordinate action against such a regime.<\/p>\n<p>AI is already facilitating the ability of governments to monitor their own citizens.<\/p>\n<p>The NSA is using AI <a href=\"https:\/\/web.archive.org\/web\/20221016011132\/https:\/\/www.defenseone.com\/technology\/2020\/01\/spies-ai-future-artificial-intelligence-us-intelligence-community\/162673\/\">to help filter the huge amounts of data they collect<\/a>, significantly speeding up their ability to identify and predict the actions of people they are monitoring. In China, AI is increasingly being used for facial recognition and predictive policing, including <a href=\"https:\/\/web.archive.org\/web\/20221016011138\/https:\/\/www.nytimes.com\/2019\/05\/22\/world\/asia\/china-surveillance-xinjiang.html\">automated racial profiling<\/a> and automatic alarms when people classified as potential threats enter certain public places.<\/p>\n<p>These sorts of surveillance technologies seem likely to significantly improve \u2014 thereby increasing governments&#8217; abilities to control their populations.<\/p>\n<p>At some point, authoritarian governments could extensively use AI-related technology to:<\/p>\n<ul>\n<li>Monitor and track dissidents<\/li>\n<li>Preemptively suppress opposition to the ruling party<\/li>\n<li>Control the military and dominate external actors <\/li>\n<li>Manipulate information flows and carefully shape public opinion<\/li>\n<\/ul>\n<p>Again, in the <a href=\"https:\/\/wiki.aiimpacts.org\/ai_timelines\/predictions_of_human-level_ai_timelines\/ai_timeline_surveys\/2023_expert_survey_on_progress_in_ai\">2023 survey of AI experts<\/a>, 73% of respondents expressed &#8220;extreme&#8221; or &#8220;substantial&#8221; concern that in the future authoritarian rulers could &#8220;use Al to control their population.&#8221;<\/p>\n<p>If a regime achieved a form of truly stable totalitarianism, it could make people&#8217;s lives much worse for a long time into the future, making it a particularly scary possible scenario resulting from AI. (Read more in our article on <a href=\"\/problem-profiles\/risks-of-stable-totalitarianism\/\">risks of stable totalitarianism<\/a>.)<\/p>\n<h3><span id=\"artificial-intelligence-and-war\" class=\"toc-anchor\"><\/span>AI could worsen war<\/h3>\n<p>We&#8217;re concerned that <a href=\"\/problem-profiles\/great-power-conflict\/\">great power conflict<\/a> could also pose a substantial threat to our world, and advances in AI seem likely to change the nature of war \u2014 through lethal autonomous weapons or through automated decision making.<\/p>\n<p>In some cases, great power war could pose an existential threat \u2014 for example, if the conflict is nuclear. Some argue that lethal autonomous weapons, if sufficiently powerful and mass-produced, could themselves constitute a new form of <a href=\"https:\/\/www.brookings.edu\/articles\/applying-arms-control-frameworks-to-autonomous-weapons\/\">weapon of mass destruction<\/a>.<\/p>\n<p>And if a single actor produces particularly powerful AI systems, this could be seen as giving them a <em>decisive strategic advantage<\/em>. Such an outcome, or even the expectation of such an outcome, could be highly destabilising.<\/p>\n<p>Imagine that the US was working to produce a planning AI that&#8217;s intelligent enough to ensure that Russia or China could never successfully launch another nuclear weapon. This could incentivise a first strike from the actor&#8217;s rivals before these AI-developed plans can ever be put into action.<\/p>\n<p>This is because nuclear deterrence can benefit from symmetry between the abilities of nuclear powers, in that the threat of a nuclear response to a first strike is believable and therefore a deterrent to a first strike. Advances in AI, which could be directly applied to nuclear forces, could create asymmetries in the capabilities of nuclear-armed nations. This could include improving early warning systems, air defence systems, and cyberattacks that disable weapons.<\/p>\n<p>For example, many countries use <a href=\"https:\/\/en.wikipedia.org\/wiki\/Submarine-launched_ballistic_missile\">submarine-launched ballistic missiles<\/a> as part of their nuclear deterrence systems \u2014 the idea is that if nuclear weapons can be hidden under the ocean, they will never be destroyed in the first strike. This means that they can always be used for a counterattack, and therefore act as an effective deterrent against first strikes. But AI could make it far easier to detect submarines underwater, enabling their destruction in a first strike \u2014 removing this deterrent.<\/p>\n<p>Many other destabilising scenarios are likely possible.<\/p>\n<p class=\"doNotRemove\">A report from the <a href=\"https:\/\/web.archive.org\/web\/20221013014512\/https:\/\/www.sipri.org\/sites\/default\/files\/2020-06\/artificial_intelligence_strategic_stability_and_nuclear_risk.pdf\">Stockholm International Peace Research Institute<\/a> found that, while AI could potentially also have stabilising effects (for example by making <em>everyone<\/em> feel more vulnerable, decreasing the chances of escalation), destabilising effects could arise even before advances in AI are actually deployed. This is because one state&#8217;s belief that their opponents have new nuclear capabilities can be enough to disrupt the delicate balance of deterrence.  <\/p>\n<p>Luckily, there are also plausible ways in which AI could help prevent the use of nuclear weapons \u2014 for example, by improving the ability of states to detect nuclear launches, which would reduce the chances of false alarms like those that <a href=\"https:\/\/en.wikipedia.org\/wiki\/1983_Soviet_nuclear_false_alarm_incident\">nearly caused nuclear war in 1983<\/a>.<\/p>\n<p>Overall, we&#8217;re uncertain about whether AI will substantially increase the risk of nuclear or conventional conflict in the short term \u2014 it could even end up decreasing the risk. But we think it&#8217;s important to pay attention to possible catastrophic outcomes and take reasonable steps to reduce their likelihood.<\/p>\n<h3><span id=\"other-risks-from-ai\" class=\"toc-anchor\"><\/span>Other risks from AI<\/h3>\n<p>We&#8217;re also concerned about the following issues:<\/p>\n<ul>\n<li>Existential threats that result not from the power-seeking behaviour of AI systems, but from <a href=\"https:\/\/web.archive.org\/web\/20221013014541\/https:\/\/www.alignmentforum.org\/posts\/LpM3EAakwYdS6aRKf\/what-multipolar-failure-looks-like-and-robust-agent-agnostic\">the interaction between AI systems<\/a>. (In order to pose a risk, these systems would still need to be, to some extent, misaligned.)<\/li>\n<li>Other ways we haven&#8217;t thought of that AI systems could be misused \u2014 especially ones that might significantly affect <a href=\"\/articles\/future-generations\">future generations<\/a>.<\/li>\n<li>Other moral mistakes made in the design and use of AI systems, particularly if future AI systems are themselves deserving of moral consideration. For example, we might (inadvertently) create sentient AI systems, which could then suffer in huge numbers. We think this could be extremely important, so we&#8217;ve written about it in a <a href=\"https:\/\/80000hours.org\/problem-profiles\/artificial-sentience\/\">separate problem profile<\/a>.<\/li>\n<\/ul>\n<h2><span id=\"how-likely-is-an-AI-related-catastrophe\" class=\"toc-anchor\"><\/span>So, how likely is an AI-related catastrophe?<\/h2>\n<p>This is a really difficult question to answer.<\/p>\n<p>There are no past examples we can use to determine the frequency of AI-related catastrophes.<\/p>\n<p>All we have to go off are arguments (like the ones we&#8217;ve given above), and less relevant data like the history of technological advances. And we&#8217;re definitely not certain that the arguments we&#8217;ve presented are completely correct.<\/p>\n<p>Consider the argument we gave earlier about the dangers of power-seeking AI in particular, based off <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2206.13353\">Carlsmith&#8217;s report<\/a>. At the end of his report, Carlsmith gives some rough guesses of the chances that <em>each stage<\/em> of his argument is correct (conditional on the previous stage being correct):<\/p>\n<ol>\n<li>By 2070 it will be possible and financially feasible to build strategically aware systems that can outperform humans on many power-granting tasks, and that can successfully make and carry out plans: Carlsmith guesses there&#8217;s a 65% chance of this being true.<\/li>\n<li>Given this feasibility, there will be strong incentives to build such systems: 80%.<\/li>\n<li>Given both the feasibility and incentives to build such systems, it will be much harder to develop aligned systems that don&#8217;t seek power than to develop misaligned systems that do, but which are at least superficially attractive to deploy: 40%.<\/li>\n<li>Given all of this, some deployed systems will seek power in a misaligned way that causes over $1 trillion (in 2021 dollars) of damage: 65%.<\/li>\n<li>Given all the previous premises, misaligned power-seeking AI systems will end up disempowering basically all of humanity: 40%.<\/li>\n<li>Given all the previous premises, this disempowerment will constitute an existential catastrophe: 95%.<\/li>\n<\/ol>\n<p>Multiplying these numbers together, Carlsmith estimated that there&#8217;s a 5% chance that his argument is right and there will be an existential catastrophe from misaligned power-seeking AI by 2070. When we spoke to Carlsmith, he noted that in the year between the writing of his report and the publication of this article, his overall guess at the chance of an existential catastrophe from power-seeking AI by 2070 had increased to >10%.<\/p>\n<p>The overall probability of existential catastrophe from AI would, in Carlsmith&#8217;s view, be higher than this, because there are other routes to possible catastrophe \u2014 like those discussed in the <a href=\"#other-risks\">previous section<\/a> \u2014 although our guess is that these other routes are probably a lot less likely to lead to existential catastrophe.<\/p>\n<p>For another estimate, in <a href=\"\/the-precipice\/\"><em>The Precipice<\/em><\/a>, philosopher and advisor to 80,000 Hours Toby Ord estimated a 1-in-6 risk of existential catastrophe by 2120 (from any cause), and that 60% of this risk comes from misaligned AI \u2014 giving a total of a 10% risk of existential catastrophe from misaligned AI by 2120.<\/p>\n<p><a href=\"https:\/\/web.archive.org\/web\/20221013014859\/https:\/\/www.alignmentforum.org\/posts\/QvwSr5LsxyDeaPK5s\/existential-risk-from-ai-survey-results\">A 2021 survey of 44 researchers working on reducing existential risks from AI<\/a> found the median risk estimate was 32.5% \u2014 the highest answer given was 98%, and the lowest was 2%. There&#8217;s obviously <em>a lot<\/em> of selection bias here: people choose to work on reducing risks from AI because they think this is unusually important, so we should expect estimates from this survey to be substantially higher than estimates from other sources. But there&#8217;s clearly significant uncertainty about how big this risk is, and huge variation in answers.<\/p>\n<p>All these numbers are shockingly, disturbingly high. We&#8217;re far from certain that all the arguments are correct. But these are generally the highest guesses for the level of existential risk of any of the issues we&#8217;ve examined (like <a href=\"https:\/\/80000hours.org\/problem-profiles\/global-catastrophic-biological-risks\/\">engineered pandemics<\/a>, <a href=\"https:\/\/80000hours.org\/problem-profiles\/great-power-conflict\/\">great power conflict<\/a>, <a href=\"\/problem-profiles\/climate-change\">climate change<\/a>, or <a href=\"https:\/\/80000hours.org\/problem-profiles\/nuclear-security\/\">nuclear war<\/a>).<\/p>\n<p>That said, I think there are <a href=\"#broader-reasons-were-wrong\">reasons why it&#8217;s harder to make guesses about the risks from AI<\/a> than other risks \u2013 and possibly reasons to think that the estimates we&#8217;ve quoted above are systematically too high.<\/p>\n<p>If I was forced to put a number on it, I&#8217;d say something like 1%. This number includes considerations both in favour and against the argument. I&#8217;m less worried than other 80,000 Hours staff \u2014 our position as an organisation is that the risk is between 3% and 50%.<\/p>\n<p>All this said, the arguments for such high estimates of the existential risk posed by AI are persuasive \u2014 making risks from AI a top contender for the most pressing problem facing humanity.<\/p>\n<div class=\"well bg-gray-lighter margin-bottom margin-top padding-top-small padding-bottom-small\">\n<p><strong>Here are some more questions you might have:<\/strong><\/p>\n<ul>\n<li><a href=\"#may-or-may-not-ever-exist\">Can it make sense to dedicate my career to solving an issue based on a speculative story about a technology that may or may not ever exist?<\/a><\/li>\n<li><a href=\"#is-this-a-form-of-pascals-mugging-taking-a-big-bet-on-tiny-probabilities\">Is this a form of &#8216;Pascal&#8217;s mugging&#8217; \u2014 taking a big bet on tiny probabilities?<\/a><\/li>\n<\/ul>\n<p>Again, we think there are <a href=\"#good-responses\">strong responses<\/a> to these questions.<\/p>\n<\/div>\n<h2><span id=\"we-can-tackle-these-risks\" class=\"toc-anchor\"><\/span>5. We can tackle these risks<\/h2>\n<p>We think one of the most important things you can do would be to help reduce the gravest risks that AI poses.<\/p>\n<p>This isn&#8217;t just because we think these risks are high \u2014 it&#8217;s also because we think there are real things we can do to reduce these risks.<\/p>\n<p>We know of two main ways people work to reduce these risks:<\/p>\n<ol>\n<li>Technical AI safety research<\/li>\n<li>AI governance and policy work<\/li>\n<\/ol>\n<p>There are lots of ways to contribute to this work. In this section, we discuss many broad approaches within both categories to illustrate the point that there <em>are<\/em> things we can do to address these risks. <a href=\"#what-can-you-do-concretely-to-help\">Below<\/a>, we discuss the kinds of careers you can pursue to work on these kinds of approaches.<\/p>\n<h3><span id=\"technical-ai-safety-research\" class=\"toc-anchor\"><\/span>Technical AI safety research<\/h3>\n<p>The benefits of transformative AI could be huge, and there are many different actors involved (operating in different countries), which means it will likely be really hard to prevent its development altogether.<\/p>\n<p>(It&#8217;s also possible that it wouldn&#8217;t even be a good idea if we could \u2014 after all, that would mean forgoing the benefits as well as preventing the risks.)<\/p>\n<p>As a result, we think it makes more sense to focus on making sure that this development is safe \u2014 meaning that it has a high probability of avoiding all the catastrophic failures listed above.<\/p>\n<p>One way to do this is to try to develop technical solutions to prevent the kind of power-seeking behaviour <a href=\"#instrumental-convergence\">we discussed earlier<\/a> \u2014 this is generally known as working on <em>technical AI safety<\/em>, sometimes called just &#8220;AI safety&#8221; for short.<\/p>\n<p>We discuss this path in more detail here:<\/p>\n<p><a href=\"\/career-reviews\/ai-safety-researcher\/\" title=\"\" class=\"btn btn-primary\">Career review of technical AI safety research<\/a><\/p>\n<h4>Approaches<\/h4>\n<p>There are lots of approaches to technical AI safety, including:<\/p>\n<ul>\n<li><strong>Scalably learning from human feedback.<\/strong> Examples include <a href=\"https:\/\/www.youtube.com\/watch?v=v9M2Ho9I9Qo\">iterated amplification<\/a>, <a href=\"https:\/\/openai.com\/research\/debate\">AI safety via debate<\/a>, <a href=\"https:\/\/www.alignmentforum.org\/posts\/nd692YfFGfZDh9Mwz\/an-69-stuart-russell-s-new-book-on-why-we-need-to-replace\">building AI assistants that are uncertain about our goals and learn them by interacting with us<\/a>, and <a href=\"https:\/\/www.alignment.org\/blog\/arcs-first-technical-report-eliciting-latent-knowledge\/\">other ways to get AI systems to report truthfully what they know<\/a>.<\/li>\n<li><strong>Threat modelling.<\/strong> An example of this work would be demonstrating the possibility of (allowing us to study) dangerous capabilities, like deceptive or manipulative AI systems. This approach splits into work that evaluates whether a model has dangerous capabilities (like the work of <a href=\"https:\/\/metr.org\/\">METR<\/a> in <a href=\"https:\/\/arxiv.org\/pdf\/2303.08774.pdf\">evaluating GPT-4<\/a>), and work that evaluates whether a model would cause harm in practice (like <a href=\"https:\/\/twitter.com\/AnthropicAI\/status\/1604883576218341376\">Anthropic&#8217;s research into the behaviour of large language models<\/a> and <a href=\"https:\/\/arxiv.org\/abs\/2210.01790\">this paper on goal misgeneralisation<\/a>). It can also include work to <a href=\"https:\/\/www.alignmentforum.org\/posts\/ChDH335ckdvpxXaXX\/model-organisms-of-misalignment-the-case-for-a-new-pillar-of-1\">find &#8216;model organisms of misalignment&#8217;<\/a>, in the hope of better understanding their dangers.<\/li>\n<li>Work to figure out how to <strong>control<\/strong> powerful AI systems \u2013 preventing them from causing harm even if they are unsafe. Read more in <a href=\"https:\/\/www.alignmentforum.org\/posts\/kcKrE9mzEHrdqtDpE\/the-case-for-ensuring-that-powerful-ais-are-controlled\">this blogpost from the team at Redwood Research working on control<\/a>.<\/li>\n<li><strong>Interpretability research.<\/strong> This work involves studying <a href=\"https:\/\/80000hours.org\/podcast\/episodes\/chris-olah-interpretability-research\/\">why AI systems do what they do<\/a> and trying to put it into human-understandable terms. For example, <a href=\"https:\/\/arxiv.org\/abs\/2111.09259\">this paper examined how AlphaZero learns chess<\/a>, and <a href=\"https:\/\/arxiv.org\/abs\/2212.03827\">this paper looked into finding latent knowledge in language models without supervision<\/a>. This category also includes  <a href=\"https:\/\/www.neelnanda.io\/mechanistic-interpretability\/\"><em>mechanistic interpretability<\/em><\/a> \u2014 for example, <a href=\"https:\/\/distill.pub\/2020\/circuits\/zoom-in\/\"><em>Zoom In: An Introduction to Circuits<\/em> by Olah et al.<\/a>. For more, see <a href=\"https:\/\/arxiv.org\/abs\/2207.13243\">this survey paper<\/a>, as well as Hubinger&#8217;s <a href=\"https:\/\/www.alignmentforum.org\/posts\/nbq2bWLcYmSGup9aF\/a-transparency-and-interpretability-tech-tree\">a transparency and interpretability tech tree<\/a>, and Nanda&#8217;s <a href=\"https:\/\/www.alignmentforum.org\/posts\/uK6sQCNMw8WKzJeCQ\/a-longlist-of-theories-of-impact-for-interpretability\"><em>A Longlist of Theories of Impact for Interpretability<\/em><\/a> for overviews of of how interpretability research could reduce existential risk from AI.<\/li>\n<li>Other <strong>anti-misuse research<\/strong> to reduce the risks of catastrophe caused by misuse of systems. For example, this work includes  training AIs so they&#8217;re hard to use for dangerous purposes. (Note there&#8217;s lots of overlap with the other work on this list). <\/li>\n<li><strong>Research to increase the robustness of neural networks.<\/strong> This work involves ensuring that the sorts of behaviour neural networks display when exposed to one set of inputs continues when exposed to inputs they haven&#8217;t previously been exposed to, in order to prevent AI systems changing to unsafe behaviour. See section 2 of <em><a href=\"https:\/\/arxiv.org\/pdf\/2109.13916.pdf\">Unsolved Problems in AI safety<\/a><\/em> for more.<\/li>\n<li><strong>Work to build cooperative AI.<\/strong>  Find ways to ensure that even if individual AI systems seem safe, they don&#8217;t produce bad outcomes through interacting with other sociotechnical systems. For more, see <a href=\"https:\/\/arxiv.org\/pdf\/2012.08630.pdf\">Open Problems in Cooperative AI<\/a> by Dafoe et al. or the <a href=\"https:\/\/www.cooperativeai.com\/\">Cooperative AI Foundation<\/a>. This seems particularly relevant for the reduction of &#8216;<a href=\"\/problem-profiles\/artificial-intelligence\">s-risks<\/a>.&#8217; <\/li>\n<li>More generally, there are some <strong>unified safety plans.<\/strong> For more, see Hubinger&#8217;s <a href=\"https:\/\/arxiv.org\/abs\/2012.07532\">11 possible proposals for building safe advanced AI<\/a>, or Karnofsky&#8217;s <a href=\"https:\/\/www.alignmentforum.org\/posts\/rCJQAkPTEypGjSJ8X\/how-might-we-align-transformative-ai-if-it-s-developed-very\">How might we align transformative AI if it&#8217;s developed very soon<\/a>. <\/li>\n<\/ul>\n<p>See <a href=\"https:\/\/www.lesswrong.com\/posts\/SQ9cZtfrzDJmw9A2m\/my-overview-of-the-ai-alignment-landscape-a-bird-s-eye-view\">Neel Nanda&#8217;s overview of the AI alignment landscape<\/a> for more details.<\/p>\n<p><a href=\"#technical-ai-safety\">Read more about technical AI safety research below.<\/a><\/p>\n<h3><span id=\"ai-governance-and-policy\" class=\"toc-anchor\"><\/span>AI governance and policy<\/h3>\n<p>Reducing the gravest risks from AI will require sound high-level decision making and policy, both at AI companies themselves and in governments.<\/p>\n<p>As AI has advanced and drawn increasing interest from customers and investors, governments have shown an interest in regulating the technology. Some have already taken significant steps to play a role in managing the development of AI, including:<\/p>\n<ul>\n<li>The <a href=\"https:\/\/www.nist.gov\/aisi\">US<\/a> and the <a href=\"https:\/\/www.gov.uk\/government\/organisations\/ai-safety-institute\">UK<\/a> have each established their own national AI Safety Institutes.<\/li>\n<li>The European Union has passed the <a href=\"https:\/\/www.europarl.europa.eu\/topics\/en\/article\/20230601STO93804\/eu-ai-act-first-regulation-on-artificial-intelligence\">EU AI Act<\/a>, which contains specific provisions for governing general-purpose AI models that pose systemic risks.<\/li>\n<li>The UK and then South Korea have hosted the first two <a href=\"https:\/\/www.gov.uk\/government\/topical-events\/ai-seoul-summit-2024\/about\">AI Safety Summits<\/a> (in 2023 and 2024), a series of high-profile summits aiming to coordinate between different countries, academics, researchers, and civil society leaders.<\/li>\n<li>China has implemented <a href=\"https:\/\/carnegieendowment.org\/research\/2024\/02\/tracing-the-roots-of-chinas-ai-regulations?lang=en\">regulations<\/a> targeting recommendation algorithms, synthetic AI content, generative AI models, and facial recognition technology.<\/li>\n<li>The US instituted <a href=\"https:\/\/www.csis.org\/analysis\/balancing-ledger-export-controls-us-chip-technology-china\">export controls<\/a> to reduce China&#8217;s access to the most cutting-edge chips used in AI development.<\/li>\n<\/ul>\n<p>Much more will need to be done to reduce the biggest risks \u2014 including continuous evaluation of the AI governance landscape to assess overall progress.<\/p>\n<p>We discuss this career path in more detail here:<\/p>\n<p><a href=\"\/career-reviews\/ai-policy-and-strategy\/\" title=\"\" class=\"btn btn-primary\">Career review of AI strategy and policy careers<\/a><\/p>\n<h4>Approaches<\/h4>\n<p>People working in AI policy have proposed a range of approaches to reducing risk as AI systems get more powerful.<\/p>\n<p>We don&#8217;t necessarily endorse all the ideas below, but what follows is a list of some prominent policy approaches that could be aimed at reducing the largest dangers from AI:<\/p>\n<ul>\n<li><strong>Responsible scaling policies<\/strong>: some major AI companies have already begun developing internal frameworks for assessing safety as they scale up the size and capabilities of their systems. These frameworks introduce safeguards that are intended to become increasingly stringent as AI systems become more potentially dangerous, and ensure that AI systems&#8217; capabilities don&#8217;t outpace companies&#8217; abilities to keep systems safe. Many argue that these internal policies are not sufficient for safety, but they may represent a promising step for reducing risk. You can see versions of such policies from <a href=\"https:\/\/www.anthropic.com\/news\/anthropics-responsible-scaling-policy\">Anthropic<\/a>, <a href=\"https:\/\/deepmind.google\/discover\/blog\/introducing-the-frontier-safety-framework\/\">Google DeepMind<\/a>, and <a href=\"https:\/\/openai.com\/preparedness\/\">OpenAI<\/a>.<\/li>\n<li><strong>Standards and evaluation<\/strong>: governments may also develop industry-wide benchmarks and testing protocols to assess whether AI systems pose major risks. The non-profit <a href=\"https:\/\/metr.org\/\">METR<\/a> and the <a href=\"https:\/\/www.gov.uk\/government\/publications\/ai-safety-institute-approach-to-evaluations\/ai-safety-institute-approach-to-evaluations\">UK AI Safety Institute<\/a> are among the organisations currently developing these evaluations to test AI models before and after they are released. This can include creating standardised metrics for an AI systems&#8217;s capabilities and potential to cause harm, as well as propensity for power-seeking or misalignment.<\/li>\n<li><strong>Safety cases<\/strong>: this practice involves requiring AI developers to provide comprehensive documentation demonstrating the safety and reliability of their systems before deployment. This approach is similar to safety cases used in other high-risk industries like aviation or nuclear power. You can see discussion of this idea in <a href=\"https:\/\/arxiv.org\/abs\/2403.10462\">a paper from Clymer et al<\/a> and in a <a href=\"https:\/\/www.aisi.gov.uk\/work\/safety-cases-at-aisi\">post<\/a> from Geoffrey Irving at the UK AI Safety Institute.<\/li>\n<li><strong>Information security standards<\/strong>: we can establish robust rules for protecting AI-related data, algorithms, and infrastructure from unauthorised access or manipulation \u2014 particularly the AI model weights. Rand released a <a href=\"https:\/\/www.rand.org\/pubs\/research_reports\/RRA2849-1.html\">detailed report<\/a> analysing the security risks to major AI companies, particularly from state actors.<\/li>\n<li><strong>Liability law<\/strong>: existing law already imposes some liability on companies that create dangerous products or cause significant harm to the public, but its application to AI models and risk in particular is unclear. Clarifying how liability applies to companies that create dangerous AI models could incentivise them to take additional steps to reduce risk. Law professor Gabriel Weil has <a href=\"https:\/\/forum.effectivealtruism.org\/posts\/epKBmiyLpZWWFEYDb\/tort-law-can-play-an-important-role-in-mitigating-ai-risk\">written about this idea<\/a>.<\/li>\n<li><strong>Compute governance<\/strong>: governments may regulate access to and use of high-performance computing resources necessary for training large AI models. The US restrictions on exporting state-of-the-art chips to China is one example of such a policy, and others are possible. Companies could also be required to install hardware-level safety features directly into AI chips or processors. These could be used to track chips and verify they&#8217;re not in the possession of anyone who shouldn&#8217;t have them, or for other purposes. You can learn more about this topic in <a href=\"https:\/\/80000hours.org\/podcast\/episodes\/lennart-heim-compute-governance\/\">our interview with Lennart Heim<\/a> and in this report from the <a href=\"https:\/\/www.cnas.org\/publications\/reports\/secure-governable-chips\">Center for a New American Security<\/a>.<\/li>\n<li><strong>International coordination<\/strong>: Fostering global cooperation on AI governance to ensure consistent standards. This could involve treaties, international organisations, or multilateral agreements on AI development and deployment. We discuss some related considerations in our article on <a href=\"https:\/\/80000hours.org\/career-reviews\/china-related-ai-safety-and-governance-paths\/\">China-related AI safety and governance paths<\/a>.<\/li>\n<li><strong>Societal adaptation<\/strong>: it may be critically important to prepare society for the widespread integration of AI and the potential risks it poses. For example, we might need to develop new information security measures to protect crucial data in a world with AI-enabled hacking. Or we may want to implement strong controls to prevent handing over key societal decisions to AI systems.<\/li>\n<li><strong>Pausing scaling if appropriate<\/strong>: <a href=\"https:\/\/80000hours.org\/podcast\/episodes\/zvi-mowshowitz-sleeper-agents-ai-updates\/#pause-ai-campaign-013016\">some argue<\/a> that we should currently pause all scaling of larger AI models because of the dangers the technology poses. We have featured <a href=\"https:\/\/80000hours.org\/podcast\/episodes\/carl-shulman-society-agi\/#why-carl-doesnt-support-enforced-pauses-on-ai-research-020358\">some discussion<\/a> of this idea on our podcast, and it seems hard to know when or if this would be a good idea. If carried out, it could involve industry-wide agreements or regulatory mandates to pause scaling efforts when necessary.<\/li>\n<\/ul>\n<p>The details, benefits, and downsides of many of these ideas have yet to be fully worked out, so it&#8217;s crucial that we do more research. And this list isn&#8217;t comprehensive \u2014 there are likely other important policy interventions and governance strategies worth pursuing.<\/p>\n<p>We also need more <a href=\"\/career-reviews\/forecasting\/\">forecasting research<\/a> into what we should expect to happen with AI, such as the work done at <a href=\"https:\/\/epoch.ai\/\">Epoch AI<\/a>.<\/p>\n<h2><span id=\"neglectedness\" class=\"toc-anchor\"><\/span>6. This work is neglected<\/h2>\n<p>In 2022, we estimated there were around 400 people around the world working directly on reducing the chances of an AI-related existential catastrophe (with a 90% confidence interval ranging between 200 and 1,000). Of these, about three quarters worked on technical AI safety research, with the rest split between strategy (and other governance) research and advocacy. We also estimated that there were around 800 people working in <a href=\"#complementary-yet-crucial-roles\">complementary roles<\/a>, but we&#8217;re highly uncertain about this figure.<\/p>\n<p>In <em>The Precipice<\/em>, Ord estimated that there was between $10 million and $50 million spent on reducing AI risk in 2020.<\/p>\n<p>That might sound like a lot of money, but we&#8217;re spending something like <strong>1,000 times that amount<\/strong> on speeding up the development of transformative AI via commercial capabilities research and engineering at large AI companies.<\/p>\n<p>To compare the $50 million spent on AI safety in 2020 to other well-known risks, we&#8217;re currently spending <a href=\"https:\/\/www.climatepolicyinitiative.org\/publication\/global-landscape-of-climate-finance-2021\/\">several hundreds of <em>billions<\/em> per year on tackling climate change<\/a>.<\/p>\n<p>Because this field is so neglected and has such high stakes, we think your impact working on risks from AI could be much higher than working on many other areas \u2014 which is why our top two recommended career paths for making a big positive difference in the world are <a href=\"https:\/\/80000hours.org\/career-reviews\/ai-safety-researcher\/\">technical AI safety<\/a> and <a href=\"https:\/\/80000hours.org\/career-reviews\/ai-policy-and-strategy\/\">AI policy research and implementation<\/a>.<\/p>\n<h2><span id=\"best-arguments-against-this-problem-being-pressing\" class=\"toc-anchor\"><\/span>What do we think are the best arguments against this problem being pressing?<\/h2>\n<p>As we said above, we&#8217;re not totally sure the arguments we&#8217;ve presented for AI representing an existential threat are right. Though we do still think that the chance of catastrophe from AI is high enough to warrant many more people pursuing careers to try to prevent such an outcome, we also want to be honest about the arguments <em>against<\/em> doing so, so you can more easily make your own call on the question.<\/p>\n<p>Here we&#8217;ll cover the strongest reasons (in our opinion) to think this problem isn&#8217;t particularly pressing. In the <a href=\"#good-responses\">next section<\/a> we&#8217;ll cover some common objections that (in our opinion) hold up less well, and explain why.<\/p>\n<div class=\"panel-group\" id=\"custom-collapse-1\">\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"we-might-have-a-lot-of-time-to-work-on-this-problem\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-3\">We might have a lot of time to work on this problem<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-3\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"We might have a lot of time to work on this problem\">\n<div class=\"panel-body\">\n<p>The longer we have before transformative AI is developed, the less pressing it is to work <em>now<\/em> on ways to ensure that it goes well. This is because the work of others in the future could be much better or more relevant than the work we are able to do now.<\/p>\n<p>Also, if it takes us a long time to create transformative AI, we have more time to figure out how to make it safe. The risk seems much higher if AI developers will create transformative AI in the next few decades.<\/p>\n<p>It seems plausible that the first transformative AI won&#8217;t be based on current deep learning methods. (AI Impacts have documented <a href=\"https:\/\/web.archive.org\/web\/20221013015039\/https:\/\/aiimpacts.org\/evidence-against-current-methods-leading-to-human-level-artificial-intelligence\/\">arguments<\/a> that current methods won&#8217;t be able to produce AI that has human-level intelligence.) This could mean that some of our current research might not end up being useful (and also \u2014 depending on what method ends up being used \u2014 could make the arguments for risk less worrying).<\/p>\n<p>Relatedly, we might expect that progress in the development of AI will occur in bursts. Previously, the field has seen <a href=\"https:\/\/en.wikipedia.org\/wiki\/AI_winter\"><em>AI winters<\/em><\/a>, periods of time with significantly reduced investment, interest and research in AI. It&#8217;s unclear how likely it is that we&#8217;ll see another AI winter \u2014 but this possibility should lengthen our guesses about how long it&#8217;ll be before we&#8217;ve developed transformative AI. Cotra writes about the possibility of an AI winter in part four of her <a href=\"https:\/\/docs.google.com\/document\/d\/1cCJjzZaJ7ATbq8N2fvhmsDOUWdm7t3uSSXv6bD0E_GM\/edit#heading=h.x0pkk2mc19ey\">report forecasting transformative AI<\/a>. New constraints on the rate of growth of AI capabilities, like the availability of training data, could also mean that there&#8217;s more time to work on this (Cotra discusses this <a href=\"https:\/\/docs.google.com\/document\/d\/1cCJjzZaJ7ATbq8N2fvhmsDOUWdm7t3uSSXv6bD0E_GM\/edit#heading=h.bjqp7nxsk34g\">here<\/a>.)<\/p>\n<p>Thirdly, the estimates about when we&#8217;ll get transformative AI from Cotra, Kanfosky and Davidson <a href=\"#when-can-we-expect-to-develop-transformative-AI\">that we looked at earlier<\/a> were produced by people who already expected that working on preventing an AI-related catastrophe <em>might<\/em> be one of the world&#8217;s most pressing problems. As a result, there&#8217;s <a href=\"https:\/\/en.wikipedia.org\/wiki\/Selection_bias\">selection bias<\/a> here: people who think transformative AI is coming relatively soon are also the people incentivised to carry out detailed investigations. (That said, if the investigations themselves seem strong, this effect could be pretty small.)<\/p>\n<p>Finally, none of the estimates we discussed earlier were trying to predict when an <em>existential catastrophe<\/em> might occur. Instead, they were looking at when AI systems might be able to automate all tasks humans can do, or when AI systems might significantly transform the economy. It&#8217;s by no means certain that the kinds of AI systems that could transform the economy would be the same <a href=\"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/#aps-systems\">advanced planning systems<\/a> that are core to the argument that AI systems might seek power. Advanced planning systems <em>do<\/em> seem to be <a href=\"#build-systems\">particularly useful<\/a>, so there is at least some reason to think these might be the sorts of systems that end up being built. But even if the forecasted transformative AI systems <em>are<\/em> advanced planning systems, it&#8217;s unclear how capable such systems would need to be to pose a threat \u2014 it&#8217;s more than plausible that systems would need to be far more capable to pose a substantial existential threat than they would need to be to transform the economy. This would mean that all the estimates we considered above would be underestimates of how long we have to work on this problem.<\/p>\n<p>All that said, it might be extremely difficult to find technical solutions to prevent <a href=\"#instrumental-convergence\">power-seeking behaviour<\/a> \u2014 and if that&#8217;s the case, focusing on finding those solutions <em>now<\/em> does seem extremely valuable.<\/p>\n<p>Overall, we think that transformative AI is sufficiently likely in the next 10\u201380 years that it is well worth it (in <a href=\"https:\/\/80000hours.org\/articles\/expected-value\/\">expected value<\/a> terms) to work on this issue now. Perhaps future generations will take care of it, and all the work we&#8217;d do now will be in vain \u2014 we hope so! But it might not be prudent to take that risk.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"ai-might-improve-gradually-over-time\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-4\">AI might improve gradually over time<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-4\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"AI might improve gradually over time\">\n<div class=\"panel-body\">\n<p>If the best AI we have improves gradually over time (rather than AI capabilities remaining fairly low for a while and then suddenly increasing), we&#8217;re likely to end up with &#8216;warning shots&#8217;: we&#8217;ll notice forms of misaligned behaviour in fairly weak systems, and be able to correct for it before it&#8217;s too late.<\/p>\n<p>In such a gradual scenario, we&#8217;ll have a better idea about what form powerful AI might take (e.g. whether it will be built using current deep learning techniques, or something else entirely), which could significantly help with safety research. There will also be more focus on this issue by society as a whole, as the risks of AI become clearer.<\/p>\n<p>So if gradual development of AI seems more likely, the risk seems lower.<\/p>\n<p>But it&#8217;s very much not certain that AI development will be gradual, or if it is, gradual <em>enough<\/em> for the risk to be noticeably lower. And even if AI development is gradual, there could still be significant benefits to having plans and technical solutions in place well in advance. So overall we still think it&#8217;s extremely valuable to attempt to reduce the risk now.<\/p>\n<p>If you want to learn more, you can read AI Impacts&#8217; work on <a href=\"https:\/\/web.archive.org\/web\/20221013015054\/https:\/\/aiimpacts.org\/likelihood-of-discontinuous-progress-around-the-development-of-agi\/\">arguments for and against discontinuous (i.e. non-gradual) progress in AI development<\/a>, and Toby Ord and Owen Cotton-Barratt on <a href=\"https:\/\/web.archive.org\/web\/20221013015104\/https:\/\/www.fhi.ox.ac.uk\/strategic-considerations-about-different-speeds-of-ai-takeoff\/\">strategic implications of slower AI development<\/a>.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"we-might-need-to-solve-alignment-anyway-to-make-AI-useful\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-5\">We might need to solve alignment anyway to make AI useful<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-5\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"We might need to solve alignment anyway to make AI useful\">\n<div class=\"panel-body\">\n<p>Making something have goals aligned with human designers&#8217; ultimate objectives and making something useful seem like very related problems. If so, perhaps the need to make AI useful will drive us to produce only aligned AI \u2014 in which case the alignment problem is likely to be solved by default.<\/p>\n<p>Ben Garfinkel <a href=\"https:\/\/80000hours.org\/podcast\/episodes\/ben-garfinkel-classic-ai-risk-arguments\/#orthogonality-thesis-012504\">gave a few examples of this on our podcast<\/a>:<\/p>\n<ul>\n<li>You can think of a thermostat as a very simple AI that attempts to keep a room at a certain temperature. The thermostat has a metal strip in it that expands as the room heats, and cuts off the current once a certain temperature has been reached. This piece of metal makes the thermostat act like it has a goal of keeping the room at a certain temperature, but also makes it capable of achieving this goal (and therefore of being actually useful).<\/li>\n<li>Imagine you&#8217;re building a cleaning robot with reinforcement learning techniques \u2014 that is, you provide some specific condition under which you give the robot positive feedback. You might say something like, &#8220;The less dust in the house, the more positive the feedback.&#8221; But if you do this, the robot will end up doing things you don&#8217;t want \u2014 like ripping apart a cushion to find dust on the inside. Probably instead you need to use techniques like those being developed by people working on AI safety (things like watching a human clean a house and letting the AI figure things out from there). So people building AIs will be naturally incentivised to also try to make them aligned (and so in some sense safe), so they can do their jobs.<\/li>\n<\/ul>\n<p>If we need to solve the problem of alignment anyway to make useful AI systems, this significantly reduces the chances we will have misaligned but still superficially useful AI systems. So the incentive to deploy a misaligned AI would be a lot lower, reducing the risk to society.<\/p>\n<p>That said, there are still reasons to be concerned. For example, it seems like we could still be susceptible to problems of <a href=\"#incentives-and-deception\">AI deception<\/a>.<\/p>\n<p>And, <a href=\"#other-risks\">as we&#8217;ve argued<\/a>, AI alignment is only part of the overall issue. Solving the alignment problem isn&#8217;t the same thing as completely eliminating existential risk from AI, since aligned AI could also be used to bad ends \u2014 such as by authoritarian governments.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"this-could-be-extremely-difficult-to-solve\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-6\">The problem could be extremely difficult to solve<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-6\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"The problem could be extremely difficult to solve\">\n<div class=\"panel-body\">\n<p>As with many research projects in their early stages, we don&#8217;t know how hard the alignment problem \u2014 or other AI problems that pose risks \u2014 are to solve. Someone could believe there are major risks from machine intelligence, but be pessimistic about what additional research or policy work will accomplish, and so decide not to focus on it.<\/p>\n<p>This is definitely a reason to potentially work on another issue \u2014 the solvability of an issue is a key part of <a href=\"https:\/\/80000hours.org\/articles\/problem-framework\/\">how we try to compare global problems<\/a>. For example, we&#8217;re also very concerned about risks from <a href=\"\/preventing-catastrophic-pandemics\/\">pandemics<\/a>, and it may be much easier to solve that issue.<\/p>\n<p>That said, we think that given the stakes, it could make sense for many people to work on reducing AI risk, even if you think the chance of success is low. You&#8217;d have to think that it was <em>extremely<\/em> difficult to reduce risks from AI in order to conclude that it&#8217;s better just to let the risks materialise and the chance of catastrophe play out.<\/p>\n<p>At least in our own case at 80,000 Hours, we want to keep trying to help with AI safety \u2014 for example, by writing profiles like this one \u2014 even if the chance of success seems low (though in fact we&#8217;re overall pretty optimistic).<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"wrong-about-power-seeking\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-7\">We could be overestimating the chances that strategic AI systems would to seek power<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-7\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"We could be overestimating the chances that strategic AI systems would to seek power\">\n<div class=\"panel-body\">\n<p>There are some reasons to think that the core argument that any advanced, strategically aware planning system will by default seek power (which we gave <a href=\"#instrumental-convergence\">here<\/a>) isn&#8217;t totally right.<\/p>\n<ol>\n<li>For a start, the argument that advanced AI systems will seek power relies on the idea that systems will <em>produce plans to achieve goals<\/em>. We&#8217;re not quite sure what this means \u2014 and as a result, we&#8217;re not sure what properties are really required for power-seeking behaviour to occur and whether the things we&#8217;ll build will have those properties.\n<p>We&#8217;d love to see a more in-depth analysis of what aspects of planning are economically incentivised, and whether those aspects seem like they&#8217;ll be enough for the argument for power-seeking behaviour to work.<\/p>\n<p>Grace has written more about <a href=\"https:\/\/www.alignmentforum.org\/posts\/LDRQ5Zfqwi8GjzPYG\/counterarguments-to-the-basic-ai-x-risk-case#Ambiguously_strong_forces_for_goal_directedness_need_to_meet_an_ambiguously_high_bar_to_cause_a_risk\">the ambiguity around &#8220;how much goal-directedness is needed to bring about disaster.&#8221;<\/a><\/p>\n<\/li>\n<li>\n<p>It&#8217;s possible that <a href=\"https:\/\/www.alignmentforum.org\/posts\/LDRQ5Zfqwi8GjzPYG\/counterarguments-to-the-basic-ai-x-risk-case#Unclear_that_many_goals_realistically_incentivise_taking_over_the_universe\">only a few goals that AI systems could have would lead to misaligned power-seeking<\/a>.<\/p>\n<p>Richard Ngo, in his <a href=\"https:\/\/www.alignmentforum.org\/s\/mzgtmmTKKn5MuCzFJ\/p\/bz5GdmCWj8o48726N\">analysis of what people mean by &#8220;goals&#8221;<\/a>, points out that you&#8217;ll only get power-seeking behaviour if you have goals that mean the system can actually benefit from seeking power. Ngo suggests that these goals need to be &#8220;large-scale.&#8221; (Some have <a href=\"https:\/\/www.alignmentforum.org\/posts\/zB3ukZJqt3pQDw9jz\/ai-will-change-the-world-but-won-t-take-it-over-by-playing-3#2__Understanding_the_validity_of_the_hypotheses\">argued<\/a> that, <em>by default<\/em>, we should expect AI systems to have &#8220;short-term&#8221; goals that won&#8217;t lead to power-seeking behaviour.)<\/p>\n<p>But whether an AI system would plan to take power <a href=\"https:\/\/www.alignmentforum.org\/posts\/LDRQ5Zfqwi8GjzPYG\/counterarguments-to-the-basic-ai-x-risk-case#Unclear_that_many_goals_realistically_incentivise_taking_over_the_universe\">depends on how easy it would be for the system to take power<\/a>, because the easier it is for a system to take power, the more likely power-seeking plans would be successful \u2014 so a good planning system would be more likely to choose them. This suggests it will be easier to accidentally create a power-seeking AI system as systems&#8217; capabilities increase.<\/p>\n<p>So there still seems to be cause for increased concern, because the capabilities of AI systems do seem to be <a href=\"#making-advances-extremely-quickly\">increasing fast<\/a>. There are two considerations here: if few goals really lead to power-seeking, even for quite capable AI systems, that significantly reduces the risk and thus the importance of the problem. But it might also increase the <em>solvability<\/em> of the problem by demonstrating that solutions could be easy to find (e.g. the solution could be never giving systems &#8220;large-scale&#8221; goals) \u2014 making this issue <em>more<\/em> valuable for people to work on.<\/p>\n<\/li>\n<li>\n<p>Earlier we argued that we can expect AI systems to do things that seem <a href=\"#instrumental-convergence-2\">generally instrumentally useful<\/a> to their overall goal, and that as a result it could be hard to prevent AI systems from doing these instrumentally useful things.<\/p>\n<p>But we can find examples where <em>how generally instrumentally useful<\/em> these things would be doesn&#8217;t seem to affect how hard it is to prevent them. Consider an autonomous car that can move around only if its engine is on. For many possible goals (other than, say, turning the car radio on), it seems like it would be useful for the car to be able to move around, so we should expect the car to turn its engine on. But despite that, we might still be able to train the car to keep its engine off: for example, we can give it some negative feedback whenever it turns the engine on, even if we also had given the car some other goals. Now imagine we improve the car so that its top speed is higher \u2014 this massively increases the number of possible action sequences that involve, as a first step, turning its engine on. In some sense, this seems to increase the instrumental usefulness of turning the engine on \u2014 there are more possible actions the car can take once its engine is on because the range of possible speeds it can travel at is higher. (It&#8217;s not clear if this sense of &#8220;instrumental usefulness&#8221; is the same as the one in the argument for the risk, although it does seem somewhat related.) But it doesn&#8217;t seem like this increase in the instrumental usefulness of turning on the engine makes it much harder to stop the car turning it on. Simple examples like this cast some doubt on the idea that just because a particular action is instrumentally useful, we won&#8217;t be able to find ways to prevent it. (For more on this example, see <a href=\"https:\/\/docs.google.com\/document\/d\/1FlGPHU3UtBRj4mBPkEZyBQmAuZXnyvHU-yaH-TiNt8w\/edit\">page 25 of Garfinkel&#8217;s review of Carlsmith&#8217;s report<\/a>.)<\/p>\n<\/li>\n<li>\n<p>Humans are clearly highly intelligent, but it&#8217;s unclear whether we are perfect goal-optimisers. For example, humans often face some kind of existential angst over what their true goals are. And even if we accept humans as an example of a strategically aware agent capable of planning, humans certainly aren&#8217;t always power-seeking. We obviously care about having basics like food and shelter, and many people go to great lengths for more money, status, education, or even formal power. But some humans choose not to pursue these goals, and pursuing them doesn&#8217;t seem to correlate with intelligence.<\/p>\n<p>However, this doesn&#8217;t mean that the argument that there will be an incentive to seek power is wrong. Most people <em>do<\/em> face and act on incentives to gain forms of influence via wealth, status, promotions, and so on. And we can explain the observation that humans don&#8217;t usually seek <em>huge<\/em> amounts of power by observing that we aren&#8217;t usually in circumstances that make the effort worth it.<\/p>\n<p>For example, most people don&#8217;t try to start billion-dollar companies \u2014 you probably won&#8217;t succeed, and it&#8217;ll cost you a lot of time and effort.<\/p>\n<p>But you&#8217;d still walk across the street to pick up a billion-dollar cheque.<\/p>\n<\/li>\n<\/ol>\n<p>The absence of extreme power-seeking in many humans, along with uncertainties in what it really means to plan to achieve goals, does suggest that the argument we gave that advanced AI systems will seek power above might not be completely correct. And they also suggest that, if there really is a problem to solve here, in principle, alignment research into preventing power-seeking in AIs could succeed.<\/p>\n<p>This is good news! But for the moment \u2014 short of hoping we&#8217;re wrong about the existence of the problem \u2014 we don&#8217;t actually know how to prevent this power-seeking behaviour.<\/p>\n<\/div><\/div><\/div>\n<\/div>\n<h2><span id=\"good-responses\" class=\"toc-anchor\"><\/span>Arguments against working on AI risk to which we think there are strong responses<\/h2>\n<p>We&#8217;ve just discussed the major objections to working on AI risk that we think are most persuasive. In this section, we&#8217;ll look at objections that we think are less persuasive, and give some reasons why.<\/p>\n<div class=\"panel-group\" id=\"custom-collapse-2\">\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"is-it-even-possible\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-8\">Is it even possible to produce artificial general intelligence?<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-8\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"Is it even possible to produce artificial general intelligence?\">\n<div class=\"panel-body\">\n<p>People have been saying <a href=\"https:\/\/web.archive.org\/web\/20221013015416\/https:\/\/www.openphilanthropy.org\/research\/what-should-we-learn-from-past-ai-forecasts\/\">since the 1950s<\/a> that artificial intelligence smarter than humans is just around the corner.<\/p>\n<p>But it hasn&#8217;t happened yet.<\/p>\n<p>One reason for this could be that it&#8217;ll never happen. Some have argued that producing <a href=\"https:\/\/web.archive.org\/web\/20221013015335\/https:\/\/www.nature.com\/articles\/s41599-020-0494-4\">artificial general intelligence is fundamentally impossible<\/a>. Others think it&#8217;s possible, but <a href=\"https:\/\/web.archive.org\/web\/20221013015342\/https:\/\/www.forbes.com\/sites\/cognitiveworld\/2020\/06\/04\/is-ai-overhyped\/?sh=443c7b6c63ee\">unlikely to actually happen<\/a>, especially not with <a href=\"https:\/\/web.archive.org\/web\/20221013015350\/https:\/\/www.kdnuggets.com\/2021\/12\/deep-neural-networks-not-toward-agi.html\">current deep learning methods<\/a>.<\/p>\n<p>Overall, we think the existence of human intelligence shows it&#8217;s possible in principle to create artificial intelligence. And the <a href=\"#making-advances-extremely-quickly\">speed of current advances<\/a> isn&#8217;t something we think would have been predicted by those who thought that we&#8217;ll never develop powerful, general AI.<\/p>\n<p>But most importantly, the idea that you need <em>fully general<\/em> intelligent AI systems for there to be a substantial existential risk is a common misconception.<\/p>\n<p>The argument we gave <a href=\"#power-seeking-ai\">earlier<\/a> relied on AI systems being as good or better than humans in a subset of areas: planning, strategic awareness, and areas related to seeking and keeping power. So as long as you think all these things are possible, the risk remains.<\/p>\n<p>And even if no single AI has all of these properties, <strong>there are still ways in which we might end up with systems of &#8216;narrow&#8217; AI systems that, together, can disempower humanity<\/strong>. For example, we might have a planning AI that develops plans for a company, a separate AI system that measures things about the company, another AI system that attempts to evaluate plans from the first AI by predicting how much profit each will make, and further AI systems that carry out those plans (for example, by automating the building and operation of factories). Considered together, this system as a whole has the capability to form and carry out plans to achieve some goal, and potentially also has advanced capabilities in areas that help it seek power.<\/p>\n<p>It does seem like it will be <em>easier<\/em> to prevent these &#8216;narrow&#8217; AI systems from seeking power. This could happen if the skills the AIs have, even when combined, don&#8217;t add up to being able to plan to achieve goals, or if the narrowness reduces the risk of systems developing power-seeking plans (e.g. if you build systems that can only produce very short-term plans). It also seems like it gives another point of weakness for humans to intervene if necessary: the coordination of the different systems.<\/p>\n<p>Nevertheless, the risk remains, even from systems of many interacting AIs.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"why-cant-we-just-unplug-a-dangerous-ai\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-9\">Why can't we just unplug a dangerous AI?<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-9\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"Why can't we just unplug a dangerous AI?\">\n<div class=\"panel-body\">\n<p>It might just be really, really hard.<\/p>\n<p>Stopping people and computers from running software is already incredibly difficult.<\/p>\n<p>Think about how hard it would be to shut down Google&#8217;s web services. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Google_data_centers\">Google&#8217;s data centres<\/a> have millions of servers over 34 different locations, many of which are running the same sets of code. And these data centres are absolutely crucial to Google&#8217;s bottom line, so even if Google <em>could<\/em> decide to shut down their entire business, they probably wouldn&#8217;t.<\/p>\n<p>Or think about how hard it is to get rid of computer viruses that autonomously spread between computers across the world.<\/p>\n<p>Ultimately, <a href=\"#incentives-and-deception\">we think<\/a> any dangerous power-seeking AI system will be looking for ways to not be turned off, which makes it more likely we&#8217;ll be in one of these situations, rather than in a case where we can just unplug a single machine.<\/p>\n<p>That said, we absolutely should try to shape the future of AI such that we <em>can<\/em> &#8216;unplug&#8217; powerful AI systems.<\/p>\n<p>There may be ways we can develop systems that let us turn them off. But for the moment, we&#8217;re <a href=\"https:\/\/www.youtube.com\/watch?v=3TYT1QfdfsM\">not sure how to do that<\/a>.<\/p>\n<p>Ensuring that we can turn off potentially dangerous AI systems could be a safety measure developed by technical AI safety research, or it could be the result of careful AI governance, such as planning coordinated efforts to stop autonomous software once it&#8217;s running.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"couldnt-we-just-sandbox-any-potentially-dangerous-ai-system-until-we-know-its-safe\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-10\">Couldn't we just 'sandbox' any potentially dangerous AI system until we know it's safe?<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-10\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"Couldn't we just 'sandbox' any potentially dangerous AI system until we know it's safe?\">\n<div class=\"panel-body\">\n<p>We could (and should!) definitely try.<\/p>\n<p>If we <em>could<\/em> successfully &#8216;sandbox&#8217; an advanced AI \u2014 that is, contain it to a training environment with no access to the real world until we were very confident it wouldn&#8217;t do harm \u2014 that would help our efforts to mitigate AI risks tremendously.<\/p>\n<p>But there are a few things that might make this difficult.<\/p>\n<p>For a start, we might only need one failure \u2014 like one person to remove the sandbox, or one security vulnerability in the sandbox we hadn&#8217;t noticed \u2014 for the AI system to begin affecting the real world.<\/p>\n<p>Moreover, this solution doesn&#8217;t scale with the capabilities of the AI system. This is because:<\/p>\n<ul>\n<li>More capable systems are more likely to be able to find vulnerabilities or other ways of leaving the sandbox (e.g. threatening or coercing humans).<\/li>\n<li>Systems that are good at planning might <a href=\"#incentives-and-deception\">attempt to deceive us<\/a> into deploying them.<\/li>\n<\/ul>\n<p>So the more dangerous the AI system, the less likely sandboxing is to be possible. That&#8217;s the opposite of what we&#8217;d want from a good solution to the risk.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"surely-a-truly-intelligent-ai-system-would-know-not-to-disempower-everyone\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-11\">Surely a <em>truly<\/em> intelligent AI system would know not to disempower everyone?<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-11\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"Surely a <em>truly<\/em> intelligent AI system would know not to disempower everyone?\">\n<div class=\"panel-body\">\n<p>For some definitions of &#8220;truly intelligent&#8221; \u2014 for example, if true intelligence includes a deep understanding of morality and a desire to be moral \u2014 this would probably be the case.<\/p>\n<p>But if that&#8217;s your definition of <em>truly intelligent<\/em>, then it&#8217;s not <em>truly intelligent<\/em> systems that pose a risk. <a href=\"#power-seeking-ai\">As we argued earlier<\/a>, it&#8217;s advanced systems that can plan and have strategic awareness that pose risks to humanity.<\/p>\n<p>With sufficiently advanced strategic awareness, an AI system&#8217;s excellent understanding of the world may well encompass an excellent understanding of people&#8217;s moral beliefs. But that&#8217;s <a href=\"https:\/\/web.archive.org\/web\/20221013015624\/https:\/\/nickbostrom.com\/superintelligentwill.pdf\">not a strong reason to think that such a system would <em>act<\/em> morally<\/a>.<\/p>\n<p>For example, when we learn about other cultures or moral systems, that doesn&#8217;t necessarily create a desire to follow their morality. A scholar of the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Antebellum_South\">Antebellum South<\/a> might have a very good understanding of how 19th century slave owners justified themselves as moral, but would be very unlikely to defend slavery.<\/p>\n<p>AI systems with excellent understandings of human morality could be even more dangerous than AIs without such understanding: the AI system could act morally at first as a way to <a href=\"#incentives-and-deception\">deceive us<\/a> into thinking that it is safe.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"isnt-the-real-danger-from-actual-modern-ai-not-some-sort-of-futuristic-superintelligence\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-12\">Isn't the real danger from actual current AI \u2014 not some sort of futuristic superintelligence?<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-12\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"Isn't the real danger from actual current AI \u2014 not some sort of futuristic superintelligence?\">\n<div class=\"panel-body\">\n<p>There are definitely dangers from current artificial intelligence.<\/p>\n<p>For example, data used to train neural networks often contains hidden biases. This means that AI systems can learn these biases \u2014 and this can lead to <a href=\"https:\/\/en.wikipedia.org\/wiki\/Algorithmic_bias#Gender_discrimination\">racist and sexist behaviour<\/a>.<\/p>\n<p>There are other dangers too. Our <a href=\"#artificial-intelligence-and-war\">earlier discussion on nuclear war<\/a> explains a threat which doesn&#8217;t require AI systems to have particularly advanced capabilities.<\/p>\n<p>But we don&#8217;t think the fact that there are also risks from current systems is a reason not to prioritise reducing existential threats from AI, if they are sufficiently severe.<\/p>\n<p>As we&#8217;ve discussed, future systems \u2014 not necessarily superintelligence or totally general intelligence, but systems advanced in their planning and power-seeking capabilities \u2014 seem like they could pose threats to the existence of the entirety of humanity. And it also seems somewhat likely that we&#8217;ll <a href=\"#when-can-we-expect-to-develop-transformative-AI\">produce such systems this century<\/a>.<\/p>\n<p>What&#8217;s more, lots of technical AI safety research is <em>also<\/em> relevant to solving problems with  existing AI systems. For example, some research focuses on <a href=\"https:\/\/web.archive.org\/web\/20221013015650\/https:\/\/deepmindsafetyresearch.medium.com\/scalable-agent-alignment-via-reward-modeling-bf4ab06dfd84\">ensuring that ML models do what we want them to, and will still do this as their size and capabilities increase<\/a>; other research tries to work out <a href=\"https:\/\/80000hours.org\/podcast\/episodes\/chris-olah-interpretability-research\/\">how and why existing models are making the decisions and taking the actions that they do<\/a>.<\/p>\n<p>As a result, at least in the case of technical research, the choice between working on current threats and future risks may look more like a choice between <em>only<\/em> ensuring that current models are safe, or instead finding ways to ensure that current models are safe that will also continue to work as AI systems become more complex and more intelligent.<\/p>\n<p>Ultimately, we have limited time in our careers, so <a href=\"\/articles\/your-choice-of-problem-is-crucial\/\">choosing which problem to work on<\/a> could be a huge way of increasing your impact. When there are such substantial threats, it seems reasonable for many people to focus on addressing these worst-case possibilities.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"but-cant-ai-also-do-a-lot-of-good\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-13\">But can't AI also do a lot of good?<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-13\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"But can't AI also do a lot of good?\">\n<div class=\"panel-body\">\n<p>Yes, it can.<\/p>\n<p>AI systems are already <a href=\"https:\/\/web.archive.org\/web\/20221002175522\/https:\/\/www.pwc.com\/gx\/en\/industries\/healthcare\/publications\/ai-robotics-new-health\/transforming-healthcare.html\">improving healthcare<\/a>, putting <a href=\"https:\/\/en.wikipedia.org\/wiki\/Self-driving_car\">driverless cars<\/a> on the roads, and <a href=\"https:\/\/web.archive.org\/web\/20221013015758\/https:\/\/www.theverge.com\/2022\/5\/25\/23140950\/dyson-robotics-investment-hiring-household-chores\">automating household chores<\/a>.<\/p>\n<p>And if we&#8217;re able to automate advancements in science and technology, we could see <a href=\"https:\/\/web.archive.org\/web\/20221013011707\/https:\/\/www.cold-takes.com\/transformative-ai-timelines-part-1-of-4-what-kind-of-ai\/\">truly incredible economic and scientific progress<\/a>. AI could likely help solve many of the world&#8217;s <a href=\"\/problem-profiles\/\">most pressing problems<\/a>.<\/p>\n<p>But, just because something can do a lot of good, that doesn&#8217;t mean it can&#8217;t also do a lot of harm. AI is an example of a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Dual-use_technology\"><em>dual-use technology<\/em><\/a> \u2014 a technology that can be used for both dangerous and beneficial purposes. For example, researchers were able to get an AI model that was trained to develop medical drugs to instead <a href=\"https:\/\/www.nature.com\/articles\/s42256-022-00465-9\">generate designs for bioweapons<\/a>.<\/p>\n<p>We are excited and hopeful about seeing large benefits from AI. But we also want to work hard to minimise the enormous risks advanced AI systems pose.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"why-shouldnt-i-just-dismiss-this-as-motivated-reasoning-by-a-group-of-people-who-just-like-playing-with-computers-and-wanted-to-think-thats-important\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-14\">Why shouldn't I dismiss this as motivated reasoning by a group of people who just like playing with computers and want to think that's important?<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-14\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"Why shouldn't I dismiss this as motivated reasoning by a group of people who just like playing with computers and want to think that's important?\">\n<div class=\"panel-body\">\n<p>It&#8217;s undoubtedly true that some people are drawn to thinking about AI safety because they like computers and science fiction \u2014 as with any other issue, there are people working on it not because they think it&#8217;s important, but because they think it&#8217;s cool.<\/p>\n<p>But, for many people, working on AI safety comes with huge reluctance.<\/p>\n<p>For me, and many of us at 80,000 Hours, spending our limited time and resources working on <em>any<\/em> cause that affects the long-run future \u2014 and therefore <em>not<\/em> spending that time on the terrible problems in the world today \u2014 is an <a href=\"https:\/\/80000hours.org\/2021\/02\/why-i-find-longtermism-hard\/\">incredibly emotionally difficult thing to do<\/a>.<\/p>\n<p>But we&#8217;ve gradually investigated these arguments (in the course of trying to figure out how we can do the most good), and over time both gained more expertise about AI and became more concerned about the risk.<\/p>\n<p>We think scepticism is healthy, and are <em>far from certain<\/em> that these arguments completely work. So while this suspicion is definitely a reason to dig a little deeper, we hope that, ultimately, this worry won&#8217;t be treated as a reason to deprioritise what may well be the most important problem of our time.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"this-all-reads-and-feels-like-science-fiction\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-15\">This all reads, and feels, like science fiction<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-15\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"This all reads, and feels, like science fiction\">\n<div class=\"panel-body\">\n<p>That something sounds like science fiction isn&#8217;t a reason in itself to dismiss it outright. There are loads of examples of things first mentioned in sci-fi that then went on to actually happen (this <a href=\"https:\/\/web.archive.org\/web\/20221013020159\/http:\/\/www.technovelgy.com\/ct\/ctnlistalpha.asp\">list of inventions in science fiction<\/a> contains plenty of examples).<\/p>\n<p>There are even a few such cases involving technology that are real existential threats today:<\/p>\n<ul>\n<li>In his 1914 novel <em>The World Set Free<\/em>, H. G. Wells predicted atomic energy fueling powerful explosives \u2014 20 years before we realised there could in theory be nuclear fission chain reactions, and 30 years before nuclear weapons were actually produced. In the 1920s and 1930s, Nobel Prize\u2013winning physicists <a href=\"https:\/\/www.youtube.com\/watch?v=HD3k1hgbUXQ\">Millikan, Rutherford, and Einstein all predicted that we would never be able to use nuclear power<\/a>. Nuclear weapons were literal science fiction before they were reality. <\/li>\n<li>In the 1964 film <em>Dr. Strangelove<\/em>, the USSR builds a doomsday machine that would automatically trigger an extinction-level nuclear event in response to a nuclear strike, but keeps it secret. Dr Strangelove points out that keeping it secret rather reduces its deterrence effect. But we now know that in the 1980s the USSR built an <a href=\"https:\/\/en.wikipedia.org\/wiki\/Dead_Hand\">extremely similar system<\/a>&#8230; and kept it secret.<\/li>\n<\/ul>\n<p>Moreover, there are top academics and researchers working on preventing these risks from AI \u2014 at <a href=\"https:\/\/people.csail.mit.edu\/dhm\/\">MIT<\/a>, <a href=\"https:\/\/www.davidscottkrueger.com\/\">Cambridge<\/a>, <a href=\"https:\/\/www.fhi.ox.ac.uk\/research\/research-areas\/#aisafety_tab\">Oxford<\/a>, <a href=\"https:\/\/humancompatible.ai\/\">UC Berkeley<\/a>, and elsewhere. Two of the world&#8217;s top AI companies (DeepMind and OpenAI) have teams explicitly dedicated to working on technical AI safety. Researchers from these places helped us with this article.<\/p>\n<p>It&#8217;s totally possible all these people are wrong to be worried, but the fact that so many people take this threat seriously undermines the idea that this is merely science fiction.<\/p>\n<p>It&#8217;s reasonable when you hear something that sounds like science fiction to want to investigate it thoroughly before acting on it. But having investigated it, if the arguments seem solid, then simply sounding like science fiction is not a reason to dismiss them.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"may-or-may-not-ever-exist\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-16\">Can it make sense to dedicate my career to solving an issue based on a speculative story about a technology that may or may not ever exist?<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-16\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"Can it make sense to dedicate my career to solving an issue based on a speculative story about a technology that may or may not ever exist?\">\n<div class=\"panel-body\">\n<p>We never know for sure what&#8217;s going to happen in the future. So, unfortunately for us, if we&#8217;re trying to have a positive impact on the world, that means we&#8217;re always having to deal with at least some degree of uncertainty.<\/p>\n<p>We also think there&#8217;s an important distinction between <em>guaranteeing that you&#8217;ve achieved some amount of good<\/em> and <em>doing the very best you can<\/em>. To achieve the former, you can&#8217;t take any risks at all \u2014 and that could mean missing out on the <a href=\"https:\/\/80000hours.org\/articles\/be-more-ambitious\/\">best opportunities to do good<\/a>.<\/p>\n<p>When you&#8217;re dealing with uncertainty, it makes sense to roughly think about the <a href=\"https:\/\/80000hours.org\/articles\/expected-value\/\">expected value<\/a> of your actions: the sum of all the good and bad potential consequences of your actions, weighted by their probability.<\/p>\n<p>Given the stakes are so high, and the risks from AI aren&#8217;t that low, this makes the expected value of helping with this problem high.<\/p>\n<p>We&#8217;re sympathetic to the concern that if you work on AI safety, you might end up doing not much at all when you might have done a tremendous amount of good working on something else \u2014 simply because the problem and our current ideas about what to do about it are so uncertain.<\/p>\n<p>But we think the world will be better off if we decide that some of us should work on solving this problem, so that together we have the best chance of successfully navigating the transition to a world with advanced AI rather than risking an existential crisis.<\/p>\n<p>And it seems like an immensely valuable thing to try.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"is-this-a-form-of-pascals-mugging-taking-a-big-bet-on-tiny-probabilities\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-17\">Is this a form of Pascal's mugging \u2014 taking a big bet on tiny probabilities?<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-17\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"Is this a form of Pascal's mugging \u2014 taking a big bet on tiny probabilities?\">\n<div class=\"panel-body\">\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Pascal%27s_mugging\">Pascal&#8217;s mugging<\/a> is a thought experiment \u2014 a riff on the famous <a href=\"https:\/\/en.wikipedia.org\/wiki\/Pascal%27s_wager\">Pascal&#8217;s wager<\/a> \u2014 where someone making decisions using <a href=\"https:\/\/80000hours.org\/articles\/expected-value\/\">expected value calculations<\/a> can be exploited by claims that they can get something extraordinarily good (or avoid something extraordinarily bad), with an extremely low probability of succeeding.<\/p>\n<p>The story goes like this: a random mugger stops you on the street and says, &#8220;Give me your wallet or I&#8217;ll cast a spell of torture on you and everyone who has ever lived.&#8221; You can&#8217;t rule out with 100% probability that he won&#8217;t \u2014 after all, nothing&#8217;s 100% for sure. And <em>torturing everyone who&#8217;s ever lived<\/em> is so bad that surely even avoiding a tiny, tiny probability of that is worth the $40 in your wallet? But intuitively, it seems like you shouldn&#8217;t give your wallet to someone just because they threaten you with something completely implausible.<\/p>\n<p>Analogously, you could worry that working on AI safety means giving your valuable time to avoid a tiny, tiny chance of catastrophe. Working on reducing risks from AI isn&#8217;t free \u2014 the opportunity cost is quite substantial, as it means you forgo working on other extremely important things, like <a href=\"\/preventing-catastrophic-pandemics\/\">reducing risks from pandemics<\/a> or <a href=\"\/problem-profiles\/factory-farming\/\">ending factory farming<\/a>.<\/p>\n<p>Here&#8217;s the thing though: while there&#8217;s lots of value at stake \u2014 perhaps the lives of everybody alive today, and the entirety of the future of humanity \u2014 it&#8217;s not the case that the probability that you can make a difference by working on reducing risks from AI is small enough for this argument to apply.<\/p>\n<p>We <em>wish<\/em> the chance of an AI catastrophe was that vanishingly small.<\/p>\n<p>Instead, we think <a href=\"https:\/\/web.archive.org\/web\/20221013020223\/https:\/\/forum.effectivealtruism.org\/posts\/vYb2qEyqv76L62izD\/saying-ai-safety-research-is-a-pascal-s-mugging-isn-t-a\">the probability of such a catastrophe (I think, around 1% this century) is much, much larger than things that people try to prevent all the time<\/a> \u2014 such as <a href=\"https:\/\/web.archive.org\/web\/20221013020349\/https:\/\/www.reuters.com\/business\/aerospace-defense\/aviation-deaths-rise-worldwide-2020-even-fatal-incidents-flights-fall-2021-01-01\/\">fatal plane crashes, which happen in 0.00002% of flights<\/a>.<\/p>\n<p>What really matters, though, is the extent to which your work can reduce the chance of a catastrophe.<\/p>\n<p>Let&#8217;s look at working on reducing risks from AI. For example, if:<\/p>\n<ol>\n<li>There&#8217;s a 1% chance of an AI-related existential catastrophe by 2100<\/li>\n<li>There&#8217;s a 30% chance that we can find a way to prevent this by technical research<\/li>\n<li>Five people working on technical AI safety raises the chances of solving the problem by 1% of that 30% (so 0.3 percentage points)<\/li>\n<\/ol>\n<p>Then each person involved has a 0.00006 percentage point share in preventing this catastrophe.<\/p>\n<p>Other ways of acting altruistically involve similarly sized probabilities.<\/p>\n<p>The chances of a <a href=\"https:\/\/web.archive.org\/web\/20221013020404\/https:\/\/www.overcomingbias.com\/2012\/09\/if-elections-arent-a-pascals-mugging-existential-risk-shouldnt-be-either.html\">volunteer campaigner swinging a US presidential election<\/a> is somewhere between 0.001% and 0.00001%. But you can still justify working on a campaign because of the large impact you expect you&#8217;d have on the world if your preferred candidate won.<\/p>\n<p>You have even lower chances of wild success from things like trying to reform political institutions, or working on some very fundamental science research to build knowledge that might one day help cure cancer.<\/p>\n<p>Overall, <em>as a society<\/em>, we may be able to reduce the chance of an AI-related catastrophe all the way down from 10% (or higher) to close to zero \u2014 that&#8217;d be clearly worth it for a group of people, so it has to be worth it for the individuals, too.<\/p>\n<p>We wouldn&#8217;t want to just not do fundamental science because each researcher has a low chance of making the next big discovery, or not do any peacekeeping because any one person has a low chance of preventing World War III. As a society, we need some people working on these big issues \u2014 and maybe you can be one of them.<\/p>\n<\/div><\/div><\/div>\n<\/div>\n<h2><span id=\"what-can-you-do-concretely-to-help\" class=\"toc-anchor\"><\/span>What you can do concretely to help<\/h2>\n<p>As we mentioned above, we know of two main ways to help reduce existential risks from AI:<\/p>\n<ol>\n<li>Technical AI safety research<\/li>\n<li>AI governance and policy work<\/li>\n<\/ol>\n<p><strong>The biggest way you could help would be to pursue a career in either one of these areas, or in a supporting area.<\/strong><\/p>\n<p>The first step is learning a lot more about the technologies, problems, and possible solutions. We&#8217;ve collated some lists of our favourite resources <a href=\"#top-resources-to-learn-more\">here<\/a>, and our top recommendation is to take a look at the <a href=\"https:\/\/www.agisafetyfundamentals.com\/ai-alignment-curriculum\">technical alignment curriculum<\/a> from AGI Safety Fundamentals.<\/p>\n<h3><span id=\"technical-ai-safety\" class=\"toc-anchor\"><\/span>Technical AI safety<\/h3>\n<p>If you&#8217;re interested in a career in technical AI safety, the best place to start is our <a href=\"https:\/\/80000hours.org\/career-reviews\/ai-safety-researcher\/\">career review of being an AI safety researcher<\/a>.<\/p>\n<p>If you want to learn more about technical AI safety as a field of research \u2014 e.g. the different techniques, schools of thought, and threat models \u2014 our top recommendation is to take a look at the <a href=\"https:\/\/www.agisafetyfundamentals.com\/ai-alignment-curriculum\">technical alignment curriculum<\/a> from AGI Safety Fundamentals.<\/p>\n<p>It&#8217;s important to note that <em>you don&#8217;t have to be an academic or an expert in AI or AI safety to contribute to AI safety research<\/em>. For example, <a href=\"\/career-reviews\/software-engineering\/\">software engineers<\/a> are needed at many places conducting technical safety research, and we also highlight more roles <a href=\"#complementary-yet-crucial-roles\">below<\/a>.<\/p>\n<p>You can see a list of <a href=\"https:\/\/80000hours.org\/career-reviews\/ai-safety-researcher\/#key-organisations\">key organisations<\/a> where you might do this kind of work in the full career review.<\/p>\n<h3><span id=\"ai-governance-and-policy-work\" class=\"toc-anchor\"><\/span>AI governance and policy work<\/h3>\n<p>If you&#8217;re interested in a career in AI governance and policy, the best place to start is our  <a href=\"https:\/\/80000hours.org\/career-reviews\/ai-policy-and-strategy\/\">AI governance and policy career review<\/a>.<\/p>\n<p>You don&#8217;t need to be a bureaucrat in a grey suit to have a career in AI governance and policy \u2014 there are roles suitable for a wide range of <a href=\"\/skills\/\">skill sets<\/a>. In particular, people with technical skills in machine learning and related fields are needed for governance work (although those skills are certainly not necessary).<\/p>\n<p>We split this career path into six different kinds of roles:<\/p>\n<ol>\n<li><a href=\"https:\/\/80000hours.org\/career-reviews\/ai-policy-and-strategy\/#governmentwork\">Government roles<\/a><\/li>\n<li><a href=\"https:\/\/80000hours.org\/career-reviews\/ai-policy-and-strategy\/#research\">Research<\/a><\/li>\n<li><a href=\"https:\/\/80000hours.org\/career-reviews\/ai-policy-and-strategy\/#industry\">Industry work<\/a><\/li>\n<li><a href=\"https:\/\/80000hours.org\/career-reviews\/ai-policy-and-strategy\/#advocacy\">Advocacy and lobbying<\/a><\/li>\n<li><a href=\"https:\/\/80000hours.org\/career-reviews\/ai-policy-and-strategy\/#third-party\">Third-party auditing and evaluation<\/a><\/li>\n<li><a href=\"https:\/\/80000hours.org\/career-reviews\/ai-policy-and-strategy\/#international\">International work and coordination<\/a><\/li>\n<\/ol>\n<p>We also have specific articles on <a href=\"https:\/\/80000hours.org\/articles\/us-ai-policy\/\">working in US AI policy<\/a> and <a href=\"https:\/\/80000hours.org\/career-reviews\/china-related-ai-safety-and-governance-paths\/\">China-related AI safety and governance paths<\/a>.<\/p>\n<p>And you can learn more about <a href=\"https:\/\/80000hours.org\/career-reviews\/ai-policy-and-strategy\/#wheretowork\">where specifically you might work<\/a> in this career path in our career review.<\/p>\n<p>If you&#8217;re new to the topic and interested in learning more broadly about AI governance, our top recommendation is to take a look at the <a href=\"https:\/\/www.agisafetyfundamentals.com\/ai-governance-curriculum\">governance curriculum<\/a> from AGI safety fundamentals.<\/p>\n<h3><span id=\"complementary-yet-crucial-roles\" class=\"toc-anchor\"><\/span>Complementary (yet crucial) roles<\/h3>\n<p>Even in a research organisation, around half of the staff will be doing other tasks essential for the organisation to perform at its best and have an impact. Having high-performing people in these roles is crucial.<\/p>\n<p>We think the importance of these roles is often underrated because the work is less visible. So we&#8217;ve written several career reviews on these areas to help more people enter these careers and succeed, including:<\/p>\n<ul>\n<li><a href=\"https:\/\/80000hours.org\/articles\/operations-management\/\">Operations management<\/a> to help impactful organisations grow and function as effectively as possible.<\/li>\n<li><a href=\"https:\/\/80000hours.org\/career-reviews\/research-management\/\">Research management<\/a> at an AI safety research organisation.<\/li>\n<li>Being an <a href=\"https:\/\/80000hours.org\/career-reviews\/executive-assistant-for-an-impactful-person\/\">executive assistant<\/a> to someone who&#8217;s doing really important work on safety and governance.<\/li>\n<\/ul>\n<h3><span id=\"other-ways-to-help\" class=\"toc-anchor\"><\/span>Other ways to help<\/h3>\n<p>AI safety is a big problem and it needs help from people doing a lot of different kinds of  work.<\/p>\n<p>One major way to help is to work in a role that directs funding or people towards AI risk, rather than working on the problem directly. We&#8217;ve reviewed a few career paths along these lines, including:<\/p>\n<ul>\n<li><a href=\"https:\/\/80000hours.org\/career-reviews\/founder-impactful-organisations\/\">Founding new projects<\/a> \u2014 in this case, starting new initiatives aimed at reducing risks from advanced AI.<\/li>\n<li>Being a <a href=\"https:\/\/80000hours.org\/career-reviews\/grantmaker\/\">grantmaker<\/a> to fund promising projects focused on reducing catastrophic AI risk.<\/li>\n<li>Working in <a href=\"https:\/\/80000hours.org\/articles\/communication\/\">communication roles<\/a>.<\/li>\n<li>Helping to build communities of people working on this problem. The most relevant community is the AI safety community itself, but it could also be impactful to help <a href=\"https:\/\/80000hours.org\/career-reviews\/work-in-effective-altruism-organisations\/\">build the community of people working on the world&#8217;s most pressing problems<\/a> (including risks from AI). <\/li>\n<\/ul>\n<p>There are ways all of these could go wrong, so the first step is to <a href=\"#top-resources-to-learn-more\">become well-informed about the issue<\/a>.<\/p>\n<p>There are also other technical roles besides safety research that could help contribute, like:<\/p>\n<ul>\n<li>Working in <a href=\"https:\/\/80000hours.org\/career-reviews\/information-security\/\">information security<\/a> to protect AI (or the results of key experiments) from misuse, theft, or tampering.<\/li>\n<li>Becoming an <a href=\"https:\/\/80000hours.org\/career-reviews\/become-an-expert-in-ai-hardware\/\">expert in AI hardware<\/a> as a way of steering AI progress in safer directions.<\/li>\n<\/ul>\n<p>You can read about all these careers \u2014 why we think they&#8217;re helpful, how to enter them, and how you can predict whether they&#8217;re a good fit for you \u2014 on our <a href=\"https:\/\/80000hours.org\/career-reviews\/\">career reviews page<\/a>.<\/p>\n<div class=\"well bg-gray-lighter margin-bottom margin-top padding-top-small padding-bottom-small\">\n<h3><span id=\"want-one-on-one-advice-on-pursuing-this-path\" class=\"toc-anchor\"><\/span>Want one-on-one advice on pursuing this path?<\/h3>\n<p>We think that the risks posed by the development of AI may be the most pressing problem the world currently faces. If you think you might be a good fit for any of the above career paths that contribute to solving this problem, we&#8217;d be <em>especially<\/em> excited to advise you on next steps, one-on-one.<\/p>\n<p>We can help you consider your options, make connections with others working on reducing risks from AI, and possibly even help you find jobs or funding opportunities \u2014 all for free.<\/p>\n<p><a href=\"\/speak-with-us\/?int_campaign=2022-08__ai-problem-profile\" title=\"\" class=\"btn btn-primary\">APPLY TO SPEAK WITH OUR TEAM<\/a><\/p>\n<\/div>\n<h3><span id=\"find-vacancies-on-our-job-board\" class=\"toc-anchor\"><\/span>Find vacancies on our job board<\/h3>\n<p><script>\n    function getLocationString(arr) {\n      if (arr.length <= 3) { \n        return arr.join(\"<br \/>\");\n      }\n      return arr.slice(0, 3).join(\"<br \/>\") + \"...\";\n    }\n  <\/script><script>\n    function getUniqueCompanyJobs(jobs, limit) {\n      const uniqueCompanies = new Set();\n      const uniqueJobs = [];\n      const additionalJobs = [];\n      for (const job of jobs) {\n          const company = job.company_name;\n          if (!uniqueCompanies.has(company)) {\n              uniqueCompanies.add(company);\n              uniqueJobs.push(job);\n          } else {\n              additionalJobs.push(job);\n          }\n      }\n      return uniqueJobs.concat(additionalJobs).slice(0, limit);\n    }\n  <\/script><script>\n    window.addEventListener(\"load\", function() {\n        const container = document.querySelector(\"#vacancies-1\");\n        if (container) {\n          const searchClient = algoliasearch(\"W6KM1UDIB3\", \"d1d7f2c8696e7b36837d5ed337c4a319\");\n          searchClient.initIndex(\"jobs_prod\"); \n          const search = instantsearch({\n            indexName: \"jobs_prod\",\n            searchClient,\n          });\n          search.addWidget(\n            instantsearch.widgets.configure({\n              facetFilters: [[\"tags_area:AI safety & policy\"]],\n              hitsPerPage: 10,\n            })\n          );\n          search.addWidget({\n            render(options) {\n              const results = getUniqueCompanyJobs(options.results.hits, 5);\n              results.forEach(item => {\n                item.post_pk = DOMPurify.sanitize(item.post_pk);\n                item.company.logo_url = DOMPurify.sanitize(item.company.logo_url);\n                item.title = DOMPurify.sanitize(item.title);\n                item.company.name = DOMPurify.sanitize(item.company.name);\n                item.card_locations = DOMPurify.sanitize(getLocationString(item.card_locations));\n                item.posted_at_relative = DOMPurify.sanitize(item.posted_at_relative);\n              });\n              container.innerHTML = results.map(item => {\n                return `<\/p>\n<li class=\"vacancy border\">\n                    <a href=\"https:\/\/jobs.80000hours.org\/?jobPk=${item.post_pk}\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"vacancy-summary pt-2 pb-2\"><\/p>\n<div class=\"col-12\">\n<div class=\"row\" style=\"position: relative;\">\n<div class=\"col-sm-8\" style=\"overflow: hidden;\">\n<div class=\"vacancy__org-logo\">\n                              <img decoding=\"async\" src=\"${item.company.logo_url}\">\n                            <\/div>\n<div class=\"vacancy__job-title-and-org-name\">\n<h5 class=\"vacancy__job-title tw--line-clamp-2\">${item.title}<\/h5>\n<p class=\"vacancy__org-name tw--line-clamp-2\">${item.company.name}<\/p><\/div><\/div>\n<div class=\"col-sm-4 text-right hidden-xs vacancy__location-and-date-listed\">\n<p class=\"pr-1\">${item.card_locations}<br \/>${item.posted_at_relative}<\/p><\/div><\/div><\/div>\n<p>                    <\/a>\n                  <\/li>\n<p>`;\n              }).join(\"\");\n            }\n          });\n          search.start();\n        }\n      });\n    <\/script><\/p>\n<p>Our job board features opportunities in AI technical safety and governance:<\/p>\n<ul id=\"vacancies-1\" class=\"!tw--p-0 no-visited-styling\"><\/ul>\n<p><a href=https:\/\/jobs.80000hours.org\/?refinementList%5Btags_area%5D%5B0%5D=AI%20safety%20%26%20policy class=\"btn btn-primary\" target=\"_blank\">View all opportunities<\/a><\/p>\n<p><span id=\"top-resources-to-learn-more\"><\/span><\/p>\n<h2><span id=\"top-resources-to-learn-more\" class=\"toc-anchor\"><\/span>Top resources to learn more<\/h2>\n<p>We&#39;ve hit you with a lot of further reading throughout this article \u2014 here are a few of our favourites:<\/p>\n<ul>\n<li><a href=\"https:\/\/web.archive.org\/web\/20221012020606\/https:\/\/www.cold-takes.com\/ai-could-defeat-all-of-us-combined\/\">AI could defeat all of us combined<\/a> and <a href=\"https:\/\/web.archive.org\/web\/20221013022027\/https:\/\/www.cold-takes.com\/most-important-century\/\">the &quot;most important century&quot; blog post series<\/a> by Holden Karnofsky, co-CEO of Open Philanthropy, argues that the 21st century could be the most important century ever for humanity as a result of AI.<\/li>\n<li><a href=\"https:\/\/web.archive.org\/web\/20221013022057\/https:\/\/www.cold-takes.com\/why-ai-alignment-could-be-hard-with-modern-deep-learning\/\">Why AI alignment could be hard with modern deep learning<\/a> by Open Philanthropy researcher Cotra is a gentle introduction to how risks from power-seeking AI could play out with current machine learning methods. <a href=\"https:\/\/web.archive.org\/web\/20221013014109\/https:\/\/www.alignmentforum.org\/posts\/pRkFkzwKZ2zfa3R6H\/without-specific-countermeasures-the-easiest-path-to\">Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover<\/a>, also by Cotra, provides a much more detailed description of how risks could play out (which we&#39;d recommend for people familiar with ML).<\/li>\n<li><a href=\"https:\/\/www.alignmentforum.org\/s\/mzgtmmTKKn5MuCzFJ\">AGI safety from first principles<\/a> provides OpenAI governance researcher Richard Ngo&#39;s perspective on how to think about risks from artificial general intelligence.<\/li>\n<li><a href=\"https:\/\/doi.org\/10.48550\/arXiv.2206.13353\">Is power-seeking AI an existential risk?<\/a> by Open Philanthropy researcher Joseph Carlsmith is an in-depth look covering exactly how and why AI could cause the disempowerment of humanity (but watch out \u2014 it&#39;s even longer than this article!). It&#39;s also available as an <a href=\"https:\/\/open.spotify.com\/episode\/5PokyqXCw4hpV5u0rc5Lio\">audio narration<\/a>. For a shorter summary, see Carlsmith&#39;s <a href=\"https:\/\/forum.effectivealtruism.org\/posts\/ChuABPEXmRumcJY57\/video-and-transcript-of-presentation-on-existential-risk\">talk on the same topic<\/a>.<\/li>\n<li><a href=\"https:\/\/web.archive.org\/web\/20221013022443\/https:\/\/www.alignmentforum.org\/posts\/qYzqDtoQaZ3eDDyxa\/distinguishing-ai-takeover-scenarios\">Distinguishing AI takeover scenarios<\/a> by Sam Clarke and Sammy Martin summarises various ways in which AI could go wrong.<\/li>\n<li><a href=\"https:\/\/web.archive.org\/web\/20221013022448\/https:\/\/forum.effectivealtruism.org\/posts\/42reWndoTEhFqu6T8\/ai-governance-opportunity-and-theory-of-impact\">AI governance: Opportunity and theory of impact<\/a> by DeepMind governance lead Allan Dafoe explores ways in which research into AI governance could effect change.<\/li>\n<li><a href=\"https:\/\/www.alignmentforum.org\/posts\/SQ9cZtfrzDJmw9A2m\/my-overview-of-the-ai-alignment-landscape-a-bird-s-eye-view\">A bird&#39;s-eye view of the AI alignment landscape<\/a> by Neel Nanda summarises the different ways in which technical alignment research could reduce the risk from AI.<\/li>\n<li><a href=\"https:\/\/doi.org\/10.48550\/arXiv.2012.07532\">An overview of 11 proposals for building safe advanced AI<\/a> by Evan Hubinger discusses and evaluates plausible techniques for AI alignment.<\/li>\n<li>Podcasts: The <a href=\"https:\/\/axrp.net\/\"><em>AI X-risk Research Podcast<\/em><\/a>, particularly <a href=\"https:\/\/axrp.net\/episode\/2021\/12\/02\/episode-12-ai-xrisk-paul-christiano.html\">episode 12 with Paul Christiano<\/a> and <a href=\"https:\/\/axrp.net\/episode\/2022\/03\/31\/episode-13-first-principles-agi-safety-richard-ngo.html\">episode 13 with Richard Ngo<\/a> \u2014 both of which serve as excellent introductions to AI risk.<\/li>\n<\/ul>\n<p>On <em>The 80,000 Hours Podcast<\/em>, we have a <a href=\"https:\/\/80000hours.org\/podcast\/on-artificial-intelligence\/\">number of in-depth interviews<\/a> with people actively working to positively shape the development of artificial intelligence:<\/p>\n<ul>\n<li><a href=\"https:\/\/80000hours.org\/podcast\/episodes\/paul-christiano-ai-alignment-solutions\/\">Paul Christiano on how OpenAI is developing real solutions to the &#39;AI alignment problem&#39;, and his vision of how humanity will progressively hand over decision-making to AI systems<\/a> <\/li>\n<li><a href=\"https:\/\/80000hours.org\/podcast\/episodes\/allan-dafoe-politics-of-ai\/\">Allan Dafoe on trying to prepare the world for the possibility that AI will destabilise global politics<\/a><\/li>\n<li>Carl Shulman on <a href=\"https:\/\/80000hours.org\/podcast\/episodes\/carl-shulman-economy-agi\/\">the economy and national security<\/a> and <a href=\"https:\/\/80000hours.org\/podcast\/episodes\/carl-shulman-society-agi\/\">government and society<\/a> after AGI<\/li>\n<li><a href=\"https:\/\/80000hours.org\/podcast\/episodes\/richard-ngo-large-language-models\/\">Richard Ngo on large language models, OpenAI, and striving to make the future go well<\/a><\/li>\n<li><a href=\"https:\/\/80000hours.org\/podcast\/episodes\/ajeya-cotra-accidentally-teaching-ai-to-deceive-us\/\">Ajeya Cotra on accidentally teaching AI models to deceive us<\/a><\/li>\n<li>Jan Leike on <a href=\"https:\/\/80000hours.org\/2018\/03\/jan-leike-ml-alignment\/\">how to become a machine learning alignment researcher<\/a> (from 2018) and <a href=\"https:\/\/80000hours.org\/podcast\/episodes\/jan-leike-superalignment\/\">OpenAI&#39;s massive push to make superintelligence safe in 4 years or less<\/a> (from 2023)<\/li>\n<li><a href=\"https:\/\/80000hours.org\/podcast\/episodes\/nathan-labenz-openai-red-team-safety\/\">Nathan Labenz on the final push for AGI, understanding OpenAI&#39;s leadership drama, and red-teaming frontier models<\/a><\/li>\n<li><a href=\"https:\/\/80000hours.org\/podcast\/episodes\/rohin-shah-deepmind-doomers-and-doubters\/\">Rohin Shah on DeepMind and trying to fairly hear out both AI doomers and doubters<\/a> <\/li>\n<li><a href=\"https:\/\/80000hours.org\/podcast\/episodes\/tom-davidson-how-quickly-ai-could-transform-the-world\/\">Tom Davidson on how quickly AI could transform the world<\/a><\/li>\n<li><a href=\"https:\/\/80000hours.org\/2017\/07\/podcast-the-world-needs-ai-researchers-heres-how-to-become-one\/\">Dario Amodei on OpenAI and how AI will change the world for good and ill<\/a><\/li>\n<li><a href=\"https:\/\/80000hours.org\/2017\/06\/the-world-desperately-needs-ai-strategists-heres-how-to-become-one\">Miles Brundage on the world&#39;s desperate need for AI strategists and policy experts<\/a> <\/li>\n<li>Holden Karnofsky, cofounder of GiveWell and Open Philanthropy, has been on three of our podcasts, explaining:\n<ul>\n<li><a href=\"https:\/\/80000hours.org\/podcast\/episodes\/holden-karnofsky-how-ai-could-take-over-the-world\/\">How AIs might take over even if they&#39;re no smarter than humans, and his four-part playbook for AI risk<\/a><\/li>\n<li><a href=\"https:\/\/80000hours.org\/2018\/02\/holden-karnofsky-open-philanthropy\/\">How philanthropy can have maximum impact by taking big risks<\/a> (including a discussion of his work in positively shaping the development of AI)<\/li>\n<li><a href=\"https:\/\/80000hours.org\/podcast\/episodes\/holden-karnofsky-most-important-century\/\">Why this might be the most important century<\/a><\/li>\n<\/ul>\n<\/li>\n<li><a href=\"https:\/\/80000hours.org\/podcast\/episodes\/olsson-and-ziegler-ml-engineering-and-safety\/\">PhD or programming? Fast paths into aligning AI as a machine learning engineer, according to ML engineers Catherine Olsson &amp; Daniel Ziegler<\/a><\/li>\n<\/ul>\n<p>If you want to go into much more depth, the <a href=\"https:\/\/www.agisafetyfundamentals.com\/\">AGI safety fundamentals<\/a> course is a good starting point. There are two tracks to choose from: <a href=\"https:\/\/www.agisafetyfundamentals.com\/ai-alignment-curriculum\">technical alignment<\/a> or <a href=\"https:\/\/www.agisafetyfundamentals.com\/ai-governance-curriculum\">AI governance<\/a>. If you have a more technical background, you could try <a href=\"https:\/\/course.mlsafety.org\/about\"><em>Intro to ML Safety<\/em><\/a>, a course from the <a href=\"https:\/\/www.safe.ai\/\">Center for AI Safety<\/a>.<\/p>\n<p class=\"pt-1\">And finally, here are a few general sources (rather than specific articles) that you might want to explore:<\/p>\n<ul>\n<li>The <a href=\"https:\/\/www.alignmentforum.org\/\">AI Alignment Forum<\/a>, which is aimed at researchers working in technical AI safety.<\/li>\n<li><a href=\"https:\/\/aiimpacts.org\/\">AI Impacts<\/a>, a project that aims to improve society&#39;s understanding of the likely impacts of human-level artificial intelligence.<\/li>\n<li>The <a href=\"https:\/\/rohinshah.com\/alignment-newsletter\/\">Alignment Newsletter<\/a>, a weekly publication with recent content relevant to AI alignment with thousands of subscribers.<\/li>\n<li><a href=\"https:\/\/jack-clark.net\/\">Import AI<\/a>, a weekly newsletter about artificial intelligence by Jack Clark (cofounder of Anthropic), read by more than 10,000 experts.<\/li>\n<li>Jeff Ding&#39;s <a href=\"https:\/\/chinai.substack.com\/\">ChinAI Newsletter<\/a>, weekly translations of writings from Chinese thinkers on China&#39;s AI landscape.<\/li>\n<\/ul>\n<div class=\"tw--mt-6 tw--p-3 tw--pt-2 tw--bg-gray-lighter tw--rounded-md \">\n<h3 class=\"no-toc\">\t\t<a class=\"no-visited-styling tw--text-off-black hover:tw--text-off-black hover:tw--no-underline focus:tw--text-off-black\" href=\"https:\/\/80000hours.org\/problem-profiles\/\">\t\t\t<small>Read next:&nbsp;<\/small>\t\t\tExplore other pressing world problems\t\t<\/a>\t<\/h3>\n<div class=\"tw--grid xs:tw--grid-flow-col tw--gap-3\">\n<div class=\"xs:tw--order-last tw--pt-1\">\t\t\t<a href=\"https:\/\/80000hours.org\/problem-profiles\/\">\t\t\t\t<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/80000hours.org\/wp-content\/uploads\/2022\/10\/sea-ocean-sky-night-cosmos-view-826635-pxhere.com_-720x448.jpg\" alt=\"Decorative post preview\" width=\"720\" height=\"448\">\t\t\t<\/a>\t\t<\/div>\n<div class=\"\">\n<div class=\"tw--pb-3\">\n<p>Want to learn more about global issues we think are especially pressing? See our list of issues that are large in scale, solvable, and neglected, according to our research.<\/p><\/div>\n<div class=\"\">\t\t\t\t<a href=\"https:\/\/80000hours.org\/problem-profiles\/\" class=\"btn btn-primary\">Continue &rarr;<\/a>\t\t\t<\/div><\/div><\/div><\/div>\n<h2><span id=\"acknowledgements\" class=\"toc-anchor\"><\/span>Acknowledgements<\/h2>\n<p><em>Huge thanks to Joel Becker, Tamay Besiroglu, Jungwon Byun, Joseph Carlsmith, Jesse Clifton, Emery Cooper, Ajeya Cotra, Andrew Critch, Anthony DiGiovanni, Noemi Dreksler, Ben Edelman, Lukas Finnveden, Emily Frizell, Ben Garfinkel, Katja Grace, Lewis Hammond, Jacob Hilton, Samuel Hilton, Michelle Hutchinson, Caroline Jeanmaire, Kuhan Jeyapragasan, Arden Koehler, Daniel Kokotajlo, Victoria Krakovna, Alex Lawsen, Howie Lempel, Eli Lifland, Katy Moore, Luke Muehlhauser, Neel Nanda, Linh Chi Nguyen, Luisa Rodriguez, Caspar Oesterheld, Ethan Perez, Charlie Rogers-Smith, Jack Ryan, Rohin Shah, Buck Shlegeris, Marlene Staib, Andreas Stuhlm\u00fcller, Luke Stebbing, Nate Thomas, Benjamin Todd, Stefan Torges, Michael Townsend, Chris van Merwijk, Hjalmar Wijk, and Mark Xu for either reviewing this article or their extremely thoughtful and helpful comments and conversations. (This isn&#8217;t to say that they would all agree with everything we&#8217;ve said here \u2014 in fact, we&#8217;ve had many spirited disagreements in the comments on this article!)<\/em><\/p>\n","protected":false},"author":423,"featured_media":87151,"parent":0,"menu_order":0,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":"[fn aidigest]The [AI Digest](https:\/\/theaidigest.org\/) compares state-of-the-art models and tracks ongoing progress in the technology.[\/fn]\r\n\r\n[fn sentience]I'm also concerned about the possibility that AI systems could deserve moral consideration for their own sake \u2014 for example, because they are sentient. I'm not going to discuss this possibility in this article; we instead cover artificial sentience in a separate article [here](https:\/\/80000hours.org\/problem-profiles\/artificial-sentience\/).[\/fn]\r\n\r\n[fn intelligence] What do we mean by 'intelligence' in this context? Something like \"the ability to predictably influence the future.\" This involves understanding the world well enough to make plans that can actually work, and the ability to carry out those plans. Humans having the ability to predictably influence the future means they have been able to shape the world around them to fit their goals and desires. We go into more detail on the importance of the ability to make and execute plans [later in this article](#aps-systems). [\/fn]\r\n\r\n[fn garfinkelepistemics]It's hard to know how to deal with this lack of research \u2014 we may be less concerned because this is evidence that researchers have chosen not to focus on this risk (and therefore, assuming they're more likely to focus on big risks, that the risk is smaller), or we may be more concerned because the risk seems [more neglected overall](#neglectedness).  {.doNotRemove}\r\n\r\nBen Garfinkel \u2014 a researcher at the [Centre for the Governance of AI](https:\/\/www.governance.ai\/) \u2014 has pointed out that concern among the existential risk community about different risks is somewhat correlated with how hard to analyse these risks are. He continues that:\r\n\r\n> It doesn't at all follow that the community is irrational to worry far more about misaligned AI than other potential risks. It's completely coherent to have something like this attitude: \"If I could think more clearly about the risk from misaligned AI, then I would probably come to realize it's not that big a deal. But, in practice, I can't yet think very clearly about it. That means that, unlike in the case of climate change, I also can't rule out the small possibility that clarity would make me much more worried about it than I currently am. So, on balance, I should feel more worried about misaligned AI than I do about other risks. I should focus my efforts on it, even if \u2014 to uncharitable observers \u2014 my efforts will probably look a bit misguided after the fact.\r\n\r\nFor more, read Garfinkel's post [here](https:\/\/web.archive.org\/web\/20221016004022\/https:\/\/forum.effectivealtruism.org\/posts\/M68oj7fwXoPFJisap\/we-should-expect-to-worry-more-about-speculative-risks).  \r\n[\/fn]\r\n\r\n[fn dmoai]DeepMind's [safety team](https:\/\/web.archive.org\/web\/20221016004051\/https:\/\/deepmindsafetyresearch.medium.com\/building-safe-artificial-intelligence-52f5f75058f1) and OpenAI's [alignment team](https:\/\/openai.com\/alignment\/) focus on technical AI safety research, some of which would mitigate the risks discussed in this article. We've spoken to researchers on both these teams who have told us that they believe that artificial intelligence poses the most significant existential risk to humanity this century, and that their research attempts to reduce this risk. In the same vein:\r\n\r\n* In 2011, Shane Legg, cofounder and chief scientist at DeepMind, [said that](https:\/\/web.archive.org\/web\/20221016004215\/https:\/\/www.lesswrong.com\/posts\/No5JpRCHzBrWA4jmS\/q-and-a-with-shane-legg-on-risks-from-ai) AI is his \"number 1 [existential] risk for this century, with an engineered biological pathogen coming a close second.\"\r\n* Sam Altman, cofounder and CEO at OpenAI, has at times expressed concerns, though he seems to be very optimistic about AI's impacts overall. For example, in his [2021 interview with Ezra Klein](https:\/\/web.archive.org\/web\/20221016004421\/https:\/\/www.nytimes.com\/2021\/06\/11\/podcasts\/transcript-ezra-klein-interviews-sam-altman.html), he was asked about the incentive systems around building AI. He said he thinks the current systems address lots of problems, but \"the one that remains that I am \u2014 for the entire field, not just us \u2014 most concerned about is actually closer to the super powerful systems like the ones that people talk about creating an existential risk to humanity.\"\r\n* We've interviewed some top researchers from these organisations on *The 80,000 Hours Podcast*, including [Dario Amodei, former vice president of research at OpenAI](https:\/\/80000hours.org\/podcast\/episodes\/the-world-needs-ai-researchers-heres-how-to-become-one\/) (he's now cofounder and CEO of Anthropic, another AI lab), [Jan Leike, former research scientist at DeepMind](https:\/\/80000hours.org\/podcast\/episodes\/jan-leike-ml-alignment\/) (he's now Alignment team lead at OpenAI), [Jack Clarke, Amanda Askell, and Miles Brundage on the OpenAI policy team](https:\/\/80000hours.org\/podcast\/episodes\/openai-askell-brundage-clark-latest-in-ai-policy-and-strategy\/) (Clarke is now cofounder at Anthropic, Askell is a member of technical staff at Anthropic, and Brundage is head of policy research at OpenAI). All have expressed concern about the consequences of AI for the future of humanity. [\/fn]\r\n\r\n[fn researchgroups]Academics at all these research groups are included on the [list of professors who say they are working on AI safety because they believe this work will reduce existential risk](https:\/\/futureoflife.org\/team\/ai-existential-safety-community\/). This list is maintained by the [Future of Life Institute](https:\/\/futureoflife.org). The list includes academics from these and other universities.[\/fn]\r\n\r\n[fn threesurveys] \r\n\r\nThe four surveys were:\r\n\r\n* [Grace et al. (2024)](https:\/\/arxiv.org\/abs\/2401.02843), conducted in 2023\r\n* [Grace et al. (2022)](https:\/\/aiimpacts.org\/2022-expert-survey-on-progress-in-ai\/), conducted in 2022\r\n* [Zhang et al. (2022)](https:\/\/doi.org\/10.48550\/arXiv.2206.04132), conducted in 2019\r\n* [Grace et al. (2018)](https:\/\/doi.org\/10.1613\/jair.1.11222), conducted in 2016\r\n\r\nAll four surveys contacted researchers who published at NeurIPS and ICML conferences. \r\n\r\nGrace et al. (2024) contacted researchers who published at NeurIPS, IMCL, or four other top AI venues (ICLR, AAAI, JMLR and IJCAI). It was distributed to 18,459 researchers, receiving 2,778 responses (a 15% response rate).\r\n\r\nGrace et al. (2022) contacted 4,271 researchers who published at the 2021 conferences (all the researchers were randomly allocated to either the Stein-Perlman et al. survey or a second survey run by others) and received 738 responses (a 17% response rate).\r\n\r\nZhang et al. (2022) contacted all 2,652 researchers who published at the 2018 conferences and received 524 responses (a 20% response rate), although due to a technical error only 296 responses could be used.\r\n\r\nGrace et al. (2018) contacted all 1,634 researchers who published at the 2015 conferences and received 352 responses (a 21% response rate).[\/fn]\r\n\r\n[fn selection]\r\nKatja Grace, who conducted the 2016, 2022 and 2023 surveys, [notes on her blog](https:\/\/web.archive.org\/web\/20221016004704\/https:\/\/aiimpacts.org\/some-survey-results\/) that the framing of questions noticeably changes the answers given:\r\n\r\n> People consistently give later forecasts if you ask them for the probability in N years instead of the year that the probability is M. We saw this in the straightforward HLMI [high-level machine intelligence] question, and most of the tasks and occupations, and also in most of these things when we tested them on mturk people earlier. For HLMI for instance, if you ask when there will be a 50% chance of HLMI you get a median answer of 40 years, yet if you ask what the probability of HLMI is in 40 years, you get a median answer of 30%.\r\n\r\n[Our interview with Katja](https:\/\/80000hours.org\/podcast\/episodes\/katja-grace-forecasting-technology\/\r\n) goes into more detail on the possible limitations of the 2016 survey.[\/fn]\r\n\r\n[fn median]By \"the median researcher thought that the chances were *x*%,\" we mean \"over half of researchers thought that the chances were greater than or equal to *x*%.\"[\/fn]\r\n\r\n[fn hlmi]\r\nIn the surveys by Grace et al., researchers were asked about \"high-level machine intelligence\" (HLMI). This was defined as:\r\n\r\n> When unaided machines can accomplish every task better and more cheaply than human workers. Ignore aspects of tasks for which being a human is intrinsically advantageous, e.g. being accepted as a jury member. *Think feasibility, not adoption.*\r\n\r\nIn the survey by Zhang et al., researchers were asked about \"human-level machine intelligence\" (HLMI), defined as:\r\n\r\n>Human-level machine intelligence (HLMI) is reached when machines are collectively able to perform almost all tasks (>90% of all tasks) that are economically relevant\\* better than the median human paid to do that task in 2019. You should ignore tasks that are legally or culturally restricted to humans, such as serving on a jury. \\*We define these tasks as all the ones included in the Occupational Information Network (O\\*NET) dataset. O\\*NET is a widely used dataset of tasks required for current occupations.\r\n\r\nThey were then asked:\r\n\r\n> Assume for the purpose of this question that HLMI will at some point exist. How positive or negative do you expect the overall impact of this to be for humanity, in the long run?\r\nPlease answer by saying how probable you find the following kinds of impact, with probabilities adding to 100%:\r\n>\r\n> * Extremely good (e.g., rapid growth in human flourishing) (2)\r\n> * On balance good (1)\r\n> * More or less neutral (0)\r\n> * On balance bad (-1)\r\n> * Extremely bad (e.g., human extinction) (-2)\r\n\r\nFor each survey, an aggregated cumulative density function of the probability of HLMI by year derived from mean or median estimates in the survey was calculated. These functions gave various aggregate chances of HLMI:\r\n\r\n\r\n* 50% by 2047 (Grace et al. (2024), mean estimates)\r\n* 50% by 2059 (Grace et al. (2022), mean estimates)\r\n* 65% by 2080 (Zhang et al. (2022), mean estimates)\r\n* 75% by 2080 (Zhang et al. (2022), median estimates)\r\n\r\n\r\nThis means that the answers we cite are similar to but not the same as answers to the question of \"Without assuming that HLMI will exist in the next century, how positive or negative do you expect the overall impact of HLMI to be for humanity in the next century?\" We look at more expert forecasts of AI timelines in the section on [when we can expect to develop transformative AI](#when-can-we-expect-to-develop-transformative-AI).\r\n[\/fn]\r\n\r\n[fn humanfailure]Specifically, Grace et al. (2022) asked participants:\r\n\r\n> What probability do you put on future AI advances causing human extinction or similarly permanent and severe disempowerment of the human species? \r\n\r\nThis is equivalent to the definition of [existential catastrophe](https:\/\/80000hours.org\/articles\/existential-risks\/) that we usually use, and is also similar to the definition of existential catastrophe given by Ord in [*The Precipice* (2020)](https:\/\/80000hours.org\/the-precipice\/):\r\n\r\n> An *existential catastrophe* is the destruction of humanity's long-term potential.\r\n\r\nOrd categorises existential risks as either risks of *extinction* or risks of *failed continuation* (Ord gives the example of a [stable totalitarian regime](https:\/\/80000hours.org\/problem-profiles\/risks-of-stable-totalitarianism\/)). We think that permanent and severe disempowerment of the human species would be a form of *failed continuation* under Ord's definition.\r\n\r\nStein-Perlman et al. next asked participants specifically about the [sorts of risks we're most concerned about](#power-seeking-ai):\r\n\r\n> What probability do you put on human inability to control future advanced AI systems causing human extinction or similarly permanent and severe disempowerment of the human species?\r\n\r\nThe median answer to this question was 10%.\r\n\r\nStein-Perlman notes:\r\n\r\n> This question is more specific and thus necessarily less probable than the previous question, but it was given a higher probability at the median. This could be due to noise \u2014 different random subsets of respondents received the questions, so there is no logical requirement that their answers cohere \u2014 or due to the [representativeness heuristic](https:\/\/en.wikipedia.org\/wiki\/Representativeness_heuristic). \r\n[\/fn]\r\n\r\n[fn clarkesurvey]A [2020 survey](https:\/\/web.archive.org\/web\/20221016004901\/https:\/\/www.alignmentforum.org\/posts\/WiXePTj7KeEycbiwK\/survey-on-ai-existential-risk-scenarios) asked researchers working on reducing existential risks from AI what risks they were most concerned about. The surveyors asked about five sources of existential risk:\r\n\r\n* Risks from superintelligent AI (similar to the scenario we've described [here](\/articles\/what-could-an-ai-caused-existential-catastrophe-actually-look-like\/#superintelligence))\r\n* Risks from influence-seeking behaviour\r\n* Risks from AI systems pursuing easy-to-measure goals (similar to the scenario we've described [here](\/articles\/what-could-an-ai-caused-existential-catastrophe-actually-look-like\/#getting-what-you-measure))\r\n* AI-exacerbated [war](#artificial-intelligence-and-war)\r\n* [Other](#other-risks) intentional misuse of AI not related to war\r\n\r\nApproximately, the researchers surveyed were equally concerned with all of these risks. The first three are covered by the section in this article on [risks from power-seeking AI](#power-seeking-ai) while the last two are covered by the section on [other risks](#other-risks). If these groupings make sense (which we think they do), this means it's roughly the case that at the time of the survey, researchers were three times as concerned about the broad risk of power-seeking AI than they were about risks from either war or other misuse separately.\r\n[\/fn]\r\n\r\n\r\n[fn dalle2] DALL-E 1's model used a [12 billion parameter](https:\/\/web.archive.org\/web\/20221016004944\/https:\/\/venturebeat.com\/dev\/openai-debuts-dall-e-for-generating-images-from-text\/) version of GPT-3, while DALL-E mini uses only [0.4 billion](https:\/\/wandb.ai\/dalle-mini\/dalle-mini\/reports\/DALL-E-Mini-Explained--Vmlldzo4NjIxODA#how-does-dall\u00b7e-mini-compare-to-openai-dall\u00b7e?). Interestingly, despite better results, DALL-E 2 was smaller than DALL-E 1, using a [3.5 billion parameter model](https:\/\/towardsdatascience.com\/dall-e-2-explained-the-promise-and-limitations-of-a-revolutionary-ai-3faf691be220#:~:text=At%203.5B%20parameters%2C%20DALL,in%20caption%20matching%20and%20photorealism.).[\/fn]\r\n\r\n[fn shakespearean]GPT-3 will output a different poem for this prompt every time it's run. We generated five short poems and picked the best.\r\n[\/fn]\r\n\r\n[fn cherrypicked]It's important to note that, when you look at outputs from systems like GPT-3 that people have shared online, these are often cherry-picked as standout examples of the system's best work. But that doesn't mean they're not impressive: the fact remains that GPT-3 produces outputs like these frequently enough that people actually can practically take the time to do the cherry-picking. And the performance of large language models like GPT-3 has only improved since its release in 2020 \u2014 we were particularly impressed by the outputs of [LaMDA](https:\/\/blog.google\/technology\/ai\/lamda\/), one of Google Brain's large language models, released in May 2022.[\/fn]\r\n\r\n\r\n[fn generalpurposetech]\r\n\r\nEconomists call technologies that affect the entirety of an economy [*general purpose technologies*](https:\/\/docs.google.com\/document\/d\/1I13_0o3kUe1AVQNfevOF9sHpc4mCQkuFDxOXFj_4g-I\/). We're effectively claiming here that AI could be a general purpose technology (like e.g. steam power or electricity).  {.doNotRemove}\r\n\r\nIt's not always easy to tell what might become a general purpose technology. For example, it took [200 years](https:\/\/doi.org\/10.1086\/ahr\/84.1.159) for steam power to be used for anything other than pumping water out of mines. \r\n\r\nDespite this uncertainty, economists increasingly think that AI is a pretty promising candidate for a general purpose technology, because it will have such a wide variety of effects.\r\n\r\nIt seems likely that [lots of jobs could be automated](https:\/\/web.archive.org\/web\/20221016005338\/https:\/\/www.technologyreview.com\/2018\/01\/25\/146020\/every-study-we-could-find-on-what-automation-will-do-to-jobs-in-one-chart\/). AI's ability to speed up the rate of development of new technology [could have significant implications for our economy](https:\/\/web.archive.org\/web\/20221013011707\/https:\/\/www.cold-takes.com\/transformative-ai-timelines-part-1-of-4-what-kind-of-ai\/), but also poses risks by potentially allowing the development of [dangerous new technology](#dangerous-new-technology).\r\n\r\nAI's effects on the economy could exacerbate inequality. Owners of AI-driven industries could become much richer than the rest of society \u2014 see e.g. [Artificial Intelligence and Its Implications for Income Distribution and Unemployment](https:\/\/dx.doi.org\/10.3386\/w24174) by Korinek and Stiglitz (2017):\r\n\r\n> Inequality is one of the main challenges posed by the proliferation of artificial intelligence (AI) and other forms of worker-replacing technological progress. This paper provides a taxonomy of the associated economic issues: First, we discuss the general conditions under which new technologies such as AI may lead to a Pareto improvement. Secondly, we delineate the two main channels through which inequality is affected \u2013 the surplus arising to innovators and redistributions arising from factor price changes. Third, we provide several simple economic models to describe how policy can counter these effects, even in the case of a \"singularity\" where machines come to dominate human labor. Under plausible conditions, non-distortionary taxation can be levied to compensate those who otherwise might lose. Fourth, we describe the two main channels through which technological progress may lead to technological unemployment \u2013 via efficiency wage effects and as a transitional phenomenon. Lastly, we speculate on how technologies to create super-human levels of intelligence may affect inequality and on how to save humanity from the Malthusian destiny that may ensue. \r\n\r\nAI systems are already having discriminatory impacts on marginalised groups. For example, [Sweeney (2013)](https:\/\/dx.doi.org\/10.1145\/2460276.2460278) found that two search engines disproportionately serve ads for arrest records when people search for racially associated names. And [Ali et al. (2019)](https:\/\/dx.doi.org\/10.1145\/3359301), on Facebook advertising: \r\n\r\n> It has been hypothesized that this process can \"skew\" ad delivery in ways that the advertisers do not intend, making some users less likely than others to see particular ads based on their demographic characteristics. In this paper, we demonstrate that such skewed delivery occurs on Facebook, due to market and financial optimization effects as well as the platform's own predictions about the \"relevance\" of ads to different groups of users. We find that both the advertiser's budget and the content of the ad each significantly contribute to the skew of Facebook's ad delivery. Critically, we observe significant skew in delivery along gender and racial lines for \"real\" ads for employment and housing opportunities despite neutral targeting parameters. \r\n\r\nWe're already able to produce simple [autonomous weapons](https:\/\/en.wikipedia.org\/wiki\/Lethal_autonomous_weapon), and as these weapons become more complex they're going to [completely change what war looks like](https:\/\/web.archive.org\/web\/20221016005551\/https:\/\/www.vox.com\/2019\/6\/21\/18691459\/killer-robots-lethal-autonomous-weapons-ai-war). As we'll argue later, [AI could even impact how nuclear weapons are used](#artificial-intelligence-and-war).\r\n\r\nFinally, politically, many have raised concerns that [automated social media algorithms are driving political polarisation](https:\/\/web.archive.org\/web\/20221016005607\/https:\/\/www.vox.com\/recode\/21534345\/polarization-election-social-media-filter-bubble). And some experts have warned that an increased ability to generate realistic videos and photos, or automating campaigns to influence people's opinions [could have a significant impact on politics](https:\/\/doi.org\/10.17863\/CAM.22520) over the coming years.\r\n\r\nNotable economists who hold the view that AI is likely to be a general purpose technology include Manuel Trajtenberg and Erik Brynjolfsson.\r\n\r\nIn [Artificial Intelligence as the Next GPT: A Political-Economy Perspective](https:\/\/dx.doi.org\/10.3386\/w24245) (2019), Trajtenberg writes:\r\n\r\n> Given that AI is poised to emerge as a powerful technological force, I discuss ways to mitigate the almost unavoidable ensuing disruption, and enhance AI's vast benign potential. This is particularly important in present times, in view of political-economic considerations that were mostly absent in previous historical episodes associated with the arrival of new GPTs. \r\n\r\nIn [Artificial Intelligence and the Modern Productivity Paradox: A Clash of Expectations and Statistics](https:\/\/ideas.repec.org\/h\/nbr\/nberch\/14007.html) (2018), Brynjolfsson writes:\r\n\r\n> As important as specific applications of AI may be, we argue that the more important economic effects of AI, machine learning, and associated new technologies stem from the fact that they embody the characteristics of general purpose technologies (GPTs). \r\n\r\n[\/fn]\r\n\r\n[fn transformative]\r\n\r\nThere are a few different definitions used in this section for \"transformative AI,\" but we think the differences aren't very important when it comes to interpreting predictions of AI progress. The definitions are:\r\n\r\n* [Karnofsky (2021)](https:\/\/web.archive.org\/web\/20221013013107\/https:\/\/www.cold-takes.com\/where-ai-forecasting-stands-today\/) uses \"AI powerful enough to bring us into a new, qualitatively different future.\" (Or [as he put it in 2016](https:\/\/web.archive.org\/web\/20221016005924\/https:\/\/www.openphilanthropy.org\/research\/some-background-on-our-views-regarding-advanced-artificial-intelligence\/), \"roughly and conceptually, transformative AI is AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution.\")\r\n* [Cotra (2020)](https:\/\/docs.google.com\/document\/d\/1IJ6Sr-gPeXdSJugFulwIpvavc0atjHGM82QjIfUSBGQ\/edit#heading=h.6t4rel10jbcj) uses a similar definition. In addition, Cotra writes: \"How large is an impact \"as profound as the Industrial Revolution\"? Roughly speaking, over the course of the Industrial Revolution, the rate of growth in gross world product (GWP) went from about ~0.1% per year before 1700 to ~1% per year after 1850, a tenfold acceleration. By analogy, I think of \"transformative AI\" as software which causes a tenfold acceleration in the rate of growth of the world economy (assuming that it is used everywhere that it would be economically profitable to use it).\"\r\n* [Davidson (2021)](https:\/\/web.archive.org\/web\/20221013013059\/https:\/\/www.openphilanthropy.org\/research\/report-on-semi-informative-priors\/) predicts timelines to \"artificial general intelligence (AGI)\" rather than transformative AI. He defines AGI as \"computer program(s) that can perform virtually any cognitive task as well as any human, for no more money than it would cost for a human to do it.\" Notably, this seems sufficient (but not necessary) to reach the sorts of rapid economic changes implied by the previous two definitions.\r\n[\/fn]\r\n\r\n[fn 2016timelines]These are similar to implied forecasts from the other surveys:\r\n\r\n* [2022 survey by Zhang et al.](https:\/\/doi.org\/10.48550\/arXiv.2206.04132): 20% probability of human-level machine intelligence (which would plausibly be transformative in this sense) by 2036, 50% probability by 2060, and 85% by 2100\r\n* [2022 survey by Grace et al.](https:\/\/web.archive.org\/web\/20221016004611\/https:\/\/aiimpacts.org\/2022-expert-survey-on-progress-in-ai\/): approximately 50% by 2059\r\n* [2016 survey by Grace et al.](https:\/\/doi.org\/10.1613\/jair.1.11222): approximately 25% by 2036, 50% by 2060, and 70% by 2100[\/fn]\r\n\r\n[fn cotravolatile]Importantly, Cotra notes that:\r\n\r\n> I expect these numbers to be pretty volatile too, and (as I did when writing bio anchors) I find it pretty fraught and stressful to decide on how to weigh various perspectives and considerations. I wouldn't be surprised by significant movements\u2026 I'm unclear how decision-relevant bouncing around within the range I've been bouncing around is.\r\n\r\n[\/fn]\r\n\r\n[fn carlsmith]These properties come from Carlsmith's [draft report into existential risks from AI](https:\/\/doi.org\/10.48550\/arXiv.2206.13353), Section 2.1: Three key properties. [\/fn]\r\n\r\n[fn necessary]That's not to say that it's *necessary* for AIs to be able to plan in order for them to be useful. Many things that AI could be useful for (like illustrating books or writing articles) don't seem to require planning or strategic awareness at all. But it does seem reasonable to say that an AI that could make and execute plans for a goal is more likely to have a significant impact on the world than one that cannot.[\/fn]\r\n\r\n[fn muzeroplanning]DeepMind, the developers of MuZero, [write](https:\/\/web.archive.org\/web\/20221013011105\/https:\/\/www.deepmind.com\/blog\/muzero-mastering-go-chess-shogi-and-atari-without-rules):\r\n> For many years, researchers have sought methods that can both learn a model that explains their environment, and can then use that model to plan the best course of action. Until now, most approaches have struggled to plan effectively in domains, such as Atari, where the rules or dynamics are typically unknown and complex.\r\n\r\n> MuZero, first introduced in a preliminary paper in 2019, solves this problem by learning a model that focuses only on the most important aspects of the environment for planning. By combining this model with AlphaZero's powerful lookahead tree search, MuZero set a new state of the art result on the Atari benchmark, while simultaneously matching the performance of AlphaZero in the classic planning challenges of Go, chess and shogi. In doing so, MuZero demonstrates a significant leap forward in the capabilities of reinforcement learning algorithms.\r\n[\/fn]\r\n\r\n[fn Jaderberg]For example, [Jaderberg et al.](https:\/\/web.archive.org\/web\/20221016010137\/https:\/\/www.deepmind.com\/blog\/capture-the-flag-the-emergence-of-complex-cooperative-agents) developed deep [reinforcement learning](https:\/\/en.wikipedia.org\/wiki\/Reinforcement_learning) agents to play games of Quake III Capture The Flag \u2014 and identified \"particular neurons that code directly for some of the most important game states, such as a neuron that activates when the agent's flag is taken\" \u2014 indicating they can identify states of the game that they value the most (and then plan and act to achieve those states). This sounds pretty similar to \"having goals\" to us.[\/fn]\r\n\r\n[fn otherreasons][Carlsmith](https:\/\/doi.org\/10.48550\/arXiv.2206.13353) section 3 gives two other reasons why we might expect these kinds of advanced, strategically aware planning systems to be built:\r\n\r\n* It may be *easier* to produce these kinds of systems. For example, the best way to automate many tasks may be to create systems that can learn new tasks (instead of separately automating each task). And perhaps the best way to create systems that can learn new tasks is to create a planning system that has a high level understanding of how the world in general works, and then fine-tuning this system on specific tasks.\r\n* We may find that planning is difficult to avoid as we create more sophisticated systems. For example, [some have argued](https:\/\/arbital.com\/p\/consequentialist\/) that being an excellent planner (and having the advanced capabilities to carry out any plans created) is the best way of achieving *any* task. If that's true, then as we optimise our systems we should expect them to (once we've optimised hard enough) become good at planning.\r\n[\/fn]\r\n\r\n[fn ballgrasp]Looking at the animation, it doesn't seem that plausible that the system really fooled any humans. We're not quite sure what's going on here (it's not discussed in the [original paper](https:\/\/doi.org\/10.48550\/arXiv.1706.03741)), but one possibility is that the animation is showing the deployed system's attempts to grasp the ball, rather than the data used to train the system.[\/fn]\r\n\r\n[fn incentives]For a fuller discussion of the incentives to deploy potentially misaligned AI, see section 5 of Carlsmith's [draft report into existential risks from AI](https:\/\/doi.org\/10.48550\/arXiv.2206.13353).[\/fn]\r\n\r\n[fn laws]Lethal autonomous weapons [already exist](https:\/\/web.archive.org\/web\/20221016010507\/https:\/\/foreignpolicy.com\/2022\/05\/11\/killer-robots-lethal-autonomous-weapons-systems-ukraine-libya-regulation\/).\r\n\r\nFor more information, see:\r\n\r\n* [Risks from Autonomous Weapon Systems and Military AI](https:\/\/web.archive.org\/web\/20221016010520\/https:\/\/forum.effectivealtruism.org\/posts\/RKMNZn7r6cT2Yaorf\/risks-from-autonomous-weapon-systems-and-military-ai), an overview of attempts to reduce risks from lethal autonomous weapons.\r\n* [On AI Weapons](https:\/\/web.archive.org\/web\/20221016010522\/https:\/\/forum.effectivealtruism.org\/posts\/vdqBn65Qaw77MpqXz\/on-ai-weapons), a presentation of the argument that lethal autonomous weapons are, on balance, more good than bad.[\/fn]\r\n\r\n[fn adm]If humans leave the loop for some military decision-making, we could see unintentional military escalation. And even if humans do remain in the loop, we could see faster and more complex decision-making, increasing the chances of mistakes or high-risk decisions.\r\n\r\nFor more information, see:\r\n\r\n* [Machine learning, artificial intelligence, and the use of force by states](https:\/\/heinonline.org\/HOL\/LandingPage?handle=hein.journals\/jnatselp10&div=5&id=&page=), by Deeks et al. (2019).\r\n* [AI and International Stability: Risks and Confidence-Building Measures](https:\/\/web.archive.org\/web\/20221016010718\/https:\/\/www.cnas.org\/publications\/reports\/ai-and-international-stability-risks-and-confidence-building-measures), by Horowitz and Scharre (2021).\r\n\r\n[\/fn]\r\n\r\n[fn slbms]\r\n\r\n[\/fn]\r\n\r\n[fn progress]We already have some automated research assistance (for example [Elicit](https:\/\/elicit.org)). If AI systems replace some jobs, or speed up economic growth, we'll see more resources able to be dedicated to scientific advancement. And if we're successful at developing particularly capable AI systems, we could see [parts of the scientific process being automated completely](https:\/\/web.archive.org\/web\/20221013011707\/https:\/\/www.cold-takes.com\/transformative-ai-timelines-part-1-of-4-what-kind-of-ai\/). [\/fn]\r\n\r\n[fn aipathogens] \r\n[Urbina et al. (2022)](https:\/\/web.archive.org\/web\/20220719201542\/https:\/\/climate-science.press\/wp-content\/uploads\/2022\/03\/00s42256-022-00465-9.pdf) developed a computational proof that existing AI technologies for drug discovery could be misused to design biochemical weapons.\r\n\r\nAlso see:\r\n\r\n[O'Brien and Nelson (2020)](https:\/\/dx.doi.org\/10.1089\/hs.2019.0122):\r\n\r\n> Within the realm of synthetic biology, AI could potentially lower some of the barriers for a malicious actor to design dangerous pathogens with custom features.\r\n\r\n[Turchin and Denkenberger (2020)](https:\/\/dx.doi.org\/10.1007\/s00146-018-0845-5), section 3.2.3.[\/fn]\r\n\r\n\r\n[fn surveillance]AI is already facilitating the ability of governments to monitor their own citizens. {.doNotRemove}\r\n\r\nThe NSA is using AI [to help filter the huge amounts of data they collect](https:\/\/web.archive.org\/web\/20221016011132\/https:\/\/www.defenseone.com\/technology\/2020\/01\/spies-ai-future-artificial-intelligence-us-intelligence-community\/162673\/), significantly speeding up their ability to identify and predict the actions of people they are monitoring. China is increasingly using facial recognition and predictive policing, including [automated racial profiling](https:\/\/web.archive.org\/web\/20221016011138\/https:\/\/www.nytimes.com\/2019\/05\/22\/world\/asia\/china-surveillance-xinjiang.html) and automatic alarms when people classified as potential threats enter certain public places.\r\n\r\nThese sorts of surveillance technologies look like they are going to significantly improve \u2014 and in doing so, significantly increase the ability for governments to control their populations.[\/fn]\r\n\r\n[fn reviews]Reviewers were asked to [critique Carlsmith's report](https:\/\/web.archive.org\/web\/20221016011150\/https:\/\/forum.effectivealtruism.org\/posts\/GRv3KB2nPFRREXb5o\/reviews-of-is-power-seeking-ai-an-existential-risk) and give their own estimates of the existential risk from power-seeking AI. The estimates given of existential risk from power-seeking AI by 2070 were: Aschenbrenner: 0.5%, Garfinkel: 0.4%, Kokotajlo: 65%, Nanda: 9%, Soares: >77%, Tarsney: 3.5%, Thorstad: 0.000002%, Wallace: 2%.[\/fn]\r\n\r\n[fn bensinger]Around 117 researchers were asked:\r\n\r\n> How likely do you think it is that the overall value of the future will be drastically less than it could have been, as a result of AI systems not doing\/optimizing what the people deploying them wanted\/intended?\r\n\r\nResearchers from OpenAI, the Future of Humanity Institute (University of Oxford), the Center for Human-Compatible AI (UC Berkeley), Machine Intelligence Research Institute, Open Philanthropy, and DeepMind were asked to fill in the survey.\r\n\r\n44 people responded (~38% response rate).\r\n\r\nThe mean of the estimates given was 40%.[\/fn]\r\n\r\n[fn objections]These objections are adapted from section 4.2 of Carlsmith's [draft report into existential risks from AI](https:\/\/doi.org\/10.48550\/arXiv.2206.13353).[\/fn]\r\n\r\n[fn garfinkelmonetary]In cases where people are willing to use systems that they think have (e.g.) a 10% chance of immediately killing them, security concerns (like trying to preempt deployment of transformative AI by others) or perhaps moral\/idealistic concerns could play larger roles than desire for wealth. On the other hand, monetary incentives do seem to be a substantial current driver for research into AI capabilities. We might also expect monetary incentives to encourage motivated reasoning about the size of the risk from AI systems.\r\n[\/fn]\r\n\r\n[fn controllingobjectives]For a detailed overview of how easy or hard it might be to successfully control the objectives of ML systems, see section 4.3.1 of Carlsmith's [draft report into existential risks from AI](https:\/\/doi.org\/10.48550\/arXiv.2206.13353). Or, for one possible story about how a deceptive ML system could end up being developed, see [Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover](https:\/\/web.archive.org\/web\/20221013014109\/https:\/\/www.alignmentforum.org\/posts\/pRkFkzwKZ2zfa3R6H\/without-specific-countermeasures-the-easiest-path-to) by Cotra.[\/fn]\r\n\r\n[fn gdp] World GDP in 2020 was 84.75 USD, [according to the World Bank](https:\/\/web.archive.org\/web\/20221016011333\/https:\/\/data.worldbank.org\/indicator\/NY.GDP.MKTP.CD). We've assumed growth of 2% per year \u2014 see [here](https:\/\/www.cold-takes.com\/this-cant-go-on\/#fn5) for an explanation of why, and [here](https:\/\/web.archive.org\/save\/https:\/\/www.cold-takes.com\/more-on-multiple-world-size-economies-per-atom\/) for further discussion of what such a huge GDP could actually mean.[\/fn]\r\n\r\n[fn neglectednessupdate]\r\nNote that before 19 December 2022, this page gave a lower estimate of 300 FTE working on reducing existential risks from AI, of which around two thirds were working on technical AI safety research, with the rest split between strategy (and other governance) research and advocacy.\r\n\r\nThis change represents a (hopefully!) improved estimate, rather than a notable change in the number of researchers. [\/fn]\r\n\r\n[fn neglectednessestimate]\r\nIt's difficult to estimate this number.\r\n\r\nIdeally we want to estimate the number of FTE (\"[full-time equivalent](https:\/\/en.wikipedia.org\/wiki\/Full-time_equivalent)\") working on the problem of reducing existential risks from AI.\r\n\r\nBut there are lots of ambiguities around what counts as working on the issue. So I tried to use the following guidelines in my estimates:\r\n\r\n* I didn't include people who might think of themselves on a career path that is building towards a role preventing an AI-related catastrophe, but who are currently skilling up rather than working directly on the problem.\r\n* I included researchers, engineers, and other staff that seem to work directly on technical AI safety research or AI strategy and governance. But there's an uncertain boundary between these people and others who I chose not to include. For example, I didn't include machine learning engineers whose role is building AI systems that might be used for safety research but aren't *primarily* designed for that purpose.\r\n* I only included time spent on work that seems related to reducing the potentially [existential risks](https:\/\/80000hours.org\/articles\/existential-risks\/) from AI, like those discussed in this article. Lots of wider AI safety and AI ethics work focuses on reducing other risks from AI seems relevant to reducing existential risks \u2013 this 'indirect' work makes this estimate difficult. I decided not to include indirect work on reducing the risks of an AI-related catastrophe (see our [problem framework](https:\/\/80000hours.org\/articles\/problem-framework\/#a-challenge-direct-vs-indirect-future-effort) for more).\r\n* Relatedly, I didn't include people working on other problems that might indirectly affect the chances of an AI-related catastrophe, such as [epistemics and improving institutional decision-making](https:\/\/80000hours.org\/problem-profiles\/improving-institutional-decision-making\/), reducing the chances of [great power conflict](https:\/\/80000hours.org\/problem-profiles\/great-power-conflict\/), or [building effective altruism](https:\/\/80000hours.org\/problem-profiles\/promoting-effective-altruism\/). \r\n\r\nWith those decisions made, I estimated this in three different ways.\r\n\r\nFirst, for each organisation in the [AI Watch](https:\/\/aiwatch.issarice.com\/) database, I estimated the number of FTE working directly on reducing existential risks from AI. I did this by looking at the number of staff listed at each organisation, both in total and in 2022, as well as the number of researchers listed at each organisation. Overall I estimated that there were 76 to 536 FTE working on technical AI safety (90% confidence), with a mean of 196 FTE. I estimated that there were 51 to 359 FTE working on AI governance and strategy (90% confidence), with a mean of 151 FTE. There's a lot of subjective judgement in these estimates because of the ambiguities above. The estimates could be too low if AI Watch is missing data on some organisations, or too high if the data counts people more than once or includes people who no longer work in the area. \r\n\r\nSecond, I adapted the methodology used by [Gavin Leech's estimate of the number of people working on reducing existential risks from AI](https:\/\/forum.effectivealtruism.org\/posts\/8ErtxW7FRPGMtDqJy\/the-academic-contribution-to-ai-safety-seems-large). I split the organisations in Leech's estimate into technical safety and governance\/strategy. I adapted Gavin's figures for the proportion of computer science academic work relevant to the topic to fit my definitions above, and made a related estimate for work outside computer science but within academia that is relevant. Overall I estimated that there were 125 to 1,848 FTE working on technical AI safety (90% confidence), with a mean of 580 FTE. I estimated that there were 48 to 268 FTE working on AI governance and strategy (90% confidence), with a mean of 100 FTE.\r\n\r\nThird, I looked at the estimates of similar numbers by [Stephen McAleese](https:\/\/forum.effectivealtruism.org\/posts\/3gmkrj3khJHndYGNe\/estimating-the-current-and-future-number-of-ai-safety). I made minor changes to McAleese's categorisation of organisations, to ensure the numbers were consistent with the previous two estimates. Overall I estimated that there were 110 to 552 FTE working on technical AI safety (90% confidence), with a mean of 267 FTE. I estimated that there were 36 to 193 FTE working on AI governance and strategy (90% confidence), with a mean of 81 FTE.\r\n\r\nI took a geometric mean of the three estimates to form a final estimate, and combined confidence intervals by assuming that distributions were approximately lognormal.\r\n\r\nFinally, I estimated the number of FTE in [complementary roles](#complementary-yet-crucial-roles) using the AI Watch database. For relevant organisations, I identified those where there was enough data listed about the number of *researchers* at those organisations. I calculated the ratio between the number of researchers in 2022 and the number of staff in 2022, as recorded in the database. I calculated the mean of those ratios, and a confidence interval using the standard deviation. I used this ratio to calculate the overall number of support staff by assuming that estimates of the number of staff are lognormally distributed and that the estimate of this ratio is normally distributed. Overall I estimated that there were 2 to 2,357 FTE in complementary roles (90% confidence), with a mean of 770 FTE.\r\n\r\nThere are likely many errors in this methodology, but I expect these errors are small compared to the uncertainty in the underlying data I'm using. Ultimately, I'm still highly uncertain about the overall FTE working on preventing an AI-related catastrophe, but I'm confident enough that the number is relatively small to say that the problem as a whole is highly neglected.\r\n\r\nI'm very uncertain about this estimate. It involved a number of highly subjective judgement calls. You can see the (very rough) spreadsheet I worked off [here](https:\/\/docs.google.com\/spreadsheets\/d\/1e1Vh_nK_7VHKZUuQ9VNp3JWC2etjUAHVmVXbKarKMNw\/edit). If you have any feedback, I'd really appreciate it if you could tell me what you think using [this form](https:\/\/forms.gle\/RRZaFTfdDkSQ6fJG8).\r\n[\/fn]\r\n\r\n[fn capabilitiesspending]It's difficult to say exactly how much is being spent to advance AI capabilities. This is partly because of a lack of available data, and partly because of questions like:\r\n\r\n* What research in AI is actually advancing the sorts of dangerous capabilities that might be increasing potential existential risk?\r\n* Do advances in AI hardware or advances in data collection count?\r\n* How about broader improvements to research processes in general, or things that might increase investment in the future through producing economic growth?\r\n\r\nThe most relevant figure we could find was the expenses of DeepMind from 2020, which were around \u00a31 billion, [according to its annual report](https:\/\/web.archive.org\/web\/20221016011531\/https:\/\/find-and-update.company-information.service.gov.uk\/company\/07386350\/filing-history). We'd expect most of that to be contributing to \"advancing AI capabilities\" in some sense, since its main goal is building powerful, general AI systems. (Although it's important to note that DeepMind is also contributing to work in AI safety, which may be reducing existential risk.)\r\n\r\nIf DeepMind is around about 10% of the spending on advancing AI capabilities, this gives us a figure of around \u00a310 billion. (Given that there are many AI companies in the US, and a large effort to produce advanced AI in China, we think 10% could be a good overall guess.)\r\n\r\nAs an upper bound, the total revenues of the AI sector in 2021 were [around $340 billion](https:\/\/web.archive.org\/web\/20221016011608\/https:\/\/www.idc.com\/getdoc.jsp?containerId=prUS48127321).\r\n\r\nSo overall, we think the amount being spent to advance AI capabilities is between $1 billion and $340 billion per year. Even assuming a figure as low as $1 billion, this would still be around 100 times the amount spent on reducing risks from AI.[\/fn]\r\n\r\n\r\n[fn misalignmentdfns]There are various definitions of *alignment* used in the literature, which differ subtly. These include:\r\n\r\n* An AI is aligned if its decisions maximise the utility of some principal (e.g. an operator or user) ([Shapiro & Shachter, 2002](https:\/\/web.archive.org\/web\/20221016011851\/https:\/\/www.aaai.org\/Papers\/Symposia\/Spring\/2002\/SS-02-07\/SS02-07-002.pdf)).\r\n* An AI is aligned if it acts in the interests of humans ([Soares & Fallenstein, 2015](https:\/\/web.archive.org\/web\/20210413005225\/https:\/\/intelligence.org\/files\/obsolete\/TechnicalAgenda%5Bold%5D.pdf)).\r\n* An AI is \"intent aligned\" if it is trying to do what its operator wants it to do ([Christiano, 2018](https:\/\/ai-alignment.com\/clarifying-ai-alignment-cec47cd69dd6)).\r\n* An AI is \"impact aligned\" (with humans) if it doesn't take actions that we would judge to be bad\/problematic\/dangerous\/catastrophic, and \"intent aligned\" if the optimal policy for its behavioural objective is impact aligned with humans ([Hubinger, 2020](https:\/\/www.alignmentforum.org\/posts\/SzecSPYxqRa5GCaSF\/clarifying-inner-alignment-terminology)).\r\n* An AI is \"intent aligned\" if it is trying to do, or \"impact aligned\" if it is succeeding in doing what a human person or institution wants it to do ([Critch, 2020](https:\/\/web.archive.org\/web\/20221016012022\/https:\/\/www.lesswrong.com\/posts\/hvGoYXi2kgnS3vxqb\/some-ai-research-areas-and-their-relevance-to-existential-1)).\r\n* An AI is \"fully aligned\" if it does not engage in unintended behaviour (specifically, unintended behaviour that arises in virtue of problems with the system's objectives) in response to any inputs compatible with basic physical conditions of our universe ([Carlsmith, 2022](https:\/\/doi.org\/10.48550\/arXiv.2206.13353)).\r\n\r\nThe term \"aligned\" is also often used to refer to the *goals* of a system, in the sense that an AI's goals are aligned if they will produce the same actions from the AI that would occur if the AI shared the goals of some other entity (e.g. its user or operator).\r\n\r\nWe use alignment here to refer to systems, rather than goals. Our definition is most similar to the definitions of \"intent\" alignment given by Christiano and Critch, and is similar to the definition of \"full\" alignment given by Carlsmith.\r\n[\/fn]\r\n\r\n[fn 1]We think it's likely to be very difficult to control the objectives of modern ML systems, for a number of reasons that we'll go through [later](#controlling-objectives). This has two implications:\r\n\r\n1. It's hard to ensure that systems are trying to do what we want them to do, which means it's hard to make systems aligned.\r\n\r\n2. It's hard to correct systems when we think that problems with their objectives could have particularly bad consequences.\r\n\r\nAs we'll argue, we think problems with AI systems' objectives could have particularly bad consequences.\r\n\r\nAjeya Cotra, a researcher at Open Philanthropy has written about why we might expect AI alignment to be hard with modern deep learning. We'd recommend [this post](https:\/\/web.archive.org\/web\/20221013022057\/https:\/\/www.cold-takes.com\/why-ai-alignment-could-be-hard-with-modern-deep-learning\/) for people new to ML, and [this](https:\/\/web.archive.org\/web\/20221013014109\/https:\/\/www.alignmentforum.org\/posts\/pRkFkzwKZ2zfa3R6H\/without-specific-countermeasures-the-easiest-path-to) for those more familiar with ML.\r\n[\/fn]\r\n\r\n[fn 2]Gaining enforced power or influence over others generally seems bad, and we're going to take that as given for the rest of this argument. Indeed, we think some forms of taking power away from humanity could even constitute an existential catastrophe, which we discuss further [later](#instrumental-convergence). However, we should note that this doesn't seem *fundamentally* true of all cases where things gain power, because in some cases power can be used to produce good outcomes (e.g. often people attempting to do good in the world will try to win elections). With AI systems, as we'll argue, we're really not sure how to ensure those outcomes would be good.[\/fn]\r\n\r\n[fn dangerous]\r\nIn the two human examples given in this section (politicians and companies), the negative effects of misalignment are tempered somewhat. This is for two reasons: \r\n\r\n1. Neither companies nor politicians have absolute power.\r\n2. We are talking about humans, whose true incentives are actually more complex (for example, they might care about acting ethically and not just achieving their specified goal). \r\n\r\nAs a result, it's hard for a set of politicians to turn things completely upside down for votes, some politicians will put in place unpopular policies they think will make things better, and some companies will do things like donate a portion of their profits to charity.\r\n\r\n(Of course, it's arguable whether companies' charitable donations are truly hurting their profits, and if they'd make them if they were \u2014 it's possible that they get enough good press from work like this that it actually makes them money. But there are definitely examples where this is much harder to argue. For example, some [meat and dairy farmers are selling their animals and concentrating on growing plants instead](https:\/\/web.archive.org\/save\/https:\/\/plantbasednews.org\/culture\/five-times-dairy-farmers-went-vegan\/) because of concerns about the moral value of animals.)\r\n\r\nMisaligned AI systems (especially those with advanced capabilities, doing things more than moving around a simulated robot arm) won't necessarily have these tempering human instincts, and could have *a lot* more power.\r\n[\/fn]\r\n\r\n[fn clarke]This distinction taken from [Sam Clarke's overview of AI governance](https:\/\/web.archive.org\/web\/20221016012047\/https:\/\/forum.effectivealtruism.org\/posts\/ydpo7LcJWhrr2GJrx\/the-longtermist-ai-governance-landscape-a-basic-overview).[\/fn]\r\n\r\n[fn carlsmithmisalignment]These arguments are adapted from section 4.3 (\"The challenge of practical PS-alignment\") of Carlsmith's [report into existential risks from power-seeking AI](https:\/\/doi.org\/10.48550\/arXiv.2206.13353).[\/fn]\r\n\r\n[fn carlsmithproxies]See section 4.3.1.1 (\"Problems with proxies\") of Carlsmith's [report into existential risks from power-seeking AI](https:\/\/doi.org\/10.48550\/arXiv.2206.13353).[\/fn]\r\n\r\n[fn carlsmithsearch]See section 4.3.1.2 (\"Problems with search\") of Carlsmith's [report into existential risks from power-seeking AI](https:\/\/doi.org\/10.48550\/arXiv.2206.13353).[\/fn]\r\n\r\n[fn badfuture]That AI systems choose to disempower humanity (presumably in order to prevent us from interfering with their plans) is evidence that we would, if we hadn't been disempowered, have chosen to interfere with the systems' plans. As a result, this disempowerment is some evidence that we won't like the future that these systems would create.[\/fn]\r\n\r\n[fn lhc]For a suggestion of what this might look like, consider the fears that arose during the construction of the Large Hadron Collider.\r\n\r\nA group of researchers convened to explore whether the heavy-ion collisions could produce negatively charged strangelets and black holes \u2014 potentially posing a threat to the whole planet. They [concluded](https:\/\/web.archive.org\/web\/20080907004852\/http:\/\/doc.cern.ch\/yellowrep\/2003\/2003-001\/p1.pdf) there was \"no basis for any conceivable threat\" \u2014 but it's possible they might have found otherwise, and it's possible future experiments in physics could pose extreme risks.\r\n\r\nA related example is the risk considered by researchers at Los Alamos in 1942 that the first nuclear weapon test could [ignite the whole atmosphere](https:\/\/www.bbc.com\/future\/article\/20230907-the-fear-of-a-nuclear-fire-that-would-consume-earth) of the Earth in an unstoppable chain reaction.[\/fn]\r\n\r\n[fn randreport]A 2023 [report](https:\/\/www.rand.org\/pubs\/research_reports\/RRA2977-1.html) from the research organisation Rand noted: \"Previous biological attacks that failed because of a lack of information might succeed in a world in which AI tools have access to all of the information needed to bridge that information gap.\"\r\n\r\nBut in January 2024, Rand published a follow-up [study](https:\/\/www.rand.org\/news\/press\/2024\/01\/25.html), which found that the current generation of large language models do not meaningfully increase the risk of biological attacks. \r\n\r\nHowever, future systems *could* increase the danger without adequate safeguards. The researchers explained:\r\n\r\n>Because LLMs are increasingly capable and available, it's important to monitor their evolution to ensure they are safe and secure from potential misuse, according to the report. Accurate risk assessment models, such as the methodology developed for this research, can be used to help evaluate these technologies and inform the discussion of effective regulatory frameworks.[\/fn]\r\n\r\n[fn bioexperts]Experts in the field of biotechnology disagree about how plausible such scenarios are. For different views on this and other controversies in biosecurity, you can read [an article we wrote](\/articles\/anonymous-misconceptions-about-biosecurity\/) compiling a range of expert views on the topic.[\/fn]\r\n\r\n[fn rogueaiagents]For more discussion of this possibility, see: Hendrycks, Dan, Mantas Mazeika, and Thomas Woodside. [\"An overview of catastrophic AI risks.\"](https:\/\/arxiv.org\/abs\/2306.12001) arXiv preprint arXiv:2306.12001 (2023).[\/fn]\r\n\r\n[fn sandbrink] For more discussion of this, see: Sandbrink, Jonas B. [\"Artificial intelligence and biological misuse: Differentiating risks of language models and biological design tools.\"](https:\/\/arxiv.org\/abs\/2306.13952) arXiv preprint arXiv:2306.13952 (2023).[\/fn]\r\n\r\n[fn otherharms]This list is not exhaustive. And there are likely many other policy approaches that would be worthwhile and justified to pursue, but that would not be targeted at reducing the biggest risks. \r\n\r\nWe don't include them here, because this article is about preventing existential risks in particular. But we also support policies that would reduce other harms from AI, and we think that many of the policies in the list could reduce both existential risks and other harms.[\/fn]\r\n\r\n[fn bernardi] For more information, see: Bernardi, Jamie, et al. \"Societal Adaptation to Advanced AI.\" arXiv preprint arXiv:2405.10295 (2024).[\/fn]\r\n\r\n[fn safetycases] See, for instance: Bishop, P. G. & Bloomfield, R. E. (1998). [A Methodology for Safety Case Development](https:\/\/openaccess.city.ac.uk\/id\/eprint\/549\/). In: Redmill, F. & Anderson, T. (Eds.), Industrial Perspectives of Safety-critical Systems: Proceedings of the Sixth Safety-critical Systems Symposium, Birmingham 1998. . London, UK: Springer. ISBN 3540761896 [\/fn]\r\n\r\n[fn surveybio]\r\nThis is the same survey we saw [earlier](#experts-are-concerned) which asked about the [overall chances of extinction from AI](#experts-are-concerned) and [when transformative AI might be developed](#when-can-we-expect-to-develop-transformative-AI). \r\n\r\n[Grace et al. (2024)](https:\/\/arxiv.org\/abs\/2401.02843v1) asked 1,345 of the 2,778 respondents (researchers who published at NeurIPS, IMCL, or four other top AI venues) about potentially concerning AI scenarios. (Participants were randomly allocated questions on only one of several topics to keep the survey brief, with questions being allocated to more participants based on factors like the question's importance and how useful it would be to have a large sample size.)\r\n\r\nThey were asked about the following eleven scenarios:\r\n\r\n> * A powerful AI system has its goals not set right, causing a catastrophe (e.g. it develops and uses powerful weapons)\r\n> * AI lets dangerous groups make powerful tools (e.g. engineered viruses)\r\n> * AI makes it easy to spread false information, e.g. deepfakes\r\n> * AI systems manipulate large-scale public opinion trends\r\n> * AI systems with the wrong goals become very powerful and reduce the role of humans in making decisions\r\n> * AI systems worsen economic inequality by disproportionately benefiting certain institutions\r\n> * Authoritarian rulers use AI to control their population\r\n> * Bias in AI systems makes unjust situations worse, e.g. AI systems learn to discriminate by gender or race in hiring processes\r\n> * Near-full automation of labor leaves most people economically powerless\r\n> * Near-full automation of labor makes people struggle to find meaning in their lives.\r\n> * People interact with other humans less because they are spending more time interacting with AI systems\r\n\r\nFor each scenario, the participants were asked whether it constituted \"no concern,\" \"a little concern,\" \"substantial concern,\" or \"extreme concern\".\r\n\r\nGrace et al. found:\r\n\r\n> Each scenario was considered worthy of either substantial or extreme concern by more than 30% of respondents. As measured by the percentage of respondents who thought a scenario constituted either a \"substantial\" or \"extreme\" concern, the scenarios worthy of most concern were: spread of false information e.g. deepfakes (86%), manipulation of large-scale public opinion trends (79%), AI letting dangerous groups make powerful tools (e.g. engineered viruses) (73%), authoritarian rulers using AI to control their populations (73%), and AI systems worsening economic inequality by disproportionately benefiting certain individuals (71%).\r\n> \r\n> There is some ambiguity about the reason why a scenario might be considered concerning: it might be considered especially disastrous, or especially likely, or both. From our results, there's no way to disambiguate these considerations.\r\n\r\nNo equivalent questions were asked on earlier surveys.\r\n[\/fn]\r\n"},"categories":[1182,1181,1320,368,1183,1321,1240,1241],"class_list":["post-77853","problem_profile","type-problem_profile","status-publish","format-standard","has-post-thumbnail","hentry","category-technical-ai-safety-research","category-artificial-intelligence","category-computer-science","category-existential-risk","category-long-term-ai-policy","category-machine-learning","category-promising-interventions","category-top-recommended-organisations"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Preventing an AI-related catastrophe - 80,000 Hours<\/title>\n<meta name=\"description\" content=\"The future of AI is difficult to predict. But while AI systems could have substantial positive effects, there&#039;s a growing consensus about the dangers of AI.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Preventing an AI-related catastrophe - Problem profile\" \/>\n<meta property=\"og:description\" content=\"Why do we think that reducing risks from AI is one of the most pressing issues of our time? There are technical safety issues that we believe could, in the worst case, lead to an existential threat to humanity.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/\" \/>\n<meta property=\"og:site_name\" content=\"80,000 Hours\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/80000Hours\" \/>\n<meta property=\"article:modified_time\" content=\"2024-11-28T12:39:07+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/80000hours.org\/wp-content\/uploads\/2022\/08\/panels-photo.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2100\" \/>\n\t<meta property=\"og:image:height\" content=\"1200\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Preventing an AI-related catastrophe - Problem profile\" \/>\n<meta name=\"twitter:description\" content=\"Why do we think that reducing risks from AI is one of the most pressing issues of our time?\" \/>\n<meta name=\"twitter:site\" content=\"@80000hours\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"97 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/\",\"url\":\"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/\",\"name\":\"Preventing an AI-related catastrophe - 80,000 Hours\",\"isPartOf\":{\"@id\":\"https:\/\/80000hours.org\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/80000hours.org\/wp-content\/uploads\/2022\/08\/panels-photo.jpg\",\"datePublished\":\"2022-08-25T19:43:58+00:00\",\"dateModified\":\"2024-11-28T12:39:07+00:00\",\"description\":\"The future of AI is difficult to predict. But while AI systems could have substantial positive effects, there's a growing consensus about the dangers of AI.\",\"breadcrumb\":{\"@id\":\"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/#primaryimage\",\"url\":\"https:\/\/80000hours.org\/wp-content\/uploads\/2022\/08\/panels-photo.jpg\",\"contentUrl\":\"https:\/\/80000hours.org\/wp-content\/uploads\/2022\/08\/panels-photo.jpg\",\"width\":2100,\"height\":1200},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/80000hours.org\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Preventing an AI-related catastrophe\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/80000hours.org\/#website\",\"url\":\"https:\/\/80000hours.org\/\",\"name\":\"80,000 Hours\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/80000hours.org\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/80000hours.org\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/80000hours.org\/#organization\",\"name\":\"80,000 Hours\",\"url\":\"https:\/\/80000hours.org\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/80000hours.org\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/80000hours.org\/wp-content\/uploads\/2018\/07\/og-logo_0.png\",\"contentUrl\":\"https:\/\/80000hours.org\/wp-content\/uploads\/2018\/07\/og-logo_0.png\",\"width\":1500,\"height\":785,\"caption\":\"80,000 Hours\"},\"image\":{\"@id\":\"https:\/\/80000hours.org\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/80000Hours\",\"https:\/\/x.com\/80000hours\",\"https:\/\/www.youtube.com\/user\/eightythousandhours\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Preventing an AI-related catastrophe - 80,000 Hours","description":"The future of AI is difficult to predict. But while AI systems could have substantial positive effects, there's a growing consensus about the dangers of AI.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/","og_locale":"en_US","og_type":"article","og_title":"Preventing an AI-related catastrophe - Problem profile","og_description":"Why do we think that reducing risks from AI is one of the most pressing issues of our time? There are technical safety issues that we believe could, in the worst case, lead to an existential threat to humanity.","og_url":"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/","og_site_name":"80,000 Hours","article_publisher":"https:\/\/www.facebook.com\/80000Hours","article_modified_time":"2024-11-28T12:39:07+00:00","og_image":[{"width":2100,"height":1200,"url":"https:\/\/80000hours.org\/wp-content\/uploads\/2022\/08\/panels-photo.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_title":"Preventing an AI-related catastrophe - Problem profile","twitter_description":"Why do we think that reducing risks from AI is one of the most pressing issues of our time?","twitter_site":"@80000hours","twitter_misc":{"Est. reading time":"97 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/","url":"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/","name":"Preventing an AI-related catastrophe - 80,000 Hours","isPartOf":{"@id":"https:\/\/80000hours.org\/#website"},"primaryImageOfPage":{"@id":"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/#primaryimage"},"image":{"@id":"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/#primaryimage"},"thumbnailUrl":"https:\/\/80000hours.org\/wp-content\/uploads\/2022\/08\/panels-photo.jpg","datePublished":"2022-08-25T19:43:58+00:00","dateModified":"2024-11-28T12:39:07+00:00","description":"The future of AI is difficult to predict. But while AI systems could have substantial positive effects, there's a growing consensus about the dangers of AI.","breadcrumb":{"@id":"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/#primaryimage","url":"https:\/\/80000hours.org\/wp-content\/uploads\/2022\/08\/panels-photo.jpg","contentUrl":"https:\/\/80000hours.org\/wp-content\/uploads\/2022\/08\/panels-photo.jpg","width":2100,"height":1200},{"@type":"BreadcrumbList","@id":"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/80000hours.org\/"},{"@type":"ListItem","position":2,"name":"Preventing an AI-related catastrophe"}]},{"@type":"WebSite","@id":"https:\/\/80000hours.org\/#website","url":"https:\/\/80000hours.org\/","name":"80,000 Hours","description":"","publisher":{"@id":"https:\/\/80000hours.org\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/80000hours.org\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/80000hours.org\/#organization","name":"80,000 Hours","url":"https:\/\/80000hours.org\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/80000hours.org\/#\/schema\/logo\/image\/","url":"https:\/\/80000hours.org\/wp-content\/uploads\/2018\/07\/og-logo_0.png","contentUrl":"https:\/\/80000hours.org\/wp-content\/uploads\/2018\/07\/og-logo_0.png","width":1500,"height":785,"caption":"80,000 Hours"},"image":{"@id":"https:\/\/80000hours.org\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/80000Hours","https:\/\/x.com\/80000hours","https:\/\/www.youtube.com\/user\/eightythousandhours"]}]}},"_links":{"self":[{"href":"https:\/\/80000hours.org\/wp-json\/wp\/v2\/problem_profile\/77853"}],"collection":[{"href":"https:\/\/80000hours.org\/wp-json\/wp\/v2\/problem_profile"}],"about":[{"href":"https:\/\/80000hours.org\/wp-json\/wp\/v2\/types\/problem_profile"}],"author":[{"embeddable":true,"href":"https:\/\/80000hours.org\/wp-json\/wp\/v2\/users\/423"}],"version-history":[{"count":2,"href":"https:\/\/80000hours.org\/wp-json\/wp\/v2\/problem_profile\/77853\/revisions"}],"predecessor-version":[{"id":88206,"href":"https:\/\/80000hours.org\/wp-json\/wp\/v2\/problem_profile\/77853\/revisions\/88206"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/80000hours.org\/wp-json\/wp\/v2\/media\/87151"}],"wp:attachment":[{"href":"https:\/\/80000hours.org\/wp-json\/wp\/v2\/media?parent=77853"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/80000hours.org\/wp-json\/wp\/v2\/categories?post=77853"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}