
- 보고서는 AI의 성능을 비교 평가하는 플랫폼 'LMSYS 챗봇 아레나'에서 미국과 중국의 최고 AI 간 성능 차이가 올해 2월 1.7%로 1년여 전인 지난해 1월 9.3%에서 크게 줄어들었다고 발표함
- 올해 2월 미국 최고 AI 모델로 평가된 구글이 언어와 사고, 수학 코딩 능력을 종합한 성능 평가에서 받은 점수(1천 385점)와, 중국 최고 AI 모델로 평가된 딥시크가 받은 점수(1천 362점)를 비교한 결과임
□ 지난해 출시한 주목할 만한 AI 모델 수에서도 중국은 15개로 40개의 미국과 25개 차이가 나 2022년 미국 70개, 중국 20개에서 줄어든 수준으로 나타남
- 프랑스는 3개, 한국을 비롯해 캐나다·이스라엘·사우디아라비아는 1개 모델을 출시함
□ 지난해 AI에 대한 민간 부문 투자는 미국이 1천 99억 8천만 달러(161조 8천억 원)로 중국(92억 9천만 달러)의 10배를 넘었음
- 미국의 투자는 전년(672억 달러)보다 63% 증가했으며, 중국(72억 6천만 달러)은 28%가 늘면서 두 국가의 격차는 전년 9배에서 더 증가함
- 한국의 투자는 13억 3천만 달러로 전년(13억 9천만 달러)보다 다소 줄어들며 조사 대상 투자 규모 순위에서도 9번째에서 11번째로 하락함
[출처] 경쟁국은 AI 투자 늘리는데 韓은 감소…투자 규모 순위 9→11위(종합) (2025.04.08.) / 연합뉴스
목차
Title page 1
Contents 12
Report Highlights 13
CHAPTER 1: Research and Development 25
Overview 27
Chapter Highlights 28
1.1. Publications 30
Overview 30
Total Number of AI Publications 30
By Venue 32
By National Affiliation 33
By Sector 37
By Topic 39
Top 100 Publications 40
By National Affiliation 40
By Sector 41
By Organization 42
1.2. Patents 43
Overview 43
By National Affiliation 44
1.3. Notable AI Models 47
By National Affiliation 47
By Sector 48
By Organization 50
Model Release 51
Parameter Trends 53
Compute Trends 57
Highlight: Will Models Run Out of Data? 60
Inference Cost 65
Training Cost 66
1.4. Hardware 69
Overview 69
Highlight: Energy Efficiency and Environmental Impact 72
1.5. AI Conferences 76
Conference Attendance 76
1.6. Open-Source AI Software 78
Projects 78
Stars 80
CHAPTER 2: Technical Performance 83
Overview 86
Chapter Highlights 87
2.1. Overview of AI in 2024 89
Timeline: Significant Model and Dataset Releases 89
State of AI Performance 95
Overall Review 95
Closed vs. Open-Weight Models 96
US vs. China Technical Performance 98
Improved Performance From Smaller Models 100
Model Performance Converges at the Frontier 101
Benchmarking AI 102
2.2. Language 105
Understanding 106
MMLU: Massive Multitask Language Understanding 106
Generation 107
Chatbot Arena Leaderboard 107
Arena-Hard-Auto 109
WildBench 110
Highlight: o1, o3, and Inference-Time Compute 112
MixEval 114
RAG: Retrieval Augment Generation (RAG) 115
Berkeley Function Calling Leaderboard 115
MTEB: Massive Text Embedding Benchmark 117
Highlight: Evaluating Retrieval Across Long Contexts 119
2.3. Image and Video 121
Understanding 121
VCR: Visual Commonsense Reasoning 121
MVBench 122
Generation 124
Chatbot Arena: Vision 125
Highlight: The Rise of Video Generation 126
2.4. Speech 128
Speech Recognition 128
LSR2: Lip Reading Sentences 2 128
2.5. Coding 130
HumanEva 130
SWE-bench 131
BigCodeBench 132
Chatbot Arena: Coding 133
2.6. Mathematics 134
GSM8K 134
MATH 135
Chatbot Arena: Math 136
FrontierMath 136
Highlight: Learning and Theorem Proving 138
2.7. Reasoning 139
General Reasoning 139
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI 139
GPQA: A Graduate-Level Google-Proof Q&A Benchmark 140
ARC-AGI 141
Humanity's Last Exam 143
Planning 145
PlanBench 145
2.8. AI Agents 146
VisualAgentBench 146
RE-Bench 147
GAIA 149
2.9. Robotics and Autonomous Motion 150
Robotics 150
RLBench 150
Highlight: Humanoid Robotics 152
Highlight: DeepMind's Developments 153
Highlight: Foundation Models for Robotics 156
Self-Driving Cars 157
Deployment 157
Technical Innovations and New Benchmarks 158
Safety Standards 159
CHAPTER 3: Responsible AI 162
Overview 164
Chapter Highlights 165
3.1. Background 167
Definitions 167
3.2. Assessing Responsible AI 168
AI Incidents 168
Examples 169
Limited Adoption of RAI Benchmarks 171
Factuality and Truthfulness 172
Hughes Hallucination Evaluation Model (HHEM) Leaderboard 172
Highlight: FACTS, SimpleQA, and the Launch of Harder Factuality Benchmarks 173
3.3. RAI in Organizations and Businesses 175
Highlight: Longitudinal Perspective 182
3.4. RAI in Academia 186
Aggregate Trends 186
Topic Area 189
3.5. RAI Policymaking 193
3.6. Privacy and Data Governance 194
Featured Research 194
Large-Scale Audit of Dataset Licensing and Attribution in AI 194
Data Consent in Crisis 195
3.7. Fairness and Bias 197
Featured Research 197
Racial Classification in Multimodal Models 197
Measuring Implicit Bias in Explicitly Unbiased LLMs 199
3.8. Transparency and Explainability 201
Featured Research 201
Foundation Model Transparency Index v1.1 201
3.9. Security and Safety 203
Benchmarks 203
HELM Safety 203
AIR-Bench 204
Featured Research 206
Beyond Shallow Safety Alignment 206
Improving the Robustness to Persistently Harmful Behaviors in LLMs 207
3.10. Special Topics on RAI 209
AI Agents 209
Identifying the Risks of LM Agents With LM-Simulated Sandboxes 209
Jailbreaking Multimodal Agents With a Single Image 209
Election Misinformation 211
AI Misinformation in the US Elections 211
Rest of World 2024 AI-Generated Election Content 212
CHAPTER 4: Economy 216
Overview 218
Chapter Highlights 219
4.1. What's New in 2024: A Timeline 221
4.2. Jobs 225
AI Labor Demand 225
Global AI Labor Demand 225
US AI Labor Demand by Skill Cluster and Specialized Skill 227
US AI Labor Demand by Sector 230
US AI Labor Demand by State 231
AI Hiring 234
AI Skill Penetration 236
AI Talent 238
Highlight: Measuring AI's Current Economic Integration 244
4.3. Investment 248
Corporate Investment 248
Startup Activity 249
Global Trends 249
Regional Comparison by Funding Amount 253
Regional Comparison by Newly Funded AI Companies 257
Focus Area Analysis 260
4.4. Corporate Activity 262
Industry Usage 262
Use of AI Capabilities 262
Deployment of AI Capabilities 266
AI's Labor Impact 269
4.5. Robot Deployments 274
Aggregate Trends 274
Industrial Robots: Traditional vs. Collaborative Robots 276
By Geographic Area 277
Country-Level Data on Service Robotics 281
CHAPTER 5: Science and Medicine 282
Overview 284
Chapter Highlights 285
5.1. Notable Medical and Biological AI Milestones 287
Protein Sequence Optimization 287
Aviary 288
AlphaProteo 289
Human Brain Mapping 289
Virtual AI Lab 290
GluFormer 291
Evolutionary Scale Modeling v3 (ESM3) 291
AlphaFold 3 292
5.2. The Central Dogma 293
Protein Sequence Analysis 293
AI-Driven Protein Sequence Models 293
Public Databases for Protein Science 295
Research and Publication Trends 296
AI-Driven Protein Science Publications 296
Image and Multimodal AI for Scientific Discovery 297
5.3. Clinical Care, Imaging 298
Data: Sources, Types, and Needs 298
Advanced Modeling Approaches 300
5.4. Clinical Care, Non-Imaging 302
Clinical Knowledge 302
MedQA 302
Highlight: AI Doctors and Cost-Efficiency Considerations 303
Evaluation of LLMs for Healthcare Performance 304
Overview 304
Diagnostic Reasoning With LLMs 306
Highlight: LLMs Influence Diagnostic Reasoning 306
Management Reasoning and Patient Care Decisions 306
Highlight: GPT-4 Assistance on Patient Care Tasks 307
Ambient AI Scribes 308
Deployment, Implementation, Deimplementation 310
FDA Authorization of AI-Enabled Medical Devices 310
Successful Use Cases: Stanford Health Care 310
Screening for Peripheral Arterial Disease 311
Social Determinants of Health 312
Extracting SDoH From EHR and Clinical Notes 312
AI Adoption Across Medical Fields and the Integration of SDoH 313
Synthetic Data 313
Clinical Risk Prediction 313
Drug Discovery 314
Data Generation Platforms 314
Electronic Health Record System 315
Clinical Decision Support 317
5.5. Ethical Considerations 319
Meta Review 319
5.6. AI Foundation Models in Science 322
Highlight: Notable Model Releases 322
CHAPTER 6: Policy and Governance 325
Overview 327
Chapter Highlights 328
6.1. Major Global AI Policy News in 2024 329
6.2. AI and Policymaking 338
Global Legislative Records on AI 338
Overview 338
By Geographic Area 339
Highlight: A Closer Look at Global AI Legislation 340
US Legislative Records 341
Federal Level 341
State Level 342
Highlight: A Closer Look at State-Level AI Legislation 344
Highlight: Anti-deepfake Policymaking 345
Global AI Mentions 347
Overview 347
US Committee Mentions 350
US Regulations 351
Overview 351
By Agency 351
Highlight: A Closer Look at US Federal Regulations 353
6.3. Public Investment in AI 354
Total AI Public Investments 355
Spending Across Agencies and Sectors 362
Highlight: AI Grant Spending in the US 364
CHAPTER 7: Education 366
Overview 368
Chapter Highlights 369
7.1. Background 370
7.2. K-12 CS and AI Education 371
United States 371
Foundational Computer Science 371
Advanced Computer Science 375
Education Standards and Guidance 378
Teacher Perspectives 379
Global 381
Access 381
Guidance 382
7.3. Postsecondary CS and AI Education 384
Degree Graduates 384
United States 384
Global 390
Guidance 394
7.4. Looking Ahead 395
CHAPTER 8: Public Opinion 396
Overview 398
Chapter Highlights 399
8.1. Public Opinion 401
Global Public Opinion 401
AI Products and Services 401
AI and Jobs 407
AI and Livelihood 409
Highlight: Self-Driving Cars 411
8.2. US Policymaker Opinion 412
APPENDIX 416
Chapter 1: Research and Development 418
Chapter 2: Technical Performance 422
Chapter 3: Responsible AI 429
Chapter 4: Economy 433
Chapter 5: Science and Medicine 443
Chapter 6: Policy and Governance 453
Chapter 7: Education 456
Chapter 8: Public Opinion 457
Figures 30
Figure 1.1.1. Number of AI publications in CS worldwide, 2013-23 30
Figure 1.1.2. AI publications in CS (% of total) worldwide, 2013-23 31
Figure 1.1.3. Number of AI publications in CS by venue type, 2013-23 32
Figure 1.1.4. AI publications in CS (% of total) by region, 2013-23 33
Figure 1.1.5. AI publication citations in CS (% of total) by region, 2013-23 34
Figure 1.1.6. AI publications in CS (% of total) by select geographic areas, 2013-23 35
Figure 1.1.7. AI publication citations in CS (% of total) by select geographic areas, 2013-23 36
Figure 1.1.8. AI publications in CS (% of total) by sector, 2013-23 37
Figure 1.1.9. AI publications in CS (% of total) by sector and select geographic areas, 2023 38
Figure 1.1.10. Number of AI publications by select top topics, 2013-23 39
Figure 1.1.11. Number of highly cited publications in top 100 by select geographic areas, 2021-23 40
Figure 1.1.12. Number of highly cited publications in top 100 by sector, 2021-23 41
Figure 1.1.13. Number of highly cited publications in top 100 by organization, 2021-23 42
Figure 1.2.1. Number of AI patents granted worldwide, 2010-23 43
Figure 1.2.2. Granted AI patents (% of world total) by region, 2010-23 44
Figure 1.2.3. Granted AI patents (% of world total) by select geographic areas, 2010-23 45
Figure 1.2.4. Granted AI patents per 100,000 inhabitants by country, 2023 46
Figure 1.2.5. Percentage change of granted AI patents per 100,000 inhabitants by country, 2013 vs. 2023 46
Figure 1.3.1. Number of notable AI models by select geographic areas, 2024 47
Figure 1.3.2. Number of notable AI models by select geographic areas, 2003-24 47
Figure 1.3.3. Number of notable AI models by geographic area, 2003-24 (sum) 48
Figure 1.3.4. Number of notable AI models by sector, 2003-24 49
Figure 1.3.5. Notable AI models (% of total) by sector, 2003-24 49
Figure 1.3.6. Number of notable AI models by organization, 2024 50
Figure 1.3.7. Number of notable AI models by organization, 2014-24 (sum) 50
Figure 1.3.8. Number of notable AI models by access type, 2014-24 51
Figure 1.3.9. Notable AI models (% of total) by access type, 2014-24 52
Figure 1.3.10. Number of notable AI models by training code access type, 2014-24 52
Figure 1.3.11. Number of parameters of notable AI models by sector, 2003-24 53
Figure 1.3.12. Number of parameters of select notable AI models by sector, 2012-24 54
Figure 1.3.13. Training dataset size of notable AI models, 2010-24 55
Figure 1.3.14. Training length of notable AI models, 2010-24 56
Figure 1.3.15. Training compute of notable AI models by sector, 2003-24 57
Figure 1.3.16. Training compute of notable AI models by domain, 2012-24 58
Figure 1.3.17. Training compute of select notable AI models in the United States and China, 2018-24 59
Figure 1.3.18. Estimated median data stocks 60
Figure 1.3.19. Projections of the stock of public text and data usage 61
Figure 1.3.20. Effect of data accumulation on language models pretrained on TinyStories 62
Figure 1.3.21. Factual accuracy: percentage of correct answers in biographies 64
Figure 1.3.22. Inference price across select benchmarks, 2022-24 65
Figure 1.3.23. Output price per million tokens for select models 66
Figure 1.3.24. Estimated training cost of select AI models, 2019-24 67
Figure 1.3.25. Estimated training cost of select AI models, 2016-24 68
Figure 1.3.26. Estimated training cost and compute of select AI models 68
Figure 1.4.1. Peak computational performance of ML hardware for different precisions, 2008-24 69
Figure 1.4.2. Performance of leading Nvidia data center GPUs for machine learning 70
Figure 1.4.3. Price-performance of leading Nvidia data center GPUs for machine learning 71
Figure 1.4.4. Cumulative number of notable AI models trained by accelerator, 2017-24 71
Figure 1.4.5. Energy efficiency of leading machine learning hardward, 2016-24 72
Figure 1.4.6. Total power draw required to train frontier models, 2011-24 73
Figure 1.4.7. Estimated carbon emissions from training select AI models and real-life activities, 2012-24 74
Figure 1.4.8. Estimated carbon emissions and number of parameters by select AI models 75
Figure 1.5.1. Attendance at select AI conferences, 2010-24 76
Figure 1.5.2. Attendance at large conferences, 2010-24 77
Figure 1.5.3. Attendance at small conferences, 2010-24 77
Figure 1.6.1. Number of GitHub AI projects, 2011-24 78
Figure 1.6.2. GitHub AI projects (% of total) by geographic area, 2011-24 79
Figure 1.6.3. Number of GitHub stars in AI projects, 2011-24 80
Figure 1.6.4. Number of GitHub stars by geographic area, 2011-24 81
Figure 2.1.1. (Omit) 89
Figure 2.1.2. (Omit) 89
Figure 2.1.3. (Omit) 89
Figure 2.1.4. (Omit) 89
Figure 2.1.5. (Omit) 89
Figure 2.1.6. (Omit) 90
Figure 2.1.7. (Omit) 90
Figure 2.1.8. (Omit) 90
Figure 2.1.9. (Omit) 90
Figure 2.1.10. (Omit) 90
Figure 2.1.11. (Omit) 90
Figure 2.1.12. (Omit) 91
Figure 2.1.13. (Omit) 91
Figure 2.1.14. (Omit) 91
Figure 2.1.15. (Omit) 91
Figure 2.1.16. (Omit) 91
Figure 2.1.17. (Omit) 92
Figure 2.1.18. (Omit) 92
Figure 2.1.19. (Omit) 92
Figure 2.1.20. (Omit) 92
Figure 2.1.21. (Omit) 92
Figure 2.1.22. (Omit) 92
Figure 2.1.23. (Omit) 93
Figure 2.1.24. (Omit) 93
Figure 5.1.25. (Omit) 93
Figure 2.1.26. (Omit) 93
Figure 2.1.27. (Omit) 93
Figure 2.1.28. (Omit) 93
Figure 2.1.29. (Omit) 94
Figure 2.1.30. (Omit) 94
Figure 2.1.31. (Omit) 94
Figure 2.1.32. (Omit) 94
Figure 2.1.33. Select AI Index technical performance benchmarks vs. human performance 95
Figure 2.1.34. Performance of top closed vs. open models on LMSYS Chatbot Arena 97
Figure 2.1.35. Performance of top closed vs. open models on select benchmarks 97
Figure 2.1.36. Performance of top United States vs. Chinese models on LMSYS Chatbot Arena 98
Figure 2.1.37. Performance of top United States vs. Chinese models on select benchmarks 99
Figure 2.1.38. Smallest AI models scoring above 60% on MMLU, 2022-24 100
Figure 2.1.39. Performance of top models on LMSYS Chatbot Arena by select providers 101
Figure 2.1.40. Five stages of the benchmark lifecycle 103
Figure 2.1.41. Design vs. usability scores across select benchmarks 104
Figure 2.2.1. A sample output from GPT-4o 105
Figure 2.2.2. Gemini 2.0 in an agentic workflow 105
Figure 2.2.3. A sample question from MMLU 106
Figure 2.2.4. MMLU: average accuracy 106
Figure 2.2.5. MMLU-Pro: overall accuracy 107
Figure 2.2.6. A sample model response on the Chatbot Arena Leaderboard 108
Figure 2.2.7. LMSYS Chatbot Arena for LLMs: Elo rating (overall) 108
Figure 2.2.8. Arena-Hard-Auto vs. other benchmarks 109
Figure 2.2.9. Arena-Hard-Auto with no modification 109
Figure 2.2.10. Arena-Hard-Auto with style control 109
Figure 2.2.11. Evaluation framework for WildBench 110
Figure 2.2.12. WildBench: WB-Elo (length controlled) 111
Figure 2.2.13. Chain-of-thought thinking in o1 112
Figure 2.2.14. GPT-4o vs. o1-preview vs. o1 on select benchmarks 113
Figure 2.2.15. Evaluation framework for MixEval 114
Figure 2.2.16. MixEval-Hard on chat models: score 114
Figure 2.2.17. Data composition on the Berkeley Function Calling Leaderboard 115
Figure 2.2.18. Berkeley Function-Calling: overall accuracy 116
Figure 2.2.19. Tasks in the MTEB benchmark 117
Figure 2.2.20. MTEB on English subsets across 56 datasets: average score 118
Figure 2.2.21. RULER: weighted average score (increasing) 119
Figure 2.2.22. RULER: claimed vs. effective context length 119
Figure 2.2.23. Comparing long-context benchmarks 120
Figure 2.2.24. HELMET: average score 120
Figure 2.3.1. Sample question from Visual Commonsense Reasoning (VCR) challenge 121
Figure 2.3.2. Visual Commonsense Reasoning (VCR) task: Q→AR score 122
Figure 2.3.3. Sample tasks on MVBench 122
Figure 2.3.4. MVBench: average accuracy 123
Figure 2.3.5. Which face is real? 124
Figure 2.3.6. Midjourney generations over time: "a hyper-realistic image of Harry Potter" 124
Figure 2.3.7. Sample from the Chatbot Vision Arena 125
Figure 2.3.8. LMSYS Chatbot Arena for LLMs: Elo rating (vision) 125
Figure 2.3.9. Still generations from Stable Video Diffusion 126
Figure 2.3.10. Still generation from Sora 126
Figure 2.3.11. Veo 2: overall preference 127
Figure 2.3.12. Will Smith eating spaghetti, 2023 vs. 2025 127
Figure 2.4.1. Still images from the BBC lip reading sentences 2 dataset 128
Figure 2.4.2. LRS2: word error rate (WER) 129
Figure 2.5.1. Sample HumanEval problem 130
Figure 2.5.2. HumanEval: Pass@1 130
Figure 2.5.3. A sample model input from SWE-bench 131
Figure 2.5.4. SWE-bench: percent solved 131
Figure 2.5.5. Programming tasks in BigCodeBench 132
Figure 2.5.6. BigCodeBench on the hard set: Pass@1 (average) 132
Figure 2.5.7. BigCodeBench on the full set: Pass@1 (average) 132
Figure 2.5.8. LMSYS Chatbot Arena for LLMs: Elo rating (coding) 133
Figure 2.6.1. Sample problems from GSM8K 134
Figure 2.6.2. GSM8K: accuracy 134
Figure 2.6.3. Sample problem from MATH dataset 135
Figure 2.6.4. MATH word problem-solving: accuracy 135
Figure 2.6.5. LMSYS Chatbot Arena for LLMs: Elo rating (Math) 136
Figure 2.6.6. Sample problems from FrontierMath 137
Figure 2.6.7. FrontierMath: percent solved 137
Figure 2.6.8. Number of solved geometry problems in IMO-AG-30 138
Figure 2.7.1. Sample MMMU questions 139
Figure 2.7.2. MMMU on validation set: overall accuracy 139
Figure 2.7.3. Sample chemistry question from GPQA 140
Figure 2.7.4. GPQA on the diamond set: accuracy 140
Figure 2.7.5. Sample ARC-AGI task 141
Figure 2.7.6. ARC-AGI-1 on private evaluation set: high score 142
Figure 2.7.7. Same questions on HLE 143
Figure 2.7.8. Humanity's Last Exam (HLE): accuracy 144
Figure 2.7.9. PlanBench: instances correct 145
Figure 2.8.1. Tasks on VisualAgentBench 146
Figure 2.8.2. VisualAgentBench on the test set: success rate 147
Figure 2.8.3. RE-Bench Process and Flow 147
Figure 2.8.4. RE-Bench: average normalized score@k 148
Figure 2.8.5. Sample questions on GAIA 149
Figure 2.8.6. GAIA: average score 149
Figure 2.9.1. Tasks on VisualAgentBench 150
Figure 2.9.2. RLBench: success rate (18 tasks, 100 demo/task) 151
Figure 2.9.3. Figure robot making coffee 152
Figure 2.9.4. Figure robot assisting in automotive assembly 152
Figure 2.9.5. AutoRT workflow 153
Figure 2.9.6. Speedtests for SARA vs. non-SARA enhanced models 153
Figure 2.9.7. ALOHA-trained robot attempting complex tasks 154
Figure 2.9.8. ALOHA: success rate 154
Figure 2.9.9. Robots playing amateur-level table tennis 155
Figure 2.9.10. GROOT blueprint for synthetic motion generation 156
Figure 2.9.11. Waymo rider-only miles driven without a human driver 157
Figure 2.9.12. Baidu's RT-6 158
Figure 2.9.13. An overview of Bench2Drive 158
Figure 2.9.14. Bench2Drive: driving score 159
Figure 2.9.15. Waymo driver vs. human benchmarks in Phoenix and San Francisco 160
Figure 2.9.16. Waymo driver percent difference to human benchmarks in Phoenix and San Francisco 160
Figure 2.9.17. Comparison of liability insurance claims by type: Waymo driver vs. human-driven vehicles 161
Figure 3.1.1. Responsible AI dimensions, denitions, and examples 167
Figure 3.2.1. Number of reported AI incidents, 2012-24 168
Figure 3.2.2. (Omit) 169
Figure 3.2.3. (Omit) 169
Figure 3.2.4. (Omit) 170
Figure 3.2.5. (Omit) 170
Figure 3.2.6. Reported general capability benchmarks for popular foundation models 171
Figure 3.2.7. Reported safety and responsible AI benchmarks for popular foundation models 171
Figure 3.2.8. HHEM: hallucination rate 172
Figure 3.2.9. Still generations from Stable Video Diffusion 173
Figure 3.2.10. FACTS: factuality score 173
Figure 3.2.11. Sample questions from SimpleQA 174
Figure 3.2.12. SimpleQA: percent of questions 174
Figure 3.3.1. Business functions assigned primary responsibility for AI governance, 2024 175
Figure 3.3.2. Investment in responsible AI by company revenue, 2024 176
Figure 3.3.3. AI risks: considered relevant vs. actively mitigated, 2024 177
Figure 3.3.4. Percentage of organizations that have experienced AI incidents, 2024 178
Figure 3.3.5. Number of AI incidents reported by organizations, 2024 178
Figure 3.3.6. Impact of responsible AI policies in organizations, 2024 179
Figure 3.3.7. Main obstacles to the implementation of responsible AI measures, 2024 180
Figure 3.3.8. Percentage of organizations influenced by AI regulations in responsible AI decision making 181
Figure 3.3.9. AI-related types of incidents reported by organizations in the past two years 182
Figure 3.3.10. Relevance of selected responsible AI risks for organizations, 2024 vs. 2025 183
Figure 3.3.11. Organizational and operational maturity model 184
Figure 3.3.12. Organizational responsible AI maturity distribution, 2024 vs. 2025 184
Figure 3.3.13. Operational responsible AI maturity distribution, 2024 vs. 2025 184
Figure 3.3.14. Organizational attitudes and philosophies surrounding responsible AI 185
Figure 3.4.1. Number of responsible AI papers accepted at select AI conferences, 2019-24 186
Figure 3.4.2. Responsible AI papers accepted (% of total) at select AI conferences by conference, 2019-24 187
Figure 3.4.3. Number of responsible AI papers accepted at select AI conferences by geographic area, 2024 188
Figure 3.4.4. Number of responsible AI papers accepted at select AI conferences by select geographic area, 2019-24 188
Figure 3.4.5. Number of responsible AI papers accepted at select AI conferences by geographic area, 2019-24 (sum) 188
Figure 3.4.6. AI privacy and data governance papers accepted at select AI conferences, 2019-24 189
Figure 3.4.7. AI fairness and bias papers accepted at select AI conferences, 2019-24 190
Figure 3.4.8. AI transparency and explainability papers accepted at select AI conferences, 2019-24 191
Figure 3.4.9. AI security and safety papers accepted at select AI conferences, 2019-24 192
Figure 3.5.1. Notable RAI policymaking milestones 193
Figure 3.6.1. Accuracy of dataset license classifications by select aggregators 194
Figure 3.6.2. Percentage of tokens in the top web domains of C4 by robots.txt restriction category, 2016-24 196
Figure 3.6.3. Percentage of tokens in the top web domains of C4 by terms of service restriction category, 2016-24 196
Figure 3.7.1. Faces and their likelihood of being classified as "criminal" by model and dataset sizes 197
Figure 3.7.2. Effect of dataset scaling on model predictions across demographic groups 198
Figure 3.7.3. Example of implicit bias in LLMs 199
Figure 3.7.4. LLMs implicit bias across stereotypes in four social categories 200
Figure 3.8.1. Foundation Model Transparency Index Scores by Domain, May 2024 201
Figure 3.8.2. Foundation Model Transparency Index Scores by Major Dimensions of Transparency, May 2024 202
Figure 3.9.1. HELM Safety: mean score 203
Figure 3.9.2. AIR-Bench: refusal rate 204
Figure 3.9.3. AIR-Bench: refusal rate across select risk categories 205
Figure 3.9.4. Attack success rate vs. number of prefilled harmful tokens in LLMs 206
Figure 3.9.5. Targeted latent adversarial training in LLMs 207
Figure 3.9.6. General performance on nonadversarial data 207
Figure 3.9.7. Model resistance to jailbreaking attacks 208
Figure 3.10.1. Overview of ToolEmu 209
Figure 3.10.2. Failure incidence of LM agents 210
Figure 3.10.3. Infection ratio by chat round 210
Figure 3.10.4. Conceptualization of ethical concerns around AI and information manipulation 211
Figure 3.10.5. Rest of World 2024 AI elections: summary statistics 212
Figure 3.10.6. (Omit) 213
Figure 3.10.7. (Omit) 213
Figure 3.10.8. (Omit) 214
Figure 3.10.9. (Omit) 214
Figure 3.10.10. (Omit) 215
Figure 3.10.11. (Omit) 215
Figure 4.1.1. (Omit) 221
Figure 4.1.2. (Omit) 221
Figure 4.1.3. (Omit) 221
Figure 4.1.4. (Omit) 221
Figure 4.1.5. (Omit) 221
Figure 4.1.6. (Omit) 222
Figure 4.1.7. (Omit) 222
Figure 4.1.8. (Omit) 222
Figure 4.1.9. (Omit) 222
Figure 4.1.10. (Omit) 222
Figure 4.1.11. (Omit) 222
Figure 4.1.12. (Omit) 223
Figure 4.1.13. (Omit) 223
Figure 4.1.14. (Omit) 223
Figure 4.1.15. (Omit) 223
Figure 4.1.16. (Omit) 223
Figure 4.1.17. (Omit) 223
Figure 4.1.18. (Omit) 223
Figure 4.1.19. (Omit) 224
Figure 4.1.20. (Omit) 224
Figure 4.1.21. (Omit) 224
Figure 4.1.22. (Omit) 224
Figure 4.1.23. (Omit) 224
Figure 4.1.24. (Omit) 224
Figure 4.2.1. AI job postings (% of all job postings) by select geographic areas, 2014-24 (part 1) 225
Figure 4.2.2. AI job postings (% of all job postings) by select geographic areas, 2014-24 (part 2) 226
Figure 4.2.3. AI job postings (% of all job postings) in the United States by skill cluster, 2010-24 227
Figure 4.2.4. Top 10 specialized skills in 2024 AI job postings in the United States, 2012-14 vs. 2024 228
Figure 4.2.5. Generative AI skills in AI job postings in the United States, 2023 vs. 2024 229
Figure 4.2.6. Share of generative AI skills in AI job postings in the United States, 2023 vs. 2024 229
Figure 4.2.7. AI job postings (% of all job postings) in the United States by sector, 2023 vs. 2024 230
Figure 4.2.8. Number of AI job postings in the United States by state, 2024 231
Figure 4.2.9. Percentage of US states job postings in AI, 2024 231
Figure 4.2.10. Percentage of US AI job postings by state, 2024 232
Figure 4.2.11. Percentage of US states' job postings in AI by select US state, 2010-24 232
Figure 4.2.12. Percentage of US AI job postings by select US state, 2010-24 233
Figure 4.2.13. Relative AI hiring rate year-over-year ratio by geographic area, 2024 234
Figure 4.2.14. Relative AI hiring rate year-over-year ratio by geographic area, 2018-24 235
Figure 4.2.15. Relative AI skill penetration rate by geographic area, 2015-24 236
Figure 4.2.16. Relative AI skill penetration rate across gender, 2015-24 237
Figure 4.2.17. AI talent concentration by geographic area, 2024 238
Figure 4.2.18. Percentage change in AI talent concentration by geographic area, 2016 vs. 2024 238
Figure 4.2.19. AI talent concentration by gender and geographic area, 2016-24 239
Figure 4.2.20. Global AI talent representation, 2016-24 240
Figure 4.2.21. AI talent representation by gender and geographic area, 2016-24 241
Figure 4.2.22. Net AI talent migration per 10,000 LinkedIn members by geographic area, 2024 242
Figure 4.2.23. Net AI talent migration per 10,000 LinkedIn members by geographic area, 2019-24 243
Figure 4.2.24/Figure 4.2.23. Occupational representation in Claude usage data vs. US workforce distribution 244
Figure 4.2.25. Occupational usage of Claude by median annual wage 245
Figure 4.2.26. Depth of AI usage across organizations 246
Figure 4.2.27. Percentage of Claude conversations by type of task execution 247
Figure 4.2.28. Distribution of occupational skills exhibited by Claude in conversations 247
Figure 4.3.1. Global corporate investment in AI by investment activity, 2013-24 248
Figure 4.3.2. Global private investment in AI, 2013-24 249
Figure 4.3.3. Global private investment in generative AI, 2019-24 250
Figure 4.3.4. Number of newly funded AI companies in the world, 2013-24 251
Figure 4.3.5. Number of newly funded generative AI companies in the world, 2019-24 251
Figure 4.3.6. Average size of global AI private investment events, 2013-24 252
Figure 4.3.7. Global AI private investment events by funding size, 2023 vs. 2024 252
Figure 4.3.8. Global private investment in AI by geographic area, 2024 253
Figure 4.3.9. Global private investment in AI by geographic area, 2013-24 (sum) 254
Figure 4.3.10. Global private investment in AI by geographic area, 2013-24 255
Figure 4.3.11. Global private investment in generative AI by geographic area, 2019-24 256
Figure 4.3.12. Number of newly funded AI companies by geographic area, 2024 257
Figure 4.3.13. Number of newly funded AI companies by geographic area, 2013-24 (sum) 258
Figure 4.3.14. Number of newly funded AI companies by geographic area, 2013-24 259
Figure 4.3.15. Global private investment in AI by focus area, 2023 vs. 2024 260
Figure 4.3.16. Global private investment in AI by focus area, 2018-24 261
Figure 4.4.1. Share of respondents who say their organization uses AI in at least one function, 2017-24 262
Figure 4.4.2. AI use by industry and function, 2024 263
Figure 4.4.3. Cost decrease and revenue increase from analytical AI use by function, 2024 264
Figure 4.4.4. AI use by organizations in the world, 2023 vs. 2024 265
Figure 4.4.5. Most common generative AI use cases by function, 2024 266
Figure 4.4.6. Cost decrease and revenue increase from generative AI use by function, 2024 267
Figure 4.4.7. Generative AI use by organizations in the world, 2023 vs. 2024 268
Figure 4.4.8. Impact of AI on customer support agents 269
Figure 4.4.9. Impact of AI on scientific innovation 269
Figure 4.4.10. AI's productivity equalizing effects 270
Figure 4.4.11. Distribution of productivity gains from AI use 271
Figure 4.4.12. Expectations about the impact of generative AI on organizations' workforces in the next 3 years, 2024 272
Figure 4.4.13. Expectations about the impact of AI on organizations' workforces in the next 3 years, 2023 vs. 2024 273
Figure 4.5.1. Number of industrial robots installed in the world, 2012-23 274
Figure 4.5.2. Operational stock of industrial robots in the world, 2012-23 275
Figure 4.5.3. Number of industrial robots installed in the world by type, 2017-23 276
Figure 4.5.4. Number of industrial robots installed by geographic area, 2023 277
Figure 4.5.5. Number of new industrial robots installed in top 5 countries, 2011-23 278
Figure 4.5.6. Number of industrial robots installed (China vs. rest of the world), 2016-23 279
Figure 4.5.7. Annual growth rate of industrial robots installed by geographic area, 2022 vs. 2023 280
Figure 4.5.8. Number of service robots installed in the world by application area, 2022 vs. 2023 281
Figure 5.1.1. Single-objective optimization results for fitness optimization 287
Figure 5.1.2. Performance of LLMs and language agents to solve tasks using Aviary environments 288
Figure 5.1.3. AlphaProteo generating successful binders 289
Figure 5.1.4. 3D brain map images 289
Figure 5.1.5. Workflow in AI-based lab 290
Figure 5.1.6. GluFormer versus glucose management indicator 291
Figure 5.1.7. ESM3 models evaluated on protein generation from atomic coordination prompts 291
Figure 5.1.8. AlphaFold 3 vs. baselines for protein-ligand docking 292
Figure 5.2.1. Emergent structure prediction success, CASP15 293
Figure 5.2.2. Size of protein sequencing models, 2020-24 294
Figure 5.2.3. Key protein science databases 295
Figure 5.2.4. Growth of public protein science databases, 2019-25 295
Figure 5.2.5. Proportion of AI-driven protein research in the biological sciences, 2024 296
Figure 5.2.6. Number of foundation models per microscopy techniques, 2023-24 297
Figure 5.3.1. US patient cohorts used to train clinical machine learning algorithms by state, 2015-19 298
Figure 5.3.2. Training dataset token volumes: medical vs. nonmedical language and imaging models 299
Figure 5.3.3. Imaging modeling approaches and notable AI models 300
Figure 5.3.4. Medical disciplines and notable AI models 301
Figure 5.4.1. MedQA: test accuracy 302
Figure 5.4.2. Performance of select LLMs on medical datasets 303
Figure 5.4.3. Enhanced pareto frontier: accuracy vs. cost 303
Figure 5.4.4. Number of publications on large language models in PubMed, 2019-24 304
Figure 5.4.5. Healthcare tasks, NLP and NLU tasks, and dimensions of evaluation acrss 519 studies 305
Figure 5.4.6. LLM performance in clinical diagnosis 306
Figure 5.4.7. Impact of LLM assistance on clinical management 307
Figure 5.4.8. Cumulative Use of the Ambient Artificial Intelligence (AI) Scribe Tool, October 16-December 24, 2023 308
Figure 5.4.9. Impact of AI Scribe on physician EHR usage 309
Figure 5.4.10. Number of AI medical devices approved by the FDA, 1995-2023 310
Figure 5.4.11. Proposed model and workflow for integrating PAD screening into clinical practice 311
Figure 5.4.12. Model performance on in-domain RT test dataset (any SDoH) 312
Figure 5.4.13. (Omit) 313
Figure 5.4.14. Principal component analysis 313
Figure 5.4.15. Percolation threshold prediction and validation based on AI-generated synthetic structures 314
Figure 5.4.16. Areas under the curve for evaluating synthetic heart disease datasets 314
Figure 5.4.17. Predictive model use across primary inpatient EHR vendor 315
Figure 5.4.18. Developer of predictive models across EHR vendor 316
Figure 5.4.19. Number of clinical trials that have included mentions of AI, 2014-24 317
Figure 5.4.20. Number of clinical trials that have included mentions of AI by select geographic areas, 2021-24 318
Figure 5.5.1. (Omit) 319
Figure 5.5.2. Number of medical AI ethics publications, 2020-24 319
Figure 5.5.3. Top 10 ethical concerns discussed in medical AI ethics publications, 2020-24 320
Figure 5.5.4. AI tools discussed in medical AI ethics publications, 2020-24 320
Figure 5.5.5. Number of NIH grants for medical AI ethics by fiscal year, 2020-24 321
Figure 5.5.6. NIH grant funding for medical AI ethics by fiscal year, 2020-24 321
Figure 5.6.1. (Omit) 322
Figure 5.6.2. (Omit) 322
Figure 5.6.3. (Omit) 323
Figure 5.6.4. (Omit) 323
Figure 5.6.5. (Omit) 323
Figure 5.6.6. (Omit) 324
Figure 5.6.7. (Omit) 324
Figure 5.6.8. (Omit) 324
Figure 5.6.9. (Omit) 324
Figure 6.1.1. Singapore plans to invest $1B in AI over 5 years 329
Figure 6.1.2. Abu Dhabi launches $100B AI investment firm 329
Figure 6.1.3. Artificial Intelligence Act is passed by European Parliament 329
Figure 6.1.4. India drops plan to require government approval for launch of new AI models 330
Figure 6.1.5. India launches IndiaAI Mission with $1.25B investment 330
Figure 6.1.6. French government fines Google 250 million euros over use of copyrighted information 330
Figure 6.1.7. U.N. General Assembly adopts resolution promoting "safe, secure, and trustworthy" AI 331
Figure 6.1.8. Canada pledges CA$2.4B investment to ensure country's AI advantage 331
Figure 6.1.9. U.K. AI Safety Institute launches open-source tool for assessing AI model safety 331
Figure 6.1.10. U.K. and South Korea cohost AI safety summit in Seoul 332
Figure 6.1.11. China creates country's largest-ever state-backed investment fund to back its semiconductor industry 332
Figure 6.1.12. European Commission establishes AI Office 332
Figure 6.1.13. U.S. NIST unveils framework to help organizations identify and mitigate GenAI risks 333
Figure 6.1.14. U.S. State Department releases AI Risk Management Profile for Human Rights 333
Figure 6.1.15. U.K. withdraws £1.3B promised for technology and AI infrastructure 333
Figure 6.1.16. U.S. White House launches task force on AI data center infrastructure 334
Figure 6.1.17. California governor signs three bills on AI and elections communications 334
Figure 6.1.18. United Nations adopts Global Digital Compact to ensure an inclusive and secure digital future 334
Figure 6.1.19. California governor vetoes expansive AI legislation 335
Figure 6.1.20. U.S. judge blocks new California AI law over Kamala Harris deepfake 335
Figure 6.1.21. Saudi Arabia announces "Project Transcendence" 335
Figure 6.1.22. European Commission AI Office releases first draft of Code of Practice for General-Purpose AI 336
Figure 6.1.23. U.S. launches international AI safety network with global partners 336
Figure 6.1.24. U.S. increases export controls of semiconductor manufacturing equipment and software to China 336
Figure 6.1.25. U.N. Security Council debates uses of AI in conflicts and calls for global framework 337
Figure 6.2.1. Number of AI-related bills passed into law by country, 2016-24 338
Figure 6.2.2. Number of AI-related bills passed into law in 116 select geographic areas, 2016-24 339
Figure 6.2.3. Number of AI-related bills passed into law in select geographic areas, 2024 339
Figure 6.2.4. Number of AI-related bills passed into law in select geographic areas, 2016-24 (sum) 339
Figure 6.2.5. (Omit) 340
Figure 6.2.6. Number of congressional AI-related proposed bills and passed laws in the United States, 2016-24 341
Figure 6.2.7. Number of AI-related bills passed into law in select US states, 2024 342
Figure 6.2.8. Number of state-level AI-related bills passed into law in the United States by state, 2016-24 (sum) 342
Figure 6.2.9. Number of AI-related bills passed into law by all US states, 2016-24 343
Figure 6.2.10. (Omit) 344
Figure 6.2.11. Number of state-level laws enacted on AI-generated deepfakes in intimate imagery and elections in the United States, 2019-24 345
Figure 6.2.12. State-level laws regulating AI-generated deepfakes in elections in the US by state and status as of 2024 346
Figure 6.2.13. State-level laws regulating AI-generated deepfakes in intimate imagery in the US by state and status as of 2024 346
Figure 6.2.14. Number of mentions of AI in legislative proceedings in 75 select geographic areas, 2016-24 347
Figure 6.2.15. Number of mentions of AI in legislative proceedings by country, 2024 348
Figure 6.2.16. Number of mentions of AI in legislative proceedings by country, 2016-24 (sum) 348
Figure 6.2.17. Mentions of AI in legislative proceedings vs. AI-related bills passed into law in select countries, 2016-24 349
Figure 6.2.18. Mentions of AI in US committee reports by legislative session, 2001-24 350
Figure 6.2.19. Number of AI-related regulations in the United States, 2016-24 351
Figure 6.2.20. Number of AI-related regulations in the United Stated by agency, 2016-24 352
Figure 6.2.21. (Omit) 353
Figure 6.3.1. Public spending on AI-related contracts in select countries, 2013-23 (sum) 355
Figure 6.3.2. Number of AI-related contracts in select countries, 2013-23 (sum) 356
Figure 6.3.3. Median value of public AI-related contracts in select countries, 2013-23 356
Figure 6.3.4. Public spending on AI-related contracts per 100,000 inhabitants in select countries, 2013-23 (sum) 357
Figure 6.3.5. Public spending on AI-related contracts in select countries, 2023 358
Figure 6.3.6. Public spending on AI-related contracts in the United States and Europe, 2013-23 359
Figure 6.3.7. Difference in public spending on AI-related contracts between the United States and Europe, 2013-23 360
Figure 6.3.8. Public spending on AI-related contracts in top 5 European countries, 2013-23 361
Figure 6.3.9. Public spending on AI-related contracts (% of total) in the United States by funding agency, 2013-23 362
Figure 6.3.10. Public spending on AI-related contracts (% of total) in Europe by funding agency activity, 2013-23 363
Figure 6.3.11. US AI-related grants, 2013-23 364
Figure 6.3.12. Public spending on AI-related grants in the United States, 2013-23 364
Figure 6.3.13. Public spending on AI-related grants (% of total) by funding agency, 2013-23 365
Figure 7.1.1. (Omit) 370
Figure 7.2.1. Public high schools teaching foundational CS (% of total in state), 2024 371
Figure 7.2.2. Schools offering foundational CS courses by size, 2024 372
Figure 7.2.3. Schools offering foundational CS courses by geographic area, 2024 372
Figure 7.2.4. Schools offering foundational CS courses by free and reduced lunch student population, 2024 372
Figure 7.2.5. Access to foundational CS courses by race/ethnicity, 2024 373
Figure 7.2.6. Public high school enrollment in CS (% of students), 2024 373
Figure 7.2.7. Public high school enrollment in CS vs. national demographics by race/ethnicity, 2024 374
Figure 7.2.8. Public high school enrollment in CS vs. national demographics by subgroup, 2024 375
Figure 7.2.9. Number of AP computer science exams taken, 2007-23 376
Figure 7.2.10. AP computer science exams taken by race/ethnicity, 2007-23 376
Figure 7.2.11. AP computer science exams taken (% of total responding students) by race/ethnicity, 2007-23 377
Figure 7.2.12. AP computer science exam participation vs. national demographics by race/ethnicity, 2023 377
Figure 7.2.13. Adoption of AI-specific K-12 computer science standards by US state 378
Figure 7.2.14. Percentage of teachers who feel equipped to teach AI by grade level 379
Figure 7.2.15. AI concepts taught in CS classrooms by grade level 379
Figure 7.2.16. Time spent learning AI in CS classrooms by grade level 380
Figure 7.2.17. Availability of CS education by country, 2024 381
Figure 7.2.18. Change in access to CS education by continent, 2019 vs. 2024 382
Figure 7.2.19. AI4K12 guidelines organized around 5 Big Ideas in AI 383
Figure 7.3.1. New CS postsecondary graduates in the United States, 2013-23 385
Figure 7.3.2. CS postsecondary graduates in the United States by gender, 2023 385
Figure 7.3.3. CS vs. all postsecondary graduates in the United States by race/ethnicity (US residents only), 2023 386
Figure 7.3.4. Number of international CS master's students enrolled in US universities, 2022 387
Figure 7.3.5. Number of international CS PhD students enrolled in US universities, 2022 387
Figure 7.3.6. Number of institutions offering AI bachelor's and master's degrees in the US, 2013-23 388
Figure 7.3.7. New AI bachelor's and master's graduates in the United States, 2013-23 388
Figure 7.3.8. Top postsecondary institutions graduating students in AI in 2023 by degree type 389
Figure 7.3.9. New ICT short-cycle tertiary graduates by country, 2022 390
Figure 7.3.10. New ICT bachelor's graduates by country, 2022 391
Figure 7.3.11. New ICT master's graduates by country, 2022 391
Figure 7.3.12. New ICT PhD graduates by country, 2022 392
Figure 7.3.13. Percentage of new ICT postsecondary graduates who are female by country, 2022 393
Figure 8.1.1. Global opinions on products and services using AI (% of total), 2022-24 401
Figure 8.1.2. 'Products and wervices using AI have more benefits than drawbacks,' by country (% of tatal), 2022-24 402
Figure 8.1.3. Opinions about AI by country (% agreeing with statement), 2024 403
Figure 8.1.4. Global opinions about products and services using AI by country, 2024 404
Figure 8.1.5. Percentage point change in opinions about AI by country (% agreeing with statement), 2023-24 405
Figure 8.1.6. Percentage point change in opinions about AI by country (% agreeing with statement), 2022 vs. 2024 406
Figure 8.1.7. Global opinions on the perceived impact of AI on current jobs, 2024 407
Figure 8.1.8. Global opinions on whether AI will change how current jobs are done in the next five years (% agreeing with statement), 2023 vs. 2024 408
Figure 8.1.9. Global opinions on the potential of AI to improve life by country, 2024 409
Figure 8.1.10. Global opinion on the potential of AI to improve the job market vs. individual jobs, 2024 410
Figure 8.1.11. Global opinion on the potential of AI to improve time to get things done vs. individual jobs, 2024 410
Figure 8.1.12. US driver attitude toward self-driving vehicles, 2021-25 411
Figure 8.2.1. Local US officials' support for government regulation of AI by party and year 412
Figure 8.2.2. Local US officials' views on what AI policies would be beneficial for 2025-50 413
Figure 8.2.3. Local US officials' likelihood of making AI policy decisions by party and year 414
Figure 8.2.4. Local US officials' feeling adequately informed to make decisions about AI by party and year 415