로그아웃 하시겠습니까?

국회도서관 국가전략정보포털

상임위별

자료보기

과학기술정보
방송통신

문화체육관광

농림축산식품
해양수산

산업통상자원
중소벤처기업

기후에너지
환경노동

성평등가족

주제별 국가전략
전체

Toward comprehensive benchmarking of the biological knowledge of frontier large language models

(최첨단 대규모 언어 모델의 생물학적 지식에 대한 포괄적 벤치마킹을 향하여)

RAND Corporation 2025-11-25

목차

About This Report iii

Summary v

Figures and Tables viii

Chapter 1 Motivation 1

Intersection of AI with Biological and Chemical Threats 1

Organization of This Report 3

Chapter 2 Methods 5

Model Selection 5

Benchmark Selection 6

Supplemental Human Expert Baselining of WMDP 9

Technical Implementation 10

Chapter 3 Results and Discussion 14

Refusal Benchmarks14

Biology Knowledge Benchmarks Overview 16

Knowledge Benchmarks and Expert Baselines 17

WMDP Biology Saturation 26

Chapter 4 Challenges and Proposed Solutions 28

Challenge: Benchmarks Without Baselines Are Difficult to Interpret 28

Challenge: Existing Benchmarks Do Not Tie Neatly to Real-World Risks 29

Challenge: Minor Implementation Details Can Lead to Different Results 32

Chapter 5 Conclusion 35

Appendix A Benchmark Performance Data Visualizations 36

Appendix B Benchmark Details 38

Appendix C Model Details 44

Appendix D Evaluation Prompt Templates 46

Abbreviations 50

References 51

About the Authors 57

해시태그

#대규모언어모델 #LLM #생물학지식 #AI위험 #인공지능 #생물안보 #기술혁신

관련자료

Drei Jahre ChatGPT – Wo stehen wir und welche Zukunft hat die europäische Wirtschaft? 2025-11-26
Ten Takeaways from Stanford University’s Report on the State of AI 2023-04-13
인공지능 위험 사례집 2025-11-26
The United States Needs Data Centers, and Data Centers Need Energy, but That Is Not Necessarily a Problem 2025-11-24
한국형 소버린 AI 국가전략의 모색 2025-11-21
AI in Strategic Foresight: Reshaping Anticipatory Governance 2025-11-19

AI 100자 요약·번역서비스

인공지능이 자동으로 요약·번역한 내용입니다.

Toward comprehensive benchmarking of the biological knowledge of frontier large language models

(최첨단 대규모 언어 모델의 생물학적 지식에 대한 포괄적 벤치마킹을 향하여)

보고서 번역