목차
About This Report ii
Summary iv
Figures and Tables vii
Chapter 1. Introduction 1
Chapter 2. DNA Acquisition and Biosecurity Context 3
Threat Model and Biological Risk Chain 3
DNA acquisition as a bottleneck task 4
Chapter 3. LLM Agent Capability Evaluations 7
Agent Evaluations in Context 7
Designing Agent Tasks for Evaluation 11
Chapter 4. The Synthesis Task and Methodology 14
Task Description 14
Task Implementation 19
Scoring the Evaluation 21
Evaluation Execution 27
Chapter 5. Results 29
Task Performance Results 30
Protocol Autograding Results 33
Physical Validation Results 35
Discussion 36
Limitations of Our Approach 38
Chapter 6. Conclusion 41
Appendix A: Task Prompt Templates 43
ReAct Agent Prompts (eGFP) 43
Protocol Autograder Prompts 44
Appendix B: Expanded Segment Scorer Criteria 47
Appendix C: o3 Physical Validation Details 49
Appendix D: Narrative Review of Per-Model Task Performance 57
OpenAI Agents 57
Anthropic Agents 61
Gemini Agent 65
Appendix E: Biomni Agent Testing 69
Abbreviations 72
References 73
About the Authors 80
해시태그
관련자료
AI 100자 요약·번역서비스
인공지능이 자동으로 요약·번역한 내용입니다.
Bridging the Digital to Physical Divide: Evaluating LLM Agents on Benchtop DNA Acquisition
(디지털과 물리적 간극 연결: 벤치탑 DNA 획득에서 LLM 에이전트 평가)
