로그아웃 하시겠습니까?

  • 주제별 국가전략
  • 전체

Bridging the Digital to Physical Divide: Evaluating LLM Agents on Benchtop DNA Acquisition

(디지털과 물리적 간극 연결: 벤치탑 DNA 획득에서 LLM 에이전트 평가)

목차

About This Report ii

Summary iv

Figures and Tables vii

Chapter 1. Introduction 1

Chapter 2. DNA Acquisition and Biosecurity Context 3

Threat Model and Biological Risk Chain 3

DNA acquisition as a bottleneck task 4

Chapter 3. LLM Agent Capability Evaluations 7

Agent Evaluations in Context 7

Designing Agent Tasks for Evaluation 11

Chapter 4. The Synthesis Task and Methodology 14

Task Description 14

Task Implementation 19

Scoring the Evaluation 21

Evaluation Execution 27

Chapter 5. Results 29

Task Performance Results 30

Protocol Autograding Results 33

Physical Validation Results 35

Discussion 36

Limitations of Our Approach 38

Chapter 6. Conclusion 41

Appendix A: Task Prompt Templates 43

ReAct Agent Prompts (eGFP) 43

Protocol Autograder Prompts 44

Appendix B: Expanded Segment Scorer Criteria 47

Appendix C: o3 Physical Validation Details 49

Appendix D: Narrative Review of Per-Model Task Performance 57

OpenAI Agents 57

Anthropic Agents 61

Gemini Agent 65

Appendix E: Biomni Agent Testing 69

Abbreviations 72

References 73

About the Authors 80

해시태그

#인공지능 #LLM #생명공학 #DNA획득 #생물안보 #디지털물리연계 #연구윤리

관련자료

AI 100자 요약·번역서비스

인공지능이 자동으로 요약·번역한 내용입니다.

Bridging the Digital to Physical Divide: Evaluating LLM Agents on Benchtop DNA Acquisition

(디지털과 물리적 간극 연결: 벤치탑 DNA 획득에서 LLM 에이전트 평가)