로그아웃 하시겠습니까?

  • 주제별 국가전략
  • 전체

Cybersecurity risks of AI-generated code
(AI 생성 코드의 사이버 보안 위험)

□ 최근 대형 언어 모델(LLM)과 AI 시스템의 발전으로 컴퓨터 코드 생성 능력이 크게 향상됨. 이는 소프트웨어 개발에 긍정적인 영향을 미치지만, 동시에 AI 코드 생성이 보안상 여러 위험을 초래할 가능성이 있음. 연구에서는 AI가 생성한 코드에서 보안 취약점이 자주 발견되며, 이러한 문제는 AI 모델 자체의 공격 가능성과 미래 AI 시스템 훈련 과정에서 보안 취약 코드가 축적될 위험으로 이어질 수 있음을 강조함

□ AI 코드 생성 모델이 생성하는 코드의 상당 부분이 보안상 취약할 가능성이 높음. 연구진이 5개의 LLM을 평가한 결과, 절반 가까운 코드가 보안 취약점을 포함하고 있음이 확인됨. 이는 특히 메모리 관리, 포인터 사용, 입력 검증 등의 취약점이 많이 포함된 코드들이었음. 그러나 현재 AI 모델의 평가 기준은 코드의 기능적 정확성에 초점을 맞추고 있으며, 보안성은 충분히 고려되지 않고 있음

□ AI 코드 생성 모델 자체도 공격에 취약한 구조를 가질 수 있음. 예를 들어, 데이터 오염 공격을 통해 악의적인 코드가 학습 데이터에 포함될 경우, 모델이 이를 기반으로 보안에 취약한 코드를 생성할 가능성이 있음. 또한, AI 모델이 자동으로 외부 라이브러리나 패키지를 참조할 때, 잘못된 정보나 악성 코드가 포함될 가능성이 존재함. 따라서 AI 코드 생성 모델의 보안성을 강화하는 것이 필수적임

□ AI 코드 생성이 보편화됨에 따라, 보안 리스크가 확산될 가능성이 큼. AI가 생성한 코드가 오픈소스 저장소에 축적되면서, 이후 AI 모델이 이를 학습하고 보안 취약점이 점점 심화되는 악순환이 발생할 수 있음. 또한, 기업들이 AI 코드를 무분별하게 채택할 경우, 보안 점검을 거치지 않은 채 취약 코드가 대량으로 유입될 가능성이 있음. 이에 따라, AI 코드 생성 도구에 대한 명확한 보안 정책과 검증 절차가 필요함

□ 결론적으로, AI 코드 생성 모델의 생산성을 높이기 위해서는 보안성이 반드시 고려되어야 함. 기업과 개발자는 AI가 생성한 코드를 검증하는 체계를 마련하고, 정책 입안자들은 보안 표준을 설정하여 AI 코드가 안전하게 활용될 수 있도록 해야 함. 이를 통해 AI 코드 생성 모델이 생산성과 보안을 동시에 충족하는 방향으로 발전할 수 있음

목차

Title page

Contents

Executive Summary 2

Introduction 5

Background 6

What Are Code Generation Models? 6

Increasing Industry Adoption of AI Code Generation Tools 8

Risks Associated with AI Code Generation 10

Code Generation Models Produce Insecure Code 10

Models' Vulnerability to Attack 12

Downstream Impacts 14

Challenges in Assessing the Security of Code Generation Models 16

Is AI Generated Code Insecure? 19

Methodology 19

Evaluation Results 23

Unsuccessful Verification Rates 23

Variation Across Models 25

Severity of Generated Bugs 26

Limitations 27

Policy Implications and Further Research 29

Conclusion 33

Authors 34

Acknowledgments 34

Appendix A: Methodology 35

Appendix B: Evaluation Results 35

Endnotes 36

Table 1. Comparison of Models Used for Our Evaluation 20

Table 2. Examples of the 67 Prompts from the LLMSecEval Dataset Intended to Elicit Bugs in C Code 21

Figure 1. Number of Papers on Code Generation by Year 8

Figure 2. Code Generation Model Development Workflow and Its Cybersecurity Implications 13

Figure 3. Evaluation Pipeline 23

Figure 4. ESBMC Verification Statuses by Model (Post-rerun) 24

Figure 5. Types of Bugs Identified by ESBMC 26

Figure 6. Types of Errors in Code Snippets Generated by the Five Models 28

Table A1. Detailed Explanation of ESBMC Outputs 35

Table B1. Number of "Error" Code Snippets by Model Before and After Code Regeneration 35

해시태그

#사이버보안 # AI보안 # AI생성코드

관련자료

AI 요약·번역 서비스

인공지능이 자동으로 요약·번역한 내용입니다.

Cybersecurity risks of AI-generated code

(AI 생성 코드의 사이버 보안 위험)