목차
Title page 1
Contents 5
Foreword 3
Abstract 4
Executive summary 6
1. Introduction 8
2. Data collection mechanisms for AI training 9
3. Data collected directly from individuals and organisations 13
3.1. Provided data and observed data 13
3.1.1. Data provided by or observed from individuals when engaging directly with AI systems 13
3.1.2. Data (provided and/or observed) re-purposed for AI training 14
3.2. Voluntary data donations 14
4. Data collected from third-party providers 16
4.1. Data collected from third parties based on commercial arrangements 16
4.2. Data collected from third parties based on non-commercial practices 17
4.2.1. Open data arrangements 17
4.2.2. Scraping of publicly available internet data 19
5. Conclusions 20
References 21
Notes 26
Figures 5
Figure 2.1. The AI model development lifecycle 11
Figure 2.2. The personal, proprietary and public domains of data 11
Figure 2.3. Key data collection mechanisms for AI training 12
Figure 4.1. The degrees of data openness 17
Boxes 5
Box 4.1. The degrees of data openness 18
