View
429
Download
7
Category
Preview:
DESCRIPTION
Trace the Linux kernel code of AMD HSA KFD driver. Source code: https://github.com/HSAFoundation/HSA-Drivers-Linux-AMD
Citation preview
HSA Kernel Code(KFD v0.6)
Advisor: 徐慰中教授Student: 黃昱儒2014/7/25
Agenda
● Introduction to HSAo hUMAo User Level Queueing
● HSA Drivero Concepts
▪ Flow Overview▪ User & Hardware Queues
o Source Code Detail● IOMMU
o Concepts▪ GCR3▪ PPR
o Source Code Detail
hUMA
User Level Queuing - Before HSA
User Level Queuing
Application 1Queue 1
HSA Device
Application 1Queue 2
Application 3Queue 1
Application 3Queue 1
HSA device access application’s ring
Application kick doorbell
IOMMU address translation (VA->PA)
1. AQL Packet2. Ring 3. Doorbell
HSA Software Stack
HSA Software Stack
HSA-aware Kernel
KFD IOMMU Driver
Runtime Library
● open(“/dev/kfd”)● ioctl(KFD_IOC_SET_MEMORY_POLICY)● ioctl(KFD_IOC_CREATE_QUEUE)● ioctl(KFD_IOC_DESTROY_QUEUE)
Application
HSA Device IOMMU
Agenda
● Introduction to HSAo hUMAo User Level Queueing
● HSA Drivero Concepts
▪ Flow Overview▪ User & Hardware Queues
o Source Code Detail● IOMMU
o Concepts▪ GCR3▪ PPR
o Source Code Detail
Concepts - HSA Run Flow
Create user queuesCreate HW queue with user
queue information
Enqueu AQL packets, kick doorbell, and wait
signal
Nothing
Application finish and destroy queues
Release HW queue
Application KFD Driver
Initialization
Computation
Finish
User - HW interaction
Scheduled Policy
1. Hardware scheduler and allows oversubscription (more queues than HW slots)
2. HW scheduling but does not allow oversubscription, so create_queue requests fail when we run out of HW slots
3. Not use HW scheduling, so the driver manually assigns queues to HW slots by programming registers
HSA GPU’s configuration register mmio address
Free hardware queue_id bitmap
doorbell
ring_base_address
pasid=0queue_id=0
doorbell
ring_base_address
pasid=0queue_id=1
doorbell
ring_base_address
pasid=1queue_id=0
doorbell
ring_base_address
pasid=1queue_id=1
queue acquire register
Physical Address
Software Scheduler
(pipe, queue)
HSA GPU’s configuration register mmio address
doorbell
ring_base_address
queue acquire register
Physical Address
Hardware Scheduler
(pipe=4, queue=0)
kernel_queue
Hardware Scheduler - No Oversubscription
IT_RUN_LIST
run_list
PM4 Packet (Type3)
IT_MAP_PROCESS
page_table_basepasid
sh_mem_config
PM4 Packet (Type3)
IT_MAP_QUEUES
mqd_addr(Memory Queue
Descriptoy)
PM4 Packet (Type3)
3 Processes
Hardware Scheduler - Oversubscription
IT_RUN_LIST
run_list
PM4 Packet (Type3)
IT_MAP_PROCESS
page_table_basepasid
sh_mem_config
PM4 Packet (Type3)
IT_MAP_QUEUES
mqd_addr(Memory Queue
Descriptoy)
PM4 Packet (Type3)
IT_RUN_LIST
run_list
PM4 Packet (Type3)
Per Application
Per Device
Per HW Queue
Only for HW scheduling
IOCTL Command Provided by KFD
● KFD_IOC_CREATE_QUEUEo Create hardware queue from application’s information (ex: ring base address)
● KFD_IOC_DESTROY_QUEUEo Release hardware queue
● KFD_IOC_UPDATE_QUEUE● KFD_IOC_SET_MEMORY_POLICY
o Set cache coherent policy● KFD_IOC_GET_CLOCK_COUNTERS
o Get GPU clock counter● KFD_IOC_GET_PROCESS_APERTURES
o Get apertures information of GPU● KFD_IOC_PMC_ACQUIRE_ACCESS● KFD_IOC_PMC_RELEASE_ACCESS
o Exclusive access for performance counters
HSA Driver Flow
● System intialization○ module_init○ device_init (Called by radeon)
● Application open “/dev/kfd” device
● Application send ioctl○ KFD_IOC_SET_MEMORY_POLICY○ KFD_IOC_CREATE_QUEUE
● Application send ioctl○ KFD_IOC_DESTROY_QUEUE
● Application termination
module_init(kfd_module_init)
● radeon_kfd_pasid_inito Initialize PASID bitmap
● radeon_kfd_chardev_inito register_chrdev: /dev/kfdo kfd_ops
▪ Define open, ioctl member function
kgd2kfd_device_init
● radeon_kfd_doorbell_init(kfd);● radeon_kfd_interrupt_init(kfd);● amd_iommu_set_invalidate_ctx_cb(kfd->pdev,
iommu_pasid_shutdown_callback);● device_queue_manager_init(kfd);
o dqm->initialize● dqm->start(kfd->dqm);
dqm->initialize For KFD_SCHED_POLICY_NO_HWS*
● Prepare pipe, queue bitmap
kfd_open
● radeon_kfd_create_process(current)o Create kfd_processo Assign PASID
KFD_IOC_SET_MEMORY_POLICY
● Two policyo cache_policy_coherento cache_policy_noncoherent
● Okra o default policy=cache_policy_coherento alternate policy=cache_policy_noncoherent
radeon_kfd_bind_process_to_device
● Called when user application send ioctl command
● amd_iommu_bind_pasid()o Register iommu with this kfd_process
KFD_IOC_CREATE_QUEUE
● Create queue with informations from userspace
● pqm_create_queue● Return queue_id and doorbell_address to
userspaceo queue_id is per kfd_processo doorbell_address map to device mmio address
pqm_create_queue
● find_available_queue_sloto Assign qid (per kfd_process)
● dqm->register_processo Register process to dqm (device queue manager)
● create_cp_queueo Create with queue_properties get from applicationo Map doorbell mmio address to application
● dqm->create_queue● dqm->execute_queue
dqm->create_queue For KFD_SCHED_POLICY_NO_HWS
● init_mqd (memory queue descriptor)o Store queue configuration from application
● Find unused (pipe, queue) from dqm (device queue manager)o If no, return -EBUSYo Maximum = 56
dqm->execute_queue For KFD_SCHED_POLICY_NO_HWS
● Write queue configuration to device● load_mqd
o ring_base_addro doorbell_offseto queue_priorityo ...
HSA GPU’s configuration register mmio address
Free hardware queue_id bitmap
queue select register
doorbell
ring_base_address
pasid=0queue_id=0
doorbell
ring_base_address
pasid=0queue_id=1
doorbell
ring_base_address
pasid=1queue_id=0
doorbell
ring_base_address
pasid=1queue_id=1
Each process can have up to 1024 queues
Physical Address
(pipe, queue)
kgd2kfd_device_init
● radeon_kfd_doorbell_init(kfd);● radeon_kfd_interrupt_init(kfd);● device_iommu_pasid_init(kfd);● kfd_topology_add_device(kfd);● amd_iommu_set_invalidate_ctx_cb(kfd->pdev,
iommu_pasid_shutdown_callback);● device_queue_manager_init(kfd);
o dqm->initialize● dqm->start(kfd->dqm);
dqm->start For KFD_SCHED_POLICY_HWS*
● pm_init (packet manager)● kernel_queue_init
o kernel_queue doorbello kernel_queue ring addresso load_mqd to write kernel_queue configuration to
device
pqm_create_queue
● find_available_queue_sloto Assign qid (per kfd_process)
● dqm->register_processo Register process to dqm (device queue manager)
● create_cp_queueo Create with queue_properties get from applicationo Map doorbell mmio address to application
● dqm->create_queue● dqm->execute_queue
dqm->create_queue ForKFD_SCHED_POLICY_HWS*
● init_mqd (memory queue descriptor)o Store queue configuration from application
dqm->execute_queue ForKFD_SCHED_POLICY_HWS*
● dqm->destroy_queues● pm_send_runlist
o pm_create_runlist_ib▪ Construct pm4 packet of MAP_PROCESS and
MAP_QUEUES type● Packet contains application’s ring address
o pm->kernel_queue->acquire_packet_buffer▪ Get a not used entry of kernel_queue
o pm_create_runlist▪ Construct pm4 packet of RUN_LIST type
o pm->kernel_queue->submit_packet▪ Kick kernel queue’s doorbell
Hardware Scheduler - No Oversubscription
IT_RUN_LIST
run_list
PM4 Packet (Type3)
IT_MAP_PROCESS
page_table_basepasid
sh_mem_config
PM4 Packet (Type3)
IT_MAP_QUEUES
mqd_addr(Memory Queue
Descriptoy)
PM4 Packet (Type3)
3 Processes
Hardware Scheduler - Oversubscription
IT_RUN_LIST
run_list
PM4 Packet (Type3)
IT_MAP_PROCESS
page_table_basepasid
sh_mem_config
PM4 Packet (Type3)
IT_MAP_QUEUES
mqd_addr(Memory Queue
Descriptoy)
PM4 Packet (Type3)
IT_RUN_LIST
run_list
PM4 Packet (Type3)
● Prepare (pipe, queue) bitmapdqm->initialize
dqm->start
● Create kfd_process● Assign PASID
kfd_open
● Get queue_id● Map doorbell to application
ioctl(CREATE_QUEUE)
● init_mqd● Find unused (pipe, queue) to
assign HW queue_id
dqm->create_queue
● Write queue configuration to device
dqm->execute_queue
dqm->initialize
● pm_init● kernel_queue_init
dqm->start
● Create kfd_process● Assign PASID
kfd_open
● init_mqddqm->create_queue
● Create pm4 packet ● Kick kernel_queue’s doorbell
dqm->execute_queue
● Get queue_id● Map doorbell to application
ioctl(CREATE_QUEUE)
Software Scheduling HardwareScheduling
Application Computation ...
● HW has ring_base_addr userspace addresso Application enqueue AQL packet and wait signal
● Application has HW doorbell mmio addresso Use to kick hardware
● Driver do nothing● Until application send
ioctl(KFD_IOC_DESTROY_QUEUE) or application finish
Haredware Queue Deactivation
1. Application send ioctl(KFD_IOC_DESTROY_QUEUE)
2. Task exit notifier
Haredware Queue Deactivation (1)
● ioctl(KFD_IOC_DESTROY_QUEUE)● pqm_destroy_queue
o dqm->destroy_queueo Restore queue, pipe bitmapo dqm->execute_queues(dqm);
dqm->destroy_queue For KFD_SCHED_POLICY_NO_HWS
● destroy_mqdo acquire_queue(kgd, pipe_id, queue_id);o write_register(kgd,
CP_HQD_DEQUEUE_REQUEST, DEQUEUE_REQUEST_DRAIN);
dqm->destroy_queue For KFD_SCHED_POLICY_HWS*
● dqm->destroy_queueso pm_send_unmap_queue
▪ Send a pm4 packet of UNMAP_QUEUESo pm_send_query_status(KFD_FENCE_COMPLETE
D)
Haredware Queue Deactivation (2)
● Task exit notifier will call iommu_pasid_shutdown_callbacko Register in kgd2kfd_device_init ->amd_iommu_set_invalidate_ctx_cbo Will be called in mmu_notifier’s release function
(mmu_notifier is registered in radeon_kfd_bind_process_to_device
->amd_iommu_bind_pasid)
iommu_pasid_shutdown_callback
● pqm_destroy_queueo dqm->destroy_queueo Restore queue, pipe bitmapo dqm->execute_queues(dqm);
Agenda
● Introduction to HSAo hUMAo User Level Queueing
● HSA Drivero Concepts
▪ Flow Overview▪ User & Hardware Queues
o Source Code Detail● IOMMU
o Concepts▪ GCR3▪ PPR
o Source Code Detail
Introduction to IOMMU
● User application send AQL packet into ring address which is virtual address
● Device accessing need translate VA to PA
DoorbellRing
Address
HSA GPU
Device table
PASID=2
GCR3
Assign this entry with kfd_process->mm->pgd
Physical Address
PRI & PPR
● The operating system is usually required to pin memory pages used for I/O.
● IOMMU Provide mechnism to let peripheral to use unpinned pages for I/O.
● Only support in AMD IOMMU_v2
PRI & PPR
● PRI(page request interface)o peripheral request memory management service
from a host OS (eg, page fault service for peripheral)o Issued by peripheral
● PPR(peripheral page service request)o When IOMMU receives a valid PRI request, it
creates a PPR message in request log to request changes to virtual address space
o Issued by IOMMU as interrupt
● Use to request IO page table changeo IOMMU driver can register PPR notifier
module_init(amd_iommu_v2_init)
● amd_iommu_register_ppr_notifier(&ppr_nb);o PPR callback
▪ ppr_notifier function
Set IOMMU With PASID
● amd_iommu_bind_pasid● Called when kfd_process create
o mmu_notifier_register(&pasid_state->mn, pasid_state->mm);
o amd_iommu_domain_set_gcr3(dev_state->domain, pasid, __pa(pasid_state->mm->pgd));
HSA GPU
Device table
PASID=2
GCR3
Assign this entry with kfd_process->mm->pgd
PRI & PPR Flow
Peripheral issue PRI to IOMMU
IOMMU write PPR request to PPR log(log contains fault address, pasid,
device_id, tag, flags)
IOMMU send interrupt to CPU
PPR FlowWhen irq comes
ppr_notifier
readl(iommu->mmio_base + MMIO_STATUS_OFFSET);
if (status & MMIO_STATUS_PPR_INT_MASK)
Register in amd_iommv_v2_init
do_fault
do_fault
● get_user_pages() API to pin fault pages into memoryo mm_struct, fault_addr
Flow Review
HSA-aware Kernel
KFD IOMMU Driver
Runtime Library
● open(“/dev/kfd”)● ioctl(KFD_IOC_SET_MEMORY_POLICY)● ioctl(KFD_IOC_CREATE_QUEUE)● ioctl(KFD_IOC_DESTROY_QUEUE)
Application
HSA Device IOMMU
Q&AThanks!
Reference
● https://github.com/HSAFoundation/HSA-Drivers-Linux-AMD
● http://www.hsafoundation.com/standards/
Recommended