Upload
stesha
View
65
Download
6
Embed Size (px)
DESCRIPTION
Network Driver in Linux 2.4. 潘仁義 CCU COMM. Overview. Auto Configuration. I/O access Byte ordering Address translation. Bus cycles. Bus. Direct Memory Access Power management. Operating System. Device. Driver framework Timer management Memory management - PowerPoint PPT Presentation
Citation preview
Network Driver in Linux 2.4Network Driver in Linux 2.4
潘仁義CCU COMM
OverviewOverview
Bus
DeviceOperating System
Auto Configuration
Direct Memory AccessPower management
I/O accessByte ordering
Address translation
Interrupt handling
Bus cycles
Driver frameworkTimer management
Memory managementRace condition handling (SMP)
CPU/Memory cache consistency
Device operations
OutlineOutline
Driver frameworkLinux network drivers
Device operationRTL8139 programming
Driver exampleA piece of code for 93C46 series
EEPROM, 93C46 64 x 16 bits, 93C66 256 x 16 bitspci_skeleton.c (for RTL8139)
Linux network driver frameworkLinux network driver frameworkConnecting to the Kernel (1/2)Connecting to the Kernel (1/2)
Module_loadingstruct net_device snull_dev = { init : snull_init, }; // 初始化函式if((result = register_netdev(snull_dev)))) printk(“error”);呼叫前 , 先設定 name 為“ eth%d”, 以便其配置 “ ethX”函式內部會呼叫 devinit()snull_init( )
Probe functionCalled when register_netdev()Usually avoid registering I/O and IRQ, delay until devopen() timeTo fill in the “dev” strcutureether_setup(dev)設定私有資料結構 “ priv”; 網路介面生命期與系統一樣長 , 可放統計資料
Module_unloadingkfree(priv); unregister_netdev (snull_dev);
Linux network driver frameworkLinux network driver frameworkConnecting to the Kernel (2/2)Connecting to the Kernel (2/2)
struct net_device {char name[IFNAMSIZ]; // eth%dunsigned long base_addr, unsigned char irq;unsigned char broadcast[], dev_addr[MAX_ADDR_LEN];unsigned short flags; // IFF_UP, IFF_PROMISC, IFF_ALLMULTIFunction pointers:
(*init) 初始化(*open) 開啟介面(*stop) 停用介面 (*do_ioctl)()(*tx_timeout) 逾時處理(*get_stats) 結算統計資訊(*hard_start_xmit) 送出封包(*set_multicast_list) 群播及 flag 變動處理
unsigned long trans_start, last_rx; // for watchdog and power managementstruct dev_mc_list *mc_list; // multicast address list
Linux network driver frameworkLinux network driver frameworkOpening and closingOpening and closing
在介面傳輸封包之前,必須先以 ifconfig 開啟介面,並賦予 IP 位址ifconfig 設定 IP 位址給介面時:
ioctl(SIOCSIFADDR) 設定軟體位址給介面Ioctl(SIOCSIFILAGS) 要求驅動程式開啟、關閉介面觸動 open 及 stop
open() 設法取得必要的系統資源 ( 佔領 IRQ, IObase, buffer)要求介面硬體起動讀出 MAC, 複製到 devdev_addr ( 也可作在 init 或 probe 時 )將 devdev_addr 設定至介面 MAC 暫存器中
stop()停止介面硬體歸還系統資源
Linux network driver frameworkLinux network driver framework Packet transmission: Packet transmission: 當核心需要送出資料封包時當核心需要送出資料封包時
將資料排入出境封包佇列 (outgoing queue)呼叫作業方法
hard_start_transmit(struct sk_buff *skb, struct net_device *dev)僅將封包交付網卡。網卡後續會再將封包傳送至網路 ( 例如 RTL8139)Spinlock_t xmit_lock; 只有在返回後才有可能再被呼叫實務上,於返回之後,網路卡仍忙著傳輸剛交付的封包。網卡緩衝區小,滿了必須讓核心知道,不接收新的傳輸要求。netif_stop_queue() 與 netif_wake_queue(),netif_start_queue()註 : 還有 Carrier loss detection/Watchdog 的 netif_carrier_on/off()跟 Hot-plugging/power management 的 netif_device_attach/detach()核心經手的每一封包,都是包裝成一個 struct sk_buff
socket buffer指向 sk_buff 的指標,通常取名為 skbskbdata 指向即將被送出的封包skblen 是該封包的長度,單位是 octet
Linux network driver frameworkLinux network driver frameworkTransmission queuing modelTransmission queuing model
netif_start_queue()netif_wake_queue() netif_stop_queue() netif_carrer_on()
netif_carrer_off()
netif_device_attach()netif_device_detach()
If ( present && carrier_ok && queue_stopped &&
( jiffies – trans_start ) > watchdog_timeo ) Then
Call tx_timeout( )更新統計,並設定使能繼續送封包
Present?
Queue stopped ?
Carrier ok ?
Packets from OS
Packets go to the LAN
Linux network driver frameworkLinux network driver frameworkPacket receptionPacket reception
封包接收事件通常是從網路硬體觸發中斷開始多半寫在 interrupt handler配置一個 sk_buff ,並交給核心內部的網路子系統Interrupt-based 較 polling 方式有效率
Example: snull_rx()skb = dev_alloc_skb(len+2); // 採用 GFP_ATOMIC, 可在 ISR 中用skb_reserve(skb, 2); // 16 byte align the IP fieldmemcpy(skb_put(skb, len), receive_packet, len); //skb_put() 參考 sk_buff填寫相關資訊
skbdev = dev;skbprotocol = eth_type_trans(skb, dev);skbip_summed = CHECKSUM_UNNECESSARY; /* 不必檢查 */CHECKSUM_HW( 硬體算了 )/NONE( 待算 , 預設 )/UNNECESSARY( 不算 )
netif_rx(skb); // 交給核心內部的網路子系統
Linux network driver frameworkLinux network driver frameworkThe interrupt handlerThe interrupt handler
Interrupt happen whenA new packet has arrivedTransmission of an outgoing packet is completedSomething happened: PCI bus error, cable length change, time out
Interrupt status register (ISR)Packet reception
Pass to the kernelPacket transmission is completed
Reset the transmit buffer of the interfaceStatistics
Linux network driver frameworkLinux network driver frameworkThe socket buffers (struct sk_buff)The socket buffers (struct sk_buff)
payloadheadroom tailroom
head data tail endlen
struct sk_buff *dev_alloc_skb(len) 配置void dev_kfree_skb(struct sk_buff *) 釋放An empty sk_buff
void skb_reserve(skb, len) 保留前頭空間unsigned char *skb_put(skb, len) 附加資料
unsigned char*skb_push(skb, len) 前置資料unsigned char *skb_pull(skb, len) 前抽資料
Linux network driver frameworkLinux network driver frameworkSetup receive mode and multicast accept listSetup receive mode and multicast accept list
Unicast, broadcast (all 1), multicast (bit0==1)Receive all, receive all multicast, receive a list of multicast address
Transmit the same as unicast
Receive Hardware filtering for a list of multicast addresses
void (*set_multicast_list)(dev)要接收的群播位址清單或是 dev->flags 有改變 , 會被核心呼叫struct dev_mc_list *mc_list; // int mc_count串列所有 dev 必須接收的所有群播位址IFF_PROMISC設立則進入『混雜模式』 ( 全收 )IFF_ALLMULTI收進所有群播封包
OutlineOutline
Driver frameworkLinux network drivers
Device operationRTL8139 programming
Driver exampleA piece of code for 93C46 series
EEPROM, 93C46 64 x 16 bits, 93C66 256 x 16 bitspci_skeleton.c (for RTL8139)
RTL8139 block diagramRTL8139 block diagram
Device operationDevice operationRTL8139(A/B) programmingRTL8139(A/B) programming
Packet transmission4 transmit descriptors in round-robinTransmit FIFO and Early Transmit
Packet receptionRing buffer in a physical continuous memoryReceive FIFO and FIFO Threshold
Hardware initializationCommand register (0x37)
Reset (4) / Transmit Enable (2) / Receive Enable (3) / Buffer empty (0)Transmit (Tx) Configuration Register (0x40~0x43)
Interframe Gap time ( 螃蟹卡 ) (25~24)Receive (Rx) Configuration Register (0x44~0x47)
Rx FIFO threshold (15~13)Accept Broadcast (3) / Multicast (2) / All (0, Promiscuous mode) packetRx buffer length (12~11)
Interrupt Mask Register (0x3C~0x3D)Software initialization (TxDescriptor and Ring buffer)
RTL8139 RTL8139 Packet transmissionPacket transmissionTransmit descriptorTransmit descriptor
Transmit start address (TSAD0-3)The physical address of packetThe packet must be in a continuous physical memory
Transmit status(TSD0-3)TOK(15R)
Set to 1 indicates packet transmission was completed successfully and no transmit underrun (14R) has occurred
OWN(13R/W)Set to 1 when the Tx DMA operation of this descriptor was completedThe driver must set this bit to 0 when the “Size” is written
Size(12~0R/W)The total size in bytes of the data in this descriptor
Early Tx Threshold(21~16R/W)When the byte count in the Tx FIFO reaches this, the transmit happens. From 000001 to 111111 in unit of 32 bytes (000000 = 8 bytes)
RTL8139 RTL8139 Packet transmissionPacket transmissionProcess of transmitting a packetProcess of transmitting a packet
1. copy the packet to a physically continuous buffer in memory2. Write the functioning descriptor
Address, Size, Early transmit threshold, Clear OWN bit (this starts PCI operation)
3. As TxFIFO meet threshold, the chip start to move from FIFO to line4. When the whole packet is moved to FIFO, the OWN bit is set to 15. When the whole packet is moved to line, the TOK(TSD) is set to 16. If TOK(IMR) is set, then TOK(ISR) is set and a interrupt is triggered7. Interrupt service routine called, driver should clear TOK(ISR)
Packet receptionPacket receptionRing bufferRing buffer1. Data goes to RxFIFO
coming from line2. Move to the buffer
when early receive threshold is meet.
Ring bufferphysical continuous
CBR (0x3A~3B R)the Current address of data moved to Buffer
CAPR (0x38~39 R/W)the pointer keeps Current Address of Pkt having been read
Status of receiving a packet
stored in front of the packet (packet header)
Packet receptionPacket receptionThe Packet Header (32 bits, i.e. 4 bytes)The Packet Header (32 bits, i.e. 4 bytes)
Bit 31~16: rx_size, including 4 bytes CRC in the tailpkt_size = rx_size - 4
Packet receptionPacket receptionPProcess of packet receive in detail
Data received from line is stored in the receive FIFOWhen Early Receive Threshold is meet, data is moved from FIFO to Receive BufferAfter the whole packet is moved from FIFO to Receive Buffer, the receive packet header (receive status and packet length) is written in front of the packet.
CBA is updated to the end of the packet. 4 byte alignment
CMD (BufferEmpty) is clear and ISR(ROK) is set.ISR routine called and then driver clear ISR(ROK) and update CAPR
cur_rx = (cur_rx + rx_size + 4 + 3) & ~3;NETDRV_W16_F (RxBufPtr, cur_rx - 16);
Packet header
Avoid overflow
OutlineOutline
Driver frameworkLinux network drivers
Device operationRTL8139 programming
Driver exampleA piece of code for 93C46 series
EEPROM, 93C46 64 x 16 bits, 93C66 256 x 16 bitspci_skeleton.c (for RTL8139)
EEPROM 93C46 operationsEEPROM 93C46 operations
93C46 Command Register (0x50 R/W)
A piece code for EEPROM 93C46 A piece code for EEPROM 93C46
11. /* Shift the read command bits out. */12. for (i = 4 + addr_len; i >= 0; i--) {13. int dataval = (read_cmd & (1 << i))
? EE_DATA_WRITE : 0;14. writeb (EE_ENB | dataval, ee_addr);15. eeprom_delay ();16. writeb (EE_ENB | dataval | EE_SHIFT_CLK,
ee_addr);17. eeprom_delay ();18. }19. writeb (EE_ENB, ee_addr);20. eeprom_delay ();
21. for (i = 16; i > 0; i--) {22. writeb (EE_ENB | EE_SHIFT_CLK, ee_addr);23. eeprom_delay ();24. retval = (retval << 1) | ((readb (ee_addr) &
EE_DATA_READ) ? 1 : 0);25. writeb (EE_ENB, ee_addr);26. eeprom_delay ();27. }28. /* Terminate the EEPROM access. */29. writeb (~EE_CS, ee_addr);30. eeprom_delay ();31. return retval;32. }
#define EE_SHIFT_CLK 0x04 /* EEPROM shift clock. */#define EE_CS 0x08 /* EEPROM chip select. */#define EE_DATA_WRITE 0x02 /* EEPROM chip data in. */#define EE_DATA_READ 0x01 /* EEPROM chip data out. */#define EE_ENB (0x80 | EE_CS)
#define eeprom_delay() readl(ee_addr)
/* EEPROM commands include the alway-set leading bit */#define EE_WRITE_CMD (5)#define EE_READ_CMD (6)#define EE_ERASE_CMD (7)
1. static int __devinit read_eeprom (2. void *ioaddr, int location, int addr_len)3. {4. int i;5. unsigned retval = 0;6. void *ee_addr = ioaddr + Cfg9346;7. int read_cmd = location |
(EE_READ_CMD << addr_len);
8. writeb (EE_ENB & ~EE_CS, ee_addr);9. writeb (EE_ENB, ee_addr);10. eeprom_delay ();
addr_len = read_eeprom (ioaddr, 0, 8) == 0x8129 ? 8 : 6;for (i = 0; i < 3; i++) ((u16 *) (dev->dev_addr))[i] = le16_to_cpu (read_eeprom (ioaddr, i + 7, addr_len));
OutlineOutline
Driver frameworkLinux network drivers
Device operationRTL8139 programming
Driver exampleA piece of code for 93C46 series
EEPROM, 93C46 64 x 16 bits, 93C66 256 x 16 bitspci_skeleton.c (for RTL8139)
#include<> of the RTL8139#include<> of the RTL8139
pci-skeleton.c
module.h kernel.h
pci.h
init.h
netdevice.hetherdevice.h
delay.h
mii.h
asm/io.h
skbuff.h
Definitions for Ethernet eth_type_trans()alloc_ethdev()
Definitions for struct net_deviceregister_netdev()netif_*()skbuff.h
Definitions of I/O port read/write andioremap()
PCI defines and prototypespci_alloc_consistent()pci_resource_*()pci_request_regions()pci_set_master()pci_read_config_word()(err)Definitions for MII_ADVERTISE, MII_LPA
ADVERTISE_FULL, LPA_100FULL…
barrier()printk()byteorder.h
udelay() definition給 multicast 算 ether_crc()
module_init()module_exit()spinlock.hconfig.hMOD_*MODULE_*()
crc32.h被間接引入sched.h (irq,jiffies,capable)slab.htime.hspinlock.hasm/atomic.h
PCI BUS
Network Device
Operating System
Driver structure of the RTL8139Driver structure of the RTL8139
pci_module_init() / pci_unregister_driver()static struct pci_driver netdrv_pci_driver = {
name: "netdrv",id_table: netdrv_pci_tbl,probe: netdrv_init_one,remove: netdrv_remove_one,
#ifdef CONFIG_PMsuspend: netdrv_suspend,resume: netdrv_resume,
static struct pci_device_id netdrv_pci_tbl[] __devinitdata = {{0x10ec, 0x8139, PCI_ANY_ID, PCI_ANY_ID, 0, 0, RTL8139 },MODULE_DEVICE_TABLE (pci, netdrv_pci_tbl);
pci_device_id
driver_data(Private, Sq# here)
PCI device probe functionPCI device probe functionnetdrv_init_one()netdrv_init_one()
netdrv_init_one()
call netdrv_init_board()
to get net_device dev, void *ioaddr
Initial net_device devSet up dev_addr[], irq, base_addrSet up method:
dev->open,dev->hard_start_transmit,dev->stop, dev->get_stats, dev->set_multicast_list, dev->do_ioctl, dev->tx_timeout
struct pci_dev *pdev, struct pci_device_id *ent
Linux invoke when probing
netdrv_init_board()
dev = alloc_etherdev(sizeof())
pci_enable_device (pdev);pci_request_regions (pdev, “pci-sk");pci_set_master (pdev);
mmio_start = pci_resource_start (pdev, 1);ioaddr = ioremap (mmio_start, len);Soft reset the chip.NETDRV_W8 (ChipCmd, (NETDRV_R8 (ChipCmd)
& ChipCmdClear) | CmdReset);
identify chip attached to board
register_netdev (dev); // ethX登記 I/O port and memory
NETDRV_W?()NETDRV_W?()
/* write MMIO register, with flush *//* Flush avoids rtl8139 bug w/ posted MMIO writes */#define NETDRV_W8_F(reg, val8)do { writeb ((val8), ioaddr + (reg)); readb (ioaddr + (reg)); } while (0)#define NETDRV_W16_F(reg, val16)do { writew ((val16), ioaddr + (reg)); readw (ioaddr + (reg)); } while (0)#define NETDRV_W32_F(reg, val32)do { writel ((val32), ioaddr + (reg)); readl (ioaddr + (reg)); } while (0)#define NETDRV_W8 NETDRV_W8_F#define NETDRV_W16 NETDRV_W16_F#define NETDRV_W32 NETDRV_W32_F#define NETDRV_R8(reg) readb (ioaddr + (reg))#define NETDRV_R16(reg) readw (ioaddr + (reg))#define NETDRV_R32(reg) ((unsigned long) readl (ioaddr + (reg)))
Device methodsDevice methods
dev->openint netdrv_open (struct net_device *dev);dev->hard_start_transmitint netdrv_start_xmit (struct sk_buff *skb, struct net_device *dev);dev->stopint netdrv_close (…);dev->get_statsstruct net_device_stats * netdrv_get_stats (struct net_device *);dev->set_multicast_listvoid netdrv_set_rx_mode (…);dev->do_ioctlint netdrv_ioctl (struct net_device *dev, struct ifreq *rq, int cmd);dev->tx_timeoutvoid netdrv_tx_timeout (struct net_device *dev);
Up up……Up up……netdrv_opennetdrv_open()()
request_irq (dev->irq, netdrv_interrupt, SA_SHIRQ, dev->name, dev)
tx_bufs = pci_alloc_consistent(pdev, TXBUFLEN, &tx_bufs_dma);rx_ring = pci_alloc_consistent(pdev, RXBUFLEN, &rx_ring_dma);
netdrv_init_ring (dev);
netdrv_hw_start (dev);
Set the timer to check for link beat
netdrv_open()Soft reset the chip
/* Restore our idea of the MAC address. */NETDRV_W32_F (MAC0 + 0, cpu_to_le32 (*(u32 *) (dev->dev_addr + 0)));NETDRV_W32_F (MAC0 + 4, cpu_to_le32 (*(u32 *) (dev->dev_addr + 4)));
NETDRV_W8_F (ChipCmd, (NETDRV_R8 (ChipCmd) & ChipCmdClear) | CmdRxEnb | CmdTxEnb);
Setting RxConfig and TxConfig
NETDRV_W32_F (RxBuf, tp->rx_ring_dma);
init Tx buffer DMA addresses
netdrv_set_rx_mode (dev);NETDRV_W16_F (IntrMask, netdrv_intr_mask);
netif_start_queue (dev);
netdrv_hw_start (dev)
Setup receive mode and multicast Setup receive mode and multicast hashtablehashtable(*set_multicast_list)()(*set_multicast_list)()netdrv_set_rx_mode()netdrv_set_rx_mode()
if (flags & IFF_PROMISC) AcceptBroadcast | AcceptMulticast | AcceptMyPhys | AcceptAllPhymc_filter[1] = mc_filter[0] = 0xffffffff
else if ((mc_count > multicast_filter_limit) || (flags & IFF_ALLMULTI))
AcceptBroadcast | AcceptMulticast | AcceptMyPhysmc_filter[1] = mc_filter[0] = 0xffffffff
elseAcceptBroadcast | AcceptMulticast | AcceptMyPhys
mclist[0].dmi_addrmclist[1].dmi_addrmclist[2].dmi_addr
ether_crc()31 30 29 28 27 26 25...0
63 62 1 0
Transmit a packetTransmit a packetnetdrv_start_xmit()netdrv_start_xmit()
if (skb->len < ETH_ZLEN) skb = skb_padto(skb, ETH_ZLEN);
entry = atomic_read (&cur_tx) % NUM_TX_DESC;
tx_info[entry].skb = skb;
memcpy (tx_buf[entry], skb->data, skb->len);
NETDRV_W32 (TxStatus[entry], tx_flag | skb->len);
dev->trans_start = jiffies;
atomic_inc (&cur_tx);if ((atomic_read (&cur_tx) - atomic_read (&dirty_tx)) >= NUM_TX_DESC)
netif_stop_queue (dev);
netdrv_start_xmit()
dirty_tx
0 1 2 3 0 1 2
cur_tx
Interrupt handlingInterrupt handlingnetdrv_interrupt()netdrv_interrupt()
spin_lock (&tp->lock);status = NETDRV_R16 (IntrStatus);NETDRV_W16_F (IntrStatus, status); // Acknowledge
Spec says, “The ISR bits are always set to 1 if the condition is present. ”Spec says, “Reading the ISR clears all. Writing to the ISR has no effect.”
if (status & (PCIErr | PCSTimeout | RxUnderrun | RxOverflow |RxFIFOOver | TxErr | RxErr))
netdrv_weird_interrupt (dev, tp, ioaddr, status, link_changed);if (RxOK | RxUnderrun | RxOverflow | RxFIFOOver)
netdrv_rx_interrupt (dev, tp, ioaddr);if (status & (TxOK | TxErr))
netdrv_tx_interrupt (dev, tp, ioaddr);spin_unlock (&tp->lock);
0 1 1 1 0 0 1 1
0 0 1 1 0 0 1 0
Interrupt
ISR
IMR
Interrupt handlingInterrupt handlingnetdrv_tx_interrupt(dev, tp, ioaddr)netdrv_tx_interrupt(dev, tp, ioaddr)
1. dirty_tx = atomic_read (&tp->dirty_tx); 2. cur_tx = atomic_read (&tp->cur_tx); 3. tx_left = cur_tx - dirty_tx;4. while (tx_left > 0) {5. int entry = dirty_tx % NUM_TX_DESC;6. int txstatus = NETDRV_R32 (TxStatus[entry]);7. if (!(txstatus & (TxStatOK | TxUnderrun | TxAborted))) break; /* It still hasn't been Txed */8. if (txstatus & (TxOutOfWindow | TxAborted)) { /* There was an major error, log it. */9. tp->stats.tx_errors++;10. } else {11. if (txstatus & TxUnderrun) /* Add 64 to the Tx FIFO threshold. */12. tp->tx_flag += 0x00020000; 13. tp->stats.tx_bytes += txstatus & 0x7ff;14. tp->stats.tx_packets++;15. }16. dev_kfree_skb_irq (tp->tx_info[entry].skb);17. tp->tx_info[entry].skb = NULL;18. dirty_tx++;19. if (netif_queue_stopped (dev))20. netif_wake_queue (dev);21. cur_tx = atomic_read (&tp->cur_tx);22. tx_left = cur_tx - dirty_tx;23. }24. atomic_set (&tp->dirty_tx, dirty_tx);
Interrupt handling Interrupt handling Packet receptionPacket receptionnetdrv_rx_interrupt (dev,tp, ioaddr)netdrv_rx_interrupt (dev,tp, ioaddr)
1. rx_ring = tp->rx_ring;2. cur_rx = tp->cur_rx;3. while ((NETDRV_R8 (ChipCmd) & RxBufEmpty) == 0) {4. ring_offset = cur_rx % RX_BUF_LEN;5. rx_status = le32_to_cpu (*(u32 *) (rx_ring + ring_offset));6. rx_size = rx_status >> 16;7. pkt_size = rx_size - 4;8. skb = dev_alloc_skb (pkt_size + 2);9. skb->dev = dev;10. skb_reserve (skb, 2); /* 16 byte align the IP fields. */11. eth_copy_and_sum (skb, &rx_ring[ring_offset + 4], pkt_size, 0);12. skb_put (skb, pkt_size);13. skb->protocol = eth_type_trans (skb, dev);14. netif_rx (skb);15. dev->last_rx = jiffies;16. tp->stats.rx_bytes += pkt_size;17. tp->stats.rx_packets++;18. cur_rx = (cur_rx + rx_size + 4 + 3) & ~3;19. NETDRV_W16_F (RxBufPtr, cur_rx - 16);20. }21. tp->cur_rx = cur_rx; Status packet CRC