Tuesday, June 26, 2007

Server And Desktop Topologies Are Host-Centric

a typical desktop or server platform is somewhat vertical. It has one or more processors at the top of the topology, the I/O subsystem at the bottom, and main system DRAM memory in the middle acting as a holding area for processor code and data as well as the source and destination for I/O DMA transactions performed on behalf of the host processor(s). The host processor plays the central role in both device control and in processing data; this is sometimes referred to as managing both the control plane and the data plane.

HyperTransport works well in this dual role because of its bandwidth and the fact that the protocol permits control information including configuration cycles, error handling events, interrupt messages, flow control, etc. to travel over the same bus as data — eliminating the need for a separate control bus or additional sideband signals.

Upstream And Downstream Traffic

There is a strong sense of upstream and downstream data flow in server and desktop systems because very little occurs in the system that is not under the direct control of the processor, acting through the host bridge. Nearly all I/O initiated requests move upstream and target main memory; peer-peer transactions between I/O devices are the infrequent exception.

Storage Semantics In Servers And Desktops

Without the addition of networking extensions, HyperTransport protocol follows the conventional model used in desktop and server busses (CPU host bus, PCI, PCI-X, etc.) in which all data transfers are associated with memory addresses. A write transaction is used to store a data value at an address location, and a read transaction is used to later retrieve it. This is referred to as associating storage semantics with memory addresses. The basic features of the storage semantics model include:

Targets Are Assigned An Address Range In Memory Map

At boot time, the amount of DRAM in the system is determined and a region at the beginning of the system address map is reserved for it. In addition, each I/O device conveys its resource requirements to configuration software, including the amount of prefetchable or non-prefetchable memory-mapped I/O address space it needs in the system address map. Once the requirements of all target devices are known, configuration software assigns the appropriate starting address to each device; the target device then "owns" the address range between the start address and the start address plus the request size.

Each Byte Transferred Has A Unique Target Address

In storage semantics, each data packet byte is associated with a unique target address. The first byte in the data packet payload maps to the start address and successive data packet bytes are assumed to be in sequential addresses following the start address.

The Requester Manages Target Addresses

An important aspect of storage semantics is the fact that the requester is completely responsible for managing transaction addresses within the intended target device. The target has no influence over where the data is placed during write operations or retrieved in read operations.

In HyperTransport, the requester generates request packets containing the target start address, then exchanges packets with the target device. The maximum packet data payload is 64 bytes (16 dwords). Transfers larger than 64 bytes are comprised of multiple discrete transactions, each to an adjusted start address. Using HyperTransport's storage semantics, an ordered sequence of transactions may be initiated using posted writes or including a non-zero SeqID field in the non-posted requests, but there is no concept of streaming data, per se.

Storage Semantics Work Fine In Servers And Desktops

As long as each requester is programmed to know the addresses it must target, managing address locations from the initiator side works well for general purpose data PIO, DMA, and peer-peer exchanges involving CPU(s), memory and I/O devices. When the target is prefetchable memory, storage semantics also help support performance enhancements such as write-posting, read pre-fetching, and caching — all of which depend on a requester having full control of target addresses.

1.04 Protocol Optimized For Host-Centric Systems

Because the HyperTransport I/O Link Protocol was initially developed as an alternative to earlier server and desktop bus protocols that use storage semantics (e.g. PCI), the 1.04 revision of the protocol is optimized to improve performance while maintaining backwards compatibility in host-centric systems:

  1. The strongly ordered producer-consumer model used in PCI transactions which guarantees flag and data coherence regardless of the location of the producer, consumer, flag location, or data storage location is available in the HyperTransport protocol.

  2. Virtual channel ordering may optionally be relaxed in transfers where the full producer-consumer model is not required.

  3. The strong sense of upstream and downstream traffic on busses such as PCI is also preserved in HyperTransport. Programmed I/O (PIO) transactions move downstream from CPU to I/O device via the host bridge. I/O bus master transactions move upstream towards main memory.

  4. Direct peer-peer transfers are not supported in the 1.04 revision of the HyperTransport I/O Link Specification; requests targeting interior devices must travel up to the host bridge, then be reissued (reflected) back downstream towards the target.

All of the above features work well for what they are intended to do: support a host-centric system in which control and data processing functions are both handled by the host processor(s), and I/O devices perform DMA data transfers using main system memory as a source and sink for data.

Some Systems Are Not Host-Centric

Unlike server and desktop computers, some processing applications do not lend themselves well to a host-centric topology. This includes cases where there are multiple levels of processing, complex look-up functions, protocol translation, etc. In these cases, a single processor (or even multiple CPUs on a host bus) can quickly become a bottleneck. Often what works more effectively is to assign control functions to a host processor and distribute data processing functions across multiple co-processors under its control. In some cases, pipeline (cascaded) co-processing is used to reduce latency.

No comments: