Hypertransport CPU technology: The Purpose Of Ordering Rules in a CPU

Some of the important reasons for enforcing ordering rules on packets moving through HyperTransport include the following:

Maintain Data Coherency

If transactions are in some way dependent on each other, a method is required to assure that they complete in a deterministic way. For example, if Device A performs a write transaction targeting main memory and then follows it with a read request targeting the same location, what data will the read transaction return? HyperTransport ordering seeks to make such events predictable (deterministic) and to match the intent of the programmer. Note that, compared to a shared bus such as PCI, HyperTransport transaction ordering is complicated somewhat by point-to-point connections which result in target devices on the same chain (logical bus) being at different levels of fabric hierarchy.

Avoid Deadlocks

Another reason for ordering rules is to handle cases where the completion of two separate transactions are each dependent on the other completing first. HyperTransport ordering includes a number of rules for deadlock avoidance. Some of the rules are in the specification because of known deadlock hazards associated with other buses to which HyperTransport may interface (e.g. PCI).

Support Legacy buses

One of the principal roles of HyperTransport is to serve as a backbone bus which is bridged to other peripheral buses. HyperTransport explicitly supports PCI, PCI-X, and AGP and the ordering requirements of those buses.

Maximize Performance

Finally, HyperTransport permits devices in the path to the target, and the target itself, some flexibility in reordering packets around each other to enhance performance. When acceptable, relaxed ordering may be enabled by the requester on a per-transaction basis using attribute bits in request and response packets.

Introduction: Three Types Of Traffic Flow

Hypertransport defines three types of traffic: Programmed I/O (PIO), Direct Memory Access (DMA), and Peer-to-Peer.

Programmed I/O traffic originates at the host bridge on behalf of the CPU and targets I/O or Memory Mapped I/O in one of the peripherals. These types of transactions often are generated by CPU to set up peripherals for bus master activity, check status, program configuration space, etc.
DMA traffic originates at a bus master peripheral and typically targets main memory. This traffic is used so that the CPU may be off-loaded from the burden of moving large amounts of data to and from the I/O subsystem. Generally, the CPU uses a few PIO instructions to program the peripheral device with information about a required DMA transfer (transfer size, target address in memory, read or write, etc.), then performs some other task while the DMA transfer is carried out. When the transfer is complete, the DMA device may generate an interrupt message to inform the CPU.
Peer-to-Peer traffic is generated by an interior node and targets another interior node. In HyperTransport, direct peer-to-peer traffic is not allowed.

What If A Device Requires Response Ordering?

All HyperTransport devices must be able to tolerate out-of-order response delivery or else restrict outstanding non-posted requests to one at a time. This also applies to bridges which sit between HyperTransport and a protocol that requires responses be returned in order. The bridge must not issue more outstanding requests than it has internal buffer space to hold responses it may be required to reorder.

Support For The Producer-Consumer Ordering Model

When the PassPW and Sequence ID bits are cleared in a request packet, HyperTransport transactions are compatible with the same producer-consumer model PCI employs. Basic features of the model include:

A producer device anywhere in the system may send data and modify a flag indicating data availability to a consumer anywhere in the system.
The data and flag need not be located in the same device as long as the consumer of the data waits for the response of a flag read before attempting to access the data.
In cases where the consumer is allowed to issue two ordered reads without making them part of an ordered sequence (setting SequenceID tag to a non-zero value), the producer-consumer model is only supported if the flag and data are within the same device.
Ordering rules guarantee that if the flag is modified after the data becomes available, the flag read will return valid status.

Producer-Consumer Model Simpler If Flag/Data In Same Place

If the flag and data are restricted to being in the same device, the PassPW bit may be set in requests which relaxes the ordering of responses and improves performance. At the same time, the producer-consumer model is maintained.

Upstream Ordering Rules

Posted requests, non-posted requests, and responses travel in independent virtual channels. Each uses a different command, which permits devices to distinguish them from one another. Requests have a Sequence ID field. Assigning non-zero sequence ID fields to non-posted requests forces all tunnel and bridge devices in the path to the target to forward these requests in the same order they were received. The target is also required to maintain this order when processing these requests internally. Requests with a Sequence ID of zero are not considered to be part of an ordered sequence. Requests and response packets also carry a May Pass Posted Writes (PassPW) bit.

Reordering Packets In DifferentTransaction Streams

Other than when a Fence command is issued, there is no ordering guarantee for packets originating from different sources. Traffic from each UnitID is considered a separate transaction stream; devices may reorder upstream packets from different streams as necessary.

Next UnitID1 receives a packet (2) from UnitID2.
When UnitID 1 forwards the two packets onto its upstream link, it may send packet (2) first. Packet (2) has then been reordered around packet (1).

No Reordering Packets In AStrongly Ordered Sequence

If one requester has issued a series of request packets carrying the same non-zero SequenceID, the packets may not be reordered (regardless of the state of the PassPW bit. The sequence only applies to packets within a single transaction stream (UnitID) and VC. Upstream devices still may reorder these packets with respect to those from other streams.

The I/O Hub issues a series of requests (1), (2), (3). All carry the same, non-zero SequenceID in the request.
When they are received by the first tunnel device, it checks the sequence ID field and the UnitID (all are identical). When it forwards the three packets to the PCI-X tunnel, it sends them in the same strongly ordered sequence.
The HyperTransport-to-PCI-X bridge makes the same determination and forwards packets (1), (2), and (3) through its tunnel interface to the host bridge in the same order.
The host bridge is also required to treat the three packets as a strongly ordered sequence internally.
If these were non-posted requests, there would be no guarantee of ordering in the responses returned to the I/O hub.

Hypertransport CPU technology

Tuesday, June 26, 2007

The Purpose Of Ordering Rules in a CPU