Hypertransport CPU technology: I/O Streams in Hypertransport Technology

In addition to virtual channels, HyperTransport also defines I/O streams. An I/O stream consists of the requests, responses, and data associated with a particular UnitID and HyperTransport link. Ordering rules require that I/O streams be treated independently from each other. When a request/response packet is sent, it is tagged with sender attributes (UnitID, Source Tag, and Sequence ID) that are used by other devices to identify the transaction stream in use, and the required ordering within it. Entries within the virtual channel buffers include the transaction stream identifiers (attributes).

Used properly, the independent I/O streams create the effect of separate connections between devices and the host bridge above them — much as a shared bus connection appears.

Transactions (Requests, Responses, and Data)

Transfers initiated by HT devices require one or more transactions to complete. These devices may need to perform a variety of operations that include:

sending or forwarding data (write)
requesting that a target return data to it (read)
performing an atomic read/modify/write operation
wanting additional control over ordering of its posted transactions (using Flush and Fence commands)
wanting to broadcast a message to all downstream agents (done by bridges only)

The format of these transactions also vary depending on the type of operation (request) specified as listed below:

Requests that behave like reads and that require a read response and data (i.e., Sized Read, Atomic RMW)
Requests that behave like writes, and require a target done response to confirm completion (i.e. Non-posted Sized Writes)
Posted Requests that behave like writes but don't require any target response or data. (i.e. Posted Sized Writes, Broadcast Message, or Fence)

Transaction Requests

Every transaction begins with the transmission of a Request Packet. Note that the actual format of a request packet varies depending on the particular request, but in general each request contains the following information:

Target address within HyperTransport memory space
The request type (command)
Sender's transaction stream ID (UnitID, SeqID)
The amount of data to be transferred (if any)
Other attributes: virtual channel to use, etc.

HT defines seven basic request types. The characteristics of each request type is discussed in the following sections.

Transaction Responses

Responses are generated by the target device in cases where data is to be returned from the target device, or when confirmation of transaction completion is required. Specifically, in HyperTransport, a response follows all non-posted requests. A target responds to:

Return data to satisfy an earlier read or Atomic Read-Modify Write (RMW) request
Confirm the arrival of non-posted write data
Confirm the completion of a Flush operation
Report errors

The information in a response varies both with the Request that causes it, and with the direction the response is traveling in the HyperTransport fabric. However, content of an HT response generally includes:

Response type (command)
Response direction (upstream or downstream)
Transaction stream (UnitID, Source Tag)
Misc. info: virtual channel to use, error, etc.

Transaction Types

As discussed earlier, HT defines seven basic transaction types. This section introduces the characteristics of each type and defines any sub-types that exist.

Sized Read Transactions

Sized Read transactions permit remote access to a device memory or memory-mapped I/O (MMIO) address space. The operation may be initiated on HT from the host bridge (PIO operation), or an HT device may wish to read data from memory (DMA operation) or from another HT device (peer-to-peer operation). Two types of Sized Read transactions define the different quantities of data to be read.

Sized (Byte) Read — this request defines an aligned 4 byte block of address space from which 0 to 4 bytes can be read. Any single byte location or any group of bytes within the 4 byte block can be accessed. The typical use of this transaction is for reading MMIO registers.
Sized (DW) Read — this request identifies an aligned 64 byte block of address space from which 4-64 bytes can be read. Any continuous group of aligned 4 byte groups (DWs) can be accessed.

The basic rules for maintaining high performance of HT reads include:

For reads, the requester won't issue the request until it has buffers available to receive all requested data without wait states.
The requester won't issue the request until it knows the target has room in its transaction queue to accept it (Flow Control)
Upon receiving the read request, the target won't issue the read response until it has all requested data and status available to send. Once it starts the response, there will be no wait states until the read response packet and all data (up to 16 dwords) have been sent.
Upon receiving the response, the requester will check the error bits to make certain the data is valid.
The target and any bridges in the path de-allocate buffers and queue entries as soon as the response has been sent.

Sized Write Transactions

Sized Write transactions permit the host bridge (PIO operation) to send data to a HyperTransport device, or permits a HyperTransport device to send data to memory (DMA operation) or to another device (Peer-to-peer operation). Two types of Sized Write requests permit different sizes of memory or MMIO space to be accessed.

Sized (Byte) Write — this request identifies an aligned block of 32 bytes of address space into which data is to be written. The amount of data to be written can be from 0 to 32 bytes. Note that the maximum transfer size of 32 bytes only occurs if the start address is 32 byte aligned. If the start address is not on a 32-byte boundary, the transfer will be less than 32 bytes. Furthermore, no Byte Write transaction crosses a 32 byte address boundary. Any combination of bytes (need not be contiguous) can be written from the start address to the next aligned 32 byte block of address space.
Sized (DW) Write — this request identifies an aligned block of 64 bytes of address space into which data can be written. The start address must be aligned on 4-byte boundaries, and data to be written is always aligned in 4- byte contiguous groups (DWs). The amount of data written can be from 1 to 16 DW increments.

Flush

Flush is useful in cases where a device must be certain that its posted writes are "visible" in host memory before it takes subsequent action. Flush is an upstream, non-posted "dummy" read command that pushes all posted requests ahead of it to memory. Note that only previously posted writes within the same transaction stream as Flush transaction need be flushed to memory. When an intermediate bridge receives a Flush transaction, it generates one or more Sized Write transactions necessary to forward all data in its upstream posted-write buffer toward the host bridge. Ultimately, the host bridge receives the command and flushes the previously-posted writes to memory. Receipt of the read response from the host bridge is confirmation that the flush operation has completed.

Fence

Fence is designed to provide a barrier between posted writes, which applies across all UnitIDs and therefore across all I/O streams and all virtual channels. Thus, the fence command is global because it applies to all I/O streams. The Fence command goes in the posted request virtual channel and has no response. The behavior of a Fence is as follows:

The PassPW bit must be clear so that the Fence pushes all requests in the posted channel ahead of it.
Packets with their PassPW bit clear will not pass a Fence regardless of UnitID.
Packets with their PassPW bit set may pass a Fence.
A nonposted request with PassPW clear will not pass a Fence as it is forwarded through the chain, but it may do so after it reaches a host bridge.

Fence requests are never issued as part of an ordered sequence, so their SeqID will always be 0. Fence requests with PassPW set, or with a nonzero SeqID, are legal, but may have an unpredictable effect. Fence is only issued from a device to a host bridge or from one host bridge to another. Devices are never the target of a fence so they do not need to perform the intended function. If a device at the end of the chain receives a fence, it must decode it properly to maintain proper operation of the flow control buffers. The device should then drop it.

Hypertransport CPU technology

Tuesday, June 26, 2007

I/O Streams in Hypertransport Technology