Tuesday, June 26, 2007

PCI Bus Issues

Several features of the PCI bus must be handled in the correct fashion when interfacing with the HT bus. For background information and details regarding PCI ordering, refer to MindShare's PCI System Architecture book, 4th edition.

PCI Ordering Requirements

Transaction ordering on the PCI bus is based on the Producer/Consumer programming model. This model involves 5 elements:

  1. Producer — PCI master that sources data to a memory target

  2. Target — main memory or any PCI device containing memory

  3. Consumer — PCI master that reads and processes the Producer data from the target

  4. Flag element — a memory or I/O location updated by the producer to indicate that all data has been delivered to the target, and checked by the Consumer to determine when it can begin to read and process the data.

  5. Status element — a memory or I/O location updated by the Consumer to indicate that it has processed all of the Producer data, and checked by the Producer to determine when the next batch of data can be sent.

This model works flawlessly in PCI when all elements reside on the same shared PCI bus. When these elements reside on different PCI buses (i.e. across PCI to PCI bridges, the model can fail without adherence to the PCI ordering rules.

The PCI specification, versions 2.2 and 2.3, defines the required transaction ordering rules. These ordering rules are included in this section as review and to identify rules that have may have no purpose in some HT designs.

  • PMW stands for posted memory write.

  • DRR and DRC stand for Delayed Read Request and Delayed Read Completion, respectively.

  • DWR and DWC stand for Delayed Write Request and Delayed Write Completion, respectively.

  • "Yes" specifies that the transaction just latched must be ordered ahead of the previously latched transaction indicated in the column heading.

  • "No" specifies that the transaction just latched must never be ordered ahead of the previously latched transaction indicated in the column heading.

  • "Yes/No" entries means that the transaction just latched is allowed to be ordered ahead of the previously-latched operation indicated in the column heading, but such reordering is not required. The Producer/Consumer Model works correctly either way.


Avoiding Deadlocks

PCI ordering rules require that Posted Memory Writes (PMWs) in Row 1, be ordered ahead of the delayed requests and delayed completions listed in columns 2-5. This requirement is based on avoiding potential deadlocks. Each of the deadlocks involve scenarios arising from the use PCI bridges based on earlier versions of the specification. If all PCI bridge designs used in HT platforms are based on 2.1 and later versions of the PCI specification, the PCI ordering rules with "Yes" entries in row 1 can be treated as "Yes/No."

Subtractive Decode

PCI employs a technique referred to as subtractive decode to handle devices that are mapped into memory or I/O address space by user selection of switches and jumpers (e.g. ISA devices). Consequently, configuration software has no knowledge of the resources assigned to these devices. Fortunately, these PC legacy devices are mapped into relatively small ranges of address space that can be reserved by platform configuration software.

Subtractive Decode: The PCI Method

Subtractive decode is a process of elimination. Since configuration software allocates and assigns address space for PCI, HT, AGP and other devices, any access to address locations not assigned can be presumed to target a legacy device, or may be an errant address.

All PCI devices must perform a positive decode to determine if they are being targeted by the current request. This decode must be performed as a fast, medium, or slow decode. The device targeted must indicate that it will respond to the request by signaling device select (DEVSEL#) across the shared bus. When device driver software issues a request with an address that has not been assigned by configuration software, no PCI device is targeted (i.e. no DEVSEL# is asserted within the time allowed) By process of elimination, the subtractive decode agent recognizes that no PCI device has responded and therefore it asserts DEVSEL# and forwards the transaction to the ISA bus, where the request is completed.

Subtractive Decode: HT Systems Requiring Extra Support

When the subtractive decode agent is not at the end of a single-hosted chain, or when more than one HT I/O chain is implemented in a system, subtractive decode becomes more difficult.

The Problem

HyperTransport devices in a chain do not share the same bus as in PCI, so a subtractive decode agent cannot detect if a request has not been claimed by other devices on the chain.

The Solution

As described previously, configuration software assigns addresses to all HT, PCI, and AGP devices. Therefore, the host knows when a request will result in a positive decode and when it will not. The specification requires that all hosts connecting to HyperTransport I/O chains implement registers that identify the positive decode ranges for all HyperTransport technology I/O devices and bridges (except as noted in the simple method). One of these I/O chains may also include a subtractive bridge (typically leading to an ISA, or LPC bus). Requests that do not match any of the positive ranges must be issued with the compat bit set, and must be routed to the chain containing the subtractive decode bridge. This chain is referred to as the compatibility chain.

The Compat bit indicates to the subtractive decode bridge that it should claim the request, regardless of address. Requests that fall within the positive decode ranges must not have the Compat bit set, and are passed to the I/O chain upon which the target device resides. The target chain may be the compatibility or any other I/O chain.

PCI Burst Transactions

PCI permits long burst transactions with either contiguous or discontiguous byte masks (byte enables) that may not be supported by HT. These long bursts must be broken into multiple requests to support the HT protocol as follows:

  • PCI read requests with discontiguous byte masks that cross aligned 4-byte boundaries must be broken into multiple 4-byte HT RdSized (byte) requests.

  • PCI write requests with discontiguous byte masks that cross 32-byte boundaries must be broken into multiple 32-byte HT WrSized (byte) requests. Note that the resulting sequence of write requests must be strongly ordered in ascending address order.

  • PCI write requests with contiguous byte masks that cross 64-byte boundaries must be broken into multiple 64-byte HT WrSized (dword) request

No comments: