Hypertransport CPU technology

Tuesday, June 26, 2007

ISA/LPC Buses

The ISA and LPC buses reside typically on the PCI bus. These buses support bus mastering and legacy DMA transfers. These devices differ from HT and PCI devices in that they do not support either split transactions or retries.

Deadlocks
The specification defines two possible deadlock conditions that can occur because the ISA and LPC (Low Pin-Count) buses do not support transaction retry. For example, if an ISA (LPC) Master initiates a transaction that requires a response, the bus cannot handle a new request prior to the current transaction having completed. This type of protocol is extremely simple from an ordering perspective because all transactions must complete before the next one begins; thus, no ordering rules are required. Of course the downside to this approach is that all other devices are stalled while they wait for the current transaction to complete. Delayed transactions supported by the PCI bus and split transactions supported by PCI-X and HyperTransport can handle new transactions while a response to a previous transaction is pending. The price — complex ordering rules to ensure that transactions complete in the intended order.

Deadlock Scenario 1

Consider the following sequence of events as they relate to the limitations of the ISA/LPC bus as discussed above and to the PCI-based Producer/Consumer transaction ordering model.

An ISA/LPC Master initiates a transaction that requires a response from the Host-to-HT Bridge (e.g., a memory read from main memory).
The CPU initiates a write operation targeting a device on the ISA/LPC bus, and the Host Bridge issues this write as a posted operation.
The posted write reaches HT-to-PCI bridge where it is sent across the PCI bus to the south bridge.
The south bridge cannot accept the write targeting the ISA bus because the ISA/LPC bus is waiting for the outstanding response. So, the south bridge issues a retry.
The read response reaches the HT/PCI bridge. However, the Producer/Consumer model requires that all previously-posted write headed to the PCI bus be completed before sending a read response. The read response is now stuck behind a posted write that cannot complete prior to the read response. Result: Deadlock!

The recommended solution to this problem is to require that all requests targeting the ISA/LPC bus be non-posted operations. This eliminates the problem because non-posted operations can be forwarded to the PCI bus in any order.

Deadlock Scenario 2

Once again because the ISA or LPC bus is unable to accept any requests while it waits for a response to its own requests a possible deadlock can occur. This deadlock can occur when the downstream non-posted request channel fills up while awaiting a response to an ISA DMA request. The sequence of events is as follows:

A DMA request is issued by an ISA/LPC device to main memory.
Downstream requests targeting the ISA bus are initiated but stack up because they are not being accepted by the south bridge, because its's waiting on a response from the previously issued DMA request. Consequently, it is possible for the downstream nonposted request channel to fill.
A peer-to-peer operation is initiated to a device on the same chain that is in the non-posted request queue ahead of the ISA/LPC request (in step 1) This peer-to-peer transaction is sent to the Host, which attempts to reflect the transaction downstream to the target device. However, because the downstream request channel is full; the upstream nonposted peer request stalls as does the request from the ISA bus. This prevents the ISA/LPC bridge from making forward progress.

The solution to this deadlock is for the host to limit the number of requests it makes to the ISA/LPC bus to a known number of requests (typically one) that the bridge can accept. Because the host cannot limit peer requests without eventually blocking the upstream nonposted channel (and causing another deadlock), no peer requests to the ISA/LPC bus are allowed. Peer requests to devices below the ISA/LPC bridge on the chain (including other devices in the same node as the ISA/LPC bridge) cannot be performed without deadlock unless the ISA/LPC bridge sinks the above mentioned known number of requests without blocking requests forwarded down the chain. This can be implemented with a buffer (or set of buffers) for requests targeting the bridge, but separate from the buffering for other requests.

AGP Bus Issues

AGP Configuration Space Requirements

Some legacy operating systems require that the AGP capability registers be mapped at Bus 0, Device 0, and Function 0. Also, The AGP aperture base address configuration register must be at Bus 0, Device 0, Function 0, Offset 10h. In a legacy system, these registers are located within the Host to PCI bridge configuration space (Host to HT bridge in our example).

For complete legacy software support, the specification recommends that the AGP subsystem be designed as follows:

AGP bridges are placed logically on HyperTransport chain 0 (Bus 0).
The AGP interface uses multiple UnitIDs due to AGP configuration being split between the Host to HT bridge and the Host to AGP bridge (i.e., virtual PCI to PCI bridge).
During initialization the base UnitID of an AGP device must be assigned a non-zero value to support configuration of chain 0. Following HT initialization the base UnitID should be changed to zero.
Device number zero, derived from the base UnitID register value, should contain the capabilities header and the AGP aperture base address register (at Offset 10h),
Device number 1, derived from the base UnitID+1, should be used for the Host to AGP bridge.
The UnitID that matches the base (0) is not used for any AGP-initiated I/O streams or responses so that there is no conflict with host-initiated I/O streams or responses. Only UnitIDs greater than the base may be used for I/O streams.
Legacy implementations place the AGP graphics address remapping table (GART) in the host. Thus, the AGP aperture base address register and any other registers that are located in the AGP device but required by the host are copied by software into implementation-specific host registers. These implementation-specific registers should be placed somewhere other than Device 0, to avoid conflicts with other predefined AGP registers. In a sharing double-hosted chain, this requires the hosts to implement the Device Number field so that the hosts may address each other after the AGP bridge has assumed Device 0.

Note that if legacy OS support is not required, the AGP device's base UnitID register may be programmed to any permissible value.

AGP Ordering Requirements

Three categories of AGP transaction types lead to three separate sets of ordering rules. These categories can be thought of as three separate transaction channels. These three channels are completely independent of each other with respect to ordering, and should have their own UnitIDs. The transaction types are:

PCI-based
Low Priority
High Priority

The specification makes the following observation that leads to HT-based AGP ordering requirements being slightly less complex that PCI-based requirements:

The ordering rules presented here for reads are somewhat different from what appears in the AGP specification. That document defines ordering between reads in terms of the order that data is returned to the requesting device. We are concerned here with the order in which the reads are seen at the target (generally, main memory). The I/O bridges can reorder returning read data if necessary. This leads to a slightly relaxed set of rules.

See MindShare's AGP System Architecture book for details regarding the AGP ordering rules.

PCI-Based Ordering

AGP transactions based on the PCI protocol follow the same rules as PCI.

Low Priority Ordering

Ordering rules for the low priority AGP transactions are:

Reads (including flushes) must not pass writes.
Writes must not pass writes.
Fences must not pass other transactions or be passed by other transactions.

High Priority Ordering

High priority transactions only carry graphics data using split transactions. Consequently, the Producer/Consumer model has no relevance and ordering requirements can be reduced to the following single rule:

Writes must not pass writes.

PCI Bus Issues

Several features of the PCI bus must be handled in the correct fashion when interfacing with the HT bus. For background information and details regarding PCI ordering, refer to MindShare's PCI System Architecture book, 4th edition.

PCI Ordering Requirements

Transaction ordering on the PCI bus is based on the Producer/Consumer programming model. This model involves 5 elements:

Producer — PCI master that sources data to a memory target
Target — main memory or any PCI device containing memory
Consumer — PCI master that reads and processes the Producer data from the target
Flag element — a memory or I/O location updated by the producer to indicate that all data has been delivered to the target, and checked by the Consumer to determine when it can begin to read and process the data.
Status element — a memory or I/O location updated by the Consumer to indicate that it has processed all of the Producer data, and checked by the Producer to determine when the next batch of data can be sent.

This model works flawlessly in PCI when all elements reside on the same shared PCI bus. When these elements reside on different PCI buses (i.e. across PCI to PCI bridges, the model can fail without adherence to the PCI ordering rules.

The PCI specification, versions 2.2 and 2.3, defines the required transaction ordering rules. These ordering rules are included in this section as review and to identify rules that have may have no purpose in some HT designs.

PMW stands for posted memory write.
DRR and DRC stand for Delayed Read Request and Delayed Read Completion, respectively.
DWR and DWC stand for Delayed Write Request and Delayed Write Completion, respectively.
"Yes" specifies that the transaction just latched must be ordered ahead of the previously latched transaction indicated in the column heading.
"No" specifies that the transaction just latched must never be ordered ahead of the previously latched transaction indicated in the column heading.
"Yes/No" entries means that the transaction just latched is allowed to be ordered ahead of the previously-latched operation indicated in the column heading, but such reordering is not required. The Producer/Consumer Model works correctly either way.

Avoiding Deadlocks

PCI ordering rules require that Posted Memory Writes (PMWs) in Row 1, be ordered ahead of the delayed requests and delayed completions listed in columns 2-5. This requirement is based on avoiding potential deadlocks. Each of the deadlocks involve scenarios arising from the use PCI bridges based on earlier versions of the specification. If all PCI bridge designs used in HT platforms are based on 2.1 and later versions of the PCI specification, the PCI ordering rules with "Yes" entries in row 1 can be treated as "Yes/No."

Subtractive Decode

PCI employs a technique referred to as subtractive decode to handle devices that are mapped into memory or I/O address space by user selection of switches and jumpers (e.g. ISA devices). Consequently, configuration software has no knowledge of the resources assigned to these devices. Fortunately, these PC legacy devices are mapped into relatively small ranges of address space that can be reserved by platform configuration software.

Subtractive Decode: The PCI Method

Subtractive decode is a process of elimination. Since configuration software allocates and assigns address space for PCI, HT, AGP and other devices, any access to address locations not assigned can be presumed to target a legacy device, or may be an errant address.

All PCI devices must perform a positive decode to determine if they are being targeted by the current request. This decode must be performed as a fast, medium, or slow decode. The device targeted must indicate that it will respond to the request by signaling device select (DEVSEL#) across the shared bus. When device driver software issues a request with an address that has not been assigned by configuration software, no PCI device is targeted (i.e. no DEVSEL# is asserted within the time allowed) By process of elimination, the subtractive decode agent recognizes that no PCI device has responded and therefore it asserts DEVSEL# and forwards the transaction to the ISA bus, where the request is completed.

Subtractive Decode: HT Systems Requiring Extra Support

When the subtractive decode agent is not at the end of a single-hosted chain, or when more than one HT I/O chain is implemented in a system, subtractive decode becomes more difficult.

The Problem

HyperTransport devices in a chain do not share the same bus as in PCI, so a subtractive decode agent cannot detect if a request has not been claimed by other devices on the chain.

The Solution

As described previously, configuration software assigns addresses to all HT, PCI, and AGP devices. Therefore, the host knows when a request will result in a positive decode and when it will not. The specification requires that all hosts connecting to HyperTransport I/O chains implement registers that identify the positive decode ranges for all HyperTransport technology I/O devices and bridges (except as noted in the simple method). One of these I/O chains may also include a subtractive bridge (typically leading to an ISA, or LPC bus). Requests that do not match any of the positive ranges must be issued with the compat bit set, and must be routed to the chain containing the subtractive decode bridge. This chain is referred to as the compatibility chain.

The Compat bit indicates to the subtractive decode bridge that it should claim the request, regardless of address. Requests that fall within the positive decode ranges must not have the Compat bit set, and are passed to the I/O chain upon which the target device resides. The target chain may be the compatibility or any other I/O chain.

PCI Burst Transactions

PCI permits long burst transactions with either contiguous or discontiguous byte masks (byte enables) that may not be supported by HT. These long bursts must be broken into multiple requests to support the HT protocol as follows:

PCI read requests with discontiguous byte masks that cross aligned 4-byte boundaries must be broken into multiple 4-byte HT RdSized (byte) requests.
PCI write requests with discontiguous byte masks that cross 32-byte boundaries must be broken into multiple 32-byte HT WrSized (byte) requests. Note that the resulting sequence of write requests must be strongly ordered in ascending address order.
PCI write requests with contiguous byte masks that cross 64-byte boundaries must be broken into multiple 64-byte HT WrSized (dword) request

The Need For Networking Extensions

While HyperTransport was initially developed to address bandwidth and scalability problems associated with moving data through the I/O subsystems of desktops and servers, the networking extensions bring a number of enhancements which permit the advantages of HyperTransport technology to be extended to communications processing applications. There are some major differences in the requirements of host-centric systems such as desktops and servers and communications processing systems.

Communications Processing Is Often Less Vertical

In communications applications, there may be a number of processors or coprocessors located in various corners of the topology. The host processor may assume responsibility for configuration and control of coprocessors and interface devices, while the coprocessors perform specialized data processing tasks. Because of the distributed responsibility for control and data handling tasks, these systems tend to be much less host processor-centric.

As a result of decentralizing data processing in communications systems, information flow may be omni-directional as coprocessors initiate transactions targeting devices under their control. When switch components are added to the topology, elaborate multi-port configurations are possible.

Summary Of Anticipated Networking Extension Features

Network Extensions Adds Message Semantics

In handling the special problems of communications processing, the HyperTransport networking extensions add message semantics to the storage semantics used in the 1.04 revision of the HyperTransport I/O Link Specification. Storage semantics were described in the last section. Message semantics are more efficient in handling variable length transfers, broadcasting messages, etc. The 64-byte HyperTransport packets are concatenated to form longer messages, and additions to request packet fields identify the start of a message, end of a message, or may even be used to signal the abort of a scheduled transaction. Unlike storage semantics, in which the payload is data targeting an address, messages can also be sent which convey interrupts and other housekeeping events.

Another difference between message semantics and storage semantics is the concept of addressing. In storage semantics, addresses are managed by the source device, and each byte of data transferred is associated with a particular address in the system memory map. This makes sense because the locations are within (and owned by) the device being targeted. In message semantics, the message is tagged as to which stream it belongs, and the destination determines where it goes. The ultimate destination is often external to the system, where the system memory map has no meaning.

16 New Posted Write Virtual Channels

Release 1.1 adds 16 new optional Posted Write Virtual Channels to the hardware of each node (above the three already required). Each of these new virtual channels may be given a dedicated bandwidth allocation, and an arbitration mechanism is defined for managing them.

An End-To-End flow control mechanism has also been added to allow devices to put millions of user streams into these 16 additional virtual channels. In this way, very large numbers of independent real-time streams (e.g. audio or video) make be handled.

Direct Peer-to-Peer Transfers Added

HyperTransport supports the full producer-consumer ordering model of PCI. In cases where this strict global ordering is needed, transactions from one HyperTransport I/O device to another (called peer-to-peer transfers) must first move upstream to the host bridge where they are then reissued downstream to the target device (a process HyperTransport calls reflection). Release 1.1 adds the option of sending send some traffic directly from peer-to-peer when the application does not require strict global ordering (it often isn't a concern in communications processing).

Link-Level Error Detection And Handling

With the addition of direct peer-to-peer transfers, Release 1.1 permits coprocessors and other devices to communicate directly without involvement of the host bridge. Along with this capability, network extensions provide for error detection and correction on the individual link level. In the event of an error, the receiver sends information back to the transmitter which causes a re-transmission of the packet. Obviously, the packet can't be consumed or forwarded until its validity is checked.

64 Bit Addressing Option

In keeping with the very large address space of many newer systems, Release 1.05 allows the optional extension of the normal 40-bit HyperTransport request address field to 64 bits.

Increased Number Of Host Transactions

Release 1.05 increases the number of outstanding transactions that a host bridge may have in progress from 32 to 128.

End-To-End Flow Control

In communication systems, there are occasions when devices are transferring packets to distant targets (not immediate neighbors) which may go "not ready" (or to another state which makes them unable to accept traffic) for extended periods. Prior to Release 1.1, HyperTransport devices only have flow control information for their immediate neighbors. Release 1.1 adds new end-to-end flow control packets which distant devices may send to each other to indicate their ability to participate in transfers. If a device is not ready, the source device does not start sending (or continue sending) packets; this helps eliminate bottlenecks which otherwise occur when the flow control buffers of devices in the path between source and target become full of packets which cannot be forwarded.

Switch Devices Formally Defined

Finally, Release 1.05 formally defines the switch device type which may be used to help implement the complex topologies required in communications systems. A switch behaves much like a two-level HyperTransport-HyperTransport bridge with multiple secondary interfaces. The basic characteristics of a switch include:

A switch consumes one or more UnitIDs on its host interface. The port attached to the host is the default upstream port.
The switch acts as host bridge for each of its other interfaces. Each interface has its own bus number.
Switches, like bridges, are allowed to reassign UnitID, Sequence ID, and SrcTag for transactions passed to other busses. The switch maintains a table of outstanding (non-posted) requests in order to handle returning responses.
Switches may be programmed to perform address translation.
Switches must maintain full producer-consumer ordering for all combinations of transaction paths.
Switches must provide a method for configuration of downstream devices on all ports.

Server And Desktop Topologies Are Host-Centric

a typical desktop or server platform is somewhat vertical. It has one or more processors at the top of the topology, the I/O subsystem at the bottom, and main system DRAM memory in the middle acting as a holding area for processor code and data as well as the source and destination for I/O DMA transactions performed on behalf of the host processor(s). The host processor plays the central role in both device control and in processing data; this is sometimes referred to as managing both the control plane and the data plane.

HyperTransport works well in this dual role because of its bandwidth and the fact that the protocol permits control information including configuration cycles, error handling events, interrupt messages, flow control, etc. to travel over the same bus as data — eliminating the need for a separate control bus or additional sideband signals.

Upstream And Downstream Traffic

There is a strong sense of upstream and downstream data flow in server and desktop systems because very little occurs in the system that is not under the direct control of the processor, acting through the host bridge. Nearly all I/O initiated requests move upstream and target main memory; peer-peer transactions between I/O devices are the infrequent exception.

Storage Semantics In Servers And Desktops

Without the addition of networking extensions, HyperTransport protocol follows the conventional model used in desktop and server busses (CPU host bus, PCI, PCI-X, etc.) in which all data transfers are associated with memory addresses. A write transaction is used to store a data value at an address location, and a read transaction is used to later retrieve it. This is referred to as associating storage semantics with memory addresses. The basic features of the storage semantics model include:

Targets Are Assigned An Address Range In Memory Map

At boot time, the amount of DRAM in the system is determined and a region at the beginning of the system address map is reserved for it. In addition, each I/O device conveys its resource requirements to configuration software, including the amount of prefetchable or non-prefetchable memory-mapped I/O address space it needs in the system address map. Once the requirements of all target devices are known, configuration software assigns the appropriate starting address to each device; the target device then "owns" the address range between the start address and the start address plus the request size.

Each Byte Transferred Has A Unique Target Address

In storage semantics, each data packet byte is associated with a unique target address. The first byte in the data packet payload maps to the start address and successive data packet bytes are assumed to be in sequential addresses following the start address.

The Requester Manages Target Addresses

An important aspect of storage semantics is the fact that the requester is completely responsible for managing transaction addresses within the intended target device. The target has no influence over where the data is placed during write operations or retrieved in read operations.

In HyperTransport, the requester generates request packets containing the target start address, then exchanges packets with the target device. The maximum packet data payload is 64 bytes (16 dwords). Transfers larger than 64 bytes are comprised of multiple discrete transactions, each to an adjusted start address. Using HyperTransport's storage semantics, an ordered sequence of transactions may be initiated using posted writes or including a non-zero SeqID field in the non-posted requests, but there is no concept of streaming data, per se.

Storage Semantics Work Fine In Servers And Desktops

As long as each requester is programmed to know the addresses it must target, managing address locations from the initiator side works well for general purpose data PIO, DMA, and peer-peer exchanges involving CPU(s), memory and I/O devices. When the target is prefetchable memory, storage semantics also help support performance enhancements such as write-posting, read pre-fetching, and caching — all of which depend on a requester having full control of target addresses.

1.04 Protocol Optimized For Host-Centric Systems

Because the HyperTransport I/O Link Protocol was initially developed as an alternative to earlier server and desktop bus protocols that use storage semantics (e.g. PCI), the 1.04 revision of the protocol is optimized to improve performance while maintaining backwards compatibility in host-centric systems:

The strongly ordered producer-consumer model used in PCI transactions which guarantees flag and data coherence regardless of the location of the producer, consumer, flag location, or data storage location is available in the HyperTransport protocol.
Virtual channel ordering may optionally be relaxed in transfers where the full producer-consumer model is not required.
The strong sense of upstream and downstream traffic on busses such as PCI is also preserved in HyperTransport. Programmed I/O (PIO) transactions move downstream from CPU to I/O device via the host bridge. I/O bus master transactions move upstream towards main memory.
Direct peer-peer transfers are not supported in the 1.04 revision of the HyperTransport I/O Link Specification; requests targeting interior devices must travel up to the host bridge, then be reissued (reflected) back downstream towards the target.

All of the above features work well for what they are intended to do: support a host-centric system in which control and data processing functions are both handled by the host processor(s), and I/O devices perform DMA data transfers using main system memory as a source and sink for data.

Some Systems Are Not Host-Centric

Unlike server and desktop computers, some processing applications do not lend themselves well to a host-centric topology. This includes cases where there are multiple levels of processing, complex look-up functions, protocol translation, etc. In these cases, a single processor (or even multiple CPUs on a host bus) can quickly become a bottleneck. Often what works more effectively is to assign control functions to a host processor and distribute data processing functions across multiple co-processors under its control. In some cases, pipeline (cascaded) co-processing is used to reduce latency.

X86 Power Management Support

X86 power management is based on the ACPI specification for the Windows operation environment. The specification defines specific timing requirements associated with STPCLK and SMI message cycles related to power management events. The specification also describes ACPI-defined system state transitions that relate to wakeup event signaling via LDTREQ#. See the specification for reference information related to these events.

Stop Clock Signal

The STPCLK# is one of the basic x86 power management signals. When power management logic asserts this signal, it places the CPU into its Stop Grant State, which has the following effects (Intel PIII example). The processor:

issues a Stop Grant Acknowledge transaction
stops driving the AGTL FSB signals, allowing them to return to the minimum power state (pulled up by termination resistors to VTT)
turns off clocks to internal architecture regions, except external bus (FSB) and interrupt sections (e.g. IOAPIC).
latches incoming interrupts, but does not service them until the CPU returns to the Normal State.
handles requests for Snoop transactions on the FSB; to do this the CPU transitions to the HALT/Grant Snoop State to perform the snoop, then returns to the Stop Grant State upon completion.

When STPCLK# is deasserted, the CPU returns to the Normal State. Many newer CPU's have an additional signal which may be used to expand the number of low power states. For example, the Intel Pentium III has a SLP# (Sleep) signal used in conjunction with STPCLK# to drive the CPU into a very deep low power state (e.g., clocks are stopped, no interrupts are recognized, and no snoops are performed). This is the next best thing to being powered down completely, and the time to recover to normal operation is much faster.

Two Types Of Double-Hosted Chains

There are two basic arrangements for double-hosted chains: sharing and non-sharing.

Sharing Double-Hosted Chain

In a sharing double-hosted chain, traffic is allowed to flow from end to end. Either host may target any of the devices in the chain, including the other host. In this arrangement, one host is the master host bridge and the other is the slave host bridge. The determination about which host is master or slave is not defined in the specification, but must be defined before reset occurs. Most likely, the system board layout will determine master/slave host bridges — possibly through a strapping option on the motherboard.

Two Types Of Double-Hosted Chains

There are two basic arrangements for double-hosted chains: sharing and non-sharing.

Sharing Double-Hosted Chain

If Possible, Assign All Devices To Master Host Bridge

The HyperTransport specification recommends that all resources in a sharing double-hosted chain be assigned to the master host bridge if possible; this eliminates a potential deadlock condition in peer-to-peer transactions. The Slave Command Register Master Host and Default Direction bits in PCI configuration space are used to program tunnel devices with the information needed to recognize the "upstream vs. downstream" directions. This is important because interior devices always issue requests and responses in the upstream direction. They only accept responses in the downstream direction.

If Slave Must Access Devices, It Uses Peer-to-Peer Transfers

The slave host in a sharing double-hosted chain may be required to access the devices on the link. To do so, it may have its Command Register Act as Slave bit set = 1. When this is done, all packets it issues travel first to the master host bridge where they are reissued back to the target devices as peer-to-peer transactions.

Non-Sharing Double-Hosted Chain

A non-sharing double-hosted chain appears logically as two distinct chains with a host bridge at each end.

Software May Break The Chain

Software chooses a point to break the chain in two parts and then:

While the link is idle, the link between the two tunnel devices is broken by programing the End Of Chain (EOC) bits in the appropriate tunnel Link Control registers on each side. The Transmit Off bit in each of the Link Control registers can also be set.
The slave host bridge writes to the Slave Command register for each device now under its control to force the Master Host and Default Direction bits in each to point at the slave host bridge.
Unique bus numbers are assigned to each segment in a non-sharing double-hosted chain. The bus number is used so that chains may be uniquely identified and so type 1 configuration cycles may be forwarded and/or converted to type 0 cycles by bridges.
If peer-to-peer transactions are not required, software link partitioning can also be used for load balancing.

Additional Notes About Double-Hosted Chains

Initialization In A Double-Hosted Chain

One of the responsibilities of a master host bridge in a double-hosted chain is to help with initialization after reset. Following low-level link initialization, the slave host bridge "sleeps" pending set up by the master. The basic steps in master initialization include:

The master host bridge sets the Slave Command CSR master host bit to point towards the master host bridge in all slave devices it finds. This bit is set automatically whenever the Slave Command CSR is written.
When the master host bridge discovers the slave host bridge, it sets the Host Command CSR Double Ended bit in the both its own and the slave's Host Command register. This informs the slave (when it wakes up) that it is in a double-hosted chain and that it is not required to configure devices below it.
If the Double Ended bit is not set in the slave, it will initialize its end of the double ended chain when it awakens.

Type 0 Configuration Cycles In A Double-Hosted Chain

Because all host bridges tend to own UnitID 0, a configuration cycle carrying a device number field of "0" in a double-hosted chain might be misinterpreted. The direction a type 0 configuration cycle request is traveling determines which host bridge is the target. If configuration software wishes to prevent a host bridge (e.g. the slave host) in a double-hosted chain from accessing another host's configuration space, the Host Command Register host hide bit may be set = 1.

Hypertransport CPU technology

Tuesday, June 26, 2007

ISA/LPC Buses

Deadlock Scenario 1

Deadlock Scenario 2

AGP Bus Issues

AGP Bus Issues

AGP Configuration Space Requirements

AGP Ordering Requirements

PCI-Based Ordering

Low Priority Ordering

High Priority Ordering

PCI Bus Issues

PCI Ordering Requirements

Avoiding Deadlocks

Subtractive Decode

Subtractive Decode: The PCI Method

Subtractive Decode: HT Systems Requiring Extra Support

The Problem

The Solution

PCI Burst Transactions

The Need For Networking Extensions

Communications Processing Is Often Less Vertical

Summary Of Anticipated Networking Extension Features

Network Extensions Adds Message Semantics

16 New Posted Write Virtual Channels

Direct Peer-to-Peer Transfers Added

Link-Level Error Detection And Handling

64 Bit Addressing Option

Increased Number Of Host Transactions

End-To-End Flow Control

Switch Devices Formally Defined

Server And Desktop Topologies Are Host-Centric

Upstream And Downstream Traffic

Storage Semantics In Servers And Desktops

Targets Are Assigned An Address Range In Memory Map

Each Byte Transferred Has A Unique Target Address

The Requester Manages Target Addresses

Storage Semantics Work Fine In Servers And Desktops

1.04 Protocol Optimized For Host-Centric Systems

Some Systems Are Not Host-Centric

X86 Power Management Support

Stop Clock Signal

Two Types Of Double-Hosted Chains

Sharing Double-Hosted Chain

Two Types Of Double-Hosted Chains

Sharing Double-Hosted Chain

If Possible, Assign All Devices To Master Host Bridge

If Slave Must Access Devices, It Uses Peer-to-Peer Transfers

Non-Sharing Double-Hosted Chain

Software May Break The Chain

Additional Notes About Double-Hosted Chains

Initialization In A Double-Hosted Chain

Type 0 Configuration Cycles In A Double-Hosted Chain

Blog Archive

About Me