<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5449984521470454692</id><updated>2011-07-28T08:18:42.251-07:00</updated><category term='HyperTransport'/><title type='text'>Hypertransport CPU technology</title><subtitle type='html'>The Ultimate Blog on Hypertransport Technology . Find every information you will ever need here.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>47</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-7069468842378470443</id><published>2007-06-26T22:15:00.000-07:00</published><updated>2007-06-26T22:16:20.578-07:00</updated><title type='text'>ISA/LPC Buses</title><content type='html'>The ISA and LPC buses reside typically on the PCI bus. These buses support bus mastering and legacy DMA transfers. These devices differ from HT and PCI devices in that they do not support either split transactions or retries.&lt;br /&gt;&lt;br /&gt;Deadlocks&lt;br /&gt;The specification defines two possible deadlock conditions that can occur because the ISA and LPC (Low Pin-Count) buses do not support transaction retry. For example, if an ISA (LPC) Master initiates a transaction that requires a response, the bus cannot handle a new request prior to the current transaction having completed. This type of protocol is extremely simple from an ordering perspective because all transactions must complete before the next one begins; thus, no ordering rules are required. Of course the downside to this approach is that all other devices are stalled while they wait for the current transaction to complete. Delayed transactions supported by the PCI bus and split transactions supported by PCI-X and HyperTransport can handle new transactions while a response to a previous transaction is pending. The price — complex ordering rules to ensure that transactions complete in the intended order.&lt;br /&gt;&lt;br /&gt;&lt;h5 class="docSection3Title"&gt;Deadlock Scenario 1&lt;/h5&gt; &lt;p class="docText"&gt;Consider the following sequence of events as they relate to the  limitations of the ISA/LPC bus as discussed above and to the PCI-based  Producer/Consumer transaction ordering model. &lt;a class="docLink" href="#ch20fig06"&gt;&lt;/a&gt;&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;An ISA/LPC Master initiates a transaction that requires a  response from the Host-to-HT Bridge (e.g., a memory read from main  memory).&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The CPU initiates a write operation targeting a device on the  ISA/LPC bus, and the Host Bridge issues this write as a posted  operation.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The posted write reaches HT-to-PCI bridge where it is sent  across the PCI bus to the south bridge.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The south bridge cannot accept the write targeting the ISA bus  because the ISA/LPC bus is waiting for the outstanding response. So, the south  bridge issues a retry.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The read response reaches the HT/PCI bridge. However, the  Producer/Consumer model requires that all previously-posted write headed to the  PCI bus be completed before sending a read response. The read response is now  stuck behind a posted write that cannot complete prior to the read response.  Result: Deadlock!&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;br /&gt;&lt;p class="docText"&gt;The recommended solution to this problem is to require that all  requests targeting the ISA/LPC bus be non-posted operations. This eliminates the  problem because non-posted operations can be forwarded to the PCI bus in any  order.&lt;/p&gt;&lt;a name="ch20lev3sec12"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Deadlock Scenario 2&lt;/h5&gt; &lt;p class="docText"&gt;Once again because the ISA or LPC bus is unable to accept any  requests while it waits for a response to its own requests a possible deadlock  can occur. This deadlock can occur when the downstream non-posted request  channel fills up while awaiting a response to an ISA DMA request. The sequence  of events is as follows:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;A DMA request is issued by an ISA/LPC device to main  memory.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Downstream requests targeting the ISA bus are initiated but  stack up because they are not being accepted by the south bridge, because its's  waiting on a response from the previously issued DMA request. Consequently, it  is possible for the downstream nonposted request channel to fill.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;A peer-to-peer operation is initiated to a device on the same  chain that is in the non-posted request queue ahead of the ISA/LPC request (in  step 1) This peer-to-peer transaction is sent to the Host, which attempts to  reflect the transaction downstream to the target device. However, because the  downstream request channel is full; the upstream nonposted peer request stalls  as does the request from the ISA bus. This prevents the ISA/LPC bridge from  making forward progress.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt; &lt;p class="docText"&gt;The solution to this deadlock is for the host to limit the  number of requests it makes to the ISA/LPC bus to a known number of requests  (typically one) that the bridge can accept. Because the host cannot limit peer  requests without eventually blocking the upstream nonposted channel (and causing  another deadlock), no peer requests to the ISA/LPC bus are allowed. Peer  requests to devices below the ISA/LPC bridge on the chain (including other  devices in the same node as the ISA/LPC bridge) cannot be performed without  deadlock unless the ISA/LPC bridge sinks the above mentioned known number of  requests without blocking requests forwarded down the chain. This can be  implemented with a buffer (or set of buffers) for requests targeting the bridge,  but separate from the buffering for other requests.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-7069468842378470443?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/7069468842378470443/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=7069468842378470443' title='45 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/7069468842378470443'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/7069468842378470443'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/isalpc-buses.html' title='ISA/LPC Buses'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>45</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-4381989242489922364</id><published>2007-06-26T22:13:00.000-07:00</published><updated>2007-06-26T22:15:18.008-07:00</updated><title type='text'>AGP Bus Issues</title><content type='html'>&lt;table border="0" cellpadding="0" cellspacing="0" width="100%"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td valign="top"&gt;&lt;a name="ch20lev1sec4"&gt;&lt;/a&gt; &lt;h3 class="docSection1Title"&gt;&lt;a name="idd1e42147"&gt;&lt;/a&gt;AGP Bus Issues&lt;/h3&gt;&lt;a name="ch20lev2sec8"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;AGP&lt;a name="idd1e42155"&gt;&lt;/a&gt; Configuration Space  Requirements&lt;/h4&gt; &lt;p class="docText"&gt;Some legacy operating systems require that the AGP &lt;a name="idd1e42162"&gt;&lt;/a&gt;capability registers be mapped at Bus 0, Device 0, and  Function 0. Also, The AGP aperture base address configuration register must be  at Bus 0, Device 0, Function 0, Offset 10h. In a legacy system, these registers  are located within the Host to PCI bridge configuration space (Host to HT bridge  in our example).&lt;/p&gt;&lt;br /&gt;&lt;p class="docText"&gt;For complete legacy software support, the specification  recommends that the AGP subsystem be designed as follows:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;AGP bridges are placed logically on HyperTransport chain 0 (Bus  0).&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;The AGP interface uses multiple UnitIDs due to AGP  configuration being split between the Host to HT bridge and the Host to AGP  bridge (i.e., virtual PCI to PCI bridge).&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;During initialization the base UnitID of an AGP device must be  assigned a non-zero value to support configuration of chain 0. Following HT  initialization the base UnitID should be changed to zero.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Device number zero, derived from the base UnitID register  value, should contain the capabilities header and the AGP aperture base address  register (at Offset 10h),&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Device number 1, derived from the base UnitID+1, should be used  for the Host to AGP bridge.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;The UnitID that matches the base (0) is not used for any  AGP-initiated I/O streams or responses so that there is no conflict with  host-initiated I/O streams or responses. Only UnitIDs greater than the base may  be used for I/O streams.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Legacy implementations place the AGP graphics address remapping  table (GART) in the host. Thus, the AGP aperture base address register and any  other registers that are located in the AGP device but required by the host are  copied by software into implementation-specific host registers. These  implementation-specific registers should be placed somewhere other than Device  0, to avoid conflicts with other predefined AGP registers. In a sharing  double-hosted chain, this requires the hosts to implement the Device Number  field so that the hosts may address each other after the AGP bridge has assumed  Device 0. &lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;Note that if legacy OS support is not required, the AGP  device's base UnitID register may be programmed to any permissible value.&lt;/p&gt;&lt;a name="ch20lev2sec9"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e42247"&gt;&lt;/a&gt;AGP Ordering Requirements&lt;/h4&gt; &lt;p class="docText"&gt;Three categories of AGP transaction types lead to three  separate sets of ordering rules. These categories can be thought of as three  separate transaction channels. These three channels are completely independent  of each other with respect to ordering, and should have their own UnitIDs. The  transaction types are:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;PCI-based&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Low Priority&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;High Priority&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;The specification makes the following observation that leads to  HT-based AGP ordering requirements being slightly less complex that PCI-based  requirements:&lt;/p&gt; &lt;blockquote&gt; &lt;p&gt; &lt;/p&gt;&lt;p class="docList"&gt;The ordering rules presented here for reads are somewhat  different from what appears in the AGP specification. That document defines  ordering between reads in terms of the order that data is returned to the  requesting device. We are concerned here with the order in which the reads are  seen at the target (generally, main memory). The I/O bridges can reorder  returning read data if necessary. This leads to a slightly relaxed set of  rules.&lt;/p&gt; &lt;/blockquote&gt; &lt;p class="docText"&gt;See MindShare's AGP System Architecture book for details  regarding the AGP ordering rules.&lt;/p&gt;&lt;a name="ch20lev3sec8"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;PCI-Based Ordering&lt;/h5&gt; &lt;p class="docText"&gt;AGP transactions based on the PCI protocol follow the same  rules as PCI. &lt;/p&gt;&lt;a name="ch20lev3sec9"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Low Priority Ordering&lt;/h5&gt; &lt;p class="docText"&gt;Ordering rules for the low priority AGP transactions are:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Reads (including flushes) must not pass writes.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Writes must not pass writes.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Fences must not pass other transactions or be passed by other  transactions.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a name="ch20lev3sec10"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;High Priority Ordering&lt;/h5&gt; &lt;p class="docText"&gt;High priority transactions only carry graphics data using split  transactions. Consequently, the &lt;a name="idd1e42343"&gt;&lt;/a&gt;Producer/Consumer model  has no relevance and ordering requirements can be reduced to the following  single rule:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Writes must not pass writes.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;/p&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-4381989242489922364?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/4381989242489922364/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=4381989242489922364' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/4381989242489922364'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/4381989242489922364'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/agp-bus-issues.html' title='AGP Bus Issues'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-5025354385635851602</id><published>2007-06-26T22:12:00.000-07:00</published><updated>2007-06-26T22:13:51.322-07:00</updated><title type='text'>PCI Bus Issues</title><content type='html'>&lt;p class="docText"&gt;Several features of the PCI bus must be handled in the correct  fashion when interfacing with the HT bus. For background information and details  regarding PCI ordering, refer to MindShare's PCI System Architecture book, 4th  edition.&lt;/p&gt;&lt;a name="ch20lev2sec1"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;PCI Ordering Requirements&lt;a name="idd1e40626"&gt;&lt;/a&gt;&lt;/h4&gt; &lt;p class="docText"&gt;Transaction ordering on the PCI bus is based on the &lt;a name="idd1e40635"&gt;&lt;/a&gt;Producer/Consumer programming model. This model involves 5  elements:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;Producer —&lt;/span&gt; PCI master that  sources data to a memory target&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;Target —&lt;/span&gt; main memory or any PCI  device containing memory&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;Consumer —&lt;/span&gt; PCI master that  reads and processes the Producer data from the target&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;Flag element —&lt;/span&gt; a memory or I/O  location updated by the producer to indicate that all data has been delivered to  the target, and checked by the Consumer to determine when it can begin to read  and process the data.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;Status element —&lt;/span&gt; a memory or  I/O location updated by the Consumer to indicate that it has processed all of  the Producer data, and checked by the Producer to determine when the next batch  of data can be sent.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt; &lt;p class="docText"&gt;This model works flawlessly in PCI when all elements reside on  the same shared PCI bus. When these elements reside on different PCI buses (i.e.  across PCI to PCI bridges, the model can fail without adherence to the PCI  ordering rules.&lt;/p&gt; &lt;p class="docText"&gt;The PCI specification, versions 2.2 and 2.3, defines the  required transaction ordering rules. These ordering rules are included in this  section as review and to identify rules that have may have no purpose in some HT  designs. &lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;PMW stands for posted memory write.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;DRR and DRC stand for Delayed Read Request and Delayed Read  Completion, respectively.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;DWR and DWC stand for Delayed Write Request and Delayed Write  Completion, respectively.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;"Yes" specifies that the transaction just latched must be  ordered ahead of the previously latched transaction indicated in the column  heading.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;"No" specifies that the transaction just latched must never be  ordered ahead of the previously latched transaction indicated in the column  heading.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;"Yes/No" entries means that the transaction just latched is  allowed to be ordered ahead of the previously-latched operation indicated in the  column heading, but such reordering is not required. The &lt;a name="idd1e40758"&gt;&lt;/a&gt;Producer/Consumer Model works correctly either way.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;h5 class="docSection3Title"&gt;Avoiding Deadlocks&lt;/h5&gt; &lt;p class="docText"&gt;&lt;a name="idd1e40940"&gt;&lt;/a&gt;PCI ordering rules require that Posted  Memory Writes (PMWs) in Row 1, be ordered ahead of the delayed requests and  delayed completions listed in columns 2-5. This requirement is based on avoiding  potential deadlocks. Each of the deadlocks involve scenarios arising from the  use PCI bridges based on earlier versions of the specification. If all PCI  bridge designs used in HT platforms are based on 2.1 and later versions of the  PCI specification, the PCI ordering rules with "Yes" entries in row 1 can be  treated as "Yes/No."&lt;/p&gt;&lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e40986"&gt;&lt;/a&gt;Subtractive Decode&lt;/h4&gt; &lt;p class="docText"&gt;PCI employs a technique referred to as subtractive decode to  handle devices that are mapped into memory or I/O address space by user  selection of switches and jumpers (e.g. ISA devices). Consequently,  configuration software has no knowledge of the resources assigned to these  devices. Fortunately, these PC legacy devices are mapped into relatively small  ranges of address space that can be reserved by platform configuration  software.&lt;/p&gt;&lt;a name="ch20lev3sec2"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Subtractive Decode: The PCI Method&lt;/h5&gt; &lt;p class="docText"&gt;Subtractive decode is a process of elimination. Since  configuration software allocates and assigns address space for PCI, HT, AGP and  other devices, any access to address locations not assigned can be presumed to  target a legacy device, or may be an errant address.&lt;/p&gt; &lt;p class="docText"&gt;All PCI devices must perform a positive decode to determine if  they are being targeted by the current request. This decode must be performed as  a fast, medium, or slow decode. The device targeted must indicate that it will  respond to the request by signaling device select (DEVSEL#) across the shared  bus. When device driver software issues a request with an address that has not  been assigned by configuration software, no PCI device is targeted (i.e. no  DEVSEL# is asserted within the time allowed) By process of elimination, the  subtractive decode agent recognizes that no PCI device has responded and  therefore it asserts DEVSEL# and forwards the transaction to the ISA bus, where  the request is completed.&lt;/p&gt;&lt;h5 class="docSection3Title"&gt;Subtractive Decode: HT Systems Requiring Extra  Support&lt;/h5&gt; &lt;p class="docText"&gt;When the subtractive decode agent is not at the end of a  single-hosted chain, or when more than one HT I/O chain is implemented in a  system, subtractive decode becomes more difficult.&lt;/p&gt;&lt;a name="ch20lev4sec1"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;The Problem&lt;/h5&gt; &lt;p class="docText"&gt;HyperTransport devices in a chain do not share the same bus as  in PCI, so a subtractive decode agent cannot detect if a request has not been  claimed by other devices on the chain.&lt;/p&gt;&lt;a name="ch20lev4sec2"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;The Solution&lt;/h5&gt; &lt;p class="docText"&gt;As described previously, configuration software assigns  addresses to all HT, PCI, and AGP devices. Therefore, the host knows when a  request will result in a positive decode and when it will not. The specification  requires that all hosts connecting to HyperTransport I/O chains implement  registers that identify the positive decode ranges for all HyperTransport  technology I/O devices and bridges (except as noted in the simple method). One  of these I/O chains may also include a subtractive bridge (typically leading to  an ISA, or LPC bus). Requests that do not match any of the positive ranges must  be issued with the &lt;a name="idd1e41095"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;compat  bit&lt;/span&gt; set, and must be routed to the chain containing the subtractive  decode bridge. This chain is referred to as the compatibility chain.&lt;/p&gt; &lt;p class="docText"&gt;The Compat bit indicates to the subtractive decode bridge that  it should claim the request, regardless of address. Requests that fall within  the positive decode ranges must not have the Compat bit set, and are passed to  the I/O chain upon which the target device resides. The target chain may be the  compatibility or any other I/O chain.&lt;/p&gt;&lt;h4 class="docSection2Title"&gt;PCI Burst Transactions&lt;/h4&gt; &lt;p class="docText"&gt;PCI permits long burst transactions with either contiguous or  discontiguous &lt;a name="idd1e41547"&gt;&lt;/a&gt;byte masks (byte enables) that may not be  supported by HT. These long bursts must be broken into multiple requests to  support the HT protocol as follows:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;PCI read requests with discontiguous byte masks that cross  aligned 4-byte boundaries must be broken into multiple 4-byte HT RdSized (byte)  requests.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;PCI write requests with discontiguous byte masks that cross  32-byte boundaries must be broken into multiple 32-byte HT WrSized (byte)  requests. Note that the resulting sequence of write requests must be strongly  ordered in ascending address order.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;PCI write requests with contiguous byte masks that cross  64-byte boundaries must be broken into multiple 64-byte HT WrSized (dword)  request&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a href="0321168453_"&gt;&lt;/a&gt;&lt;ul&gt;&lt;li&gt;&lt;p class="docList"&gt;&lt;a name="idd1e40762"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-5025354385635851602?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/5025354385635851602/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=5025354385635851602' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/5025354385635851602'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/5025354385635851602'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/pci-bus-issues.html' title='PCI Bus Issues'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-8116407561686531360</id><published>2007-06-26T22:11:00.000-07:00</published><updated>2007-06-26T22:12:18.641-07:00</updated><title type='text'>The Need For Networking Extensions</title><content type='html'>&lt;p class="docText"&gt;While HyperTransport was initially developed to address  bandwidth and scalability problems associated with moving data through the I/O  subsystems of desktops and servers, the networking extensions bring a number of  enhancements which permit the advantages of HyperTransport technology to be  extended to communications processing applications. There are some major  differences in the requirements of host-centric systems such as desktops and  servers and communications processing systems.&lt;/p&gt;&lt;a name="ch19lev2sec4"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Communications Processing Is Often Less Vertical&lt;/h4&gt; &lt;p class="docText"&gt;In communications applications, there may be a number of  processors or coprocessors located in various corners of the topology. The host  processor may assume responsibility for configuration and control of  coprocessors and interface devices, while the coprocessors perform specialized  data processing tasks. Because of the distributed responsibility for control and  data handling tasks, these systems tend to be much less host  processor-centric.&lt;/p&gt; &lt;p class="docText"&gt;As a result of decentralizing data processing in communications  systems, information flow may be omni-directional as coprocessors initiate  transactions targeting devices under their control. When switch components are  added to the topology, elaborate multi-port configurations are possible.&lt;/p&gt;&lt;h3 class="docSection1Title"&gt;Summary Of Anticipated Networking Extension  Features&lt;/h3&gt; &lt;a name="ch19lev2sec6"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Network Extensions Adds&lt;a name="idd1e40302"&gt;&lt;/a&gt;  Message Semantics&lt;/h4&gt; &lt;p class="docText"&gt;In handling the special problems of communications processing,  the HyperTransport networking extensions add &lt;span class="docEmphasis"&gt;message  semantic&lt;/span&gt;s to the &lt;span class="docEmphasis"&gt;storage semantics&lt;/span&gt; used in  the 1.04 revision of the HyperTransport I/O Link Specification. Storage  semantics were described in the last section. Message semantics are more  efficient in handling variable length transfers, broadcasting messages, etc. The  64-byte HyperTransport packets are concatenated to form longer messages, and  additions to request packet fields identify the start of a message, end of a  message, or may even be used to signal the abort of a scheduled transaction.  Unlike storage semantics, in which the payload is data targeting an address,  messages can also be sent which convey interrupts and other housekeeping  events.&lt;/p&gt; &lt;p class="docText"&gt;Another difference between message semantics and storage  semantics is the concept of addressing. In storage semantics, addresses are  managed by the source device, and each byte of data transferred is associated  with a particular address in the system memory map. This makes sense because the  locations are within (and owned by) the device being targeted. In message  semantics, the message is tagged as to which stream it belongs, and the  destination determines where it goes. The ultimate destination is often external  to the system, where the system memory map has no meaning.&lt;/p&gt;&lt;a name="ch19lev2sec7"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;16 New Posted Write&lt;a name="idd1e40323"&gt;&lt;/a&gt; Virtual  Channels&lt;/h4&gt; &lt;p class="docText"&gt;Release 1.1 adds 16 new optional Posted Write Virtual Channels  to the hardware of each node (above the three already required). Each of these  new virtual channels may be given a dedicated bandwidth allocation, and an  arbitration mechanism is defined for managing them.&lt;/p&gt; &lt;p class="docText"&gt;An End-To-End flow control mechanism has also been added to  allow devices to put millions of &lt;span class="docEmphasis"&gt;user streams&lt;/span&gt;  into these 16 additional virtual channels. In this way, very large numbers of  independent real-time streams (e.g. audio or video) make be handled.&lt;/p&gt;&lt;a name="ch19lev2sec8"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Direct&lt;a name="idd1e40341"&gt;&lt;/a&gt; Peer-to-Peer Transfers  Added&lt;/h4&gt; &lt;p class="docText"&gt;HyperTransport supports the full producer-consumer ordering  model of PCI. In cases where this strict global ordering is needed, transactions  from one HyperTransport I/O device to another (called peer-to-peer transfers)  must first move upstream to the host bridge where they are then reissued  downstream to the target device (a process HyperTransport calls &lt;span class="docEmphasis"&gt;reflection&lt;/span&gt;). Release 1.1 adds the option of sending  send some traffic directly from peer-to-peer when the application does not  require strict global ordering (it often isn't a concern in communications  processing). &lt;/p&gt;&lt;a name="ch19lev2sec9"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Link-Level Error Detection And Handling&lt;/h4&gt; &lt;p class="docText"&gt;With the addition of direct peer-to-peer transfers, Release 1.1  permits coprocessors and other devices to communicate directly without  involvement of the host bridge. Along with this capability, network extensions  provide for error detection and correction on the individual link level. In the  event of an error, the receiver sends information back to the transmitter which  causes a re-transmission of the packet. Obviously, the packet can't be consumed  or forwarded until its validity is checked.&lt;/p&gt;&lt;a name="ch19lev2sec10"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;64 Bit Addressing Option&lt;/h4&gt; &lt;p class="docText"&gt;In keeping with the very large address space of many newer  systems, Release 1.05 allows the optional extension of the normal 40-bit  HyperTransport request address field to 64 bits.&lt;/p&gt;&lt;a name="ch19lev2sec11"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Increased Number Of Host Transactions&lt;/h4&gt; &lt;p class="docText"&gt;Release 1.05 increases the number of outstanding transactions  that a host bridge may have in progress from 32 to 128.&lt;/p&gt;&lt;a name="ch19lev2sec12"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;End-To-End Flow Control&lt;/h4&gt; &lt;p class="docText"&gt;In communication systems, there are occasions when devices are  transferring packets to distant targets (not immediate neighbors) which may go  "not ready" (or to another state which makes them unable to accept traffic) for  extended periods. Prior to Release 1.1, HyperTransport devices only have flow  control information for their immediate neighbors. Release 1.1 adds new  end-to-end flow control packets which distant devices may send to each other to  indicate their ability to participate in transfers. If a device is not ready,  the source device does not start sending (or continue sending) packets; this  helps eliminate bottlenecks which otherwise occur when the &lt;a name="idd1e40390"&gt;&lt;/a&gt;flow control buffers of devices in the path between source  and target become full of packets which cannot be forwarded. &lt;/p&gt;&lt;a name="ch19lev2sec13"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Switch Devices Formally Defined&lt;/h4&gt; &lt;p class="docText"&gt;Finally, Release 1.05 formally defines the &lt;span class="docEmphasis"&gt;switch&lt;/span&gt; device type which may be used to help implement  the complex topologies required in communications systems. A switch behaves much  like a two-level HyperTransport-HyperTransport bridge with multiple secondary  interfaces. The basic characteristics of a switch include:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;A switch consumes one or more UnitIDs on its host interface.  The port attached to the host is the default upstream port.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The switch acts as host bridge for each of its other  interfaces. Each interface has its own bus number.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Switches, like bridges, are allowed to reassign UnitID,  Sequence ID, and SrcTag for transactions passed to other busses. The switch  maintains a table of outstanding (non-posted) requests in order to handle  returning responses.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Switches may be programmed to perform address  translation.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Switches must maintain full producer-consumer ordering for all  combinations of transaction paths.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Switches must provide a method for configuration of downstream  devices on all ports.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-8116407561686531360?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/8116407561686531360/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=8116407561686531360' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/8116407561686531360'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/8116407561686531360'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/need-for-networking-extensions.html' title='The Need For Networking Extensions'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-6382824534345570019</id><published>2007-06-26T22:10:00.002-07:00</published><updated>2007-06-26T22:11:22.488-07:00</updated><title type='text'>Server And Desktop Topologies Are Host-Centric</title><content type='html'>a typical desktop or server platform is somewhat vertical. It has one or more  processors at the top of the topology, the I/O subsystem at the bottom, and main  system DRAM memory in the middle acting as a holding area for processor code and  data as well as the source and destination for I/O DMA transactions performed on  behalf of the host processor(s). The host processor plays the central role in  both device control and in processing data; this is sometimes referred to as  managing both the &lt;span class="docEmphasis"&gt;control plane&lt;/span&gt; and the &lt;span class="docEmphasis"&gt;data plane.&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;p class="docText"&gt;HyperTransport works well in this dual role because of its  bandwidth and the fact that the protocol permits control information including  &lt;a name="idd1e40069"&gt;&lt;/a&gt;configuration cycles, error handling events, interrupt  messages, flow control, etc. to travel over the same bus as data — eliminating  the need for a separate control bus or additional sideband signals.&lt;/p&gt;&lt;a name="ch19lev2sec1"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Upstream And Downstream Traffic&lt;/h4&gt; &lt;p class="docText"&gt;There is a strong sense of &lt;span class="docEmphasis"&gt;upstream&lt;/span&gt; and &lt;span class="docEmphasis"&gt;downstream&lt;/span&gt;  data flow in server and desktop systems because very little occurs in the system  that is not under the direct control of the processor, acting through the &lt;a name="idd1e40086"&gt;&lt;/a&gt;host bridge. Nearly all I/O initiated requests move upstream  and target main memory; peer-peer transactions between I/O devices are the  infrequent exception.&lt;/p&gt;&lt;a name="ch19lev2sec2"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e40094"&gt;&lt;/a&gt;Storage Semantics In Servers  And Desktops&lt;/h4&gt; &lt;p class="docText"&gt;Without the addition of networking extensions, HyperTransport  protocol follows the conventional model used in desktop and server busses (CPU  host bus, PCI, PCI-X, etc.) in which all data transfers are associated with  memory addresses. A write transaction is used to store a data value at an  address location, and a read transaction is used to later retrieve it. This is  referred to as associating &lt;span class="docEmphasis"&gt;storage semantics&lt;/span&gt; with  memory addresses. The basic features of the storage semantics model  include:&lt;/p&gt;&lt;a name="ch19lev3sec1"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Targets Are Assigned An Address Range In&lt;a name="idd1e40114"&gt;&lt;/a&gt; Memory Map&lt;/h5&gt; &lt;p class="docText"&gt;At boot time, the amount of DRAM in the system is determined  and a region at the beginning of the system &lt;a name="idd1e40121"&gt;&lt;/a&gt;address map  is reserved for it. In addition, each I/O device conveys its resource  requirements to configuration software, including the amount of prefetchable or  non-prefetchable memory-mapped I/O address space it needs in the system address  map. Once the requirements of all target devices are known, configuration  software assigns the appropriate starting address to each device; the target  device then "owns" the address range between the start address and the start  address &lt;span class="docEmphUl"&gt;plus&lt;/span&gt; the request size.&lt;/p&gt;&lt;a name="ch19lev3sec2"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Each Byte Transferred Has A Unique Target  Address&lt;/h5&gt; &lt;p class="docText"&gt;In storage semantics, each data packet byte is associated with  a unique target address. The first byte in the data packet payload maps to the  start address and successive data packet bytes are assumed to be in sequential  addresses following the start address.&lt;/p&gt;&lt;a name="ch19lev3sec3"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;The Requester Manages Target Addresses&lt;/h5&gt; &lt;p class="docText"&gt;An important aspect of storage semantics is the fact that the  requester is completely responsible for managing transaction addresses within  the intended target device. The target has no influence over where the data is  placed during write operations or retrieved in read operations.&lt;/p&gt; &lt;p class="docText"&gt;In HyperTransport, the requester generates request packets  containing the target start address, then exchanges packets with the target  device. The maximum packet data payload is 64 bytes (16 dwords). Transfers  larger than 64 bytes are comprised of multiple discrete transactions, each to an  adjusted start address. Using HyperTransport's storage semantics, an ordered  sequence of transactions may be initiated using posted writes or including a  non-zero &lt;span class="docEmphasis"&gt;SeqID&lt;/span&gt; field in the non-posted requests,  but there is no concept of streaming data, per se.&lt;/p&gt;&lt;a name="ch19lev3sec4"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e40154"&gt;&lt;/a&gt;Storage Semantics Work Fine In  Servers And Desktops&lt;/h5&gt; &lt;p class="docText"&gt;As long as each requester is programmed to know the addresses  it must target, managing address locations from the initiator side works well  for general purpose data PIO, DMA, and peer-peer exchanges involving CPU(s),  memory and I/O devices. When the target is prefetchable memory, storage  semantics also help support performance enhancements such as write-posting, read  pre-fetching, and caching — all of which depend on a requester having full  control of target addresses.&lt;/p&gt;&lt;a name="ch19lev2sec3"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;1.04 Protocol Optimized For Host-Centric Systems&lt;/h4&gt; &lt;p class="docText"&gt;Because the HyperTransport I/O Link Protocol was initially  developed as an alternative to earlier server and desktop bus protocols that use  storage semantics (e.g. PCI), the 1.04 revision of the protocol is optimized to  improve performance while maintaining backwards compatibility in host-centric  systems:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The strongly ordered producer-consumer model used in PCI  transactions which guarantees flag and data coherence regardless of the location  of the producer, consumer, flag location, or data storage location is available  in the HyperTransport protocol.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;&lt;a name="idd1e40188"&gt;&lt;/a&gt;Virtual channel ordering may optionally  be relaxed in transfers where the full producer-consumer model is not  required.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The strong sense of upstream and downstream traffic on busses  such as PCI is also preserved in HyperTransport. Programmed I/O (PIO)  transactions move downstream from CPU to I/O device via the host bridge. I/O bus  master transactions move upstream towards main memory.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Direct peer-peer transfers are not supported in the 1.04  revision of the HyperTransport I/O Link Specification; requests targeting  interior devices must travel up to the host bridge, then be reissued (reflected)  back downstream towards the target.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt; &lt;p class="docText"&gt;All of the above features work well for what they are intended  to do: support a host-centric system in which control and data processing  functions are both handled by the host processor(s), and I/O devices perform DMA  data transfers using main system memory as a source and sink for data.&lt;/p&gt;&lt;table border="0" cellpadding="0" cellspacing="0" width="100%"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td valign="top"&gt;&lt;a name="ch19lev1sec3"&gt;&lt;/a&gt; &lt;h3 class="docSection1Title"&gt;Some Systems Are Not Host-Centric&lt;/h3&gt; &lt;p class="docText"&gt;Unlike server and desktop computers, some processing  applications do not lend themselves well to a host-centric topology. This  includes cases where there are multiple levels of processing, complex look-up  functions, protocol translation, etc. In these cases, a single processor (or  even multiple CPUs on a host bus) can quickly become a bottleneck. Often what  works more effectively is to assign control functions to a host processor and  distribute data processing functions across multiple co-processors under its  control. In some cases, pipeline (cascaded) co-processing is used to reduce  latency.&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-6382824534345570019?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/6382824534345570019/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=6382824534345570019' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/6382824534345570019'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/6382824534345570019'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/server-and-desktop-topologies-are-host.html' title='Server And Desktop Topologies Are Host-Centric'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-3407948537432042507</id><published>2007-06-26T22:10:00.001-07:00</published><updated>2007-06-26T22:10:39.750-07:00</updated><title type='text'>X86 Power Management Support</title><content type='html'>&lt;p class="docText"&gt;X86 power management is based on the ACPI specification for the  Windows operation environment. The specification defines specific timing  requirements associated with STPCLK and SMI message cycles related to power  management events. The specification also describes ACPI-defined system state  transitions that relate to wakeup event signaling via &lt;a name="idd1e39890"&gt;&lt;/a&gt;LDTREQ#. See the specification for reference information  related to these events.&lt;/p&gt;&lt;a name="ch18lev2sec2"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e39897"&gt;&lt;/a&gt;Stop Clock Signal&lt;/h4&gt; &lt;p class="docText"&gt;The &lt;a name="idd1e39904"&gt;&lt;/a&gt;STPCLK# is one of the basic x86  power management signals. When power management logic asserts this signal, it  places the CPU into its Stop Grant State, which has the following effects (Intel  PIII example). The processor:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;issues a Stop Grant Acknowledge transaction&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;stops driving the AGTL FSB signals, allowing them to return to  the minimum power state (pulled up by termination resistors to VTT)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;turns off clocks to internal architecture regions, except  external bus (FSB) and interrupt sections (e.g. IOAPIC).&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;latches incoming interrupts, but does not service them until  the CPU returns to the Normal State.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;handles requests for Snoop transactions on the FSB; to do this  the CPU transitions to the HALT/Grant Snoop State to perform the snoop, then  returns to the Stop Grant State upon completion.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;When STPCLK# is deasserted, the CPU returns to the Normal  State. Many newer CPU's have an additional signal which may be used to expand  the number of low power states. For example, the Intel Pentium III has a SLP#  (Sleep) signal used in conjunction with STPCLK# to drive the CPU into a very  deep low power state (e.g., clocks are stopped, no interrupts are recognized,  and no snoops are performed). This is the next best thing to being powered down  completely, and the time to recover to normal operation is much faster.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-3407948537432042507?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/3407948537432042507/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=3407948537432042507' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/3407948537432042507'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/3407948537432042507'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/x86-power-management-support.html' title='X86 Power Management Support'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-2944089603143210960</id><published>2007-06-26T22:09:00.000-07:00</published><updated>2007-06-26T22:10:00.805-07:00</updated><title type='text'>Two Types Of Double-Hosted Chains</title><content type='html'>&lt;p class="docText"&gt;There are two basic arrangements for double-hosted chains:  &lt;span class="docEmphasis"&gt;sharing&lt;/span&gt; and &lt;span class="docEmphasis"&gt;non-sharin&lt;/span&gt;g.&lt;/p&gt;&lt;a name="ch17lev2sec3"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Sharing Double-Hosted Chain&lt;/h4&gt; &lt;p class="docText"&gt;In a sharing double-hosted chain, traffic is allowed to flow  from end to end. Either host may target any of the devices in the chain,  including the other host. In this arrangement, one host is the &lt;a name="idd1e39339"&gt;&lt;/a&gt;master host bridge and the other is the &lt;a name="idd1e39343"&gt;&lt;/a&gt;slave host bridge. The determination about which host is  master or slave is not defined in the specification, but must be defined before  reset occurs. Most likely, the system board layout will determine master/slave  host bridges — possibly through a strapping option on the motherboard. &lt;a class="docLink" href="#ch17fig04"&gt;&lt;/a&gt;&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;table border="0" cellpadding="0" cellspacing="0" width="100%"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td valign="top"&gt;&lt;a name="ch17lev1sec2"&gt;&lt;/a&gt; &lt;h3 class="docSection1Title"&gt;Two Types Of&lt;a name="idd1e39319"&gt;&lt;/a&gt; Double-Hosted  Chains&lt;/h3&gt; &lt;p class="docText"&gt;There are two basic arrangements for double-hosted chains:  &lt;span class="docEmphasis"&gt;sharing&lt;/span&gt; and &lt;span class="docEmphasis"&gt;non-sharin&lt;/span&gt;g.&lt;/p&gt;&lt;a name="ch17lev2sec3"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Sharing Double-Hosted Chain&lt;/h4&gt; &lt;p class="docText"&gt;In a sharing double-hosted chain, traffic is allowed to flow  from end to end. Either host may target any of the devices in the chain,  including the other host. In this arrangement, one host is the &lt;a name="idd1e39339"&gt;&lt;/a&gt;master host bridge and the other is the &lt;a name="idd1e39343"&gt;&lt;/a&gt;slave host bridge. The determination about which host is  master or slave is not defined in the specification, but must be defined before  reset occurs. Most likely, the system board layout will determine master/slave  host bridges — possibly through a strapping option on the motherboard.  sharing  double-hosted chain with master and slave host bridges.&lt;/p&gt; &lt;center&gt; &lt;h5 class="docFigureTitle"&gt;&lt;a name="ch17fig04"&gt;&lt;/a&gt;&lt;/h5&gt;&lt;/center&gt;&lt;a name="ch17lev3sec1"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;If Possible, Assign All Devices To &lt;a name="idd1e39362"&gt;&lt;/a&gt;Master &lt;a name="idd1e39366"&gt;&lt;/a&gt;Host Bridge&lt;/h5&gt; &lt;p class="docText"&gt;The HyperTransport specification recommends that all resources  in a sharing double-hosted chain be assigned to the master host bridge if  possible; this eliminates a potential deadlock condition in &lt;a name="idd1e39373"&gt;&lt;/a&gt;peer-to-peer transactions. The Slave Command Register &lt;span class="docEmphasis"&gt;Master Host&lt;/span&gt; and &lt;span class="docEmphasis"&gt;Default  Direction&lt;/span&gt; bits in PCI configuration space are used to program tunnel  devices with the information needed to recognize the "upstream vs. downstream"  directions. This is important because interior devices always issue requests and  responses in the upstream direction. They only accept responses in the  downstream direction.&lt;/p&gt;&lt;a name="ch17lev3sec2"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;If Slave Must Access Devices, It Uses Peer-to-Peer  Transfers&lt;/h5&gt; &lt;p class="docText"&gt;The slave host in a sharing double-hosted chain may be  required to access the devices on the link. To do so, it may have its Command  Register &lt;span class="docEmphasis"&gt;Act as Slave&lt;/span&gt; bit set = 1. When this is  done, all packets it issues travel first to the master host bridge where they  are reissued back to the target devices as peer-to-peer transactions.&lt;/p&gt;&lt;a name="ch17lev2sec4"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Non-Sharing&lt;a name="idd1e39407"&gt;&lt;/a&gt; Double-Hosted  Chain &lt;a name="idd1e39411"&gt;&lt;/a&gt;&lt;/h4&gt; &lt;p class="docText"&gt;A non-sharing double-hosted chain appears logically as two  distinct chains with a host bridge at each end.&lt;/p&gt;&lt;a name="ch17lev3sec3"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Software May Break The Chain&lt;/h5&gt; &lt;p class="docText"&gt;Software chooses a point to break the chain in two parts and  then:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;While the link is idle, the link between the two tunnel devices  is broken by programing the &lt;a name="idd1e39430"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;End  Of Chain&lt;/span&gt; (EOC) bits in the appropriate tunnel &lt;a name="idd1e39439"&gt;&lt;/a&gt;Link  Control registers on each side. The &lt;span class="docEmphasis"&gt;Transmit Off&lt;/span&gt;  bit in each of the Link Control registers can also be set.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The slave host bridge writes to the Slave Command register for  each device now under its control to force the &lt;span class="docEmphasis"&gt;Master  Host&lt;/span&gt; and &lt;span class="docEmphasis"&gt;Default Direction&lt;/span&gt; bits in each to  point at the slave host bridge.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Unique bus numbers are assigned to each segment in a  non-sharing double-hosted chain. The bus number is used so that chains may be  uniquely identified and so &lt;a name="idd1e39460"&gt;&lt;/a&gt;type 1 configuration cycles  may be forwarded and/or converted to type 0 cycles by bridges.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;If peer-to-peer transactions are not required, software link  partitioning can also be used for load balancing.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;br /&gt;&lt;h4 class="docSection2Title"&gt;Additional Notes About Double-Hosted Chains&lt;/h4&gt;&lt;a name="ch17lev3sec4"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Initialization In A&lt;a name="idd1e39493"&gt;&lt;/a&gt;  Double-Hosted Chain&lt;/h5&gt; &lt;p class="docText"&gt;One of the responsibilities of a master host bridge in a  double-hosted chain is to help with initialization after reset. Following &lt;a name="idd1e39500"&gt;&lt;/a&gt;low-level link initialization, the slave host bridge  "sleeps" pending set up by the master. The basic steps in master initialization  include:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The master host bridge sets the Slave Command CSR &lt;span class="docEmphasis"&gt;master host&lt;/span&gt; bit to point towards the master host bridge  in all slave devices it finds. This bit is set automatically whenever the Slave  Command CSR is written.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;When the master host bridge discovers the slave host bridge, it  sets the Host Command CSR &lt;span class="docEmphasis"&gt;Double Ended&lt;/span&gt; bit in the  both its own and the slave's Host Command register. This informs the slave (when  it wakes up) that it is in a double-hosted chain and that it is not required to  configure devices below it.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;If the &lt;span class="docEmphasis"&gt;Double Ended&lt;/span&gt; bit is not  set in the slave, it will initialize its end of the double ended chain when it  awakens.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch17lev3sec5"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Type 0&lt;a name="idd1e39534"&gt;&lt;/a&gt; Configuration Cycles In  A&lt;a name="idd1e39538"&gt;&lt;/a&gt; Double-Hosted Chain&lt;/h5&gt; &lt;p class="docText"&gt;Because all host bridges tend to own UnitID 0, a configuration  &lt;a name="idd1e39545"&gt;&lt;/a&gt;cycle carrying a device number field of "0" in a  double-hosted chain might be misinterpreted. The direction a type 0  configuration cycle request is traveling determines which host bridge is the  target. If configuration software wishes to prevent a host bridge (e.g. the  slave host) in a double-hosted chain from accessing another host's &lt;a name="idd1e39549"&gt;&lt;/a&gt;configuration space, the Host Command Register &lt;span class="docEmphasis"&gt;host hide&lt;/span&gt; bit may be set = 1.&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-2944089603143210960?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/2944089603143210960/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=2944089603143210960' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/2944089603143210960'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/2944089603143210960'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/two-types-of-double-hosted-chains.html' title='Two Types Of Double-Hosted Chains'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-2965474324554128737</id><published>2007-06-26T22:08:00.001-07:00</published><updated>2007-06-26T22:08:48.737-07:00</updated><title type='text'>Other Fields In The Header of HT Tech</title><content type='html'>&lt;h5 class="docSection3Title"&gt;Primary&lt;a name="idd1e38828"&gt;&lt;/a&gt; Latency Timer  Register&lt;/h5&gt; &lt;p class="docText"&gt;This register is not implemented by HyperTransport devices. &lt;a name="idd1e38835"&gt;&lt;/a&gt;Should return 0's if read by software. If primary bus is PCI  or PCI-X, use of this register follows that protocol.&lt;/p&gt;&lt;a name="ch16lev3sec10"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Base Address Registers&lt;/h5&gt; &lt;p class="docText"&gt;The two &lt;a name="idd1e38853"&gt;&lt;/a&gt;Base Address Registers (BARs)  are used by bridges in much the same way as for PCI bridge devices, with the  following limits if the primary interface is HyperTransport:&lt;/p&gt;&lt;a name="ch16lev4sec7"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;I/O BAR&lt;/h5&gt; &lt;p class="docText"&gt;For an I/O request, a single BAR is implemented. Only the lower  25 bits of the value programmed into the BAR is used for address comparison by  the target, and the upper bits of the BAR should be written to zeros by system  software. Any I/O request packet sent out on a link should have the start  address bits 39-25 programmed for the I/O range in the HyperTransport memory  map.&lt;/p&gt;&lt;a name="ch16lev4sec8"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Memory BAR&lt;/h5&gt; &lt;p class="docText"&gt;A request for memory using 32-bit addressing can be  accomplished using a single BAR, just as in PCI. This would limit the assigned  target start address for the device to the lower 4GB of the 1 TB (40 bit)  HyperTransport address map.&lt;/p&gt; &lt;p class="docText"&gt;Optionally, a HyperTransport device may support 64 bit address  decoding, and use a pair of BARs to support it. If this is done, only the lower  40 bits of the 64 bit BAR memory address will be valid, and the upper bits are  assumed to be zeros.&lt;/p&gt; &lt;p class="docText"&gt;Memory windows for HyperTransport devices are always assigned  in BARs on 64-byte boundaries; this assures that even the largest transfer (16  dwords/64 bytes) will never cross a device address boundary. This is important  because HyperTransport does not support a disconnect mechanism (such as PCI  uses) to force early transaction termination.&lt;/p&gt;&lt;a name="ch16lev3sec11"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e38895"&gt;&lt;/a&gt;Capabilities Pointer&lt;/h5&gt; &lt;p class="docText"&gt;This field contains a pointer to the first advanced capability  block. Because all HyperTransport bridge devices have at least one advanced  capability, this register is always implemented. The pointer is an absolute byte  offset from the beginning of configuration space to the first byte of the first  advanced capability register block.&lt;/p&gt;&lt;a name="ch16lev3sec12"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e38906"&gt;&lt;/a&gt;Interrupt Line Register&lt;/h5&gt; &lt;p class="docText"&gt;The HyperTransport specification indicates that this register  should be read-writable and may be used as a software scratch pad. The  information routing information programmed into this register in PCI devices  isn't required in HyperTransport because interrupt messages are sent over the  links and sideband interrupts are not defined. If the primary bridge interface  is PCI or PCI-X, this register is used by software to program the system  interrupt mapped to this device.&lt;/p&gt;&lt;a name="ch16lev3sec13"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e38926"&gt;&lt;/a&gt;Interrupt Pin Register&lt;/h5&gt; &lt;p class="docText"&gt;This register is reserved in the HyperTransport Specification.  It may optionally be implemented for compatibility with software which may  expect to gather interrupt pin information from all PCI-compatible devices. If  the primary bus interface is PCI or PCI-X, this register is hard-coded with the  interrupt pin driven by this device (if any).&lt;/p&gt;&lt;a name="ch16lev3sec14"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e38946"&gt;&lt;/a&gt;Cache Line Size Register&lt;/h5&gt; &lt;p class="docText"&gt;This register is not implemented by HyperTransport devices. If  both interfaces are HyperTransport, bit should be tied low and read back as 0's  if read by software. If either interface is PCI, this register is  read-write.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-2965474324554128737?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/2965474324554128737/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=2965474324554128737' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/2965474324554128737'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/2965474324554128737'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/other-fields-in-header-of-ht-tech.html' title='Other Fields In The Header of HT Tech'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-6279123204935476281</id><published>2007-06-26T22:07:00.002-07:00</published><updated>2007-06-26T22:08:08.915-07:00</updated><title type='text'>Basic Jobs Of A HyperTransport Bridge</title><content type='html'>&lt;p class="docText"&gt;As in the case of PCI bridges, a HyperTransport&lt;a name="idd1e37423"&gt;&lt;/a&gt; bridge has a number of responsibilities:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;It extends the topology through the addition of one or more  secondary buses. Each HyperTransport chain (bus) can support up to 32 UnitIDs.  Because a device is permitted to consume multiple UnitID's, implementing a  bridge is a reasonable way to add a new chain that can support 32 additional  UnitIDs (the bridge secondary interface consumes at least one of the new  UnitIDs).&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;It acts as host for each of its secondary chains. There are  many aspects to this, including ordering responsibilities, error handling,  maintaining a queue for outstanding transactions routed to other buses,  reflecting peer-to-peer transactions originating below it, decoding memory  addresses so it may claim and forward transactions moving between the primary  and secondary bus, forwarding/converting &lt;a name="idd1e37437"&gt;&lt;/a&gt;configuration  cycles based on target bus number, etc.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;In cases where it bridges between HyperTransport and PCI/PCI-X,  the bridge also must translate protocols for transactions going in either  directiion. It may also have to remap address ranges between the 40-bit  HyperTransport address range and the 32/64-bit PCI or PCI-X range.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-6279123204935476281?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/6279123204935476281/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=6279123204935476281' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/6279123204935476281'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/6279123204935476281'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/basic-jobs-of-hypertransport-bridge.html' title='Basic Jobs Of A HyperTransport Bridge'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-8465929065931718904</id><published>2007-06-26T22:07:00.001-07:00</published><updated>2007-06-26T22:07:23.101-07:00</updated><title type='text'>Why Use Pseudo-Synchronous Clock Mode?</title><content type='html'>&lt;h5 class="docSection3Title"&gt;Why Use Pseudo-Synchronous Clock Mode?&lt;/h5&gt; &lt;p class="docText"&gt;The specification does not address any specific application for  Pseudo-Synchronous clock mode. It appears that the main advantage is that a link  is given the ability to transfer data in one direction at a higher rate than the  other. But this begs the question, "Why not transfer in both directions at the  highest speed possible, thereby keeping bus efficiency as high as possible?" It  further raises the question of a possible advantage associated with clocking one  direction at a slower rate; however, there would be power savings, reduced EMI,  and reduced transmit PHY complexity.&lt;/p&gt;&lt;a name="ch15lev3sec7"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Implementation Issues&lt;/h5&gt; &lt;p class="docText"&gt;Pseudo-synchronous &lt;a name="idd1e37112"&gt;&lt;/a&gt;clocking mode must  take into account the same clock variance issued as synchronous &lt;a name="idd1e37116"&gt;&lt;/a&gt;mode. Additionally, several other key issues must be  considered for pseudo-synchronous clocking mode. These issues include:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Methods and procedures required to implement pseudo-sync  mode.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Managing the FIFOs and pointers given the different transmit  and receive clock frequencies.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Is support mandatory?&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a name="ch15lev4sec12"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Methods and Procedures&lt;/h5&gt; &lt;p class="docText"&gt;The specification does not define a mechanism to lower the  transmit clock frequency, nor does it provide a method for determining which  clock modes are supported by a given HT device. The specification states  that:&lt;/p&gt; &lt;blockquote&gt; &lt;p class="docText"&gt;"The means by which the operating mode is selected for a device  that can support multiple modes is outside the scope of this  specification."&lt;/p&gt;&lt;/blockquote&gt; &lt;p class="docText"&gt;Further, no definition exists regarding the level of software  that would be involved in transitioning a device to the pseudo-sync mode.&lt;/p&gt;&lt;a name="ch15lev4sec13"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;FIFO Management&lt;/h5&gt; &lt;p class="docText"&gt;Pseudo-sync mode must consider the same sources of clock  variation as in synchronous mode and the &lt;a name="idd1e37172"&gt;&lt;/a&gt;receive FIFOs  must be sized appropriately and the separation between the write and read  pointers must be established.&lt;/p&gt; &lt;p class="docText"&gt;Because Tx Clock Out may run slower than Rx Clk in  pseudo-synchronous mode, incoming packets may be clocked into the receive FIFO  more slowly than they are clocked out. This situation results in a buffer  underrun condition. To prevent this from happening the unload pointer  occasionally must be stopped and then restarted when sufficient data is present  in the receive FIFO. One approach to solving the potential underrun problem is  to implement the FIFO to set a flag when the read pointer reaches the write  pointer. The unload pointer could be stopped to keep additional reads from  occurring until the situation is corrected. When sufficient separation between  the load and unload pointers have accumulated, the flag can be cleared and reads  can continue.&lt;/p&gt;&lt;a name="ch15lev4sec14"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Is Support for Pseudo-Sync Mode Required?&lt;/h5&gt; &lt;p class="docText"&gt;The HT specification clearly requires support for &lt;span class="docEmphStrong"&gt;synchronous&lt;/span&gt; clocking mode for all devices. It further  states that:&lt;/p&gt; &lt;blockquote&gt; &lt;p class="docText"&gt;"Devices may also implement Pseudo-sync and Async modes based  on their unique requirements."&lt;/p&gt;&lt;/blockquote&gt; &lt;p class="docText"&gt;This statement suggests that Pseudo-sync mode is conditionally  required; that is, it's optional unless a device has some special conditions  that require the support. Further, the specification does not mention any  requirement for standard synchronous devices to operate correctly when attached  to devices that operate in pseudo-sync mode. It may be that it is expected that  all synchronous clocking mode devices will be able to inter-operate with  pseudo-sync devices. As discussed in the previous section, support for  pseudo-sync mode at the receiving end simply requires that the FIFO read pointer  not be allowed to advance to the same entry as the write pointer.&lt;/p&gt;&lt;a name="ch15lev2sec6"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e37208"&gt;&lt;/a&gt;Asynchronous Clock Mode&lt;/h4&gt; &lt;p class="docText"&gt;The asynchronous clock mode permits the transmit and receive  clocks to be derived from different sources. The specification limits the  maximum difference permitted between the transmit and receive clock frequency.  In this case, either the transmit clock or the receive clock may run faster than  the other. So, both situations must be taken into account.&lt;/p&gt;&lt;a name="ch15lev3sec8"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Transmit Clock Slower Than Receive Clock&lt;/h5&gt; &lt;p class="docText"&gt;In this case, a potential underrun condition can develop. The  solution for preventing underrun is the same as that discussed for the  pseudo-synchronous clock mode as discussed in "&lt;a class="docLink" href="#ch15lev4sec13"&gt;FIFO Management&lt;/a&gt;." on page 401. In summary, the FIFO  read pointer is prevented from reaching the write pointer by stopping the read  clock until the transmit clock has had a chance to catch up.&lt;/p&gt;&lt;a name="ch15lev3sec9"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Transmit Clock Faster Than Receive Clock&lt;/h5&gt; &lt;p class="docText"&gt;Tx Clock Out can run slightly faster than Rx Clk in  asynchronous mode (but by no more than 2000 ppm), thus incoming packets may be  clocked into the receive FIFO faster than they are clocked out. This situation  will result in a buffer overrun condition, and the receiver has no way of  stopping or slowing the incoming packets. The following discussion describes how  to prevent the buffer overrun condition from occurring.&lt;/p&gt; &lt;p class="docText"&gt;CRC bits appear on the link for 4 bit-times (on 8-,16-, and  32-bit links) after every 512 bit-times. These CRC bits are detected by the  receiver, but NOT clocked into the receive FIFO. Instead the CRC bits are routed  into the CRC error checking logic. Consequently, the FIFO write pointer does not  increment during the CRC bit times, but the read pointer continues to increment  and data continues to be read from the FIFO. As a result, the unload pointer has  sufficient time to catch-up by clock data in the receive FIFO out before the  buffer overruns.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-8465929065931718904?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/8465929065931718904/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=8465929065931718904' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/8465929065931718904'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/8465929065931718904'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/why-use-pseudo-synchronous-clock-mode.html' title='Why Use Pseudo-Synchronous Clock Mode?'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-4115734851946167165</id><published>2007-06-26T22:05:00.000-07:00</published><updated>2007-06-26T22:07:00.705-07:00</updated><title type='text'>Clock Initialization in HT Technology</title><content type='html'>&lt;p class="docText"&gt;The &lt;a name="idd1e36080"&gt;&lt;/a&gt;receive FIFO in each device must be  able to absorb timing differences between the transmit and receive clocks. Data  is written into the FIFO in the transmit clock domain and read in the receive  clock domain.&lt;/p&gt; &lt;p class="docText"&gt;The design and operation of this FIFO must account for the  dynamic variations in phase between the transmit clock domain (Tx Clock Out) and  the receive clock domain (Rx Clock). The FIFO depth must be large enough to  store all transmitted data until it has been safely read into the receive clock  domain. The separation from the write pointer to which the FIFO data is written  and the read pointer from which the FIFO location is read (write-to-read  separation) must be large enough to ensure the FIFO location can be read into  the receive clock domain.&lt;/p&gt; &lt;p class="docText"&gt;The deassertion of the incoming CTL/&lt;a name="idd1e36090"&gt;&lt;/a&gt;CAD  signals across a rising CLK edge is used in the transmit clock domain within  each receiver to initialize the write (load) pointer. The same deassertion CTL  and CAD signals is read from the FIFO synchronous to the receive clock domain  and used to initialize the read (unload) pointer. The separation between the  write and read pointers is calculated based on worst-case variation between the  transmit and receive clocks.&lt;/p&gt; &lt;p class="docText"&gt;Note also that CTL cannot be used to initialize the pointers  for byte lanes other than 0 in a multi-byte link, because CTL only exists within  the byte 0 transmit clock domain.&lt;/p&gt;&lt;br /&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;/p&gt;&lt;h3 class="docSection1Title"&gt;&lt;a name="idd1e36106"&gt;&lt;/a&gt;Synchronous Clock Mode&lt;/h3&gt; &lt;p class="docText"&gt;The specification requires that all HT devices support the  synchronous clock mode. This mode is the least complicated method of  transferring data from transmitter to receiver. Synchronous clock mode requires  that the transmit clock and receive clock have the same source, and operate at  the same frequency. If we were to assume that the transmit clock and the receive  clock always remained synchronized, then a simple clocking interface could be  used as described in the following example.&lt;/p&gt;&lt;a name="ch15lev2sec1"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;A Conceptual Example&lt;/h4&gt; &lt;p class="docText"&gt;In this synchronous example, the transmit clock (Tx Clock) and  receive clock (Rx Clock) are presumed to be in synchronization. Note, however,  that source synchronous clocking requires that Transmit Clock Out (Tx Clk Out)  be 90° phase shifted from Tx Clock. In this example all other sources of  transmit to receive clock variation are ignored, including the expected clock  drift associated with PLLs.&lt;/p&gt; &lt;p class="docText"&gt;The transmitter delivers data synchronously across the link using  the transmit clock. Tx Clock Out is sourced later and lags the data by 90° (or  one-half bit time), thereby centering the clock edge in the middle of the valid  data interval. When the data arrives at the receiver it is clocked into the FIFO  using Tx Clock Out. Note that the clocked FIFO has two entries, which provides a  separation of 1 between Tx Clock Out and Rx Clock. Data written into the FIFO  during clock 1 would not be read from the FIFO using Rx Clock until clock 2.  This one entry separation (called write-to-read separation) permits time for the  sample to be stored prior to being read (i.e. the FIFO entry is not being  written to and read from in the same clock cycle). In short, two FIFO entries  are sufficient to provide the separation needed to ensure that data is safely  stored and transferred into the receive clock domain.&lt;/p&gt;&lt;p class="docText"&gt;However, in the real world many factors contribute to timing  differences between the transmit and receive clock that are potentially  significant, even though the clocks originate from the same source. These real  world perturbations result in somewhat more complicated implementations that  must account for and manage the worst case variation between the transmit and  receive clocks. Specifically, the specification describes the receive FIFO  implementation for handling the variation between the transmit and receive  clocks.&lt;/p&gt;&lt;a name="ch15lev2sec2"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Sources of Transmit and Receive Clock Variance&lt;/h4&gt; &lt;p class="docText"&gt;The specification defines and details the sources of transmit  and receive clock variation that can exist. These clock differences can create  FIFO overflow or underflow if not identified and taken into account. The clock  differences can be attributed to two different categories or sources:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;Invariant sources —&lt;/span&gt; components  that represent a constant phase shift between the transmit and receive clock  domain.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;Variant sources —&lt;/span&gt; dynamic  variations in the transmit and receive time domain (these phase variations can  occur even though both transmit and receive clock are running at the same  frequency).&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;The sources of clock variation in some cases can accumulate  over time, causing clock variation to increase over time. However, all of the  sources of clock variation are naturally limited in terms of the maximum amount  of change that can occur. For example, a PLL is designed to produce an output  clock that is synchronized with the input source clock, but with certain  limitations. That is, variation of output frequency is specified not to change  beyond a certain phase shift. The time over which the clock phase may change can  be relatively short or perhaps much longer depending upon conditions. The  consideration and assessment of the sources of clock variance is done to  determine a FIFO size that can absorb the worst-case clock variation. This would  occur if all sources of clock variation simultaneously reach their extremes, a  very unlikely circumstance.&lt;/p&gt; &lt;p class="docText"&gt;This chapter discusses the variant and invariant sources of  transmit clock to receive clock variance. It also provides an example timing  budget for each source.&lt;/p&gt;&lt;a name="ch15lev3sec1"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Invariant Sources&lt;/h5&gt; &lt;p class="docText"&gt;The time-invariant factors contribute a small proportion of the  overall clock variance. The invariant factors include:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Cross-byte skew in multi-byte link implementations&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Sampling Error&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch15lev4sec1"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Cross&lt;a name="idd1e36208"&gt;&lt;/a&gt;-byte skew in multi-byte  link implementations&lt;/h5&gt; &lt;p class="docText"&gt;Differences in the arrival of Tx Clock Out at the receiver  (CLKIN) between each byte lane is caused by path length mismatch. This constant  skew is termed T&lt;sub&gt;bytelaneconst&lt;/sub&gt; in the specification. The specification  allows up to 1000ps for this skew. Consequently, when multiple bytes are clocked  into the FIFO the maximum skew could result in one of the bytes being clocked  into the FIFO 1000ps later than the associated bytes. Thus, when the associated  bytes are clocked out of the FIFO by Rx Clock, one byte having arrived late may  be left behind. This problem is solved by adding additional entries in the FIFOs  to handle the maximum lane-to-lane skew, ensuring that all associated bytes are  clocked out at the same time. Note that lane-to-lane skew may change due to the  effects of temperature, voltage change, etc. This parameter called  T&lt;sub&gt;bytelanevar&lt;/sub&gt; is included in the variant source list.&lt;/p&gt;&lt;a name="ch15lev4sec2"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Sampling Error&lt;/h5&gt; &lt;p class="docText"&gt;Uncertainty in read pointer due to CTL sampling error in the  receive clock domain (1 device specific Rx Clock bit time). The specification  does not specifically define the source of this sampling error, but is likely  caused by phase variations between the Tx Clock Out and Rx Clock that could  cause a sample to be missed. Adding an additional bit time solves this  problem.&lt;/p&gt;&lt;a name="ch15lev3sec2"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Variant Sources&lt;/h5&gt; &lt;p class="docText"&gt;The phase difference between the transmit and receive clock may  change significantly due to dynamic factors such as:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Reference Clock Distribution Skew.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;PLL Variation in Transmitter and Receiver.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Transmitter and Link Transfer Variation&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Receiver Transfer Variation&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Dynamic Cross Byte Lane Variation&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt; &lt;p class="docText"&gt;All time variant parameters must be considered in terms of  their worst-case variance. The total dynamic phase variation due to these  factors is called T&lt;sub&gt;variant.&lt;/sub&gt; Additionally, the transmit clock could  either LEAD the receive clock by T&lt;sub&gt;variant&lt;/sub&gt; or it could LAG the receive  clock by T&lt;sub&gt;variant&lt;/sub&gt;. Consequently, the receive FIFO must be sized to  accommodate both phase variations.&lt;/p&gt;&lt;a name="ch15lev4sec3"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Reference&lt;a name="idd1e36278"&gt;&lt;/a&gt; Clock Distribution  Skew&lt;/h5&gt; &lt;p class="docText"&gt;Synchronous clock mode requires that the input reference clocks  to the transmitter and receiver be derived from the same time base. The  distribution of the reference clock to the transmitter and the receiver results  in skew between the two reference clocks. This is due to:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;differences in the output skew of the clock source, including  phase error associated with Spread Spectrum Clocking in the reference clock  generator, and the skew associated with the mismatch in the distribution  path.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;differences in the distribution of the clocks to their PLLs due  primarily to temperature and voltage changes.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;This skew results in phase difference between the Transmit and  Receive Clocks and must be included in the T&lt;sub&gt;variant&lt;/sub&gt;  calculation.&lt;/p&gt;&lt;a name="ch15lev4sec4"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;PLL Variation in Transmitter and Receiver&lt;/h5&gt; &lt;p class="docText"&gt;The largest contribution to the overall Tx Clock to Rx Clock  variance comes from the PLLs. The PLL is constantly making adjustments to the  output frequency as a result of a feedback loop. In addition, voltage and  temperature changes also add to the possible output clock variation. The sample  timing budget included within the specification allows a maximum PLL output  phase variation of 3500ps. This represents &gt;1 bit time at the 400 MT/s rate  and approximately 5.6 bit times at the 1600MT/s rate.&lt;/p&gt;&lt;a name="ch15lev4sec5"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Transmitter and Link Transfer Variation&lt;/h5&gt; &lt;p class="docText"&gt;The transmitter clock error (accumulated over a single bit  time), the transmitter PHY, and the interconnect contribute small amounts of  phase error into the link transfer clock domain through all of the parameters  included in the link transfer timing. This includes noise on the PCB that  affects both the clock and data in the same way causing a minor shift in  frequency or phase of clock and data. (Note that if the noise affected the clock  and data differently, this would affect the maximum bit transfer rate due to  potential violations of T&lt;sub&gt;SU&lt;/sub&gt; and T&lt;sub&gt;HD&lt;/sub&gt;).&lt;/p&gt;&lt;a name="ch15lev4sec6"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Receiver Transfer Variation&lt;/h5&gt; &lt;p class="docText"&gt;The receiver contributes small amounts of phase error in the  received CLKIN due to distribution effects.&lt;/p&gt;&lt;h5 class="docSection3Title"&gt;Write-to-Read and Read-to-Write Separation&lt;/h5&gt; &lt;p class="docText"&gt;Recall that the FIFO depth must be large enough to store all  transmitted data until it has been safely read into the receive clock domain.  The separation from the write pointer location where data is written and the  read pointer location from which data is read must be large enough to ensure the  FIFO location can be read safely into the receive clock domain.&lt;/p&gt; &lt;p class="docText"&gt;To accommodate this clock variance in this example, the read  pointer within the FIFO would need to be separated from the write pointer by 8  entries (or, bit times). The following three scenarios are provided to explain  the operation of the FIFO and its pointers.&lt;/p&gt;&lt;p class="docText"&gt;&lt;span class="docEmphStrong"&gt;Stage A &lt;/span&gt;— the write pointer  has progressed from entry 0 to entry 8. Because the separation between the write  and read pointer is 8, Rx Clock is prevented from clocking data from the FIFO  until the separation reaches 8. At this stage, the separation has just been  reached, so Rx Clock clocks data from entry 0, while the Tx Clock Out clocks  data into entry 8.&lt;/p&gt; &lt;p class="docText"&gt;&lt;span class="docEmphStrong"&gt;Stage B&lt;/span&gt; — the write pointer  has progressed to entry 15 and because there is still no phase difference  between Tx Clock Out and Rx Clock the separation between the pointers remains at  8. Rx Clock is clocking data from entry 7 as Tx Clock Out is clocking data into  entry 15.&lt;/p&gt; &lt;p class="docText"&gt;&lt;span class="docEmphStrong"&gt;Stage C&lt;/span&gt; — the write pointer  has rolled from entry 15 back to entry 0 while the read pointer has advanced to  entry 8. This simply illustrates that the separation is still maintained when  the write pointer reaches the end of the FIFO and wraps back to entry 0.&lt;/p&gt;&lt;h5 class="docSection4Title"&gt;Scenario 3: Rx Clock Lags Tx Clock Out&lt;/h5&gt; &lt;p class="docText"&gt;This scenario presents the opposite condition that was  illustrated in scenario 2. In this example, the receive clock lags the transmit  clock. As in the previous example, the phase difference between the clocks would  not likely accumulate so quickly.&lt;/p&gt; &lt;p class="docText"&gt;&lt;span class="docEmphStrong"&gt;Stage A&lt;/span&gt; — the write pointer  has previously traversed all of the entries and is back at entry 0 again, while  the read pointer is at entry 8 This scenario focuses on the possibility that the  Rx Clock lags the Tx Clock Out clock. In this case, the read-to-write separation  becomes critical. In stage A this separation is 8.&lt;/p&gt; &lt;p class="docText"&gt;&lt;span class="docEmphStrong"&gt;Stage B&lt;/span&gt; — the write pointer  has advanced to entry 13, while the read pointer has only advanced to entry 15.  The write pointer had moved ahead by 13 entries and the read pointer has moved  only 7 entries, leaving a read-to-write separation of only 2.&lt;/p&gt; &lt;p class="docText"&gt;Once again, the large change in clock variance over such a  short period of time as illustrated in stage B would not occur. But the example  does serve to illustration that over time the clock variance can accumulate and  that an appropriately sized FIFO will be able to absorb the clock variance  without overflow.&lt;/p&gt;&lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e37073"&gt;&lt;/a&gt;Pseudo-Synchronous Clock  Mode&lt;/h4&gt; &lt;p class="docText"&gt;In pseudo-synchronous mode, both Rx Clk in the receiver device  and Tx Clk in the transmitter device are generated from the same time base clock  just as in the synchronous mode case. During initialization, software configures  each link to the maximum common frequency based on the values reported in each  device's frequency capability register. The highest frequency supported by both  devices is loaded into the &lt;a name="idd1e37080"&gt;&lt;/a&gt;Link Frequency register of  each device. This value defines the highest frequency that both devices can use  when sending packets over the link. In synchronous implementations this would be  the exact frequency used by both devices. However, a device implementing  pseudo-synchronous mode may arbitrarily lower the transmit clock frequency (Tx  Clk or Tx Clock Out) below that specified by the Link Frequency register. Note  that the receiver clock (Rx Clk) still runs at the frequency specified by the  Link Frequency register.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-4115734851946167165?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/4115734851946167165/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=4115734851946167165' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/4115734851946167165'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/4115734851946167165'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/clock-initialization-in-ht-technology.html' title='Clock Initialization in HT Technology'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-876979854403413713</id><published>2007-06-26T22:04:00.001-07:00</published><updated>2007-06-26T22:05:14.200-07:00</updated><title type='text'>Other Fields In The Header</title><content type='html'>&lt;h4 class="docSection2Title"&gt;Other Fields In The Header&lt;/h4&gt; &lt;p class="docText"&gt;The use of other fields in the type 0 header region of a  HyperTransport device include:&lt;/p&gt;&lt;a name="ch13lev3sec13"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e29940"&gt;&lt;/a&gt;Cache Line Size Register.  (Offset 0Ch)&lt;/h5&gt; &lt;p class="docText"&gt;This read-only register is not implemented by HyperTransport  devices. Should return 0's if read by software.&lt;/p&gt;&lt;a name="ch13lev3sec14"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e29951"&gt;&lt;/a&gt;Latency Timer Register.  (Offset 0Dh)&lt;/h5&gt; &lt;p class="docText"&gt;This register is not implemented by HyperTransport devices.  Should return 0's if read by software.&lt;/p&gt;&lt;a name="ch13lev3sec15"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e29962"&gt;&lt;/a&gt;Base Address Registers.  (Offset 10h-24h)&lt;/h5&gt; &lt;p class="docText"&gt;The six Base Address Registers (BARs) are used in much the same  way as for PCI devices, with the following limits:&lt;/p&gt;&lt;a name="ch13lev4sec4"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;I/O BAR&lt;/h5&gt; &lt;p class="docText"&gt;For an I/O request, a single BAR is implemented. Only the lower  25 bits of the value programmed into the BAR is used for address comparison by  the target, and the upper bits of the BAR should be written to zeros by system  software. Any I/O request packet sent out on a link should have the start  address bits 39-25 programmed for the I/O range in the HyperTransport memory  map.&lt;/p&gt;&lt;a name="ch13lev4sec5"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Memory BAR&lt;/h5&gt; &lt;p class="docText"&gt;A request for memory using 32-bit addressing can be  accomplished using a single BAR, just as in PCI. This would limit the assigned  target start address for the device to the lower 4GB of the 1 TB (40 bit)  HyperTransport address map.&lt;/p&gt; &lt;p class="docText"&gt;Optionally, a HyperTransport device may support 64 bit address  decoding, and use a pair of BARs to support it. If this is done, only the lower  40 bits of the 64 bit BAR memory address will be valid, and the upper bits are  assumed to be zeros.&lt;/p&gt; &lt;p class="docText"&gt;Memory windows for HyperTransport devices are always assigned  in BARs on 64-byte boundaries; this assures that even the largest transfer (16  dwords/64 bytes) will never cross a device address boundary. This is important  because HyperTransport does not support a disconnect mechanism (such as PCI  uses) to force early transaction termination.&lt;/p&gt;&lt;a name="ch13lev3sec16"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;CardBus CIS Pointer. (Offset 28h)&lt;/h5&gt; &lt;p class="docText"&gt;This register is not implemented by HyperTransport devices.  Should return 0's if read by software.&lt;/p&gt;&lt;a name="ch13lev3sec17"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e30015"&gt;&lt;/a&gt;Capabilities Pointer. (Offset  34h)&lt;/h5&gt; &lt;p class="docText"&gt;This field contains a pointer to the first advanced capability  block. Because all HyperTransport devices have at least one advanced capability,  this register is always implemented. The pointer is an absolute byte offset from  the beginning of configuration space to the first byte of the first advanced  capability register block.&lt;/p&gt;&lt;a name="ch13lev3sec18"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e30026"&gt;&lt;/a&gt;Interrupt Line Register.  (Offset 3Ch)&lt;/h5&gt; &lt;p class="docText"&gt;The HyperTransport Specification indicates that this register  should be read-writable and may be used as a software scratch pad. The  information routing information programmed into this register in PCI devices  isn't required in HyperTransport because interrupt messages are sent over the  links and sideband interrupts are not defined.&lt;/p&gt;&lt;a name="ch13lev3sec19"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e30040"&gt;&lt;/a&gt;Interrupt Pin Register.  (Offset 3Dh)&lt;/h5&gt; &lt;p class="docText"&gt;This register is reserved in the HyperTransport Specification.  It may optionally be implemented for compatibility with software which may  expect to gather interrupt pin information from all PCI-compatible  devices.&lt;/p&gt;&lt;a name="ch13lev3sec20"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Min_Gnt and Max_Latency Registers. (Offsets 3Eh and  3Fh)&lt;/h5&gt; &lt;p class="docText"&gt;&lt;a name="idd1e30057"&gt;&lt;/a&gt;These register fields are associated  with PCI shared-bus arbitration, and are not implemented by HyperTransport  devices. Should return 0's if read by software.&lt;/p&gt;&lt;h5 class="docSection3Title"&gt;Block Formats Vary With Capability And Device  Type&lt;/h5&gt; &lt;p class="docText"&gt;Each of the HyperTransport capability blocks has its own  format. The &lt;span class="docEmphasis"&gt;Type&lt;/span&gt; field in the first dword of each  capability block defines the format of the entire block. In addition, one of the  principal capability block types (Slave/Primary Interface) also varies with the  device which implements it because tunnel devices interface to two links and end  (cave) devices interface to only one.&lt;/p&gt;&lt;a name="ch13lev2sec20"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;The&lt;a name="idd1e30261"&gt;&lt;/a&gt; Slave/Primary Interface  Block&lt;/h4&gt; &lt;p class="docText"&gt;HyperTransport defines two principal advanced capability  register block formats, Slave/Primary and Host/Secondary, which reflect the two  possible roles a device interface can perform on a link. The Slave/Primary  format is used by all tunnels and single-link peripheral (cave) devices. These  devices never act as a host for a bus (they are slaves). In addition, because  they are not bridges, they have a single primary interface to the bus and no  secondary interfaces.&lt;/p&gt; &lt;p class="docText"&gt;One complicating factor is the fact that while an end (&lt;a name="idd1e30271"&gt;&lt;/a&gt;cave) device interfaces to only one link, a tunnel must  interface to two links (still only one bus, though). To accommodate this  difference, each Slave/Primary interface has two sets of link management  registers, one for each link. A tunnel device implements one Slave/Primary  interface and both sets of link management registers; an end (cave) device also  implements one Slave/Primary interface but only one set of link management  registers. &lt;/p&gt;&lt;p class="docText"&gt;&lt;a name="idd1e30064"&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-876979854403413713?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/876979854403413713/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=876979854403413713' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/876979854403413713'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/876979854403413713'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/other-fields-in-header.html' title='Other Fields In The Header'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-3680431656410682757</id><published>2007-06-26T22:02:00.000-07:00</published><updated>2007-06-26T22:03:54.563-07:00</updated><title type='text'>HyperTransport Configuration Space Format</title><content type='html'>&lt;p class="docText"&gt;This section describes the general format of the configuration  space used by a HyperTransport functional device. The discussion here focuses on  two major areas:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;How a HyperTransport device is similar and different from a PCI  device in its use of the generic &lt;span class="docEmphasis"&gt;header&lt;/span&gt; region of  configuration space.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;The use of the required and optional &lt;span class="docEmphasis"&gt;HyperTransport advanced capability register blocks&lt;/span&gt; also  located in the required 256 byte configuration space.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a name="ch13lev2sec14"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Two Header Formats Are Used&lt;/h4&gt; &lt;p class="docText"&gt;The first one fourth (16 dwords) of any PCI configuration space  is called the &lt;span class="docEmphasis"&gt;header.&lt;/span&gt; As in the case of PCI  devices, HyperTransport devices use two header formats: one for HT-to-HT  bridges, called &lt;span class="docEmphasis"&gt;header type 1,&lt;/span&gt; and the other for  all non-bridge devices (including &lt;a name="idd1e29346"&gt;&lt;/a&gt;tunnels and single link  end (&lt;a name="idd1e29350"&gt;&lt;/a&gt;cave) devices) called &lt;span class="docEmphasis"&gt;header  type 0.&lt;/span&gt; The lower bits in the &lt;span class="docEmphasis"&gt;Header Type&lt;/span&gt;  field within both types of PCI configuration header is hard coded with the type  code; software checks this field early in the process of device discovery to  determine which of the header formats it is dealing with.&lt;/p&gt; &lt;p class="docText"&gt;&lt;br /&gt;&lt;/p&gt;&lt;a name="ch13lev2sec15"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;The Type 0 Header Format&lt;/h4&gt; &lt;p class="docText"&gt;Basic PCI  functionality is managed by having BIOS or other low level software read certain  hard-coded header fields to obtain device requirements, then having it program  other fields to set up plug-and-play options.&lt;/p&gt;&lt;h4 class="docSection2Title"&gt;PCI &lt;a name="idd1e29424"&gt;&lt;/a&gt;Advanced &lt;a name="idd1e29428"&gt;&lt;/a&gt;Capability Registers&lt;/h4&gt; &lt;p class="docText"&gt;While many early PCI devices were managed using just the  register fields in the configuration space header, many additional features have  been added over the years which require dedicated registers to manage them. For  these devices which have capabilities beyond basic PCI compliance, the generic  PCI header registers are augmented by one or more additional register sets  outside of the header area, but still within the 256 byte PCI configuration  space. PCI calls these &lt;span class="docEmphasis"&gt;advanced capability&lt;/span&gt;  register blocks;&lt;/p&gt;&lt;h5 class="docSection3Title"&gt;Many Advanced Capabilities Are Defined&lt;/h5&gt; &lt;p class="docText"&gt;Under the current PCI specification, advanced capability block  register sets have been defined for all sorts of purposes. Two important classes  are:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Register sets for bus extensions such as HyperTransport, PCI-X,  and AGP.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Register sets for enhanced device management, including Message  Signalled Interrupts (MSI), Power Management, Vital Product Data,  etc.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;When a PCI compatible device is designed, the basic PCI  configuration space type 0 or type 1 header fields are implemented as are any  additional advanced capability register blocks which may be needed. The format  of an advanced capability block varies with the type, and a &lt;a name="idd1e29500"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;Capability ID&lt;/span&gt; byte at the  start of each block identifies which type it is; the capability ID for  HyperTransport is 08. At a minimum, a HyperTransport device must implement the  256 byte PCI configuration space memory, containing a header AND at least one  HyperTransport advanced capability block (&lt;span class="docEmphasis"&gt;Host/Secondary&lt;/span&gt; or &lt;span class="docEmphasis"&gt;Slave/Primary&lt;/span&gt; Interface Block).&lt;/p&gt;&lt;a name="ch13lev3sec10"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Discovering The Advanced Capability Blocks&lt;/h5&gt; &lt;p class="docText"&gt;If a PCI compatible device implements advanced capability  blocks, low-level software must find and configure each one. Because the  specific location of advanced capability blocks within the 256 byte  configuration space is not specified, they must be "discovered" by executing  some variation of the following software configuration process &lt;/p&gt;&lt;a name="ch13pr01"&gt;&lt;/a&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList"&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="1"&gt; &lt;p class="docText"&gt;Use the capability pointer (&lt;span class="docEmphasis"&gt;CapPtr&lt;/span&gt;) at dword 13 in the header to determine the  configuration space offset (from the beginning of configuration space) to the  first advanced capability register block. Check the first byte in the block to  determine the capability &lt;span class="docEmphasis"&gt;ID&lt;/span&gt; (HT = 08).&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="2"&gt; &lt;p class="docText"&gt;Next, check the upper byte in the first dword to determine the  HyperTransport capability block &lt;span class="docEmphasis"&gt;Type.&lt;/span&gt;  HyperTransport supports a number of these: Host/Secondary, Slave/Primary,  Interrupt Discovery &amp; Configuration, etc.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="3"&gt; &lt;p class="docText"&gt;Set up all of the registers in the capability block using  configuration cycles.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="4"&gt; &lt;p class="docText"&gt;Use the next pointer (&lt;span class="docEmphasis"&gt;NPtr&lt;/span&gt;)  contained in the second byte of the first advanced capability block to determine  the offset (from the beginning of configuration space) to the next capability  block in the "linked list". If the ID field is "08", this is another  HyperTransport capability block. Read the &lt;span class="docEmphasis"&gt;Type&lt;/span&gt;  field, and set up the register fields as appropriate.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="5"&gt; &lt;p class="docText"&gt;Continue the discovery and set up process until the last  capability block has been located and set up. If a block is the last one, its  &lt;span class="docEmphasis"&gt;Nptr&lt;/span&gt; field is zero — indicating the end of the  linked list of advanced capability blocks.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt; &lt;p class="docText"&gt;Refer to MindShare's &lt;span class="docEmphasis"&gt;PCI System  Architecture, 4th Ed.&lt;/span&gt; book for a complete description of configuration  space advanced capability management.&lt;/p&gt;&lt;a name="ch13lev2sec17"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;HyperTransport Configuration Type 0 Header  Fields&lt;/h4&gt; &lt;p class="docText"&gt;In this section, the configuration header format for non-bridge  HyperTransport devices (type 0 header format) is described. For the most part,  HyperTransport devices use these fields in the same way as PCI devices; the few  differences are described here. Header fields not mentioned are used in the same  way as in PCI devices. &lt;/p&gt;&lt;a name="ch13lev3sec11"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Header &lt;a name="idd1e29613"&gt;&lt;/a&gt;Command Register&lt;/h5&gt; &lt;p class="docText"&gt;the command register occupies the lower 16 bits at dword 01. The header Command  register is used by BIOS or other software to enable basic capabilities of the  device &lt;span class="docEmphUl"&gt;on the primary bus,&lt;/span&gt; including bus mastering,  target address decoding, error response capability, etc. &lt;/p&gt;&lt;p class="docText"&gt; &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-3680431656410682757?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/3680431656410682757/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=3680431656410682757' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/3680431656410682757'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/3680431656410682757'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/hypertransport-configuration-space.html' title='HyperTransport Configuration Space Format'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-5533900408756099166</id><published>2007-06-26T22:01:00.000-07:00</published><updated>2007-06-26T22:02:29.176-07:00</updated><title type='text'>How HyperTransport Handles Configuration Accesses</title><content type='html'>&lt;h4 class="docSection2Title"&gt;Configuration Cycles Are Memory Mapped&lt;/h4&gt; &lt;p class="docText"&gt;To generate a configuration space read or write, a  HyperTransport bridge simply sends a RdSized or non-posted WrSized request using  a reserved address range in the 40-bit HyperTransport memory map. This 32MB  range, recognized by all devices,&lt;/p&gt;&lt;h4 class="docSection2Title"&gt;How The 32MB Configuration Area Is Used&lt;a name="idd1e28766"&gt;&lt;/a&gt;&lt;/h4&gt; &lt;p class="docText"&gt;The 32MB HyperTransport memory map address space reserved for  configuration cycles is used to access the 256 byte &lt;a name="idd1e28772"&gt;&lt;/a&gt;configuration space of each function in each device on each  bus. How the address range is interpreted and how a particular device can  recognize configuration cycles it should claim vs. those it must forward&lt;br /&gt;&lt;/p&gt;&lt;h5 class="docSection3Title"&gt;Upper 16 Address Bits Indicate Type 0 And Type 1  Cycle&lt;/h5&gt; &lt;p class="docText"&gt;As in PCI &lt;a name="idd1e28797"&gt;&lt;/a&gt;configuration cycles,  HyperTransport requires two variants of configuration read/write cycles, type 1  and type 0. The type 0 configuration is generated by a bridge when the cycle has  reached the target bus (chain) where the device being accessed resides; the type  1 cycle is in transit to the target bus and should be forwarded by bridges or  tunnels in the target path. The bridge to the destination bus will convert it to  type 0.&lt;/p&gt; &lt;p class="docText"&gt;Because HyperTransport configuration cycles are distinguished  from other read/write requests only by the fact they target the 32MB reserved  configuration address range, the first problem is how to distinguish type 1 from  type 0 cycles. The 32MB configuration address range is further divided into two  parts: request packets carrying addresses in the upper 16MB of the range are  type 1 cycles; requests with addresses in the lower 16MB are type 0 cycles. &lt;/p&gt;&lt;a name="ch13lev3sec3"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;HyperTransport&lt;a name="idd1e28813"&gt;&lt;/a&gt; Type 1  Configuration Cycle&lt;/h5&gt; &lt;p class="docText"&gt;If a SizedRD or SizedWt request carries an address with the  upper 16 bits set = FDFFh, then the cycle is a type 1 configuration request.  Only bridges are allowed to accept these requests, and only if the &lt;span class="docEmphasis"&gt;bus number&lt;/span&gt; field in the address (bits 23:16) falls into  the range defined by the bridges Secondary-Subordinate bus number registers. The  bridge then passes the request downstream.&lt;/p&gt;&lt;a name="ch13lev3sec4"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;HyperTransport &lt;a name="idd1e28832"&gt;&lt;/a&gt;Type 0  Configuration Cycle&lt;/h5&gt; &lt;p class="docText"&gt;If a SizedRD or SizedWt request carries an address with the  upper 16 bits set = FDF8h, then the cycle is a type 0 configuration request.  This will be claimed by the device that also has a match when the &lt;span class="docEmphasis"&gt;device number&lt;/span&gt; field (bits 15:11) in the address matches  one of its UnitIDs. It then uses the &lt;span class="docEmphasis"&gt;function  numbe&lt;/span&gt;r and &lt;span class="docEmphasis"&gt;Dword&lt;/span&gt; fields to target the  particular internal function and configuration space offset.&lt;/p&gt;&lt;a name="ch13lev2sec11"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;No IDSEL Signal Needed In HyperTransport&lt;/h4&gt; &lt;p class="docText"&gt;Finally, there is no IDSEL signal to accompany a type 0  configuration cycle in the HyperTransport protocol. The need for this signal has  been eliminated because a &lt;span class="docEmphasis"&gt;Base UnitID&lt;/span&gt; field has  been included in the HyperTransport advanced capability register block so that a  device is programmed to "know" its UnitID number(s). This allows the device to  decode its own configuration cycles rather than depending on the upstream bridge  to do it with IDSEL.&lt;/p&gt;&lt;br /&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;/p&gt;&lt;h5 class="docSection3Title"&gt;Events In HT Configuration Example &lt;/h5&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Low level software executing on the CPU requires access of the  configuration space in Device 2 on Bus (chain) number 1.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The Host Bridge checks its secondary bus number register,  recognizes the target bus &lt;span class="docEmphUl"&gt;is not&lt;/span&gt; its secondary bus,  and sends a request packet for type 1 configuration cycle onto bus 0 (using the  upper half of configuration address range).&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The HT-to-HT bridge on Bus 0 checks the &lt;span class="docEmphasis"&gt;bus number&lt;/span&gt; field in the request and compares it with  its own secondary, and subordinate bus numbers. Because the target bus is below  it, the HT-to-HT bridge forwards the configuration cycle onto bus 1; at the same  time it converts it to a type 0 because the target bus has been reached.  Conversion to type 0 simply means shifting the configuration address into the  lower half of the configuration address range. Note that the bus number field is  stripped off when the cycle is converted to type 0.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Device 1 claims the cycle because it is a type 0 configuration  cycle AND it carries a &lt;span class="docEmphasis"&gt;device number&lt;/span&gt; which  matches one of its assigned UnitIDs.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Device 1 then uses &lt;span class="docEmphasis"&gt;function  number&lt;/span&gt; and &lt;span class="docEmphasis"&gt;dword offset&lt;/span&gt; fields in the  request packet to target the specific internal function and offset location in  its configuration space &lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch13lev2sec13"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Initializing Bus Numbers And &lt;a name="idd1e28969"&gt;&lt;/a&gt;Unit IDs&lt;/h4&gt; &lt;p class="docText"&gt;One of the first steps in HyperTransport configuration is the  initial assignment of bus numbers and UnitIDs for each device and chain in the  topology. Using a depth-first search algorithm, &lt;a name="idd1e28976"&gt;&lt;/a&gt;enumeration software assigns IDs to each device it  discovers; if it finds any HyperTransport bridges, it also assigns the primary,  secondary, and subordinate bus numbers so that later &lt;a name="idd1e28980"&gt;&lt;/a&gt;configuration cycles may find their way to target buses  other than bus 0.&lt;/p&gt;&lt;a name="ch13lev3sec6"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Case 1: A Single Chain With One &lt;a name="idd1e28988"&gt;&lt;/a&gt;Host Bridge&lt;/h5&gt; &lt;p class="docText"&gt;In a single chain with only one host bridge, enumeration is  fairly simple:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Following a reset assertion on a chain, the &lt;span class="docEmphasis"&gt;Base UnitID&lt;/span&gt; field in the Slave Command register of each  in HyperTransport device is cleared to "0".&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;In addition, reset forces the primary, secondary, and  subordinate bus number registers in each HyperTransport bridge and the secondary  bus number register in host bridges to "0" as well.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The transmitter and receiver interfaces on each link perform  the low-level negotiation to determine starting bus width. They also perform the  required link initialization sequence. Once synchronization is complete, the &lt;a name="idd1e29012"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;Initialization Complete&lt;/span&gt; bit  in each active &lt;a name="idd1e29018"&gt;&lt;/a&gt;Link Control register is set.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;After link synchronization, each active transmitter issues  buffer release (NOP) packets to the corresponding receiver to indicate its own  input flow control buffer capacities. Once this is done, each transmitter issues  NOPs until configuration starts.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The host bridge initializes its UnitID counter so it can start  assigning UnitIDs to slave devices it discovers (it reserves UnitID 0 for  itself).&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;If the host bridge's Link Control register &lt;span class="docEmphasis"&gt;Initialization Complete&lt;/span&gt; and &lt;span class="docEmphasis"&gt;End  Of Chain&lt;/span&gt; bits indicate that another device is attached to its secondary  bus, the host bridge sends a series of configuration cycles to the first device  in the chain. These type 0 configuration cycles target Bus 0, Device 0 (UnitID  0), Function 0. Because all devices default to UnitID 0, the first device will  claim the cycles. Read cycles will target configuration space locations  containing Vendor ID, Device ID, Class Code, Header Type, etc.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;At some point, the host bridge assigns new UnitID(s) to the  device by reading the &lt;span class="docEmphasis"&gt;Unit Count&lt;/span&gt; field in the  Slave Command Register and then programming (writing) the &lt;span class="docEmphasis"&gt;Base UnitID&lt;/span&gt; field with the next available UnitID (1).  For devices which request more than one Unit ID, this &lt;span class="docEmphasis"&gt;Base UnitI&lt;/span&gt;D is the first in a sequential set. &lt;span class="docEmphStrong"&gt;Note that the act of writing the Command register causes  the&lt;/span&gt; &lt;span class="docEmphBoldItalic"&gt;Base UnitID&lt;/span&gt; &lt;span class="docEmphStrong"&gt;field to be updated and the&lt;/span&gt; &lt;span class="docEmphBoldItalic"&gt;Master Host&lt;/span&gt; &lt;span class="docEmphStrong"&gt;bit to be  set (indicating the device link which points towards the host bridge).&lt;/span&gt;  Thereafter, the device uses its new UnitID when claiming configuration cycles,  etc. Only a reset or rewriting the Slave Command register causes the &lt;span class="docEmphasis"&gt;Base UnitID&lt;/span&gt; field to change.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Once all functions in the first device are configured, the host  repeats the process to access the next device in the chain. It again uses the  configuration cycle attributes of &lt;span class="docEmphasis"&gt;Bus 0, Device 0  (UnitID 0), Function 0.&lt;/span&gt; Now, the device which is already assigned as  UnitID 1, forwards the transaction downstream because the UnitID in the request  (0) does not match. The second device is then programmed as the first one was,  but the UnitID(s) assigned to it start where the previous device left off (i.e.,  UnitID 2).&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;After programming each device, the host bridge checks the &lt;span class="docEmphasis"&gt;End-Of-Chain (&lt;span class="docEmphasis"&gt;EOC&lt;/span&gt;)&lt;/span&gt; bit  set in the device's downstream Link Control Register. If this bit is set = 1,  the enumeration process for the chain is complete.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch13lev3sec7"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Case 2: A HyperTransport Bridge Is Discovered&lt;/h5&gt; &lt;p class="docText"&gt;If the enumeration process on &lt;a name="idd1e29098"&gt;&lt;/a&gt;a chain  encounters a &lt;a name="idd1e29102"&gt;&lt;/a&gt;HyperTransport-To-HyperTransport bridge or a  bridge from HyperTransport to a compatible protocol (PCI, AGP, PCI-X), then some  additional initialization is needed. A bridge is detected when a read of the  &lt;span class="docEmphasis"&gt;Header Type&lt;/span&gt; field in the configuration header  indicates that the device uses the type 1 header format. Software must program  the device in accordance with the type 1 header format which includes:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Programming the secondary and subordinate bus number registers  with the next available bus number (1). This will allow this bridge to forward  and/or convert subsequent configuration cycles targeting the new bus(ses) below  the bridge.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Setting up the Base Address Registers and other fields in the  configuration header in accordance with the protocol being used on the secondary  bus (HyperTransport, PCI-X, PCI, etc.).&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt; &lt;p class="docText"&gt;It is permissible for a HyperTransport bridge to have more than  one secondary bus and/or a tunnel interface for its primary bus. &lt;/p&gt;&lt;a name="ch13lev4sec1"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;A Note About &lt;a name="idd1e29147"&gt;&lt;/a&gt;Bus Numbering In  HyperTransport&lt;/h5&gt; &lt;p class="docText"&gt;Bus numbering in HyperTransport systems makes no distinction  between HyperTransport, PCI, AGP, or PCI-X buses. As bridges to other protocols  are discovered during enumeration, bus numbers are assigned without regard to  the particular protocol.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-5533900408756099166?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/5533900408756099166/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=5533900408756099166' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/5533900408756099166'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/5533900408756099166'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/how-hypertransport-handles.html' title='How HyperTransport Handles Configuration Accesses'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-982944264126098300</id><published>2007-06-26T22:00:00.000-07:00</published><updated>2007-06-26T22:01:06.546-07:00</updated><title type='text'>How PCI Handles Configuration Accesses</title><content type='html'>&lt;p class="docText"&gt;With the exception of chipsets, PCI devices generally power up  (or come out of reset) disabled with respect to either generating transactions  as bus master or decoding memory or I/O transactions as targets. This is because  they are not aware of either their own plug-and-play addresses or those of other  devices. The &lt;span class="docEmphasis"&gt;Configuration Read&lt;/span&gt; and &lt;span class="docEmphasis"&gt;Configuration Write&lt;/span&gt; transactions are the only ones a  PCI device may decode following reset. &lt;a name="idd1e28488"&gt;&lt;/a&gt;Configuration  cycles originate at the CPU, and instead of carrying conventional address  information (which would be useless), these cycles start downstream carrying the  following attributes about the target in the 32-bit address of the configuration  read or write transaction:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Bus number the target resides on (0-255 decimal)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Device number of the target (0-31 decimal)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Function number inside the target (0-7 decimal)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Double Word Offset in target's configuration space  (0-63decimal)&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;&lt;span class="docEmphasis"&gt;Note that while addresses are not known  after reset, bus number and device number are functions of the board layout and  ARE known.&lt;/span&gt;&lt;/p&gt;&lt;a name="ch13lev2sec5"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Two&lt;a name="idd1e28520"&gt;&lt;/a&gt;Configuration Cycle  Types&lt;/h4&gt; &lt;p class="docText"&gt;As PCI configuration cycles travel downstream, there are two  variants: type 0 and type 1. The type is indicated in the lowest two bits of the  32-bit PCI address. Having two types is necessary because PCI devices don't know  their bus number or device numbers and must depend on upstream bridges to help  select them.&lt;/p&gt;&lt;a name="ch13lev2sec6"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Type 1 Cycle Until Target Bus Is Reached&lt;/h4&gt; &lt;p class="docText"&gt;Starting at the host bridge, a &lt;a name="idd1e28544"&gt;&lt;/a&gt;&lt;a name="idd1e28547"&gt;&lt;/a&gt;type 1 configuration cycle is propagated downstream until it  reaches the bridge with a secondary bus number equal to that of the  configuration cycle &lt;span class="docEmphasis"&gt;bus number&lt;/span&gt; field. Type 1  configuration cycles are ignored by all devices except bridges which will claim  them and pass them on to the next downstream bus if the &lt;span class="docEmphasis"&gt;bus number&lt;/span&gt; field of the configuration cycle is between  the values programmed in the bridge's secondary and subordinate bus number  registers.&lt;/p&gt;&lt;a name="ch13lev2sec7"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Target Bus Bridge: Convert To Type 0; Assert  IDSEL&lt;/h4&gt; &lt;p class="docText"&gt;The bridge owning the target secondary bus (based on the value  programmed in its secondary bus number register) will convert the type 1  configuration cycle to a type 0. It will also check the &lt;span class="docEmphasis"&gt;device number&lt;/span&gt; field and assert the corresponding PCI  &lt;span class="docEmphasis"&gt;IDSEL&lt;/span&gt; signal to the intended target; IDSEL acts  as an explicit target device "chip select". There is a separate IDSEL signal for  each device on a PCI bus; a target which detects IDSEL asserted at the same time  a Configuration Read or Write cycle (type 0) occurs claims the transaction and  uses the remaining information (&lt;span class="docEmphasis"&gt;Function number&lt;/span&gt;  and &lt;span class="docEmphasis"&gt;&lt;span class="docEmphasis"&gt;Dword&lt;/span&gt; offset&lt;/span&gt;)  to access its configuration space.&lt;/p&gt; &lt;p class="docText"&gt;&lt;span class="docEmphasis"&gt;Note:&lt;/span&gt; &lt;span class="docEmphasis"&gt;In  many systems, the IDSEL signals routed to each device are actually upper bits on  the AD bus which are otherwise unused in &lt;/span&gt;&lt;a name="idd1e28594"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;type 0 configuration cycle address phases.&lt;/span&gt;&lt;/p&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;span class="docEmphasis"&gt;&lt;/span&gt;&lt;/p&gt;&lt;h5 class="docSection3Title"&gt;Events In PCI &lt;a name="idd1e28646"&gt;&lt;/a&gt;Configuration  Space Example&lt;/h5&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Because the target bus is not bus 0, the North Bridge sends a  type 1 PCI configuration cycle out on bus 0. The configuration type 1 cycle  address phase includes target bus, device number, function number, and Dword  offset.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Only bridges are allowed to respond to type 1 configuration  cycles. The PCI-PCI bridge claims the cycle because the &lt;span class="docEmphasis"&gt;bus number&lt;/span&gt; field (bus 1) is in its range of  secondary-subordinate bus numbers.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The PCI-PCI bridge also converts the cycle to a type 0 on bus 1  because its Secondary Bus Number register matches the &lt;span class="docEmphasis"&gt;bus number&lt;/span&gt; field (1). This, therefore, is the bus the  target resides on.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The PCI-PCI bridge also asserts IDSEL2 to the target during the  address phase of the configuration cycle because the &lt;span class="docEmphasis"&gt;device number&lt;/span&gt; field indicates Device 2.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Device 2 claims the type 0 configuration cycle based on the  command type (configuration read/write, type 0) &lt;span class="docEmphUl"&gt;and&lt;/span&gt;  the fact that IDSEL2 is asserted.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Device 2 then uses the &lt;span class="docEmphasis"&gt;function  number&lt;/span&gt; and &lt;span class="docEmphasis"&gt;&lt;span class="docEmphasis"&gt;Dword&lt;/span&gt;  offset&lt;/span&gt; fields in the Configuration read/write address to internally  target the specific function and configuration space  offset.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a href="0321168453_"&gt;&lt;img src="FILES/pixel.gif" border="0" height="1" width="1" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-982944264126098300?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/982944264126098300/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=982944264126098300' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/982944264126098300'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/982944264126098300'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/how-pci-handles-configuration-accesses.html' title='How PCI Handles Configuration Accesses'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-2020720855862361443</id><published>2007-06-26T21:59:00.000-07:00</published><updated>2007-06-26T22:00:35.172-07:00</updated><title type='text'>HyperTransport Uses PCI Configuration</title><content type='html'>&lt;p class="docText"&gt;Many current generation computers use the PCI configuration  method and the 256 byte PCI &lt;a name="idd1e28217"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;configuration space&lt;/span&gt; memory required of all  PCI-compliant devices to help set up and manage system chipsets and I/O  peripherals. Using PCI configuration for a bus protocol such as HyperTransport  goes a long way toward promoting software compatibility with the millions of  systems already supporting buses employing PCI-based configuration, including  PCI, AGP, PCI-X, USB, etc. HyperTransport is designed for PCI plug-and-play  configuration and to minimize impact on existing BIOS and driver software.&lt;/p&gt;&lt;h3 class="docSection1Title"&gt;What PCI Configuration Accomplishes&lt;/h3&gt; &lt;p class="docText"&gt;During system initialization, low level BIOS or other system  software uses configuration transaction cycles to "walk" each PCI-compatible bus  (PCI, PCI-X, HyperTransport, AGP, etc.) and read the PCI configuration space of  each device function it finds. Once discovered, basic and advanced capability  features of each device are set up as appropriate. Collectively, PCI &lt;a name="idd1e28267"&gt;&lt;/a&gt;configuration cycles may be used for many aspects of device  management, including:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphStrong"&gt;Assignment of system  resources.&lt;/span&gt; Unlike earlier bus protocols, including the &lt;span class="docEmphasis"&gt;Industry Standard Architecture (ISA),&lt;/span&gt; PCI compatible  plug-and-play devices are not allowed to establish their own base addresses and  interrupt levels using fixed schemes or through user manipulation of jumpers and  switches. Instead, the designer of a PCI compatible device "hard codes"  information in selected PCI Configuration Space fields describing the fixed  requirements of the device with respect to memory and I/O addresses needed,  whether system interrupt support is required, arbitration needs, etc. Once the  system address maps and interrupt routing are determined, software then returns  to programmable fields in the PCI Configuration Space of each device and  programs address ranges, interrupt routing, etc.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphStrong"&gt;Enabling of device capabilities and  options.&lt;/span&gt; In addition to assignment of system resources to PCI compatible  devices, software also uses the PCI Configuration Space to select device  options, enable bus mastering and target decoding of memory and I/O  transactions, program error response strategy, and set up other basic PCI and  advanced capability protocol features.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphStrong"&gt;Checking of dynamic (error)  status.&lt;/span&gt; Finally, the PCI configuration space is used to log errors  resulting from attempted transactions. These logged errors, if checked by  software, provide a picture of the nature of the error, which device(s) detected  it, etc. The &lt;a name="idd1e28327"&gt;&lt;/a&gt;Status register in the configuration space  header is used for generic PCI-type &lt;a name="idd1e28334"&gt;&lt;/a&gt;error logging; in  addition, advanced capability register blocks also contain logging fields for  errors related to a specific capability (e.g. HyperTransport CRC errors, buffer  overflow errors, etc.).&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;h3 class="docSection1Title"&gt;HyperTransport System Limits&lt;/h3&gt; &lt;p class="docText"&gt;HyperTransport shares PCI terminology in describing a system in  terms of the number of buses, devices, functions, and configuration space.&lt;/p&gt;&lt;a name="ch13lev2sec1"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;256 Buses In A System&lt;/h4&gt; &lt;p class="docText"&gt;PCI permits 256 buses in a system and each PCI &lt;a name="idd1e28366"&gt;&lt;/a&gt;host bridge or PCI-to-PCI bridge secondary interface is host  to a new bus with a unique bus number. Unlike PCI, a HyperTransport bus may not  end with a single electrical connection. Tunnel devices enable the construction  of device chains which are still viewed as a single logical bus. The 256 bus  limit in HyperTransport, then, is actually 256 chains.&lt;/p&gt;&lt;a name="ch13lev2sec2"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;32 UnitIDs Per Bus&lt;/h4&gt; &lt;p class="docText"&gt;PCI permits a maximum of 32 physical devices per bus. In  HyperTransport, each functional device can request multiple device numbers,  called UnitIDs. The reason for this is because HyperTransport ordering rules  consider the transactions from each UnitID to be a unique &lt;span class="docEmphasis"&gt;transaction stream&lt;/span&gt;; owning multiple UnitIDs enables a  device to source more than one transaction stream (e.g. a standard transaction  stream and an isochronous transaction stream for its high priority traffic). The  32 device per bus limit in PCI is a 32 UnitID per bus limit in  HyperTransport.&lt;/p&gt;&lt;a name="ch13lev2sec3"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;One To Eight Functions Per Device&lt;/h4&gt; &lt;p class="docText"&gt;As in PCI, HyperTransport allows 1-8 logical functions in a  physical device package. Each function has its own 256 byte configuration space,  and will be assigned unique UnitID(s).&lt;/p&gt;&lt;a name="ch13lev2sec4"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;256 Bytes Of &lt;a name="idd1e28411"&gt;&lt;/a&gt;Configuration  Space&lt;/h4&gt; &lt;p class="docText"&gt;Just as in other PCI devices, each function of a HyperTransport  device must implement a 256 byte configuration space memory. The first  one-fourth of the configuration space is the header. In addition to the header,  devices also must implement at least one set of HyperTransport advanced &lt;a name="idd1e28421"&gt;&lt;/a&gt;capability registers.&lt;/p&gt;&lt;br /&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;/p&gt;&lt;h3 class="docSection1Title"&gt;Configuration Accesses: Reaching All Devices&lt;/h3&gt; &lt;p class="docText"&gt;The process of HyperTransport device configuration depends on  software being able to access the 256 byte configuration space of each function  in each device on each bus in the system. &lt;a name="idd1e28434"&gt;&lt;/a&gt;Configuration  cycles originate at the CPU that executes the configuration software; the cycles  then move in the direction of the target. This section compares the PCI and  HyperTransport methods used to reach the configuration space of a device which  may reside on a bus many levels deep in the topology.&lt;/p&gt; &lt;p class="docText"&gt;Implied in plug-and-play address assignment on buses such as  PCI and HyperTransport is the fact that until it is discovered and assigned an  address range by low-level software, a device can't claim normal memory or I/O  transactions. Furthermore, whenever a bus reset occurs, each device "forgets"  its address ranges and other information programmed in configuration space and  can no longer be targeted with transactions which depend on assigned addresses.  So, how can a device's configuration space be set up if it doesn't know its  target address?&lt;/p&gt; &lt;p class="docText"&gt;In addition to the problem of simple devices recognizing their  own configuration cycles in an uninitialized system, the complex topologies  permitted in PCI, PCI-X, and HyperTransport require that bridges be programmed  to forward configuration transactions to the proper bus before a device can even  consider claiming it.&lt;/p&gt; &lt;p class="docText"&gt;Before looking at how HyperTransport differs from PCI in its  handling of system-wide configuration accesses, here is a quick review of how  PCI handles them.&lt;/p&gt;&lt;a href="0321168453_"&gt;&lt;img src="FILES/pixel.gif" border="0" height="1" width="1" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-2020720855862361443?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/2020720855862361443/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=2020720855862361443' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/2020720855862361443'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/2020720855862361443'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/hypertransport-uses-pci-configuration.html' title='HyperTransport Uses PCI Configuration'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-7567131720739388200</id><published>2007-06-26T21:58:00.000-07:00</published><updated>2007-06-26T21:59:00.716-07:00</updated><title type='text'>Link Initialization in the CPU</title><content type='html'>&lt;p class="docText"&gt;The process of initializing each link begins during cold reset.  The complete &lt;a name="idd1e26368"&gt;&lt;/a&gt;link initialization process consists of  several stages:&lt;/p&gt;&lt;a name="ch12pr01"&gt;&lt;/a&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList"&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="1"&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;&lt;a name="idd1e26379"&gt;&lt;/a&gt;Low-level link  initialization —&lt;/span&gt; This hardware mechanism ensures that the devices  attached to a link can pass transactions safely in both directions following a  cold reset. This includes:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Determining the link width that can be used after cold reset.  This width is based on the maximum width of the smallest transmitter or  receiver, but limited to 8 bits.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Establishing the default clock frequency of 200 MHz for all  devices.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Synchronizing the transmit and receive clocks and setting up  the &lt;a name="idd1e26399"&gt;&lt;/a&gt;receive FIFOs with the appropriate load and unload  values.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Establishing the reference point for the beginning of packet  transmission in both directions. This reference defines the beginning of 4-byte  aligned packet transmission as well as the beginning of the &lt;a name="idd1e26407"&gt;&lt;/a&gt;CRC window.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="2"&gt; &lt;p class="docText"&gt;The next stage of link initialization occurs after cold reset  and is driven primarily by system firmware. This stage is needed because the  low-level link initialization does not guarantee that the link is operating&lt;a name="idd1e26421"&gt;&lt;/a&gt; at maximum clock frequency and link width. The process  involves:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Reading the maximum link-width fields from the &lt;a name="idd1e26431"&gt;&lt;/a&gt;Link Configuration register and loading the link-width  control registers with the maximum common width (done for both upstream &amp;amp;  downstream directions of a link).&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Reading the &lt;a name="idd1e26439"&gt;&lt;/a&gt;Link Frequency Capability  registers and loading the maximum common frequency into the Link Frequency  control registers (done for both upstream and downstream directions).&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Initiating a warm reset (or &lt;a name="idd1e26447"&gt;&lt;/a&gt;LDTSTOP#  disconnect/connect sequence) to force the updated values to take  effect.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch12lev2sec4"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Low-Level&lt;a name="idd1e26466"&gt;&lt;/a&gt; Link Width  Initialization&lt;/h4&gt; &lt;p class="docText"&gt;Low-Level initialization of the link width is performed as a  hardware sequenced point-to-point handshake between the two devices attached to  each link. Once completed, the devices at each end of the link will be ready to  perform transactions using either 2-, 4-, or 8-bits. This link-width negotiation  sequence may not result in links operating at their maximum width. For example,  since the maximum width following the negotiation is 8 bits, 16-bit, 32-bit, and  asymmetrically-sized operations are not possible until enabled by software,  which is the second stage of link-width initialization.&lt;/p&gt;&lt;a name="ch12lev3sec3"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Determining Low-Level Link Width&lt;/h5&gt; &lt;p class="docText"&gt;HT permits devices with different link widths to be directly  connected. This results in unused receiver and transmitter pins on the wider  device. Logic within a device of course has no knowledge of the width of devices  to which it connects. Consequently, a hardware handshake process is defined at  powerup to ensure that all devices can determine a safe link width over which  devices can communicate.&lt;/p&gt;&lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;The transmitter width can be wider than the receiver, thus the  values listed in column 2, are shown to be 32 bits wide (maximum possible  width).&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;The receiver width may be wider than the transmitter width. In  this event, the transmitter cannot report the correct receiver size and is  required to drive all CAD lines to 1's.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Rows 3 and 4 list the transmit values for specifying 8-bit and  16-bit receiver widths, respectively. Note that transmit values seem to  represent receiver widths that are much wider than the actual receiver size.  (i.e., 32 bits of all 1's reflect a receiver width of 4 bytes). However, because  the low-level link initialization process limits the maximum link width to  CAD[7:0], a value beyond FFh has no meaning. The upper lines are driven to  ensure backward compatibility with the early versions of LDT.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Row 5 defines the transmit value for a 32-bit receiver width.  While this value seems to define precisely a 32-bit receiver width, the  low-level receiver width is limited to FFh as described in the previous  bullet.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;p class="docText"&gt;During reset both devices deliver a pattern that represents the  size of their receiver .&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;The 8-bit link delivers a value of FFh (logic doesn't know the  receiver on the other end of the link is only 4-bits wide).&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;The 4-bit link delivers a value of Fh.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;The receivers then detect the pattern driven, and each device  learns the link width to use when transmitting packets to the other.&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;The 4-bit link device sees only Fh on CAD[3:0], and interprets  the size of the remote receiver to be 4-bits wide.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;The 8-bit link device has its CAD[7:4] pins tied to  differential logic 0 and detects the value Fh on CAD[3:0], and also interprets  the size of the remote receiver to be 4-bits wide.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-7567131720739388200?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/7567131720739388200/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=7567131720739388200' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/7567131720739388200'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/7567131720739388200'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/link-initialization-in-cpu.html' title='Link Initialization in the CPU'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-6631660844556058743</id><published>2007-06-26T21:57:00.002-07:00</published><updated>2007-06-26T21:58:07.712-07:00</updated><title type='text'>Cold Reset in HT Technology</title><content type='html'>&lt;p class="docText"&gt;Cold Reset is signaled during the power-up sequence under  hardware control. This section details the sources, effects, and characteristics  of a HyperTransport cold reset.&lt;/p&gt;&lt;a name="ch12lev2sec1"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Sources of Cold Reset&lt;/h4&gt; &lt;p class="docText"&gt;In addition to the hardware generation of cold reset during the  powerup sequence, platform developers may also provide hooks for generating cold  reset under software control. An optional method of generating a HyperTransport  cold reset is defined by the specification for the secondary bus of a HT-to-HT  bridge (discussed on page 278). However, software generation of cold reset for  the secondary side of the Host-to-HT bridge can be implementation  specific.&lt;/p&gt;&lt;a name="ch12lev3sec1"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Resetting the Primary HT Bus&lt;/h5&gt; &lt;p class="docText"&gt;Some implementation-specific mechanism must be defined to  initiate a cold reset at powerup. The HT specification does not precisely define  the source of HT cold reset for the system. It may be generated by system board  logic or could be incorporated into the Host to HT bridge or other HT device  residing on the system board.&lt;/p&gt;&lt;br /&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;/p&gt;&lt;p class="docText"&gt;Further, the specification does not require a software  controlled method of cold &lt;a name="idd1e26109"&gt;&lt;/a&gt;reset generation. However, a  host bridge could optionally implement a mechanism similar to that provided by  the bridge control register of an HT-to-HT bridge. (See next section.)&lt;/p&gt; &lt;p class="docText"&gt;Once reset is signalled, any HT device has the option of  extending it (via open drain signaling) to ensure the amount of time it needs to  complete its internal initialization. In this way, reset remains asserted until  the last HT device in the chain completes its initialization. All HT devices  that signal cold reset must correctly sequence RESET# and PWROK.&lt;/p&gt;&lt;a name="ch12lev3sec2"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Resetting Secondary Side of HT-to-HT Bridge&lt;/h5&gt; &lt;p class="docText"&gt;An HT Bridge is required to propagate cold reset from its  primary to its secondary side, but is not allowed to propagate any form of reset  from its secondary to primary side. Thus, when the HT-to-HT Bridge initiates an  HT cold reset to its secondary side, it will be distributed to all devices in  the downstream chain.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-6631660844556058743?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/6631660844556058743/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=6631660844556058743' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/6631660844556058743'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/6631660844556058743'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/cold-reset-in-ht-technology.html' title='Cold Reset in HT Technology'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-5667175603498498585</id><published>2007-06-26T21:57:00.001-07:00</published><updated>2007-06-26T21:57:12.362-07:00</updated><title type='text'>HyperTransPort Imposes A Fairness Algorithm</title><content type='html'>&lt;h5 class="docSection3Title"&gt;The Basic Policy2&lt;/h5&gt; &lt;p class="docText"&gt;The fairness policy basically allows a tunnel to insert packets  onto the upstream link at a rate equal to the most active device (UnitID) below  it. What this means is that it must determine the most active downstream UnitID,  then may insert packets one-for-one with that transaction stream. Of course,  during idle times on the bus, it may insert packets at will. The goal of the  specification is that there will be a continuous reassessment of the insertion  rate to match changing conditions. On the subject of dynamic adjustment of the  insertion rate, the specification says: "this property must be met over a window  in time small enough to be responsive to the dynamic traffic patterns, yet large  enough to be statistically convergent".&lt;/p&gt;&lt;a name="ch11lev3sec17"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;The Algorithm&lt;/h5&gt; &lt;p class="docText"&gt;There are two parts to the &lt;a name="idd1e25836"&gt;&lt;/a&gt;Fairness and  Forward progress algorithm: calculating the insertion rate and implementing a  hardware mechanism that enforces it. While a single interface is described here,  tunnel devices actually must implement independent algorithms for each link.  There are no programmable control/status registers associated with the  algorithm; it is completely hardware based. Also note that:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Information packets (NOP and Sync) are issued on a per-link  basis and not subject to the fairness algorithm.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;All other packets are handled without regard to their type,  virtual channel, etc. All ordering and priority issues are handled  internally.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch11lev4sec4"&gt;&lt;/a&gt; &lt;h5 class="docSection5Title"&gt;First, Calculate The Insertion Rate&lt;/h5&gt; &lt;p class="docText"&gt;This is done through the implementation of hardware registers  to monitor incoming packets requiring forwarding for each transaction stream.  Because HyperTransport permits a total of 32 UnitIDs per chain, the register set  to manage each link consists of:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;32 individual 3-bit counters which are used to count the  incoming packets from &lt;span class="docEmphUl"&gt;individual transaction  streams&lt;/span&gt; (using the request and response packet UnitID field)&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;One 8-bit counter used to count the collective number of  incoming packets from &lt;span class="docEmphUl"&gt;all transaction  streams.&lt;/span&gt;&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt; &lt;p class="docText"&gt;At reset, all 32 individual counters are reset to "0". The  8-bit counter is reset to "1". The sequence of events in the counters as packets  arrive to be forwarded is as follows:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Each time a packet to be forwarded arrives, its individual  3-bit counter is indexed by 1.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The 8-bit counter is also indexed any time a forwarded packet  arrives from any UnitID.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The first individual 3-bit counter to overflow (roll to "0")  represents the most active UnitID.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The value in the eight bit counter when the first 3-bit counter  overflows is used as the denominator of a fraction which is expressed as  8/denominator. This ratio is the maximum rate at which the tunnel device may  insert its own packets onto the upstream link.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The specification recommends that the tunnel space out the  interleaving of its packets, rather than bursting them all at  once.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch11lev4sec5"&gt;&lt;/a&gt; &lt;h5 class="docSection5Title"&gt;Insertion Rate Calculation Example&lt;/h5&gt; &lt;p class="docText"&gt;Assume there are three devices below a tunnel: UnitID2,  UnitID4, and UnitID5. Packets targeting main memory are being issued by all  three devices.&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Incoming packets from UnitID5 cause its 3-bit counter to roll  over first.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The 8-bit counter (denominator) was at 12 when the roll over  occurred.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The tunnel then calculates the insertion rate: 8/denominator =  8/12 = 2/3.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The tunnel is allowed to insert, on average, two packets for  every three which it forwards for the devices below  it.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt; &lt;p class="docText"&gt;The HyperTransport specification provides additional guidelines  for designers in implementing a simple priority and arbitration scheme,  achieving non-integral insertion rates, and avoiding potential starvation  problems.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-5667175603498498585?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/5667175603498498585/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=5667175603498498585' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/5667175603498498585'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/5667175603498498585'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/hypertransport-imposes-fairness.html' title='HyperTransPort Imposes A Fairness Algorithm'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-6444560977973202104</id><published>2007-06-26T21:55:00.000-07:00</published><updated>2007-06-26T21:56:57.954-07:00</updated><title type='text'>More about CPU Packets</title><content type='html'>&lt;table border="0" cellpadding="0" cellspacing="0" width="100%"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td valign="top"&gt;&lt;a name="ch11lev1sec6"&gt;&lt;/a&gt; &lt;h3 class="docSection1Title"&gt;Rejecting Packets&lt;/h3&gt; &lt;p class="docText"&gt;Once system initialization is complete, &lt;a name="idd1e25477"&gt;&lt;/a&gt;packet rejection should be a rare event. The only non-error  rejection after initialization would be the arrival of a &lt;a name="idd1e25481"&gt;&lt;/a&gt;broadcast message which is intended to travel to the end of  the chain and then be dropped by the end-of-chain device. Prior to the  completion of system initialization, other devices may have not completed link  initialization (&lt;span class="docEmphasis"&gt;Link Initialization&lt;/span&gt; bit is  clear). If they have not been programmed to hold incoming packets pending  completion of initialization (&lt;span class="docEmphasis"&gt;Drop on Uninitialized  Link&lt;/span&gt; bit is set), they will behave as an end-of-chain device and reject  packets temporarily.&lt;/p&gt; &lt;p class="docText"&gt;Actions taken when a packet is rejected by an end-of-chain  device, or by an interior device which has not completed initialization and is  behaving temporarily as one, depends on the type of packet.&lt;/p&gt;&lt;a name="ch11lev2sec8"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Rules For Rejection&lt;/h4&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Broadcast requests which have completed the trip to the end of  a chain are silently dropped. These are always posted, so no response is  expected or sent.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Non-posted downstream &lt;a name="idd1e25508"&gt;&lt;/a&gt;directed requests  (UnitID = 0) cause the return of a Target Done response (for non-posted writes  or Flush) or a Data Response (for reads or Atomic RMW). The response for  rejected non-posted downstream requests will have the Error and NXA error bits  set, and the bridge bit clear. The UnitID field will be set to either 0 or that  of the end-of-chain device. Read responses are accompanied by all requested data  (driven to value of FFh)&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Non-Posted upstream directed requests (non-zero UnitID) cause  the return of a Target Done response (for non-posted writes or Flush) or a Data  Response (for reads or Atomic RMW). The response for rejected non-posted  upstream requests will have the Error and NXA error bits set, and the bridge bit  set. The UnitID field must be set to that of the original requester (without  &lt;span class="docEmphUl"&gt;its&lt;/span&gt; UnitID and the bridge bit set in the response,  an interior node won't accept the response). Read responses are accompanied by  all requested data (driven to value of FFh)&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Rejected response and posted request packets are  dropped.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;br /&gt;&lt;h3 class="docSection1Title"&gt;&lt;a name="idd1e25536"&gt;&lt;/a&gt;Host Bridge Behavior&lt;/h3&gt; &lt;p class="docText"&gt;Host bridges always reside at the ends of HyperTransport  chains. They never forward packets, but will have occasion to accept and reject  packets. There is additional complexity caused by the responsibilities host  bridges may have in supporting double-hosted chains (this is optional) and  peer-to-peer transfers (required). The following sections describe host bridge  behavior in accepting and rejecting packets they receive.&lt;/p&gt;&lt;a name="ch11lev2sec9"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Directed Request With UnitID = 0&lt;/h4&gt; &lt;p class="docText"&gt;&lt;a name="idd1e25549"&gt;&lt;/a&gt;Directed requests with a UnitID = 0  detected by the HyperTransport interface of a host bridge are inbound from the  host at the other end of a double-hosted chain.&lt;/p&gt;&lt;a name="ch11lev3sec7"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Accepted&lt;/h5&gt; &lt;p class="docText"&gt;If the host bridge recipient of the request has implemented  internal memory or I/O space and owns the address range targeted in the request,  it will respond to the request as an interior node would: accepting posted  requests and returning Target Done or Read responses/data for non-posted  requests.&lt;/p&gt; &lt;p class="docText"&gt;The host bridge will similarly accept &lt;a name="idd1e25563"&gt;&lt;/a&gt;Type 0 &lt;a name="idd1e25567"&gt;&lt;/a&gt;configuration cycles carrying  the proper fields if two conditions are met:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The host bridge supports double-hosted chains&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The Host/Secondary Interface Command Register &lt;span class="docEmphasis"&gt;Host Hide&lt;/span&gt; bit is clear&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch11lev3sec8"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Rejected&lt;/h5&gt; &lt;p class="docText"&gt;If the host bridge does not support double-hosted chains or  doesn't own the address range contained in the request, it will be rejected in a  similiar way to any other end-of-chain device. Posted requests are dropped (they  may be reported as end-of-chain errors); non-posted requests cause the return of  a Target Done or Read response with Error and NXA error bits set; for read  requests, all requested data is also returned with the response (driven to value  of FFh).&lt;/p&gt;&lt;a name="ch11lev3sec9"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Response UnitID And Bridge Fields&lt;/h5&gt; &lt;p class="docText"&gt;A host bridge receiving a non-posted, directed request with  UnitID = 0 would also set the response &lt;a name="idd1e25602"&gt;&lt;/a&gt;Unit ID field = 0  unless it is programmed to act as the slave bridge in a double-hosted chain. If  this is the case, the Host/Secondary Interface Command Register &lt;span class="docEmphasis"&gt;Act As Slave&lt;/span&gt; bit would be set = 1, and the bridge would  set the UnitID in the response to the value programmed into its Host/Secondary  Interface &lt;span class="docEmphasis"&gt;Device Number&lt;/span&gt; Register.&lt;/p&gt;&lt;a name="ch11lev2sec10"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e25617"&gt;&lt;/a&gt;Broadcast Request&lt;/h4&gt; &lt;p class="docText"&gt;Broadcast requests detected by a host bridge could only be  coming from another host bridge on the other end of a double-hosted chain.&lt;/p&gt;&lt;a name="ch11lev3sec10"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Always Accepted&lt;/h5&gt; &lt;p class="docText"&gt;If a broadcast message is seen by a host bridge, it has already  been seen throughout the chain. The host bridge accepts it, and silently drops  it. The specification indicates that a host bridge could optionally implement an  internal space addressable with a broadcast message; if it does this, the  message would be accepted and routed to the internal target.&lt;/p&gt;&lt;a name="ch11lev2sec11"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Directed Request With Non-Zero UnitID&lt;/h4&gt; &lt;p class="docText"&gt;&lt;a name="idd1e25639"&gt;&lt;/a&gt;Directed requests with a non-zero UnitID  are sourced by interior nodes and may be destined either for internal space  within the host bridge (e.g. main memory) or for another device downstream (a  peer-to-peer request). The host bridge evaluates the command type and address  contained in the request and routes it accordingly. It is also possible that the  request packet does not map to any internal space or device on the chain; in  that case it will be rejected.&lt;/p&gt;&lt;a name="ch11lev3sec11"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Accepted Requests&lt;/h5&gt;&lt;a name="ch11lev4sec2"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Internal Target&lt;/h5&gt; &lt;p class="docText"&gt;If the host bridge recipient of the request has implemented  internal memory or I/O space and owns the address range targeted in the request,  it will respond to the request as an interior node would: accepting posted  requests and returning Target Done or Read responses/data for non-posted  requests. This would be the typical behavior during DMA transfers from  HyperTransport peripherals to and from main memory via the host bridge.&lt;/p&gt;&lt;a name="ch11lev4sec3"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;&lt;a name="idd1e25661"&gt;&lt;/a&gt;Peer-to-Peer Target&lt;/h5&gt; &lt;p class="docText"&gt;It is also possible that a host bridge examines the request  packet address and determines that the address range is actually downstream in  the HyperTransport topology. The target could either be on the same chain as the  requester or on another chain if the host supports more than one. Hypertransport  doesn't support direct peer-to-peer transfers and requires bridges to handle  them as two separate requests (followed by two separate responses if  non-posted). The sequence of events in accepting a peer-to-peer transactions  include: (assume a non-posted peer-to-peer request)&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;An interior node issues a non-posted request. The UnitID is  that of the requester; the Source tag field and Sequence ID fields are assigned  by the requester from its pool of available tags.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The host bridge examines the request and determines it is  peer-to-peer. It reissues the request downstream on the appropriate chain. When  it does this, it changes the UnitID to 0 (its own) and changes the Source Tag  and Sequence ID fields to values from its own pool of available tags. All other  fields are passed through unchanged.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The host bridge &lt;span class="docEmphUl"&gt;must&lt;/span&gt; track  outstanding transactions so that when the response is returned, the original  UnitID and Source Tag values can be restored when the response is reissued  downstream to the original requester.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Upon return of a response from the target (bridge bit is  clear), the bridge sends the response downstream on the chain containing the  requester with the bridge bit set = 1, and UnitID and Source Tag restored. The  bridge may then retire the transaction from its outstanding request  queue.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The response (and data, if any) is claimed by the original  requester based on the UnitID field and the fact the bridge bit = 1. It uses the  Source Tag to determine the specific transaction which has been  serviced.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch11lev3sec12"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Compatibility Chain Requests&lt;/h5&gt; &lt;p class="docText"&gt;Another peer-to-peer possibility is that an inbound directed  request from an interior node (non-zero UnitID) targets a legacy device on the  compatibility chain. If the host bridge supports a compatibility chain, it may  reissue requests that don't target any other address space to that chain. Again,  it replaces the UnitID, Source Tag (for non-posted requests), and Sequence ID  field (if non-zero) with its own values. It also sets the &lt;a name="idd1e25704"&gt;&lt;/a&gt;Compat bit in the request it sends onto the compatibility  chain. This informs all devices which see it that it should be forwarded  downstream to the subtractive decoder (compatibility bridge).&lt;/p&gt; &lt;p class="docText"&gt;If the original request was non-posted, the host bridge will  again track the outstanding request so it can restore the original UnitID and  Source Tag information in the response packet it sends to the original  requester.&lt;/p&gt;&lt;a name="ch11lev3sec13"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Rejected Requests&lt;/h5&gt; &lt;p class="docText"&gt;If the host bridge does not recognize the address carried by an  inbound directed request from an interior node and doesn't support a  compatibility chain, the packet will be rejected in a similiar way to any other  end-of-chain error. Posted requests are dropped (they may be reported as  end-of-chain errors); non-posted requests cause the return of a Target Done or  Read response with Error and NXA error bits set; for read requests, all  requested data is returned (driven to value of FFh).&lt;/p&gt;&lt;a name="ch11lev2sec12"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Responses Received By The &lt;a name="idd1e25725"&gt;&lt;/a&gt;Host  Bridge&lt;/h4&gt; &lt;p class="docText"&gt;When a response is received by a host bridge, either the bridge  bit is set or it is not.&lt;/p&gt;&lt;a name="ch11lev3sec14"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Response With Bridge Bit = 1&lt;/h5&gt; &lt;p class="docText"&gt;A response with the bridge bit set always originates at a host  bridge. If one is &lt;span class="docEmphUl"&gt;received&lt;/span&gt; by a host bridge, it  means that another host bridge in a double-hosted chain attempted to respond to  an interior node and the node failed to claim it. In this case, it continued to  the end of the chain where it will be handled as an end-of-chain error by the  other host bridge. The response is dropped, and the receiving host bridge may  log it as an end-of-chain error. &lt;/p&gt;&lt;a name="ch11lev3sec15"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Response With Bridge Bit = 0&lt;/h5&gt; &lt;p class="docText"&gt;Responses arriving at a host bridge with the bridge bit cleared  belong to the host bridge. The Target Done or Read response/data is being  returned in response to a non-posted request issued by this host. Devices are  required to track all of their outstanding requests (those requiring responses),  so the bridge simply uses the Source Tag field in the response to determine  which transaction is being serviced. In the event a response returns with the  bridge bit clear, but the response Source Tag doesn't match any outstanding  transactions, the node may log the error and report it.&lt;/p&gt;&lt;a href="0321168453_"&gt;&lt;img src="FILES/pixel.gif" border="0" height="1" width="1" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-6444560977973202104?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/6444560977973202104/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=6444560977973202104' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/6444560977973202104'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/6444560977973202104'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/more-about-cpu-packets.html' title='More about CPU Packets'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-2440694865782422972</id><published>2007-06-26T21:54:00.002-07:00</published><updated>2007-06-26T21:55:42.557-07:00</updated><title type='text'>Forwarding Packets</title><content type='html'>Any node that forwards a packet sends it in the same direction  it is already moving.&lt;a name="ch11lev2sec6"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Rules For Forwarding&lt;/h4&gt; &lt;p class="docText"&gt;A node will forward an incoming packet to its outbound link if  &lt;span class="docEmphUl"&gt;any&lt;/span&gt; of the following conditions are met.&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The packet is a broadcast request (determined from the command  type)&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The packet is a directed request which carries a UnitID = 0  (coming from a bridge), the compatibility bit is clear, and the address is not  owned by this node.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The packet is a directed request which carries a UnitID = 0  (coming from a bridge), the compatibility bit is set, and this node is not the  subtractive decoder or a bridge to it.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The packet is a response with the bridge bit set (traveling  downstream from a bridge), and the UnitID field does not match that of the  node.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The packet is a directed request which carries a non-zero  UnitID (coming from an interior node). Non-bridges are not allowed to claim  requests from interior nodes (no direct Peer-to-Peer transfers).&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The packet is a response with the bridge bit clear (coming from  an interior node). Non-bridges are not allowed to claim responses which are not  sourced by a bridge (bridge bit must be set).&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch11lev2sec7"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Other Notes On Forwarding&lt;/h4&gt;&lt;a name="ch11lev3sec5"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Forwarding Into The &lt;a name="idd1e25415"&gt;&lt;/a&gt;End Of  Chain&lt;/h5&gt; &lt;p class="docText"&gt;An attempt to forward a packet into the end of a chain (device  has EOC bit set) will result in a rejected packet. How the rejection is handled  on the link is described in the next section on &lt;a name="idd1e25425"&gt;&lt;/a&gt;packet  rejection. In addition, error handling policy programmed into the end-of-chain  device determines what additional action should be taken (log error, generate an  interrupt, etc.). &lt;/p&gt;&lt;a name="ch11lev3sec6"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Forwarding If Initialization Is Not Complete&lt;/h5&gt; &lt;p class="docText"&gt;Another aspect of forwarding involves the behavior of a device  which detects a packet that should be forwarded, but the device has not yet  completed its initialization (&lt;span class="docEmphasis"&gt;EOC&lt;/span&gt; and &lt;a name="idd1e25447"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;Initialization Complete CSR&lt;/span&gt;  bits still clear). Whether the incoming packet will be dropped or held pending  initialization is then determined by the &lt;span class="docEmphasis"&gt;Drop on  Uninitialized Link&lt;/span&gt; bit in the HyperTransport advanced &lt;a name="idd1e25456"&gt;&lt;/a&gt;capability Command Register. &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-2440694865782422972?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/2440694865782422972/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=2440694865782422972' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/2440694865782422972'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/2440694865782422972'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/forwarding-packets.html' title='Forwarding Packets'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-2661927774613336755</id><published>2007-06-26T21:54:00.001-07:00</published><updated>2007-06-26T21:54:50.862-07:00</updated><title type='text'>Accepting Packets</title><content type='html'>&lt;p class="docText"&gt;A packet is accepted by a device if certain fields indicate it  is the intended recipient. For directed requests or responses, this will be the  end of packet travel. For broadcast requests, the packet is consumed and passed  downstream to the next device.&lt;a name="idd1e25277"&gt;&lt;/a&gt;&lt;/p&gt;&lt;a name="ch11lev2sec5"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Rules For Acceptance&lt;/h4&gt; &lt;p class="docText"&gt;A node will accept (consume) an incoming packet if &lt;span class="docEmphUl"&gt;any&lt;/span&gt; of the following conditions are met.&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The packet is a broadcast request (determined from the command  type)&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The packet is a directed request which carries a UnitID = 0  (coming from a bridge), the &lt;a name="idd1e25300"&gt;&lt;/a&gt;compatibility bit is clear  (only subtractive decoders may claim a request if compatibility bit is set), and  the address is owned by this node.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The packet is a directed request which carries a UnitID = 0  (coming from a bridge), the compatibility bit is set (only the subtractive  decoder may claim this request), and this node &lt;span class="docEmphUl"&gt;is&lt;/span&gt;  the subtractive decoder or a bridge to it.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The packet is a response with the bridge bit set (traveling  downstream from a bridge), and the UnitID field matches that of the  node.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Because of double-hosted chain topologies, tunnels must be able  to accept downstream packets from either link.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch11lev3sec4"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;A Note About The&lt;a name="idd1e25325"&gt;&lt;/a&gt;Subtractive  Decoder&lt;/h5&gt; &lt;p class="docText"&gt;In the rules for packet acceptance, an allowance is made in  HyperTransport for a "compatibility chain". If implemented, this chain would  host the system subtractive decode device (e.g. compatibility bridge) which is  responsible for handling transactions to &lt;span class="docEmphasis"&gt;legacy&lt;/span&gt;  devices. Often, these devices do not support plug and play addressing and the  system may not be aware of the address ranges they are using. When the host  bridge targets such devices, it sets the &lt;a name="idd1e25335"&gt;&lt;/a&gt;Compat bit in  the request which indicates that:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;No device, other than the subtractive decoder, is allowed to  claim the request packet — regardless of the address it carries.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Bridges in the path to the subtractive decoder must reissue the  request onto the proper downstream link.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The subtractive decode device must accept the request on behalf  of the legacy devices it supports.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-2661927774613336755?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/2661927774613336755/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=2661927774613336755' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/2661927774613336755'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/2661927774613336755'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/accepting-packets.html' title='Accepting Packets'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-1044487068692979015</id><published>2007-06-26T21:53:00.000-07:00</published><updated>2007-06-26T21:54:36.365-07:00</updated><title type='text'>Packet Routing: Shared Bus vs. Point-Point Topology</title><content type='html'>&lt;table border="0" cellpadding="0" cellspacing="0" width="100%"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td valign="top"&gt;&lt;a name="ch11lev1sec1"&gt;&lt;/a&gt; &lt;h3 class="docSection1Title"&gt;Packet Routing: Shared Bus vs. Point-Point  Topology&lt;/h3&gt; &lt;p class="docText"&gt;Routing information in a shared bus topology such as PCI or  PCI-X is somewhat simpler than in a point-point topology such as  HyperTransport.&lt;a name="idd1e24632"&gt;&lt;/a&gt;&lt;/p&gt;&lt;a name="ch11lev2sec1"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Shared Bus Routing&lt;/h4&gt; &lt;p class="docText"&gt;Referring to the PCI/PCI-X shared bus example illustrated in , it should be clear  that if a transaction appears on the shared bus, all devices "see it" and have  an opportunity to decode the address and command and claim the cycle. Devices  other than bridges have no responsibilities for routing information to their  neighbors. Also note that arbitration on a shared bus is simple because a single  arbiter can manage the entire bus. In PCI/PCI-X, the arbiter is typically in the  bus &lt;a name="idd1e24658"&gt;&lt;/a&gt;Host Bridge; the arbiter considers requests from each  master, then grants the bus to each in turn, hopefully applying a reasonable &lt;a name="idd1e24662"&gt;&lt;/a&gt;fairness algorithm.&lt;/p&gt;&lt;br /&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;/p&gt;&lt;h4 class="docSection2Title"&gt;HyperTransport Point-Point Routing&lt;/h4&gt; &lt;p class="docText"&gt;In contrast to the shared bus approach, the HyperTransport  topology distributes responsibility for routing and &lt;a name="idd1e24681"&gt;&lt;/a&gt;forwarding packets among all devices, with the exception of  single-link end (&lt;a name="idd1e24685"&gt;&lt;/a&gt;cave) devices. For example, the tunnel  peripheral device in&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;/p&gt;&lt;h3 class="docSection1Title"&gt;Review Of Packet Types And Formats&lt;/h3&gt; &lt;p class="docText"&gt;How a packet is routed depends in large part on the type of  packet it is. Each packet in HyperTransport is a multiple of four bytes in size,  and the specification divides packets into two types: control and data. All  control packets contain a &lt;span class="docEmphasis"&gt;Command Type&lt;/span&gt; field in  the first byte which identifies which type of control packet it is and the  format of the remaining packet fields to follow. It also indicates whether data  packets follow immediately (writes), will return later (reads), or are not  required.&lt;/p&gt;&lt;a name="ch11lev2sec3"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Control Packets&lt;/h4&gt; &lt;p class="docText"&gt;Control packets are sent across a link to initiate specific  tasks; they contain information fields used for several purposes: address  decoding, virtual channel and transaction stream management, &lt;a name="idd1e24716"&gt;&lt;/a&gt;error reporting, and routing. Devices perform routing  functions by extracting information from key fields in control packets. Control  packets are further divided into three groups: information, requests, and  responses.&lt;/p&gt;&lt;a name="ch11lev3sec1"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e24723"&gt;&lt;/a&gt;Information Packets: No  Routing Required&lt;/h5&gt; &lt;p class="docText"&gt;Information packets include &lt;a name="idd1e24730"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;NOP&lt;/span&gt; and &lt;span class="docEmphasis"&gt;Sync/Error.&lt;/span&gt;  These four-byte packets are used for communication between two ends of a link  interface. When issued by a transmitter, they are always accepted by the  corresponding receiver; they are never forwarded to another link. This means  that are no routing issues associated with them. These two packet types will not  be discussed further in this chapter.&lt;/p&gt;&lt;a name="ch11lev3sec2"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Request Packet Routing Information&lt;/h5&gt; &lt;p class="docText"&gt;Request packets are used to initiate various transactions and  control operations. Packet format depends on the request type; four byte request  packets are sent when no address field is needed; eight byte requests are sent  otherwise. &lt;span style="text-decoration: underline;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e25226"&gt;&lt;/a&gt;Data Packet Routing Depends On  Control Packets&lt;/h4&gt; &lt;p class="docText"&gt;Because data packets are always accompanied by a control packet  (request or response), they do not contain any routing information of their own.  The control packet indicates the size of the data packet payload, the virtual  channel it travels in, whether it is bytes or dwords, where it is going, and  even whether it is valid or not. For this reason, data packets are not mentioned  much in the packet routing discussion that follows; they are assumed to be  immediately behind the control packets which accompany them.&lt;/p&gt;&lt;h3 class="docSection1Title"&gt;Directed vs. &lt;a name="idd1e25239"&gt;&lt;/a&gt;Broadcast  Requests&lt;/h3&gt; &lt;p class="docText"&gt;HyperTransport defines two classes of requests: &lt;span class="docEmphasis"&gt;directed&lt;/span&gt; and &lt;span class="docEmphasis"&gt;broadcast.&lt;/span&gt;  &lt;a name="idd1e25252"&gt;&lt;/a&gt;Directed requests travel in the posted or non-posted  virtual channel, and always target a particular device. The request destination  is indicated either by the address field (e.g. RdSIzed and WrSized requests), or  is implied in the request type (Flush and Fence commands target the host  bridge). Directed requests move through the chain until they reach the target  device and are absorbed. Devices in the path of a directed request forward it to  the next device in the direction of the target.&lt;/p&gt; &lt;p class="docText"&gt;Broadcast requests are issued by bridges and travel in the  posted virtual channel. They are accepted by each node then forwarded downstream  on all links. When the broadcast request reaches an end-of-chain device, it is  absorbed and dropped.&lt;/p&gt;&lt;p class="docText"&gt;&lt;span style="text-decoration: underline;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="docText"&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-1044487068692979015?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/1044487068692979015/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=1044487068692979015' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/1044487068692979015'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/1044487068692979015'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/packet-routing-shared-bus-vs-point.html' title='Packet Routing: Shared Bus vs. Point-Point Topology'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-6514396945568810269</id><published>2007-06-26T21:51:00.000-07:00</published><updated>2007-06-26T21:52:37.994-07:00</updated><title type='text'>Error Reporting in Hypertransport Technology</title><content type='html'>&lt;table border="0" cellpadding="0" cellspacing="0" width="100%"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td valign="top"&gt;&lt;a name="ch10lev1sec3"&gt;&lt;/a&gt; &lt;h3 class="docSection1Title"&gt;&lt;a name="idd1e24150"&gt;&lt;/a&gt;Error Reporting&lt;/h3&gt; &lt;p class="docText"&gt;The three error reporting methods, &lt;a name="idd1e24157"&gt;&lt;/a&gt;error  responses, fatal and non-fatal interrupts, and Sync flood have different system  implications. They are described here in order of increasing severity.&lt;/p&gt;&lt;a name="ch10lev2sec12"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Error Responses (&lt;a name="idd1e24165"&gt;&lt;/a&gt;Non-Posted  Requests Only)&lt;/h4&gt; &lt;p class="docText"&gt;The HyperTransport specification considers error responses the  preferred error reporting mechanism because they are the most localized  (conveyed only from target to requester). Error responses are  transaction-specific and do not prevent the link from performing other transfers  — even to or from the same device.&lt;/p&gt; &lt;p class="docText"&gt;Every RdSized or &lt;a name="idd1e24175"&gt;&lt;/a&gt;Atomic  Read-Modify-Write request results in the return of a &lt;span class="docEmphasis"&gt;Read&lt;/span&gt; response from the target, followed by all of the  requested data. All non-posted WrSized and Flush requests result in the return  of a &lt;span class="docEmphasis"&gt;Target Done&lt;/span&gt; response which confirms the  completion of the operation, but is not accompanied by data.&lt;/p&gt;&lt;p class="docText"&gt;When either a Read or Target Done response packet is returned  to a requester, the requester checks the state of the two &lt;a name="idd1e24199"&gt;&lt;/a&gt;error bits — &lt;span class="docEmphasis"&gt;Error&lt;/span&gt; and &lt;span class="docEmphasis"&gt;NXA&lt;/span&gt; (Non-Existent Address) — contained in the packet to  determine if the transaction completed properly. The two sources of error  responses are the target device and, in the case of a non-existent address, the  end-of-chain device.&lt;/p&gt;&lt;a name="ch10lev3sec21"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e24212"&gt;&lt;/a&gt;Error Response Returned By The  Target&lt;/h5&gt; &lt;p class="docText"&gt;If a non-posted request reaches a target, but the target cannot  complete the operation (can't source or accept data, etc.), the target will  return the appropriate response with the Error bit set. If the request called  for the return of data (RdSized or Atomic RMW), all requested data (as indicated  in the Mask/Count field of the request) will also be returned. Sending the data  (even though it is invalid) allows devices in the path to deallocate buffer  space and retire the outstanding transaction.&lt;/p&gt; &lt;p class="docText"&gt;A returning response with Error set and NXA cleared is  equivalent to a PCI target abort; HyperTransport requesters detecting this  "non-NXA" error response set the &lt;span class="docEmphasis"&gt;Received Target  Abort&lt;/span&gt; bit in the PCI Status register. Bridges seeing this error on a  secondary bus would set the bit in the Secondary Status CSR.&lt;/p&gt;&lt;a name="ch10lev3sec22"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Error Response Returned By An End-Of-Chain  Device&lt;/h5&gt; &lt;p class="docText"&gt;If a non-posted request fails to reach the target (bad address,  etc.), an end-of-chain device must send the response on its behalf. The response  will have &lt;span class="docEmphUl"&gt;both&lt;/span&gt; the Error and NXA bits set. As in  the target response above, if the request called for the return of data, all  requested data (again, invalid) will be returned as FFh.&lt;/p&gt; &lt;p class="docText"&gt;A returning response with both Error and NXA set is equivalent  to a PCI master abort; HyperTransport requesters detecting the &lt;a name="idd1e24251"&gt;&lt;/a&gt;NXA error response set the &lt;span class="docEmphasis"&gt;Received  Master Abort&lt;/span&gt; bit in the PCI Status register. Bridges seeing this error on  a secondary bus would set the equivalent bit in the Secondary Status CSR.&lt;/p&gt;&lt;a name="ch10lev2sec13"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Fatal And Non-Fatal &lt;a name="idd1e24267"&gt;&lt;/a&gt;Interrupts&lt;/h4&gt; &lt;p class="docText"&gt;Using interrupts to inform the system of errors is slightly  more complex because the interrupt message must travel up through the topology  to the host. Interrupts can indicate a non-fatal error (roughly analogous to  INTR# in an x86 machine) which implies that the device issuing it has seen an  error, but may be able to recover from it; or an interrupt can indicate a fatal  error condition (analogous to NMI# in an x86 machine) which indicates that the  nature of the error is such that recovery is not possible. Interrupts of either  type do not prevent the link from performing other transfers. The conditions  under which fatal or non-fatal interrupts are to be used are device and driver  specific.&lt;/p&gt; &lt;p class="docText"&gt;In HyperTransport, interrupts are typically sent using an  interrupt message scheme rather than sideband interrupt signals as found in  other buses. Devices are not prevented from using external pins as an option,  although this method is beyond the scope of the HyperTransport  specification.&lt;/p&gt; &lt;p class="docText"&gt;An interrupt message transaction is actually a special case of  the standard &lt;span class="docEmphUl"&gt;size byte write&lt;/span&gt; (WrSized Byte)  request. Devices in the system can distinguish interrupt messages being sent  from other sized writes by the following attributes of the request:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Interrupt requests target a reserved address in the system  address map (from FD_0000_0000h to FD_F8FF_FFFFh).&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;The command type is WrSized (byte)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;The Count field is always programmed with a "0", indicating a  single dword of data content follows. In standard byte writes, this would be the  &lt;a name="idd1e24297"&gt;&lt;/a&gt;byte mask dword; in interrupt requests, the single dword  data payload contains information about the interrupt.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e24323"&gt;&lt;/a&gt;Sync Flood: When All Else  Fails&lt;/h4&gt; &lt;p class="docText"&gt;In some cases, one or more links in HyperTransport may get into  a state where ordinary packets cannot be sent reliably. For example, a device  may detect a series of CRC errors which indicates to it that either the external  link is broken or, more likely, it may not be synchronized with its neighbor  with respect to CRC stuffing in the CAD stream. If this is the case, it can't  send new packets; it also can't convey the fault using fatal/non-fatal  interrupts because they travel in the same channels as other packets.&lt;/p&gt; &lt;p class="docText"&gt;Sync flood reports errors that cannot be signalled by other  methods. It is roughly analogous to the PCI SERR# (system error) event and has a  serious impact on the entire chain. Sync flood packets put the chain into an  inactive state pending a warm reset to restore normal packet protocol. The  behavior of the device initiating the sync flood is slightly different from the  other devices which propagate it. The basic rules are described below.&lt;/p&gt;&lt;a name="ch10lev3sec23"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Device Initiating The Sync Flood&lt;/h5&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The device initiating sync flood must have the &lt;span class="docEmphasis"&gt;SERR# Enable&lt;/span&gt; bit in the configuration header Command  register set = 1 before it initiates a sync flood for any reason.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;If the device intends to initiate sync flood for CRC errors,  buffer overflow errors, or protocol errors, it must first check the  corresponding "flood enable" bits in the Error Handling and Link Control  registers.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The device which initiates a sync flood sets the &lt;span class="docEmphasis"&gt;Signaled System Error&lt;/span&gt; bit in the configuration header  Status register, &lt;span class="docEmphasis"&gt;LinkFail&lt;/span&gt; bit in its Link Control  Register, and the &lt;span class="docEmphasis"&gt;Chain Fail&lt;/span&gt; bit in the Error  Handling CSR. Note: if all conditions for a sync flood have been met, Link Fail  is always set — even if the SERR# enable bit in the configuration header Command  register is clear (preventing the sync flood packets from actually being  sent.)&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch10lev3sec24"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Devices Detecting Sync Flood&lt;/h5&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Devices detecting sync flood at a receiver input cease all  normal packet transmission on the affected chain.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Each device sets the &lt;span class="docEmphasis"&gt;Chain Fail&lt;/span&gt;  bit in its Error Handling CSR.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Each device drives sync packets onto all transmitter  interfaces, including back to the device which initiated the flood. This assures  that sync is seen on all links on the chain.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch10lev3sec25"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Sync Flooding And HyperTransport Bridges&lt;/h5&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Bridges set the &lt;span class="docEmphasis"&gt;Detected System  Error&lt;/span&gt; bit in the &lt;a name="idd1e24418"&gt;&lt;/a&gt;Secondary Status register if they  see a sync flood on the secondary bus.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Bridges may forward a secondary bus sync flood upstream to the  primary bus if the &lt;span class="docEmphasis"&gt;SERR# Enable&lt;/span&gt; bit in its PCI  Command register is set. This is similiar to the behavior of PCI-PCI bridges  when SERR# is detected on a secondary bus. The bridge may optionally convert the  secondary bus sync flood to a fatal or non-fatal interrupt on the primary  bus.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Bridges always propagate primary bus sync floods downstream  onto their secondary bus(ses).&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch10lev3sec26"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Miscellaneous Notes&lt;/h5&gt;&lt;a name="ch10lev4sec5"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Flooding Continues Until Reset&lt;/h5&gt; &lt;p class="docText"&gt;Once a device commences the sync flood operation, it must  continue until a reset is detected on the affected bus. This assures that the  sync flood propagates throughout the chain.&lt;/p&gt;&lt;p class="docText"&gt; &lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-6514396945568810269?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/6514396945568810269/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=6514396945568810269' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/6514396945568810269'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/6514396945568810269'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/error-reporting-in-hypertransport.html' title='Error Reporting in Hypertransport Technology'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-2091335689923489244</id><published>2007-06-26T21:48:00.000-07:00</published><updated>2007-06-26T21:51:34.964-07:00</updated><title type='text'>Errors | Error Checking in HT Technology CPU</title><content type='html'>&lt;p class="docText"&gt;HyperTransport defines six types of errors, and three basic  ways they may be reported to the system.&lt;/p&gt;&lt;a name="ch10lev2sec1"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Types Of Errors&lt;/h4&gt; &lt;p class="docText"&gt;The error types which may be detected, logged, and reported  ar&lt;a name="idd1e22843"&gt;&lt;/a&gt;e:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;CRC (Cycle Redundancy Code) Errors&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;&lt;a name="idd1e22858"&gt;&lt;/a&gt;Protocol Errors&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;&lt;a name="idd1e22865"&gt;&lt;/a&gt;Receive Buffer Overflow  Errors&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;&lt;a name="idd1e22872"&gt;&lt;/a&gt;End Of Chain Errors&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;&lt;a name="idd1e22879"&gt;&lt;/a&gt;Chain Down Errors&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;&lt;a name="idd1e22886"&gt;&lt;/a&gt;Response  Errors&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch10lev2sec2"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Reporting Methods&lt;/h4&gt; &lt;p class="docText"&gt;Once an error is detected, it can be conveyed to other devices  in the system in the following ways:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;&lt;a name="idd1e22905"&gt;&lt;/a&gt;Error Responses&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;&lt;a name="idd1e22912"&gt;&lt;/a&gt;Error Interrupts (fatal and  non-fatal)&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;&lt;a name="idd1e22919"&gt;&lt;/a&gt;Sync  Flooding&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch10lev2sec3"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;The Role Of PCI&lt;a name="idd1e22932"&gt;&lt;/a&gt; Configuration  Space&lt;/h4&gt; &lt;p class="docText"&gt;The PCI Configuration Space required of each HyperTransport  device performs several roles in error handling. The &lt;span class="docEmphasis"&gt;Command&lt;/span&gt; and &lt;a name="idd1e22945"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;Status&lt;/span&gt; registers in the header and the &lt;span class="docEmphasis"&gt;Link Error&lt;/span&gt; and &lt;a name="idd1e22954"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;Error Handling&lt;/span&gt; registers in the HyperTransport Advanced  Capability Register block are used to report error handling capabilities,  program the error reporting mechanism to be used if an error occurs, and to log  the errors which occur so that software can later assess the error events seen  by each device.&lt;/p&gt; &lt;p class="docText"&gt;Once the error capabilities of a device have been determined  and the error reporting strategy is programmed in configuration space, any  errors which occur will be handled accordingly. For example, a HyperTransport  device which detects a protocol error may be programmed to set the corresponding  log bit in the configuration space &lt;span class="docEmphasis"&gt;Error Handlin&lt;/span&gt;g  register and generate a &lt;a name="idd1e22966"&gt;&lt;/a&gt;fatal interrupt message.&lt;/p&gt;&lt;a name="ch10lev2sec4"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Most Types Of Error Checking Are Optional&lt;/h4&gt; &lt;p class="docText"&gt;To accommodate differences in how devices and applications may  view certain types of errors, the specification only requires &lt;a name="idd1e22978"&gt;&lt;/a&gt;CRC generation/checking on each link; other aspects of error  detection and handling are optional. If a particular error is not checked, the  corresponding enable and logging bits in configuration space must be hardwired  to 0.&lt;/p&gt;&lt;a name="ch10lev2sec5"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;System Handling Of HyperTransport Errors Varies&lt;/h4&gt; &lt;p class="docText"&gt;As in many other bus protocols, HyperTransport bus behavior  during error events is well specified but the action taken by the system in  response to reported errors is implementation specific. However, if Sync flood  is used as a reporting mechanism, a reset is required on the affected chain(s)  to restore proper protocol.&lt;/p&gt;&lt;h3 class="docSection1Title"&gt;The Error Types&lt;/h3&gt; &lt;p class="docText"&gt;The following section summarizes the required &lt;span class="docEmphasis"&gt;&lt;span class="docEmphasis"&gt;CRC&lt;/span&gt; generation/checking&lt;/span&gt;  as well as the optional &lt;span class="docEmphasis"&gt;protocol, receive buffer  overflow, end of chain, chain down,&lt;/span&gt; and &lt;span class="docEmphasis"&gt;respons&lt;/span&gt;e error handling.&lt;/p&gt;&lt;a name="ch10lev2sec6"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;CRC Errors&lt;/h4&gt; &lt;p class="docText"&gt;The Cycle Redundancy Code (CRC) is used to detect transmission  errors on all enabled byte lanes on each link. The 32 bit CRC value is  calculated and sent at prescribed intervals by each transmitter, then checked  against the CRC value calculated by the corresponding receiver as packets  arrive. CRC is calculated by finding the remainder when the sum of packet data  (CAD bits plus &lt;a name="idd1e23036"&gt;&lt;/a&gt;CTL signal during each bit time) is  divided by the CRC polynomial. The polynomial used is:&lt;/p&gt; &lt;p class="docText"&gt;X&lt;sup&gt;32&lt;/sup&gt; + X&lt;sup&gt;26&lt;/sup&gt; + X&lt;sup&gt;23&lt;/sup&gt; +  X&lt;sup&gt;22&lt;/sup&gt; +X&lt;sup&gt;16&lt;/sup&gt; + X&lt;sup&gt;12&lt;/sup&gt; + X&lt;sup&gt;11&lt;/sup&gt; +  X&lt;sup&gt;10&lt;/sup&gt; +X&lt;sup&gt;8&lt;/sup&gt; +X&lt;sup&gt;7&lt;/sup&gt; +X&lt;sup&gt;5&lt;/sup&gt; +X&lt;sup&gt;4&lt;/sup&gt;  +X&lt;sup&gt;2&lt;/sup&gt; + X +1&lt;/p&gt;&lt;a name="ch10lev3sec1"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e23088"&gt;&lt;/a&gt;CRC On 8, 16, or 32 bit  Interfaces&lt;/h5&gt;&lt;br /&gt;&lt;p class="docText"&gt;For interfaces which are 8-, 16-, or 32-bits wide, CRC is  independently generated and checked for each byte of CAD width. &lt;a class="docLink" href="#ch10fig01"&gt;&lt;/a&gt;&lt;/p&gt;&lt;h5 class="docSection3Title"&gt;CRC Generation/Checking: 8/16/32 bit links&lt;/h5&gt; &lt;p class="docText"&gt;&lt;span class="docEmphasis"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;After link initialization, each transmitter begins sending  packets (NOP, etc.). CRC calculation is based on "raw" CAD/CTL bit patterns on  each CAD byte without regard to the packet types being sent.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;512 bit times after initialization, the first 32-bit CRC value  has been calculated for each byte lane. The window for "stuffing" the 32-bit CRC  value into its CAD stream is 64 bit times &lt;span class="docEmphUl"&gt;into the next  "window".&lt;/span&gt; Note: because of this delay, there is no CRC sent during the  first window.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Although each window for CRC calculation is 512 bit times, in  reality all windows (after the first one) are actually 516 bit times because CRC  for each window is inserted into the following one for four additional bit  times. Note that the CRC value stuffed into each window is not included in the  subsequent CRC calculation for that window.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;There is no special signalling associated with CRC  transmission; both devices simply count the bit times starting with link  initialization and "know" where the CRC payload falls in each window.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;CRC is calculated and sent independently for each 8 bits of CAD  width. The CTL signal itself is included in the CRC calculation for the lowest  byte of CAD (bits 0-7). On a bus wider than 8 bits, the CTL signal is also  factored into the CRC calculation for each of the upper CAD bytes, &lt;span class="docEmphUl"&gt;but is assumed to be 0&lt;/span&gt; during all bit times.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;During the driving of the CRC value itself, the CTL signal is  driven = 1 (Control) by the transmitter. The CRC bits are inverted before being  transmitted onto the link.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch10lev3sec3"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;CRC Generation/Checking: 2/4 bit links&lt;/h5&gt; &lt;p class="docText"&gt;On links narrower than 8 bits, the CRC value is generated in  the same way as for 8-bit links carrying the same value. It simply takes longer  to move the packets and CRC value across the link — causing the calculation  window and stuffing point for the CRC value to be stretched accordingly. The  extra assertions of the CTL signal (after the first bit time in each byte) are  not used by the transmitter or receiver in the CRC calculation.&lt;/p&gt;&lt;a name="ch10lev4sec1"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;4 Bit CAD Width&lt;/h5&gt; &lt;p class="docText"&gt;A CAD width of four bits requires twice as many bit times as an  8 bit bus for moving information across the link. Therefore:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;The CRC window size is 1024 bit times.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;The CRC stuffing point starts128 bit times after the start of a  window.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;It takes 8 bit times to transfer the 32-bit CRC  value.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a name="ch10lev4sec2"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;2 Bit CAD Width&lt;/h5&gt; &lt;p class="docText"&gt;A CAD width of two bits requires four times as many bit times  as an eight bit bus for moving information across the link. Therefore:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;The CRC window size is 2048 bit times.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;The CRC stuffing point starts 256 bit times after the start of  a window.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;It takes 16 bit times to transfer the 32-bit CRC  value.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a name="ch10lev3sec4"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Logging CRC Errors&lt;/h5&gt; &lt;p class="docText"&gt;CRC errors impact both control and data information; if these  errors occur on any CAD byte lane, the corresponding error bit(s) will be set in  the HyperTransport Advanced Capability block &lt;span class="docEmphasis"&gt;Link  Control&lt;/span&gt; CSR. The four bits (one for each byte lane) .&lt;/p&gt;&lt;h5 class="docSection3Title"&gt;Programming The &lt;a name="idd1e23378"&gt;&lt;/a&gt;CRC&lt;a name="idd1e23383"&gt;&lt;/a&gt; Error Reporting Policy&lt;/h5&gt; &lt;p class="docText"&gt;Informing the system of a CRC error on one or more of the links  is handled in the manner programmed at boot time in the Advanced Capability  &lt;span class="docEmphasis"&gt;Error Handling&lt;/span&gt; and &lt;a name="idd1e23396"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;Link Control&lt;/span&gt; Registers. Options include sending a &lt;a name="idd1e23402"&gt;&lt;/a&gt;fatal interrupt message, &lt;a name="idd1e23406"&gt;&lt;/a&gt;non-fatal  interrupt message, or initiation of a sync flood.&lt;/p&gt;&lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e23483"&gt;&lt;/a&gt;CRC Test Mode&lt;/h5&gt; &lt;p class="docText"&gt;If both devices on a link support the CRC diagnostic testing  mode (determined by checking bit 2 in the Feature Capability register for each  device), then software may enable a test sequence that allows stress tests of  CRC generation and checking. The basic events involved in link CRC testing  include:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Software writes a "1" to the CRC Start Test bit of the Link  Control register . Setting this bit informs the transmitter interface that it should  enter the CRC diagnostic mode for the following 512 bit times on each enabled  byte lane. For 4-or 2-bit CAD widths, this time is stretched to 1024 or 2048 bit  times, respectively.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The transmitter sends a NOP packet with the &lt;span class="docEmphasis"&gt;Diag&lt;/span&gt; bit set; this informs the receiver that it should  ignore CAD and CTL signals for the next 512 bit times but still is required to  check CRC. Again, for 4-or 2-bit CAD widths, this time is stretched to 1024 or  2048 bit times, respectively.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;With the normal buffers suspended, the transmitter may generate  any test pattern it wants; CRC is still stuffed into the CAD test pattern stream  in the normal way.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;CRC errors detected during this time will be logged normally,  and if the Sync flood is enabled, it will be performed. All data content is  "don't care" during this time and is dropped.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;If the CRC Force Error (CFE) bit is also set during the test&lt;a class="docLink" href="#ch10fig04"&gt;&lt;/a&gt; ,  then the test pattern sent by the transmitter will contain at least one CRC  error in each of the active byte lanes.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;When the test is complete, hardware automatically clears the  CRC Start Test bit. This bit may be polled by software to check  completion.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;At the end of the CRC Diagnostic test, normal packet transfer  resumes.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch10lev2sec7"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e23587"&gt;&lt;/a&gt;Protocol Errors&lt;/h4&gt; &lt;p class="docText"&gt;Protocol errors are failures on the link involving low-level  packet violations. These include the following:&lt;/p&gt;&lt;a name="ch10lev3sec7"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e23597"&gt;&lt;/a&gt;CTL Signal Four-Byte Boundary  Violation&lt;/h5&gt; &lt;p class="docText"&gt;The CTL signal may only transition between low-high on four  byte boundaries. The exception to this rule is during the CRC diagnostic test  mode. If an illegal transition is detected, then either the transmitter has lost  track of packet start and ending boundaries or the receiver has.&lt;/p&gt;&lt;a name="ch10lev3sec8"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;CTL Deassertion Violation&lt;/h5&gt; &lt;p class="docText"&gt;Other than when CRC diagnostic test mode is in use, a  transmitter only deasserts the CTL signal during data packets associated with  earlier requests requiring them. Deasserting CTL when data packets are not in  transit is another protocol violation.&lt;/p&gt;&lt;a name="ch10lev3sec9"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;CTL/Data Interleaving Violation&lt;/h5&gt; &lt;p class="docText"&gt;A transmitter is allowed to interleave new control packets into  the data packet associated with an earlier request i&lt;span class="docEmphUl"&gt;f the  new control packet does not have any immediate data of its own.&lt;/span&gt; If an  attempt is made to interleave a control packet with immediate data (e.g. a write  request) into a data packet already in transit, this is a protocol  violation.&lt;/p&gt;&lt;a name="ch10lev3sec10"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Bad Command Code In&lt;a name="idd1e23634"&gt;&lt;/a&gt; Control  Packet&lt;/h5&gt; &lt;p class="docText"&gt;Control packets (request, response, information) have a 6-bit  command field in the first byte to encode the intended operation. Some codes are  not used, and are reserved. Sending an illegal command code is another protocol  violation.&lt;/p&gt;&lt;a name="ch10lev3sec11"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;CTL Deassertion Timeout Violation&lt;/h5&gt; &lt;p class="docText"&gt;The HyperTransport specification limits the amount of time the  CTL signal may be deasserted. There are two maximum timeout options (1  millisecond or 1 second) and the one in effect is programmed in bit 15 of the &lt;a name="idd1e23649"&gt;&lt;/a&gt;Link Error Register. If the transmitter exceeds the programmed  maximum CTL deassertion timeout, it is a protocol violation.&lt;/p&gt;&lt;h5 class="docSection3Title"&gt;CTL Deasserted During CRC Transmission&lt;/h5&gt; &lt;p class="docText"&gt;CTL is always asserted during the transmission of the 32-bit  CRC code in each calculation window. If a receiver detects CTL deasserted during  a CRC stuffing period, it is a protocol violation.&lt;/p&gt;&lt;a name="ch10lev3sec13"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Logging&lt;a name="idd1e23686"&gt;&lt;/a&gt; Protocol Errors&lt;/h5&gt; &lt;p class="docText"&gt;Protocol error checking is optional. If protocol violations are  checked, the &lt;span class="docEmphasis"&gt;Link Error&lt;/span&gt; register log the errors;  refer to &lt;a class="docLink" href="#ch10fig05"&gt;Figure 10-5&lt;/a&gt; on page 239.&lt;/p&gt;&lt;a name="ch10lev3sec14"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Programming The Protocol &lt;a name="idd1e23705"&gt;&lt;/a&gt;Error  Reporting &lt;a name="idd1e23709"&gt;&lt;/a&gt;Policy&lt;/h5&gt; &lt;p class="docText"&gt;Informing the system of a protocol error on one or more of the  links is handled in much the same way as for CRC errors. They may be mapped to a  fatal or non-fatal interrupt message, or a sync flood. The reporting strategy is  programmed in the Error handling CSR&lt;/p&gt;&lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e23735"&gt;&lt;/a&gt;Receive Buffer Overflow  Errors&lt;/h4&gt; &lt;p class="docText"&gt;Receive buffer overflow errors can occur if a link transmitter  no longer maintains an accurate count of available &lt;a name="idd1e23742"&gt;&lt;/a&gt;flow  control buffers at the receiver. If a flow-controlled packet (&lt;a name="idd1e23746"&gt;&lt;/a&gt;posted request, non-posted request, or response) is sent  without an available receiver flow control buffer to accept it, the packet will  be lost.&lt;/p&gt;&lt;h4 class="docSection2Title"&gt;End-Of-Chain Errors&lt;/h4&gt; &lt;p class="docText"&gt;&lt;a name="idd1e23812"&gt;&lt;/a&gt;End-Of-Chain (EOC) errors result when a  packet moving through HyperTransport is either not claimed by, or does not  reach, the intended recipient. Other devices which see the packet forward it and  eventually it reaches the device at the end of the chain, where the packet must  be handled. Some of the possible reasons for EOC errors include; improper  address in a request, invalid Unit &lt;a name="idd1e23822"&gt;&lt;/a&gt;ID in a response, the  target device is broken, or it has not been programmed properly with UnitID or  target base address range.&lt;/p&gt; &lt;p class="docText"&gt;EOC errors are analogous to the &lt;span class="docEmphasis"&gt;master  abort&lt;/span&gt; event in PCI. Unlike PCI, however, "misdirected" transactions must  be handled by the EOC device rather than simply having the initiator of the  transaction time out after a prescribed amount of time. This is important in  HyperTransport because it is a series of point-to-point connections rather than  a shared bus, and an initiator simply sends packets to the neighboring device  and has no way of immediately "knowing" whether the ultimate recipient receives  it. The EOC error handling mechanism helps with link management in two  ways:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;For posted requests and responses which inadvertently reach an  EOC device, the EOC error bit and reporting mechanism may be used to let the  system know a packet never reached its destination — information that otherwise  would be unknown.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;For non-posted requests which reach an EOC device in error, the  &lt;a name="idd1e23865"&gt;&lt;/a&gt;error logging and reporting can also be used. In  addition, the EOC device will act as a surrogate for the target and send back a  &lt;span class="docEmphasis"&gt;Read&lt;/span&gt; or &lt;span class="docEmphasis"&gt;Target  Done&lt;/span&gt; response to the requestor (with &lt;a name="idd1e23878"&gt;&lt;/a&gt;error bits  set). For read requests, all of the requested data is also sent back by the EOC  device — although it is obviously invalid (all data values are driven to FFh).  Sending back the responses (and data) allows all devices in the path back to the  requestor to deallocate internal buffer space and retire the outstanding  transaction. The original requester examines the response, decodes the error  bits, and takes whatever action is appropriate.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;br /&gt;&lt;h5 class="docSection3Title"&gt;How A Device Knows It Is At The End Of A Chain&lt;/h5&gt; &lt;p class="docText"&gt;Single link peripherals (also known as &lt;span class="docEmphasis"&gt;End&lt;/span&gt; or &lt;a name="idd1e23897"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;Cave&lt;/span&gt; devices) are always end-of chain-devices. Any  packets reaching these device that they are not programmed to accept (by Command  type, UnitID, or Address range), are considered lost. No software programming is  required for these devices to carry out their EOC function other than setting up  the error reporting mechanism to be used.&lt;/p&gt;&lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e24013"&gt;&lt;/a&gt;Chain Down Errors&lt;/h4&gt; &lt;p class="docText"&gt;If a device detects a Sync flood or an error that would cause a  Sync flood, it sets the &lt;a name="idd1e24020"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;Chain  Fail&lt;/span&gt; bit in its &lt;a name="idd1e24026"&gt;&lt;/a&gt;Error Handling register and waits  for a bus reset. The action taken when the chain goes down depends on the device  type:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Host interfaces track outstanding non-posted requests for  devices below them. On chain down errors, they flush the state of all internal  non-posted requests and return &lt;a name="idd1e24036"&gt;&lt;/a&gt;non-NXA error responses to  the requesters for each one that is pending.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Slave devices have their internal states re-initialized when  the &lt;a name="idd1e24044"&gt;&lt;/a&gt;RESET# occurs after a chain goes down; there is  generally no need for a flush operation of non-posted requests by these devices.  If a slave device were implemented that maintained its state through a  HyperTransport RESET#, it would need to perform the non-posted request flush  operation after the chain goes down as well.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;h4 class="docSection2Title"&gt;Response Errors&lt;/h4&gt; &lt;p class="docText"&gt;All non-posted requests&lt;a name="idd1e24072"&gt;&lt;/a&gt; that are issued  require either a Read or Target Done response. The requester programs &lt;span class="docEmphasis"&gt;UnitID&lt;/span&gt; and &lt;span class="docEmphasis"&gt;source tag&lt;/span&gt;  information into each request packet it issues so that when the response is  returned it may be tagged with the same information and find its way back to the  original requester. When a downstream response is detected, each device compares  the UnitID to its own to see if it should claim the response; if so, it then  checks the source tag to determine which of its outstanding transactions is  being completed.&lt;/p&gt; &lt;p class="docText"&gt;It is possible a response may return and be claimed by a  requester (UnitID is OK), but not be recognized as being valid. Some of the  reasons this might happen include:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;A read response (RdResponse) is received by a device which  carries the correct UnitID, but has an invalid source tag (SrcTag field). The  recipient cannot associate the response with any of its outstanding  transactions.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;A read response (with data) is received with the correct UnitID  and SrcTag fields, but the response type is incorrect (requester is expecting a  Target Done response).&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;A Target Done response is received for a RdSized or Atomic RMW  request.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;A read response (with data) is received for a RdSized or Atomic  RMW, but the (data) count field doesn't match what the requester originally  asked for.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;p class="docText"&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-2091335689923489244?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/2091335689923489244/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=2091335689923489244' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/2091335689923489244'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/2091335689923489244'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/errors-error-checking-in-ht-technology.html' title='Errors | Error Checking in HT Technology CPU'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-5431433657157106664</id><published>2007-06-26T21:47:00.002-07:00</published><updated>2007-06-26T21:48:36.224-07:00</updated><title type='text'>Example SM Sequence: Link Initialization Disconnect</title><content type='html'>&lt;table border="0" cellpadding="0" cellspacing="0" width="100%"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td valign="top"&gt;&lt;a name="ch09lev1sec3"&gt;&lt;/a&gt; &lt;h3 class="docSection1Title"&gt;Example SM Sequence: &lt;a name="idd1e22549"&gt;&lt;/a&gt;Link  Initialization Disconnect&lt;/h3&gt; &lt;p class="docText"&gt;This example illustrates a link initialization disconnect  sequence; that is, the events that would occur when BIOS software uses  disconnect (&lt;a name="idd1e22556"&gt;&lt;/a&gt;LDTSTOP#) to change the link frequency and  width during initialization.&lt;/p&gt;&lt;a name="ch09lev2sec6"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Background&lt;/h4&gt; &lt;p class="docText"&gt;During initialization, devices that share a HT link must  determine the maximum clock frequency and maximum link width supported by both  devices. The first step in this process involves a device procedure that  establishes a safe but not necessarily optimum clock frequency and link width  immediately following reset. Initialization software (BIOS) must tune the link  width and frequency. Software simply reads the maximum &lt;a name="idd1e22570"&gt;&lt;/a&gt;capability registers within each device, determines the  maximum values that each device supports, and loads the &lt;a name="idd1e22574"&gt;&lt;/a&gt;link control registers with these values. However, these  values do not take effect until BIOS software initiates either a soft reset or  LDTSTOP disconnect. This example assumes that the system is designed to perform  the disconnect rather than a soft reset. The disconnect method may be chosen  because it completes more quickly than a soft reset. &lt;/p&gt;&lt;a name="ch09lev2sec7"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Setup and Assumptions&lt;/h4&gt; &lt;p class="docText"&gt;This example explains the relationships between the various  messages, responses, and signals involved in the disconnect sequence. &lt;span style="text-decoration: underline;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p class="docText"&gt;The &lt;a name="idd1e22610"&gt;&lt;/a&gt;Host Bridge and ICH must be designed  specifically to perform the link initialization disconnect. This requires  support of several key features including:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;A trigger mechanism to initiate link  initialization disconnect —&lt;/span&gt; The ICH must include a mechanism (e.g. a  register) that permits BIOS software to initiate the link initialization  disconnect sequence for changing the link frequency and width.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;STPCLK system management message  —&lt;/span&gt; The ICH must be able to assert and deassert STPCLK in response to a  software request to initiate the link initialization disconnect. The platform  must also define an SM Action Field (SMAF) code that identifies the reason for  STPCLK being signaled. The host bridge must be designed to detect the message  and respond appropriately.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;STOP_GRANT system management message  —&lt;/span&gt; The system must support the delivery of the STOP_GRANT message and the  associated SMAF code that identifies the reason for sending the message. The ICH  must be designed to detect the message and respond appropriately.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;LDTSTOP# signal —&lt;/span&gt; The ICH must  be able to assert and deassert LDTSTOP# using the specified sequence and timing  required.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a name="ch09lev2sec8"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;The Link Initialization Disconnect Sequence&lt;/h4&gt; &lt;p class="docText"&gt;The following steps define the sequence of events beginning  with BIOS initiation of the sequence to return to normal operation.&lt;/p&gt; &lt;p&gt; &lt;table cellpadding="5" cellspacing="0" frame="void" rules="none" width="100%"&gt; &lt;colgroup span="2" align="left"&gt; &lt;/colgroup&gt;&lt;thead&gt;&lt;/thead&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td class="docTableCell" align="left" valign="top"&gt; &lt;p class="docText"&gt;&lt;span class="docEmphasis"&gt;Step 1:&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td class="docTableCell" align="left" valign="top"&gt; &lt;p class="docText"&gt;The BIOS code initiates the link-initialization disconnect  sequence by writing to register within the ICH, in this example implementation.  The write transaction is assumed to be a non-posted operation, which requires a  TargetDone response.&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td class="docTableCell" align="left" valign="top"&gt; &lt;p class="docText"&gt;&lt;span class="docEmphasis"&gt;Step 2:&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td class="docTableCell" align="left" valign="top"&gt; &lt;p class="docText"&gt;The ICH responds by sending a STPCLK assertion SM message to  the host with a UnitID that matches the UnitID of the TargetDone response  pending for step 1. The STPCLK assertion message contains a SMAF value that  defines the reason for STPCLK assertion in this example.&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td class="docTableCell" align="left" valign="top"&gt; &lt;p class="docText"&gt;&lt;span class="docEmphasis"&gt;Step 3:&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td class="docTableCell" align="left" valign="top"&gt; &lt;p class="docText"&gt;After the STPCLK assertion message is sent to the host, the ICH  is allowed to send the response to the initiating transaction from step 1. Note  that this sequence is important to ensure correct ordering of events based on  some OS implementations. The response must follow the STPCLK SM message to  guarantee that the host does not execute any additional instructions after the  initiating command of step 1.&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td class="docTableCell" align="left" valign="top"&gt; &lt;p class="docText"&gt;&lt;span class="docEmphasis"&gt;Step 4:&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td class="docTableCell" align="left" valign="top"&gt; &lt;p class="docText"&gt;When the STPCLK assertion message reaches the host, the host  reflects the message downstream to all links in the fabric. Reflecting STPCLK  assertion downstream has no specific purpose, it is simply earier for the host  to reflect all SM messages rather than just some.&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td class="docTableCell" align="left" valign="top"&gt; &lt;p class="docText"&gt;&lt;span class="docEmphasis"&gt;Step 5:&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td class="docTableCell" align="left" valign="top"&gt; &lt;p class="docText"&gt;In addition to reflecting the STPCLK assertion message in the  downstream direction, the host must also respond to the STPCLK assertion message  by broadcasting a STOP_GRANT SM message across all downstream links. This is  intended to indicate that the host is ready for the next step in the state  transition, and notifies all devices of the power state being  entered.&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td class="docTableCell" align="left" valign="top"&gt; &lt;p class="docText"&gt;&lt;span class="docEmphasis"&gt;Step 6:&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td class="docTableCell" align="left" valign="top"&gt; &lt;p class="docText"&gt;The ICH asserts LDTSTOP# in response to receiving and decoding  the STOP_GRANT system management message and SMAF code. The ICH must delay  signaling LDTSTOP# after receiving STOP_GRANT to allow time for STOP_GRANT to  reach all other devices in the system. All devices upon detecting LDTSTOP#  perform the disconnect sequence that includes updating the link width and  frequency based on new values loaded into the &lt;a name="idd1e22744"&gt;&lt;/a&gt;Link  Control Register.&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td class="docTableCell" align="left" valign="top"&gt; &lt;p class="docText"&gt;&lt;span class="docEmphasis"&gt;Step 7:&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td class="docTableCell" align="left" valign="top"&gt; &lt;p class="docText"&gt;LDTSTOP# is deasserted under control of the system management  logic. Recall that LDTSTOP# can be deasserted either before or after the link  disconnection sequence is complete as described in item 5 on page  224.&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td class="docTableCell" align="left" valign="top"&gt; &lt;p class="docText"&gt;&lt;span class="docEmphasis"&gt;Step 8:&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td class="docTableCell" align="left" valign="top"&gt; &lt;p class="docText"&gt;When a device completes the disconnect sequence and has  detected LDTSTOP# deasserted, it enters its reconnect sequence.&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td class="docTableCell" align="left" valign="top"&gt; &lt;p class="docText"&gt;&lt;span class="docEmphasis"&gt;Step 9:&lt;/span&gt;&lt;/p&gt;&lt;/td&gt; &lt;td class="docTableCell" align="left" valign="top"&gt; &lt;p class="docText"&gt;After LDTSTOP# is deasserted, the ICH must send the STPCLK  deassertion system management message to the host to notify the host that it can  resume normal operation (i.e. to exit the STOP_GRANT state). The Host Bridge in  turn reflects the STPCLK deassertion message downstream to all chains.&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/p&gt;&lt;p class="docText"&gt;&lt;span style="text-decoration: underline;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-5431433657157106664?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/5431433657157106664/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=5431433657157106664' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/5431433657157106664'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/5431433657157106664'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/example-sm-sequence-link-initialization.html' title='Example SM Sequence: Link Initialization Disconnect'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-8717933172834115404</id><published>2007-06-26T21:47:00.001-07:00</published><updated>2007-06-26T21:47:48.378-07:00</updated><title type='text'>HT Link Disconnect/Reconnect Sequence</title><content type='html'>&lt;p class="docText"&gt;The specification defines the ability of the HT bus to  disconnect all of its links simultaneously. This mechanism uses the &lt;a name="idd1e22363"&gt;&lt;/a&gt;LDTSTOP# signal and NOP &lt;a name="idd1e22367"&gt;&lt;/a&gt;packet (with  disconnect bit set) to gracefully disconnect and subsequently reconnect all  links within the HT fabric. This feature has five specified uses:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Disconnecting HT links to conserve power&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;An alternate and faster method of changing link frequency and  width during initialization, when compared to Soft &lt;a name="idd1e22387"&gt;&lt;/a&gt;Reset&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Support for Host Voltage ID and Frequency ID (VID/FID)  change&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Processor is entering certain ACPI-specified states&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;System is entering certain ACPI-specified states&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;Regardless of the system motivation for asserting LDTSTOP# the  sequence of events associated with one of the five previous features is  initiated in one of the following ways:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Host software accesses a register within the I/O Controller Hub  (or South Bridge)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;I/O Controller Hub (or South Bridge) logic initiates the  sequence.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;The host sends a VID/FID SM request.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a name="ch09lev2sec5"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Reference Information: LDTSTOP# Procedures&lt;/h4&gt; &lt;p class="docText"&gt;The specification defines the following procedures associated  with the disconnect and reconnect sequence.&lt;/p&gt;&lt;a name="ch09pr01"&gt;&lt;/a&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList"&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="1"&gt; &lt;p class="docText"&gt;Once LDTSTOP# is asserted, it must remain asserted for at least  1 us. LDTSTOP# assertion must not occur while new link frequency and width  values are being assigned by link-sizing software, or undefined operation may  occur. (This is because both sides of a link must have link width and frequency  programmed, and if one side has been programmed with new values and the other  has not yet been programmed, the width and/or frequency of the two sides will  not match.)&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="2"&gt; &lt;p class="docText"&gt;&lt;a name="idd1e22442"&gt;&lt;/a&gt;PWROK and &lt;a name="idd1e22446"&gt;&lt;/a&gt;RESET#  assertions have priority over LDTSTOP# assertion, and LDTSTOP# must be  deasserted before RESET# is deasserted.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="3"&gt; &lt;p class="docText"&gt;A transmitter that recognizes the assertion of LDTSTOP#  finishes sending any control packet that is in progress and sends a disconnect  NOP packet. After sending this packet, the transmitter continues to send  disconnect NOP packets through the end of the current &lt;a name="idd1e22456"&gt;&lt;/a&gt;CRC  window (if the window is incomplete) and continuing through the transmission of  the CRC bits for the current window. After sending the CRC bits for the current  window, the transmitter continues to drive disconnect NOP packets on the link  for no less than 64 bit-times, after which the transmitter waits for the  corresponding receiver on the same device to complete its disconnect sequence,  and disables its drivers (if enabled by the LDTSTOP# Tristate Enable bit). No  CRC bits are transmitted for the last (partial) CRC window, which only contains  disconnect NOP packets. Since the HyperTransport protocol allows control packets  to be inserted in the middle of data packets, and since transmitters react to  the assertion of LDTSTOP# on control packet boundaries, a given data packet  could be distributed amongst two or more devices after the disconnect sequence  is complete.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="4"&gt; &lt;p class="docText"&gt;A receiver that detects the disconnect NOP packet continues to  operate through the end of the current CRC window and into the next CRC window  until it receives the CRC bits for the current window. After sampling the CRC  bits for the current window, the receiver disables its input receivers to the  extent required by the LDTSTOP# Tristate Enable bit.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="5"&gt; &lt;p class="docText"&gt;&lt;a name="idd1e22497"&gt;&lt;/a&gt;Note that LDTSTOP# can deassert either  before or after the link disconnection sequence is complete. A link transmitter  is not sensitive to the deassertion of LDTSTOP# until both its disconnect  sequence as described in step 3 is complete, and the disconnect sequence for the  associated receiver on the same device is complete. A link receiver is not  sensitive to the deassertion of LDTSTOP# until both its disconnect sequence is  complete and the disconnect sequence for the associated transmitter on the same  device is complete.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="6"&gt; &lt;p class="docText"&gt;A transmitter that perceives and is sensitive to the  deassertion of LDTSTOP# enables its drivers as soon as the implementation  allows, begins toggling the CLK with a minimum frequency of 2MHz and places the  link in the state associated with the beginning of the initialization sequence  (CTL = 0, CAD = 1s, CLK toggling). The transmitter is required to have CLK  running within 1 us (to assure that the receive logic has a clock source). The  clock frequency does not have to match the currently programmed frequency before  CTL is asserted. A receiver that perceives and is sensitive to the deassertion  of LDTSTOP# waits at least 1 us before enabling its inputs. This 1-us delay is  required to prevent a device from enabling its input receivers while the signals  are invalid before the transmitter on the other side of the link has perceived  and reacted to the deassertion of LDTSTOP#. When a transmitter's corresponding  receiver on the same device has been enabled, it is free to begin the  initialization sequence.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="7"&gt; &lt;p class="docText"&gt;After reconnecting to the link, the first transmitted packet  after the initialization sequence must be a control packet, as implied by the  state transitions of the CTL signal during link initialization. This is true  even if the link was disconnected in the middle of a data packet  transmission.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="8"&gt; &lt;p class="docText"&gt;The CRC logic on either side of the link should be  re-initialized after a disconnect sequence in exactly the same way as for a  reset sequence.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="9"&gt; &lt;p class="docText"&gt;Link disconnect and reconnect sequences do not cause flow  control buffers to be flushed, nor do they cause flow control buffer counts to  be reset.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="10"&gt; &lt;p class="docText"&gt;LDTSTOP# should not be reasserted until all links have  reconnected to avoid invalid link states. The means to ensure this is beyond the  scope of this specification, although it is expected that this will be under  software control.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-8717933172834115404?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/8717933172834115404/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=8717933172834115404' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/8717933172834115404'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/8717933172834115404'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/ht-link-disconnectreconnect-sequence.html' title='HT Link Disconnect/Reconnect Sequence'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-6316231826920971887</id><published>2007-06-26T21:46:00.000-07:00</published><updated>2007-06-26T21:47:30.734-07:00</updated><title type='text'>System Management Transactions for HT Cpu</title><content type='html'>&lt;p class="docText"&gt;HT provides a message passing mechanism between the &lt;a name="idd1e21732"&gt;&lt;/a&gt;Host Bridge and the System Management Controller (SMC). One  of the primary purposes of HT messages is to eliminate dedicated pins and traces  that would otherwise be required to signal various events, reducing pin count  and cost. These System Management (SM) messages are delivered via packets that  support a wide variety of functions including:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;HT Power Management&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;X86 Power Management&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;X86 Legacy CPU Signalling (e.g. A20M, FERR#, and  IGNNE#)&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;HT System Management messages in conjunction with &lt;a name="idd1e21762"&gt;&lt;/a&gt;LDTSTOP# may be used to support operations such as changes  in operating frequency and link width, or to disable the links to save power. It  is also through System Management (SM) requests that many of the x86  compatibility mechanisms are accomplished as indicated above. Further, x86  platforms are required to support SM and LDTSTOP# for power management. Power  Management support for HT devices is optional in non-x86 platforms; however,  many non-x86 systems do support power management. Note also that the  specification requires all HT devices to forward SM packets in both  directions.&lt;/p&gt;&lt;a name="ch09lev2sec1"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Sources of&lt;a name="idd1e21776"&gt;&lt;/a&gt; SM Request&lt;/h4&gt; &lt;p class="docText"&gt;System Management requests may be either sent in the upstream  or downstream direction,  All SM requests moving upstream  originate at the System Management Controller (SMC) and downstream requests  originate at the Host Bridge. Note that the SMC typically resides in the south  bridge (or I/O Controller Hub) where the legacy signals typically originate and  where power management registers reside.&lt;/p&gt;&lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e21798"&gt;&lt;/a&gt;System Management Address  Range&lt;/h4&gt; &lt;p class="docText"&gt;System Management transactions are recognized by their assigned  address range. The HT specification reserves a 1MB address range for system  management transactions from FD_F910_0000h to FD_F91F_FFFFh. In reality, only  the upper address bits are needed to identify that the transaction falls within  the assigned 1MB range. SM request packets include only the upper 20 bits  (A39:A20) of the HT address for identifying the SM range (FD_F91h). Note that  the lower 5 nibbles (or 20 bits) of the address are not defined and could  theoretically be any value between 0_0000h and F_FFFF. The 1MB block of SM  address space serves only to identify SM transactions and does not actually  target any memory locations.&lt;/p&gt;&lt;a name="ch09lev2sec3"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;The SMC &amp;amp; Upstream Request Packets&lt;/h4&gt; &lt;p class="docText"&gt;The System Management Controller generates SM requests in  response to both software initiated events (i.e., writes to registers within the  south bridge) and hardware events (e.g. inactivity timeouts).&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-6316231826920971887?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/6316231826920971887/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=6316231826920971887' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/6316231826920971887'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/6316231826920971887'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/system-management-transactions-for-ht.html' title='System Management Transactions for HT Cpu'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-985327463628584337</id><published>2007-06-26T21:44:00.000-07:00</published><updated>2007-06-26T21:46:33.809-07:00</updated><title type='text'>Interrupt in Hypertransport Technology</title><content type='html'>&lt;table border="0" cellpadding="0" cellspacing="0" width="100%"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td valign="top"&gt;&lt;a name="ch08lev1sec1"&gt;&lt;/a&gt; &lt;h3 class="docSection1Title"&gt;Introduction&lt;/h3&gt; &lt;p class="docText"&gt;HT, unlike most legacy I/O bus implementations, does not define  the use of interrupt pins, nor an interrupt controller. Instead, interrupt  delivery is distributed to the HT devices themselves. Each device delivers  interrupts by performing memory writes to memory address locations reserved for  that purpose. The data written to these locations provides information that  historically comes from or is handled by an interrupt controller (such as  interrupt priority and vector information that specifies the location of the  interrupt service routine). This method of interrupt delivery is commonly  referred to as Message Signaled Interrupts.&lt;/p&gt; &lt;p class="docText"&gt;HT supports message signaled interrupts via two message  types:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphStrong"&gt;Interrupt Request message&lt;/span&gt; —  Interrupt requests are forwarded upstream as sized write transactions that  target a reserved interrupt request address range. The &lt;a name="idd1e20604"&gt;&lt;/a&gt;host bridge receives these packets and based on the target  address recognizes the transaction as an interrupt request. The specific actions  taken by the bridge to process the interrupt request is platform-specific and  not specified.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;a name="idd1e20612"&gt;&lt;/a&gt;&lt;span class="docEmphStrong"&gt;End of  Interrupt message&lt;/span&gt; — HT also supports an End Of Interrupt (EOI) message  that may be used by devices that require confirmation that their interrupt  service routine has completed. These messages originate at the host and are  forwarded downstream as a broadcast. Like the interrupt request message, the EOI  request packet address must also fall within the reserved address  range.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;p class="docText"&gt;The address associated with the interrupt packets must include  additional address bits to distinguish between the difference address ranges. If  the Host Bridge is to resolve the address to within the specified interrupt  address range (FD_0000_0000h to FD_F8FF_FFFFh), then Addr[31:24] must be  included within the interrupt packets. Verification of the specification's  intent can be found in the x86 compatibility definitions, which specify  Addr[31:24] be delivered in the IntrInfo[31:24] field of the interrupt packet.  To maintain compatibility with earlier versions of HT implementations, the  specification sets a default value of F8h for IntrInfo[31:24].&lt;/p&gt; &lt;p class="docText"&gt;For system platform implementations other than x86, the  specification leaves open the possibility of the interrupt range being extended,  but does not explicitly state that the interrupt range can be extended in the  absence of the PIC IACK, System Management, and IO mappings. For example, some  platforms may only need support for the interrupt and configuration packets.  This would require the use of Addr[39:26], thereby permitting the Host Bridge to  distinguish between the interrupt and configuration requests.&lt;/p&gt;&lt;h3 class="docSection1Title"&gt;&lt;a name="idd1e20744"&gt;&lt;/a&gt;Interrupt Requests&lt;/h3&gt; &lt;p class="docText"&gt;Interrupt request messages originate within HT I/O devices and  are sent upstream using the posted-write virtual channel. This assures that any  posted write transactions that preceded the interrupt request are pushed ahead  of it to memory before the host bridge receives the interrupt.&lt;/p&gt; &lt;p class="docText"&gt;HT uses message-signaled interrupts that behave much like HT  Sized Write (byte) transactions. The interrupt request packet format is defined,  but how bit fields are used is implementation-specific&lt;/p&gt; &lt;p class="docText"&gt;While interrupt information contained in a request varies with  the implementation, basic content might include:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Type of interrupt&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Target address or CPU ID of the recipient&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Interrupting device's vector&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Whether an End-Of- Interrupt (EOI) acknowledgement is  required.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;The specification defines a generic interrupt request packet  format, thereby providing flexibility for supporting the interrupt protocols  used in different platforms. For example:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;An interrupt vector might be defined as 8 bits (e.g. x86  machines), while it may be 32 bits in other architectures.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Interrupt requests may come from devices attached directly to  an HT link, and from devices residing on a legacy bus (e.g., PCI), where  interrupt requests are gathered by an interrupt controller and delivered by a  HT-to-PCI bridge to the HT bus and transported to the host.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Interrupt requests may all be directed to a single processor  for handling, or particular interrupt requests may be directed to different  processors in a multi-processor system. Etc.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a name="ch08lev2sec1"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e20825"&gt;&lt;/a&gt;Interrupt Request Packet&lt;/h4&gt;&lt;br /&gt;&lt;p class="docText"&gt;The interrupt request packet specifies a posted byte  sized-write command with an address that falls within the reserved interrupt  address range. The request packet is immediately followed by a single 4 byte  data packet which carries interrupt information. &lt;a class="docLink" href="#ch08fig04"&gt;&lt;/a&gt;&lt;/p&gt;&lt;h4 class="docSection2Title"&gt;The &lt;a name="idd1e21028"&gt;&lt;/a&gt;End of Interrupt (EOI)  Message&lt;/h4&gt; &lt;p class="docText"&gt;The HT specification defines the mechanism used to notify an  interested party that an interrupt service routine has completed execution. The  EOI is used to notify an Advanced Programmable Interrupt Controller (APIC) that  an interrupt request has been processed. This is needed when more than one  device is sharing the same interrupt line via level triggering. In such cases,  the APIC needs confirmation that an interrupt has been serviced prior to sending  another interrupt request. {See IA32 Processor Architecture book for more  details.)&lt;/p&gt; &lt;p class="docText"&gt;The EOI request is sent downstream through the HT fabric as a  &lt;a name="idd1e21053"&gt;&lt;/a&gt;broadcast message that travels in the posted channel.  Also, the EOI message targets the same reserved address ranges that interrupt  requests use. When the broadcast EOI message reaches the end of a chain, it is  simply dropped by the last device (no response is expected or sent because EOI  travels in the posted channel)&lt;/p&gt;&lt;table border="0" cellpadding="0" cellspacing="0" width="100%"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td valign="top"&gt;&lt;a name="ch08lev1sec5"&gt;&lt;/a&gt; &lt;h3 class="docSection1Title"&gt;&lt;a name="idd1e21185"&gt;&lt;/a&gt;Interrupt Discovery and  Configuration Capability Block&lt;/h3&gt; &lt;p class="docText"&gt;Each function can have its own capability block, facilitating a  mapping of interrupts to functions. The capability block not only defines the  number of interrupts the function is designed to use, but also provides a way  for system software to define the contents of the Interrupt Information fields  that will be delivered to the host during each Interrupt Request.&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-985327463628584337?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/985327463628584337/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=985327463628584337' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/985327463628584337'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/985327463628584337'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/interrupt-in-hypertransport-technology.html' title='Interrupt in Hypertransport Technology'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-8867463429928971130</id><published>2007-06-26T21:43:00.000-07:00</published><updated>2007-06-26T21:44:13.453-07:00</updated><title type='text'>I/O Ordering in HT Technology</title><content type='html'>&lt;h5 class="docSection3Title"&gt;Downstream I/O Ordering&lt;/h5&gt; &lt;p class="docText"&gt;&lt;a name="idd1e16440"&gt;&lt;/a&gt;Downstream ordering rules in  HyperTransport are much the same as the upstream rules previously described,  with a few exceptions:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;While the same virtual channels are used (&lt;a name="idd1e16450"&gt;&lt;/a&gt;posted request, non-posted request, and response),  downstream &lt;a name="idd1e16454"&gt;&lt;/a&gt;&lt;a name="idd1e16457"&gt;&lt;/a&gt;I/O streams are  determined by the target of the transaction instead of the source.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Although &lt;span class="docEmphasis"&gt;UnitID&lt;/span&gt; uniquely  identifies upstream transaction stream requests, it can't be used for this  purpose in downstream requests because the UnitID field is always that of the  host bridge (UnitID 0). All downstream request traffic is assumed to be part of  the same transaction stream (the host bridge's).&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The &lt;span class="docEmphasis"&gt;bridg&lt;/span&gt;e bit is used to help  nodes distinguish downstream from upstream response traffic. It also helps  devices interpret the UnitID field in responses. Upstream responses carry the  UnitID of the sender (the original target), while downstream responses carry the  UnitID of the original requester. Interior nodes are only allowed to claim  response packets which carry their UnitID and are moving downstream (bridge bit  set = 1).&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;A host bridge (this includes the secondary interface of  HyperTransport-HyperTransport bridges) which performs a peer-to-peer reflection  must preserve strongly ordered sequences (non-zero Sequence ID) when it reissues  them downstream. It is allowed to change the Sequence ID tag, but the same tag  will be applied to all requests in the sequence.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch06lev3sec19"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e16485"&gt;&lt;/a&gt;Double-Hosted Chain  Ordering&lt;/h5&gt; &lt;p class="docText"&gt;Upstream traffic and&lt;a name="idd1e16492"&gt;&lt;/a&gt; downstream traffic  in HyperTransport have no ordering interaction because they are in different  transaction streams. A special case arises in sharing double-hosted chains when  one of the &lt;a name="idd1e16496"&gt;&lt;/a&gt;host bridges must send traffic to the other  host bridge. Refer to &lt;a class="docLink" href="#ch06fig14"&gt;Figure 6-14&lt;/a&gt; on page  138.&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Host bridge A sends a posted write targeting host bridge  B.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;At nearly the same time, host bridge B performs a read from  host bridge A.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The read response/data will be travelling in the same direction  as the posted write (towards host bridge B).&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Although the posted write request is traveling downstream and  the read response is traveling upstream (from the perspective of Device B), the  producer-consumer ordering model requires that both must be treated as being in  the same transaction stream (response will push posted write request if PassPW  is clear).&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Devices in the path can perform ordering tests on upstream  responses based only on UnitID (both 0 in the case of two bridges communicating  with each other), and by disregarding the direction of the requests.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;In the event a host has its &lt;span class="docEmphasis"&gt;Act as  Slave&lt;/span&gt; bit set = 1, then it won't use UnitID 0; for its requests and  responses; in this case, conventional ordering based on UnitID will  work.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-8867463429928971130?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/8867463429928971130/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=8867463429928971130' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/8867463429928971130'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/8867463429928971130'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/io-ordering-in-ht-technology.html' title='I/O Ordering in HT Technology'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-1374987346156499970</id><published>2007-06-26T21:42:00.000-07:00</published><updated>2007-06-26T21:43:39.424-07:00</updated><title type='text'>Host Ordering Requirements: General Features</title><content type='html'>The HyperTransport specification breaks down the ordering rules governing transaction completion in the host system into a set of rules for ordered pairs of transactions. Depending on the request types and where the target locations are, the second request may be received but might have to wait to take effect in the host fabric until the first request reaches a specific point in completion called its ordering point. "Taking effect", in this case, means that a read request actually fetches data, a write request actually exposes new data, peer-to-peer requests are actually queued for reissue downstream, etc.&lt;br /&gt;&lt;br /&gt;How read and write accesses originating in HyperTransport are handled depends on the type of space they target in the host system.&lt;br /&gt;&lt;br /&gt;Cacheable address ranges have strongest ordering&lt;br /&gt;&lt;br /&gt;Non-cacheable memory, I/O, and MMIO have weaker ordering&lt;br /&gt;&lt;br /&gt;Interrupt and System Management Address ranges have special ordering&lt;br /&gt;&lt;br /&gt;Two Ordering Points Are Defined&lt;br /&gt;There are two ordering points (degrees of transaction completion) defined for the first transaction in an ordered pair; this information is used in determining whether the second request of the ordered pair may take effect or must wait. The ordering points are called Globally Ordered (GO) and Globally Visible (GV).&lt;br /&gt;&lt;br /&gt;Globally Ordered (GO)&lt;br /&gt;HyperTransport defines the globally ordered point for the first request as the point where it is guaranteed to be observed in the correct order (with respect to the second transaction) from any "observer". While the two transactions are guaranteed to complete in the proper order, they may not have actually done so yet. This means agents such as caches may not have been updated at this ordering point.&lt;br /&gt;&lt;br /&gt;Globally Visible (GV)&lt;br /&gt;HyperTransport defines the globally visible ordering point for the first request as the point where it is assured to be "visible" to all observers (CPUs, I/O devices, etc.). It also means that all side effects of the first request (cache transitions, etc.) have completed.&lt;br /&gt;&lt;br /&gt;Note: If there are no "sideband" agents (caches, etc.), GO and GV are equivalent.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-1374987346156499970?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/1374987346156499970/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=1374987346156499970' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/1374987346156499970'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/1374987346156499970'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/host-ordering-requirements-general.html' title='Host Ordering Requirements: General Features'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-327758265266214420</id><published>2007-06-26T21:41:00.002-07:00</published><updated>2007-06-26T21:42:45.869-07:00</updated><title type='text'>The Purpose Of Ordering Rules in a CPU</title><content type='html'>&lt;p class="docText"&gt;Some of the important reasons for enforcing ordering rules on  packets moving through HyperTransport include the following:&lt;/p&gt;&lt;a name="ch06lev2sec1"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Maintain&lt;a name="idd1e15234"&gt;&lt;/a&gt; Data Coherency&lt;/h4&gt; &lt;p class="docText"&gt;If transactions are in some way dependent on each other, a  method is required to assure that they complete in a deterministic way. For  example, if Device A performs a write transaction targeting main memory and then  follows it with a read request targeting the same location, what data will the  read transaction return? HyperTransport ordering seeks to make such events  predictable (deterministic) and to match the intent of the programmer. Note  that, compared to a shared bus such as PCI, HyperTransport transaction ordering  is complicated somewhat by point-to-point connections which result in target  devices on the same chain (logical bus) being at different levels of fabric  hierarchy.&lt;/p&gt;&lt;a name="ch06lev2sec2"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Avoid Deadlocks&lt;/h4&gt; &lt;p class="docText"&gt;Another reason for ordering rules is to handle cases where the  completion of two separate transactions are each dependent on the other  completing first. HyperTransport ordering includes a number of rules for  deadlock avoidance. Some of the rules are in the specification because of known  deadlock hazards associated with other buses to which HyperTransport may  interface (e.g. PCI).&lt;/p&gt;&lt;a name="ch06lev2sec3"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Support Legacy buses&lt;/h4&gt; &lt;p class="docText"&gt;One of the principal roles of HyperTransport is to serve as a  backbone bus which is bridged to other peripheral buses. HyperTransport  explicitly supports PCI, PCI-X, and AGP and the ordering requirements of those  buses.&lt;/p&gt;&lt;a name="ch06lev2sec4"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Maximize Performance&lt;/h4&gt; &lt;p class="docText"&gt;Finally, HyperTransport permits devices in the path to the  target, and the target itself, some flexibility in reordering packets around  each other to enhance performance. When acceptable, relaxed ordering may be  enabled by the requester on a per-transaction basis using attribute bits in  request and response packets.&lt;/p&gt;&lt;table border="0" cellpadding="0" cellspacing="0" width="100%"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td valign="top"&gt;&lt;a name="ch06lev1sec2"&gt;&lt;/a&gt; &lt;h3 class="docSection1Title"&gt;Introduction: Three Types Of Traffic Flow&lt;/h3&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;Hypertransport defines three types of  traffic:&lt;/span&gt; Programmed I/O (&lt;a name="idd1e15291"&gt;&lt;/a&gt;PIO), &lt;a name="idd1e15295"&gt;&lt;/a&gt;Direct Memory Access (DMA), and Peer-to-Peer. &lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Programmed I/O traffic originates at the host bridge on behalf  of the CPU and targets I/O or Memory Mapped I/O in one of the peripherals. These  types of transactions often are generated by CPU to set up peripherals for bus  master activity, check status, program configuration space, etc.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;&lt;a name="idd1e15315"&gt;&lt;/a&gt;DMA traffic originates at a bus master  peripheral and typically targets main memory. This traffic is used so that the  CPU may be off-loaded from the burden of moving large amounts of data to and  from the I/O subsystem. Generally, the CPU uses a few PIO instructions to  program the peripheral device with information about a required DMA transfer  (transfer size, target address in memory, read or write, etc.), then performs  some other task while the DMA transfer is carried out. When the transfer is  complete, the DMA device may generate an interrupt message to inform the  CPU.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;&lt;a name="idd1e15333"&gt;&lt;/a&gt;Peer-to-Peer traffic is generated by an  interior node and targets another interior node. In HyperTransport, direct  peer-to-peer traffic is not allowed. &lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;br /&gt;&lt;h5 class="docSection4Title"&gt;What If A Device Requires Response Ordering?&lt;/h5&gt; &lt;p class="docText"&gt;All HyperTransport devices must be able to tolerate  out-of-order response delivery or else restrict outstanding non-posted requests  to one at a time. This also applies to bridges which sit between HyperTransport  and a protocol that requires responses be returned in order. The bridge must not  issue more outstanding requests than it has internal buffer space to hold  responses it may be required to reorder.&lt;/p&gt;&lt;a name="ch06lev3sec3"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Support For The Producer-Consumer Ordering Model&lt;/h5&gt; &lt;p class="docText"&gt;When the &lt;span class="docEmphasis"&gt;PassPW&lt;/span&gt; and &lt;a name="idd1e15512"&gt;&lt;/a&gt;&lt;a name="idd1e15515"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;Sequence  ID&lt;/span&gt; bits are cleared in a request packet, HyperTransport transactions are  compatible with the same producer-consumer model PCI employs. Basic features of  the model include:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;A producer device anywhere in the system may send data and  modify a flag indicating data availability to a consumer anywhere in the  system.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The data and flag need not be located in the same device as  long as the consumer of the data waits for the response of a flag read before  attempting to access the data.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;In cases where the consumer is allowed to issue two ordered  reads &lt;span class="docEmphUl"&gt;without making them part of an ordered  sequence&lt;/span&gt; (setting SequenceID tag to a non-zero value), the  producer-consumer model is only supported if the flag and data are within the  same device.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Ordering rules guarantee that if the flag is modified after the  data becomes available, the flag read will return valid  status.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch06lev3sec4"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Producer-Consumer Model Simpler If Flag/Data In Same  Place&lt;/h5&gt; &lt;p class="docText"&gt;If the flag and data are restricted to being in the same  device, the &lt;a name="idd1e15555"&gt;&lt;/a&gt;PassPW bit may be set in requests which  relaxes the ordering of responses and improves performance. At the same time,  the producer-consumer model is maintained.&lt;/p&gt;&lt;a name="ch06lev2sec6"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e15564"&gt;&lt;/a&gt;Upstream Ordering Rules&lt;/h4&gt; &lt;p class="docText"&gt;&lt;a name="idd1e15570"&gt;&lt;/a&gt;Posted requests, &lt;a name="idd1e15574"&gt;&lt;/a&gt;non-posted requests, and responses travel in independent &lt;a name="idd1e15578"&gt;&lt;/a&gt;virtual channels. Each uses a different command, which  permits devices to distinguish them from one another. Requests have a &lt;a name="idd1e15582"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;Sequence ID&lt;/span&gt; field. Assigning  non-zero sequence ID fields to non-posted requests forces all &lt;a name="idd1e15588"&gt;&lt;/a&gt;tunnel and bridge devices in the path to the target to  forward these requests in the same order they were received. The target is also  required to maintain this order when processing these requests internally.  Requests with a Sequence ID of zero are not considered to be part of an ordered  sequence. Requests and response packets also carry a &lt;span class="docEmphasis"&gt;May  Pass &lt;/span&gt;&lt;a name="idd1e15594"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;Posted Writes&lt;/span&gt;  (PassPW) bit.&lt;/p&gt;&lt;a name="ch06lev3sec5"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Reordering Packets In Different&lt;a name="idd1e15604"&gt;&lt;/a&gt;Transaction Streams&lt;/h5&gt; &lt;p class="docText"&gt;Other than when a Fence command is issued, there is no ordering  guarantee for packets originating from different sources. Traffic from each &lt;a name="idd1e15611"&gt;&lt;/a&gt;UnitID is considered a separate transaction stream; devices  may reorder upstream packets from different streams as necessary. &lt;/p&gt;&lt;span style="font-weight: bold;"&gt;&lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Next UnitID1 receives a packet (2) from UnitID2.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;When UnitID 1 forwards the two packets onto its upstream link,  it may send packet (2) first. Packet (2) has then been reordered around packet  (1).&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;br /&gt;&lt;h5 class="docSection3Title"&gt;No Reordering Packets In A&lt;a name="idd1e15647"&gt;&lt;/a&gt;Strongly Ordered Sequence&lt;/h5&gt; &lt;p class="docText"&gt;If one requester has issued a series of request packets  carrying the same non-zero SequenceID, the packets may not be reordered  (regardless of the state of the &lt;a name="idd1e15654"&gt;&lt;/a&gt;PassPW bit. The sequence  only applies to packets within a single transaction stream (UnitID) and VC.  Upstream devices still may reorder these packets with respect to those from  other streams. &lt;a class="docLink" href="#ch06fig05"&gt;&lt;/a&gt;&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The I/O Hub issues a series of requests (1), (2), (3). All  carry the same, non-zero SequenceID in the request.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;When they are received by the first tunnel device, it checks  the sequence ID field and the UnitID (all are identical). When it forwards the  three packets to the PCI-X tunnel, it sends them in the same strongly ordered  sequence.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The HyperTransport-to-PCI-X bridge makes the same determination  and forwards packets (1), (2), and (3) through its tunnel interface to the host  bridge in the same order.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The host bridge is also required to treat the three packets as  a strongly ordered sequence internally.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;If these were non-posted requests, there would be no guarantee  of ordering in the responses returned to the I/O hub.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-327758265266214420?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/327758265266214420/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=327758265266214420' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/327758265266214420'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/327758265266214420'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/purpose-of-ordering-rules-in-cpu.html' title='The Purpose Of Ordering Rules in a CPU'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-7663005722164245052</id><published>2007-06-26T21:41:00.001-07:00</published><updated>2007-06-26T21:41:24.380-07:00</updated><title type='text'>Implementation Notes of HT Technology</title><content type='html'>&lt;table border="0" cellpadding="0" cellspacing="0" width="100%"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td valign="top"&gt;&lt;a name="ch05lev1sec6"&gt;&lt;/a&gt; &lt;h3 class="docSection1Title"&gt;A Few Implementation Notes&lt;/h3&gt;&lt;a name="ch05lev2sec11"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e15093"&gt;&lt;/a&gt;Information Packets Not  Flow-Controlled&lt;/h4&gt; &lt;p class="docText"&gt;Information packets, including NOP and Sync are not subject to  flow control. When sent, they must be accepted, and are used for point-point  communication between a transmitter and its corresponding receiver on a given  link.&lt;/p&gt;&lt;a name="ch05lev2sec12"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Transmitter Must Be Able To Track 15 Buffer  Entries&lt;/h4&gt; &lt;p class="docText"&gt;While a receiver has the option of implementing buffers of any  desired depth, including single entry buffers, each transmitter interface must  implement its flow control counters such that they can track up to 15 entries in  each of the six corresponding receiver flow control buffers (a four bit transmit  counter will do this). If the transmitter counter is larger than that of the  receiver, only a portion of it will ever be used (because NOP updates are always  are based on available receiver flow control buffer entries). In the event that  a transmitter implements a counter smaller than that of the receiver, the  counter must "saturate" at the maximum value it can handle, and not roll over.  The idea is that once the counters are initialized, they will use the maximum  count that both devices can accommodate.&lt;/p&gt;&lt;a name="ch05lev2sec13"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Sometimes Two Counters Must Be Checked&lt;/h4&gt; &lt;p class="docText"&gt;A transmitter can't issue a request packet that has data  associated with it (e.g. a posted or non-posted write) without assuring the  receiver has buffer entries available for both the request and the data. This is  necessary because there is no receiver disconnect or retry mechanism once such a  transaction starts.&lt;/p&gt;&lt;a name="ch05lev2sec14"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e15120"&gt;&lt;/a&gt;NOP Packets Cannot Be  Completely Blocked&lt;/h4&gt; &lt;p class="docText"&gt;The HyperTransport Specification indicates that it is the  responsibility of each device to make certain that NOP update packets it sends  are not starved (prevented from being sent) because of the sending of other  types of traffic. If they are blocked, eventually one or more of the virtual  channels may completely stall.&lt;/p&gt;&lt;a name="ch05lev2sec15"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;The&lt;a name="idd1e15132"&gt;&lt;/a&gt;Isochronous Flow Control  Option&lt;/h4&gt; &lt;p class="docText"&gt;In the event that a designer decides to provide Isochronous  flow control in addition to the standard three virtual channels, each receiver  interface which supports Isochronous will implement six more receiver flow  control buffers and counters, and an additional set of transmitter flow control  counters as well. The way the receiver determines which flow control buffer  (isochronous or standard) a packet should use is determined by a bit in the  request packet (&lt;a name="idd1e15139"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;Isoc&lt;/span&gt; bit).  If the Isoc bit is asserted in a request, it will also be asserted in the  response when it comes back — again identifying the buffer set to use.&lt;/p&gt;&lt;a name="ch05lev3sec12"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;How About NOP Updates For Isochronous Buffers?&lt;/h5&gt; &lt;p class="docText"&gt;If a device supports the Isochronous flow control buffers, it  will track packet progress through these buffers in the same way as it does for  non-isochronous packets it receives. As Isochronous buffer entries become  available, the receiver will return NOP update packets to the other device and  will set the &lt;span class="docEmphasis"&gt;Isoc&lt;/span&gt; bit in the NOP packet (byte 2,  bit 5) indicating the NOP packet updates should be applied to the isochronous  transmit counters.&lt;/p&gt;&lt;a name="ch05lev3sec13"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Isochronous Traffic/Non-Isochronous Flow Control&lt;/h5&gt; &lt;p class="docText"&gt;Receivers which see request packets with the Isoc bit set, but  which are not in isochronous flow control mode, do not use the dedicated  isochronous flow control buffers to handle them. In this case, the standard six  flow control buffers are used and NOP buffer update packets returned to the  transmitter all apply to the standard transmitter flow counters. Such devices  preserve the Isoc bit in both the request packet and its response as they  forward it to the next device; in this way, if there is a device in the path  that does support isochronous traffic, it can still be used in that portion of  the topology.&lt;/p&gt;&lt;a name="ch05lev3sec14"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Isochronous Traffic Disabled At Initialization&lt;/h5&gt; &lt;p class="docText"&gt;At initialization, all devices are disabled with respect to  isochronous traffic. Software can later enable ISOC traffic on a link-by-link  basis after support on both sides of the link is determined through &lt;a name="idd1e15171"&gt;&lt;/a&gt;configuration space accesses. Once enabled for ISOC traffic,  each device which sees isochronous packets and supports them is expected to  apply a higher priority to them than for standard virtual channel packets.&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-7663005722164245052?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/7663005722164245052/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=7663005722164245052' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/7663005722164245052'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/7663005722164245052'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/implementation-notes-of-ht-technology.html' title='Implementation Notes of HT Technology'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-7713312427769717758</id><published>2007-06-26T21:40:00.001-07:00</published><updated>2007-06-26T21:40:52.659-07:00</updated><title type='text'>The basic steps in flow control counter initialization</title><content type='html'>&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList"&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="1"&gt; &lt;p class="docText"&gt;The transmitter in Device 1 initializes its Posted Request  (CMD) counter to 0 at reset (all transmit counters reset = 0). It then waits for  the receiver on the other side to update this counter with the starting buffer  depth available (this will be the maximum depth the receiver  supports).&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="2"&gt; &lt;p class="docText"&gt;Device 2 loads its receiver Posted Request counter = 5 (its  maximum).&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="3"&gt; &lt;p class="docText"&gt;Device 2 then sends two NOP packets which carry this buffer  availability information: the first NOP has a 11b (3) in the Post CMD field  (Byte 1, bits 0,1 above), and the second NOP has a 10b (2) in this field. Total  = 5.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="4"&gt; &lt;p class="docText"&gt;Upon receipt of these two NOPs, the Device 1 has updated its  transmit counter, first by three then again by two. It now has 5 "credits"  available for sending Posted Request packets — representing five separate Posted  Requests which may be initiated.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="5"&gt; &lt;p class="docText"&gt;Having sent the NOPs, the Device 2 RCV counter is now at 0, and  will remain that way until additional packets are received, processed, and move  out of the buffer, thereby creating new entries.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt; &lt;p class="docText"&gt;Note that this process will be repeated for each of the six  required flow control buffers; it will also be done for the six isochronous flow  control buffers if they are supported. In the NOP packet format (see above), six  transmit registers can be updated at once using the six fields provided. The  &lt;span class="docEmphasis"&gt;Isoc&lt;/span&gt; bit (Byte 2, bit 5) would be set if the NOP  update was to be applied to the isochronous flow control buffer set.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-7713312427769717758?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/7713312427769717758/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=7713312427769717758' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/7713312427769717758'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/7713312427769717758'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/basic-steps-in-flow-control-counter.html' title='The basic steps in flow control counter initialization'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-583315666004499678</id><published>2007-06-26T21:39:00.000-07:00</published><updated>2007-06-26T21:40:38.888-07:00</updated><title type='text'>PCI and Hypertransport Handles Flow Control</title><content type='html'>&lt;h4 class="docSection2Title"&gt;How PCI Handles Flow Control&lt;/h4&gt; &lt;p class="docText"&gt;While the PCI specification permits 64-bit data bus and 66MHz  clock options, a generic PCI bus carries only 32 bits (4 bytes) of data and runs  at a 33MHz clock speed. This means that the burst bandwidth for this bus is  132MB/s (4 bytes x 33MHz = 132MB/s). In many systems the PCI bus is populated by  all sorts of high- and low-performance peripherals such as hard drives, graphics  adapters, and serial port adapters. All PCI bus master devices must take turns  accessing the shared bus and performing their transfers. The priority of a bus  master in accessing the bus and the amount of time it is allowed to retain  control of the bus is a function of PCI &lt;span class="docEmphasis"&gt;arbitration.&lt;/span&gt; In a typical computer system, the PCI  arbiter logic resides in the system chipset.&lt;/p&gt; &lt;p class="docText"&gt;Once a PCI bus master has won arbitration and verifies the bus  is idle, it commences its transaction. After decoding the address and command  sent by the master, one target claims the cycle by asserting a signal called  DEVSEL#. At this point, if both devices are prepared, either &lt;span class="docEmphasis"&gt;write data&lt;/span&gt; will be sent by the initiator or &lt;span class="docEmphasis"&gt;read data&lt;/span&gt; will be returned by the target. For cases  where either the master or target are not prepared for full-speed transfer of  some or all of the data, &lt;a name="idd1e14412"&gt;&lt;/a&gt;flow control comes into play. In  PCI there are a number of cases that must be dealt with.&lt;/p&gt;&lt;a name="ch05lev3sec1"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;PCI Target Flow Control Problems&lt;/h5&gt;&lt;a name="ch05lev4sec1"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;PCI Target Not Ready To Start&lt;/h5&gt; &lt;p class="docText"&gt;In some cases, a PCI device being targeted for transmission is  not prepared to transfer any data at all. This could happen if the target is  off-line, does not have buffer space for write data being sent to it, or does  not have requested read data available. It may also occur if the transaction  must cross a bridge device to a different bus. Many bus protocols, including  PCI, place a limit on how long the bus may be stalled before completing a  transaction; in cases where a target can't meet the requirement for even the  first data, a mechanism is required to indicate the transaction should be  abandoned and re-attempted later. PCI calls the target cancellation of a  transaction (without transferring any data) a &lt;span class="docEmphasis"&gt;Retry&lt;/span&gt;; a Retry is indicated when a target asserts the  STOP# signal (instead if TRDY#) in the first data phase.&lt;/p&gt;&lt;a name="ch05lev4sec2"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;PCI Target Starts Data Transfer, But Can't  Continue&lt;/h5&gt; &lt;p class="docText"&gt;Another possibility is that a transaction started properly,  some data has transferred, but at some point before completion the target  "realizes" it can't continue the transfer within the time allowed by the  protocol. The target must indicate to the master that the transaction must be  suspended (and resumed later at the point where it left off). PCI calls this  target suspension of a transaction (with a partial transfer of data) a &lt;span class="docEmphasis"&gt;Disconnect&lt;/span&gt;. A Disconnect is signalled when the target  asserts the STOP# signal in a data phase after the first one.&lt;/p&gt;&lt;a name="ch05lev4sec3"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;PCI Target Starts, Can Continue, But Needs More  Time&lt;/h5&gt; &lt;p class="docText"&gt;Sometimes a transaction is underway and the target requires  additional time to complete transmission of a particular data item; in this  case, it does not need to suspend the transaction altogether, but simply stretch  one or more data phases. The generic name for this is &lt;span class="docEmphasis"&gt;wait-state insertion.&lt;/span&gt; Wait states are a reasonable  alternative to Retry and Disconnect if there are not too many of them; when  there are excessive wait states, bus performance would be better served by the  devices giving up the bus and allowing it to be used by other devices while they  prepare for the resumption of the suspended transaction. PCI targets de-assert  the TRDY# signal during any data phase to indicate wait states. A target must be  prepared to complete each data phase within 8 PCI clocks (maximum of seven wait  states), except for the first data phase which it must complete within 16  clocks. If a target cannot meet the "16 and 8 tick" rules for completing a data  phase, it must signal Retry or Disconnect instead.&lt;/p&gt;&lt;a name="ch05lev3sec2"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;PCI Initiator Flow Control Problems&lt;/h5&gt; &lt;p class="docText"&gt;While many flow control problems are associated with the target  of a transaction, there are a couple which may occur on the initiator side.  Again, the cases are described in terms of PCI protocol.&lt;/p&gt;&lt;a name="ch05lev4sec4"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;PCI Initiator Starts, But Can't Continue&lt;/h5&gt; &lt;p class="docText"&gt;Some bus protocols also allow an initiator to break off a  transaction early in the event it can't accept the next read data or source the  next write data within the time allowed by the protocol — even with wait states.  PCI initiators suspend transactions simply by de-asserting the FRAME# signal  early. As a rule, the master will re-arbitrate later for the PCI bus and perform  a new transaction which picks up from where it left off previously.&lt;/p&gt;&lt;a name="ch05lev4sec5"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;PCI Initiator Starts, Can Continue, But Needs  Wait-States&lt;/h5&gt; &lt;p class="docText"&gt;Some bus protocols allow an initiator to insert wait states in  a transfer, just as the target may. Other bus protocols (e.g. PCI-X) only allow  targets to insert wait states — based on the assumption that a device which  starts a transaction should be ready to complete it before requesting the bus.  In any case, PCI initiators de-assert the IRDY# signal to indicate wait states.  An initiator must be prepared to complete each data phase within 8 clocks  (maximum of seven wait states); if it can't meet this rule for any data phase,  it must instead suspend the transaction by de-asserting FRAME#.&lt;/p&gt;&lt;a name="ch05lev2sec2"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;All PCI Flow Control Problems Hurt Performance&lt;/h4&gt; &lt;p class="docText"&gt;Each of the initiator and target flow control problems just  described impact PCI bus performance for both the devices involved in the  transfer, and for devices waiting to access the bus. While not every transaction  is afflicted with target retries and disconnects, or early de-assertion of  FRAME# by initiators, they happen enough to make effective bandwidth  considerably less than 132MB/s on the PCI bus. In addition, arbitration and flow  control uncertainties make system performance difficult to estimate.&lt;/p&gt;&lt;br /&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;/p&gt;&lt;h3 class="docSection1Title"&gt;HyperTransport&lt;a name="idd1e14550"&gt;&lt;/a&gt; Flow Control:  Overview&lt;/h3&gt; &lt;p class="docText"&gt;All of the flow control problems described previously for PCI  severely hurt bus performance and would be even less acceptable on a very  high-performance connection. The flow control scheme used in HyperTransport  applies independently to each transmitter-receiver pair on each link. The basic  features include the following.&lt;/p&gt;&lt;a name="ch05lev2sec3"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Packets Never Start Unless Completion Assured&lt;/h4&gt; &lt;p class="docText"&gt;All transfers across HyperTransport links are packet based. No  link transmitter ever starts a packet transfer unless it is known the packet can  be accepted by the receiver. This is accomplished with the "coupon based" flow  control scheme described in this section, and eliminates the need for the Retry  and Disconnect mechanisms used in PCI.&lt;/p&gt;&lt;a name="ch05lev2sec4"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Transfer Length Is Always Known&lt;/h4&gt; &lt;p class="docText"&gt;Hypertransport control packets have a fixed size (four or eight  bytes) and data packets have a &lt;span class="docEmphasis"&gt;known&lt;/span&gt; and &lt;span class="docEmphasis"&gt;maximum&lt;/span&gt; transfer length, unlike PCI data transfers.  This makes buffer sizing and flow control much more straightforward as both  transmitter and receiver are aware of their actual transfer commitments. It also  makes the interleaving of control packets with data packets much simpler.&lt;/p&gt;&lt;a name="ch05lev2sec5"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Split Transactions Used When Response Is  Required&lt;/h4&gt; &lt;p class="docText"&gt;HyperTransport performs all read and non-posted write  operations as split transactions, eliminating the need for the inefficient Retry  mechanism used in PCI. A split transaction breaks a transfer which requires a  response (and maybe data) into two parts — the sending of the request packet,  followed later by response/data packets returned by the original target. This  keeps the link free during the period between request and response, and means  that the burden for completing the transaction is on the device best equipped to  know when it is possible to do so — the target.&lt;/p&gt;&lt;a name="ch05lev2sec6"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Flow Control Pins Are Eliminated&lt;/h4&gt; &lt;p class="docText"&gt;Because HyperTransport uses a message-based flow control  scheme, it eliminates the flow control handshaking pins and signal traces found  on other buses. Instead, each pair of devices on a link convey flow control  information related to their receivers by sending update &lt;a name="idd1e14606"&gt;&lt;/a&gt;NOP packets over their transmitter connections.&lt;/p&gt;&lt;a name="ch05lev2sec7"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e14614"&gt;&lt;/a&gt;Flow Control Buffers Mean No  Bus Wait States&lt;/h4&gt; &lt;p class="docText"&gt;All link receiver interfaces are required to implement a set of  buffers which are capable of receiving packets at full speed. Once a transmitter  has determined that buffer space is available at the receiver, the transfer of  the bytes within the packet always proceeds at full bus speed into the receiver  buffer. The buffers are sized such that the full packet can always be accepted.  Data packets can be as large as 64 bytes (16 dwords) and control packets can be  as large as 8 bytes. The one twist to this is the fact that the transmitter has  the option of interleaving new control packets into a large data packet on four  byte boundaries. Still, this is done at full speed, without any wait states. The  transmitter simply asserts the &lt;a name="idd1e14621"&gt;&lt;/a&gt;CTL signal to indicate  control packets are moving across the CAD bus, and deasserts it to indicate data  packets are moving across; the target uses the CTL signal input to determine  which buffer the packet should enter.&lt;/p&gt;&lt;a name="ch05lev2sec8"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e14632"&gt;&lt;/a&gt;Flow Control Buffers For  Each&lt;a name="idd1e14636"&gt;&lt;/a&gt; Virtual Channel&lt;/h4&gt; &lt;p class="docText"&gt;Finally, because there are a minimum of three virtual channels  as packets move through HyperTransport, the flow control mechanism maintains  separate flow control buffer pairs for the &lt;a name="idd1e14643"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;posted request, non-posted request,&lt;/span&gt; and &lt;span class="docEmphasis"&gt;response&lt;/span&gt; virtual channels. Each &lt;a name="idd1e14652"&gt;&lt;/a&gt;non-posted request has an associated response (and possibly  data); that must be tracked internally by the device until the response comes  back. Posted requests do not have a response, and may be flushed internally as  soon as they are processed. In addition, the separate flow control buffers are  important in enforcing the ordering rules that apply to the three virtual  channels.&lt;/p&gt; &lt;p class="docText"&gt;Optionally, devices may also support isochronous transfers; in  this case, three additional receiver flow control buffer sets (CMD/Data) would  be required to track this traffic.&lt;/p&gt;&lt;br /&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;/p&gt;&lt;h5 class="docSection3Title"&gt;Flow Control Buffer Pairs (Item 1)&lt;/h5&gt; &lt;p class="docText"&gt;Each receiver interface is required to implement six buffers to  accept the following packet types being sent by the corresponding transmitter.  &lt;span class="docEmphasis"&gt;The specification requires a &lt;span class="docEmphUl"&gt;minimum depth of one&lt;/span&gt;&lt;/span&gt; &lt;span class="docEmphasis"&gt;for  each buffer, meaning that a receiver is permitted to deal with as few as one  packet of each type at a time. It may optionally increase the depth of one or  more of the buffers to track multiple packets at a time.&lt;/span&gt;&lt;/p&gt;&lt;a name="ch05lev4sec6"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;&lt;a name="idd1e14740"&gt;&lt;/a&gt;Posted Request Buffer  (Command)&lt;/h5&gt; &lt;p class="docText"&gt;This buffer stores incoming posted request packets. Because  every request packet is either four or eight bytes in length, each entry in this  buffer should be eight bytes deep.&lt;/p&gt;&lt;a name="ch05lev4sec7"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Posted Request Buffer (Data)&lt;/h5&gt; &lt;p class="docText"&gt;This buffer is used in conjunction with the previous one and  stores data associated with a Posted Request. Because posted request data  packets may range in size from 1 dword to 16 dwords (64 bytes), each entry in  this buffer should be 64 bytes deep.&lt;/p&gt;&lt;a name="ch05lev4sec8"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Non-Posted Request Buffer (Command)&lt;/h5&gt; &lt;p class="docText"&gt;This buffer stores incoming non-posted request packets. Because  every request packet is either four or eight bytes in length, each entry in this  buffer should be eight bytes deep.&lt;/p&gt;&lt;a name="ch05lev4sec9"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Non-Posted Request Buffer (Data)&lt;/h5&gt; &lt;p class="docText"&gt;This buffer is used in conjunction with the previous one and  stores data associated with a &lt;a name="idd1e14771"&gt;&lt;/a&gt;Non-Posted Request. Because  non-posted request data packets may range in size from 1 dword to 16 dwords (64  bytes), each entry in this buffer should be 64 bytes deep.&lt;/p&gt;&lt;a name="ch05lev4sec10"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Response Buffer (Command)&lt;/h5&gt; &lt;p class="docText"&gt;This buffer stores returning response packets. Because every  response packet is four bytes in length, each entry in this buffer should be  four bytes deep.&lt;/p&gt;&lt;a name="ch05lev4sec11"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Response Buffer (Data)&lt;/h5&gt; &lt;p class="docText"&gt;This buffer is used in conjunction with the previous one and  stores data associated with a returning response. Because responses may precede  data packets ranging in size from 1 dword to 16 dwords (64 bytes), each entry in  this buffer should be 64 bytes deep.&lt;/p&gt;&lt;a name="ch05lev3sec4"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e14796"&gt;&lt;/a&gt;Receiver Flow Control Counters  (Item 2)&lt;/h5&gt; &lt;p class="docText"&gt;The receiver interface uses one counter for each of the flow  control buffers to track the availability of new buffer entries. The size of the  counter is a function of how many entries were designed into the corresponding  flow control buffer. After initialization reports the starting buffer size to  the transmnitter, the value in each counter only increments when a new entry  becomes available due to a packet being consumed or forwarded; it decrements  when NOP packets carrying buffer update information are sent to the transmitter  on the other side of the link.&lt;/p&gt;&lt;a name="ch05lev3sec5"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e14807"&gt;&lt;/a&gt;Transmitter Flow Control  Counters (Item 3)&lt;/h5&gt; &lt;p class="docText"&gt;It is a transmitter responsibility on each link to check the  current state of receiver readiness before sending a packet in any of the three  required virtual channels. It does this by maintaining its own set of flow  control counters, which track the available entries in the corresponding  receiver flow control buffer. For example, if the transmitter wishes to send a  read request across the link, it would first consult the Non-Posted Request CMD  counter to see the current number of credits. If the counter = 0, the receiver  is not prepared to accept any additional packets of this type and the  transmitter &lt;span class="docEmphUl"&gt;must wait&lt;/span&gt; until the count is updated  via the NOP mechanism to a value &gt;0. If the counter value is =1, the receiver  will accept one packet of this type, etc. Note that for requests that are  accompanied by data (e.g. posted or non-posted writes), the transmitter must  consult &lt;span class="docEmphUl"&gt;both&lt;/span&gt; its CMD counter and the Data counter  for that virtual channel. If either is at 0, it must wait until both counters  have been updated to non-zero values.&lt;/p&gt;&lt;a name="ch05lev3sec6"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e14824"&gt;&lt;/a&gt;NOP Packet Update Information  (Item 4)&lt;/h5&gt; &lt;p class="docText"&gt;During idle times on the link, each device sends NOP packets to  the other. If one or more buffer entries in any of the six receiver flow control  buffers have become available, designated fields in the NOP packets are encoded  to indicate that fact. Otherwise those fields contain 0, indicating no new  buffer entries have become available since the previous NOP transmission. In the  next section, use of the NOP packet fields for flow control updates is reviewed. &lt;/p&gt;&lt;a name="ch05lev3sec7"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Control Logic (Item 5)&lt;/h5&gt; &lt;p class="docText"&gt;This generic representation of internal control logic is  intended to indicate that a number of things related to flow control are under  the management of each HyperTransport device. In general:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Logic associated with the transmit side of a link interface  always must consult transmitter flow counters before commencing a packet  transfer in any virtual channel. This assures that any packet sent will be  accepted.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Logic monitoring the progress of packet processing in the  receiver flow control buffers, must translate new entries that become available  into NOP update information to be passed back to the transmitter.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Logic monitoring the receive side of a link interface must  parse incoming NOPs to determine if the receiver is reporting any changes in  buffer availability. If so, then the information is used to update the  transmitter's flow control counters to match the available buffer entries on the  receiver side.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a name="ch05lev3sec8"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Transmit And&lt;a name="idd1e14863"&gt;&lt;/a&gt;Receive FIFO (Item  6)&lt;/h5&gt; &lt;p class="docText"&gt;The &lt;a name="idd1e14870"&gt;&lt;/a&gt;transmit and receive FIFOs are not  part of flow control at all, and are shown here as a reminder that all packets  moving across the high-speed HyperTransport link pass through an additional  layer of buffering to help deal with the effects of clock mismatch within the  two devices, skew between multiple clocks sourced by the transmitter on a wide  interface, etc.&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;/p&gt;&lt;h3 class="docSection1Title"&gt;Example: Initialization And Use Of The Counters&lt;/h3&gt; &lt;p class="docText"&gt;The following three diagrams and associated descriptions  explain the initialization of HyperTransport buffer counts, followed by the  actions taken by the transmitter and receiver as two packets are sent across the  link. The diagrams have been simplified to show a single flow control buffer and  the corresponding receiver and transmitter counters used to track available  entries. In this example, assume the following:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;The flow control buffer illustrated is the Posted Request  Command (CMD) buffer.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;The designer of the receiver interface has decided to construct  this flow control buffer with a depth of five entries. Because this is a buffer  for receiving requests, each entry in the buffer will hold up to 8 bytes (this  covers the case of either four or eight byte request packets)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Following initialization, the transmitter wishes to send two  Posted Request packets to the receiver.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a name="ch05lev2sec10"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Basic Steps In Counter Initialization And Use&lt;/h4&gt;&lt;a name="ch05pr01"&gt;&lt;/a&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList"&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="1"&gt; &lt;p class="docText"&gt;At reset, the transmitter counters in each device are reset =  0. This prevents the initiation of any packet transfers until buffer depth has  been established.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="2"&gt; &lt;p class="docText"&gt;At &lt;a name="idd1e14923"&gt;&lt;/a&gt;reset, the receiver interfaces load  each of the RCV counters with a value that indicates how many entries its  corresponding flow control buffer supports (shown as N in the diagram). This is  necessary because the receiver is allowed to implement buffers of any  depth.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="3"&gt; &lt;p class="docText"&gt;Each device then transmits its initial receiver buffer depth  information to the other device using NOP packets. Each NOP packet can indicate  a range of 0-3 entries. If the receiver buffer being reported is deeper than 3  entries, the device will send additional NOPs which carry the remainder of the  count.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="4"&gt; &lt;p class="docText"&gt;As each device receives the initial NOP information, it updates  its transmitter flow control counters, adding the value indicated in the NOP  fields to the appropriate counter total.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="5"&gt; &lt;p class="docText"&gt;When a device has a non-zero value in the counter, it can send  packets of the appropriate type across the link. Each time it sends packet(s),  the device subtracts the number of packets sent from the current transmitter  counter value. If the counter decrements to 0, the transmitter must wait for NOP  updates before proceeding with any more packet  transmission.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;p class="docText"&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-583315666004499678?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/583315666004499678/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=583315666004499678' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/583315666004499678'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/583315666004499678'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/pci-and-hypertransport-handles-flow.html' title='PCI and Hypertransport Handles Flow Control'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-357822184251784449</id><published>2007-06-26T21:38:00.000-07:00</published><updated>2007-06-26T21:39:11.178-07:00</updated><title type='text'>Transactions in HT Technology</title><content type='html'>&lt;h5 class="docSection3Title"&gt;RdSized And WrSized Requests: Transaction Limits&lt;/h5&gt; &lt;p class="docText"&gt;Using the various request packet option bits when constructing  RdSized and WrSized transactions makes it possible to perform byte and dword  read and write transfers in a number of variations. The following section  describes some of the key limits associated with RdSized and WrSized  requests.&lt;/p&gt;&lt;a name="ch04lev4sec2"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;RdSized And WrSized (Dword) Transactions&lt;/h5&gt; &lt;p class="docText"&gt;Sized &lt;a name="idd1e12002"&gt;&lt;/a&gt;dword read and write transactions  &lt;a name="idd1e12006"&gt;&lt;/a&gt;can transfer any number&lt;a name="idd1e12010"&gt;&lt;/a&gt; of  contiguous dwords within a 64 byte, address-aligned block. The request packet  &lt;span class="docEmphasis"&gt;Mask/Count&lt;/span&gt; field provides the number of dwords to  be transferred, beginning at the start address and indexing addresses  sequentially upward until the limit defined by the Mask/Count field is reached.  All bytes in the range are considered valid. &lt;a name="idd1e12017"&gt;&lt;/a&gt;Dword read  and write start addresses must be dword aligned. If the start address is 64 byte  aligned, the transfer may include the entire 64 byte (16 dword) region; if the  start address is not 64 byte aligned, the transfer can only go to the end of the  current 64-byte address-aligned block. Dword requests which would cross 64 byte  address boundaries must be broken into multiple transactions.&lt;/p&gt;&lt;a name="ch04lev4sec3"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;RdSized (Byte) Transactions&lt;/h5&gt; &lt;p class="docText"&gt;Sized &lt;a name="idd1e12034"&gt;&lt;/a&gt;&lt;a name="idd1e12037"&gt;&lt;/a&gt;byte read  transactions can transfer any combination of &lt;a name="idd1e12041"&gt;&lt;/a&gt;bytes within  one address-aligned dword; requests which would cross an aligned dword address  boundary must be broken into multiple transactions. The request packet &lt;span class="docEmphasis"&gt;Mask/Count&lt;/span&gt; field provides the "byte enable" mask  pattern, indicating which bytes are valid. Mask[0] qualifies byte 0, Mask[1]  qualifies byte 1, etc. Any mask pattern is legal; mask bits can be ignored by  targets reading from "pre-fetchable" locations (all four bytes in the target  dword are always returned).&lt;/p&gt;&lt;a name="ch04lev4sec4"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;WrSized (Byte) Transactions&lt;/h5&gt; &lt;p class="docText"&gt;Sized &lt;a name="idd1e12056"&gt;&lt;/a&gt;byte write transactions can  transfer any combination of bytes&lt;a name="idd1e12060"&gt;&lt;/a&gt; within a 32-byte  address-aligned region. The request packet &lt;span class="docEmphasis"&gt;Mask/Count&lt;/span&gt; field provides the total number of &lt;span class="docEmphUl"&gt;dwords&lt;/span&gt; to be transferred including the required single  dword "write mask" pattern. The mask itself is sent just ahead of the data byte  payload, and indicates which of the data bytes that follow are valid. Mask  bit[0] qualifies byte 0, Mask bit [31] qualifies byte 31, etc. Byte write start  address must be dword aligned. If the start address is 32 byte aligned, the  write transfer may be as large as the entire 32 byte (8 dword) region; if the  start address is not 32 byte aligned, the transfer can only go to the end of the  current 32 byte address-aligned block. Basically, start address bits [4:2]  identify the first the valid dword of data within the 32-byte region defined by  start address bits [39:5]. Byte write requests which would cross 32 byte address  boundaries must be broken into multiple transactions. A couple of subtle things  about these transfers:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;The entire dword (32 bit) mask is always sent ahead of the data  payload, regardless of start address and number of bytes being transferred. Mask  bit fields are cleared for all invalid bytes in the 32-byte region ahead of the  start address, for all invalid bytes within the transfer range itself, and for  all unsent bytes remaining in the 32-byte region beyond the transfer limit  implied by the Mask/Count field.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;While it isn't illegal to send invalid dwords at the front and  back of a WrSized (Byte) transfer, it is more efficient to adjust the start  address and Mask/Count field to trim off completely invalid dwords in front of  the first and after the last dwords containing at least one valid byte in the 32  byte aligned region.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a name="ch04lev3sec10"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;RdSized And WrSized Requests: Other Notes&lt;/h5&gt;&lt;a name="ch04lev4sec5"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Coherency&lt;/h5&gt; &lt;p class="docText"&gt;The coherency bit in the Command field of RdSized and WrSized  request packets (Byte 0, bit 0) indicates whether host cache coherency is a  concern when HyperTransport RdSized and WrSized requests target host memory.  Some buses, such as PCI, require coherency enforcement any time a transaction  originating in the I/O subsystem targets main memory. This can represent a  serious performance hit as processors spend much of their time snooping internal  caches for accesses which they may not cache anyway.&lt;/p&gt; &lt;p class="docText"&gt;HyperTransport uses the coherency bit in the Command field of  the request packet to inform the system whether coherency actions are required.  If the coherency bit is set:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;All HyperTransport writes targeting host memory result in the  CPU updating or invalidating the relevant cache line.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;All HyperTransport reads targeting main memory must result in  the latest copy being returned to the requestor. If the CPU has a modified cache  line, the system must assure that this is the one returned to the  requestor.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;If a device has no particular requirement for coherency, it may  chose to keep the coherency bit cleared. In this case, the request will complete  without any coherency events.&lt;/p&gt; &lt;p class="docText"&gt;&lt;span class="docEmphBoldItalic"&gt;Special Case: Forcing A Coherency  Event.&lt;/span&gt; A RdSized (byte) targeting host memory with all Mask/Count bits  set = 0 (no valid bytes) and coherency bit set = 1 in the request packet Command  field causes a host coherency action, using the address provided in the read.  One dword of invalid data will be returned.&lt;/p&gt;&lt;a name="ch04lev4sec6"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;WrSized Requests And The &lt;span class="docEmphasis"&gt;Posted&lt;/span&gt; Bit&lt;/h5&gt; &lt;p class="docText"&gt;Sized write request packets may or may not set the &lt;span class="docEmphasis"&gt;posted&lt;/span&gt; bit (bit 5 of the CMD field). The implications  of this bit are as follows:&lt;/p&gt; &lt;p class="docText"&gt;If set, the bit indicates the write request will travel in the  &lt;a name="idd1e12141"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;posted request&lt;/span&gt; virtual  channel and that there will not be a response from the target. Each device in  the transaction path may de-allocate its buffers as soon as the posted request  is transmitted. This also means that the SrcTag field is not used (reserved)  because posted writes have no outstanding responses to track. This is in  contrast to non- posted requests which require a unique SrcTag field for each  request issued.&lt;/p&gt; &lt;p class="docText"&gt;It the posted bit is not set, the requestor expects a  confirmation that the data written has reached the destination — and is willing  to suffer the performance penalty and wait for it. Eventually, a Target Done  response will be routed back to the original requestor. In HyperTransport,  certain address ranges require non-posted writes; this includes configuration  and I/O cycles.&lt;/p&gt;&lt;a name="ch04lev4sec7"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Errors During RdSized Transactions&lt;/h5&gt; &lt;p class="docText"&gt;In the event of a read error (SizedRd command), a response and  all requested data is returned to the requestor, even though some or all of the  data is not valid. Proceeding with a "dummy" read of invalid data is mainly for  the benefit of devices in the transaction path that have already allocated flow  control buffer space for the returning data. These devices use the return of  each byte to simplify de-allocation of buffer space.&lt;/p&gt;&lt;a name="ch04lev4sec8"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;&lt;span class="docEmphasis"&gt;PassPW&lt;/span&gt; and &lt;span class="docEmphasis"&gt;Response May Pass &lt;/span&gt;&lt;a name="idd1e12167"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;Posted Requests&lt;/span&gt; bits&lt;/h5&gt; &lt;p class="docText"&gt;HyperTransport supports the strict producer-consumer ordering  model found in PCI systems. There are occasions when strict producer/consumer  ordering may not be required. In these cases, devices are allowed some  flexibility in reordering of posted and non-posted request packets, as well as  response packets. Ordering rules, including relaxed ordering, are described in  more detail in the chapter entitled Ordering. Relaxing ordering rules is  application-specific, and may provide better system performance in some  cases.&lt;/p&gt; &lt;p class="docText"&gt;The source of a transaction indicates whether or non relaxed  ordering is permitted through the setting or clearing of two bits in a  request:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docText"&gt;&lt;span class="docEmphStrong"&gt;PassPW bit&lt;/span&gt;. The &lt;span class="docEmphasis"&gt;PassPW&lt;/span&gt; request packet bit (Byte 1, bit 7) is programmed  in the request packet and affects how ordering rules are applied to request as  it moves toward the target. If set = 1, relaxed ordering is enabled; if PassPW  is clear, relaxed ordering is not allowed.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docText"&gt;&lt;span class="docEmphStrong"&gt;Response May Pass Posted  Requests&lt;/span&gt; &lt;span class="docEmphRoman"&gt;bit&lt;/span&gt;. For RdSized transactions,  there is also a bit in the Command field of the RdSized request packet called  &lt;span class="docEmphasis"&gt;Response May Pass Posted Requests&lt;/span&gt; (Byte 0, bit  3). This bit state will be replicated in the PassPW bit of the returning  response and affects how ordering rules are applied to response as it moves back  to the original source. The &lt;span class="docEmphasis"&gt;Response May Pass Posted  Requests&lt;/span&gt; bit does not apply to commands other than RdSized. For reads,  the bit should be cleared if the strict producer/consumer ordering model is  required; otherwise this bit and the PassPW bit should both be set in the  request.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch04lev4sec9"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;&lt;a name="idd1e12222"&gt;&lt;/a&gt;Compatibility Bit&lt;/h5&gt; &lt;p class="docText"&gt;In keeping with PCI subtractive decoding, HyperTransport may  use the &lt;span class="docEmphasis"&gt;Compat&lt;/span&gt; bit in RdSized and WrSized request  packets (Byte 2, bit 5) to enable them to reach legacy hardware (e.g. boot  firmware) behind the system &lt;a name="idd1e12235"&gt;&lt;/a&gt;subtractive decoder. When the  Compat bit is set, all system devices should pass the request downstream through  the "compatibility &lt;a name="idd1e12239"&gt;&lt;/a&gt;chain" to the subtractive decoder.  Only the subtractive decoder may claim these transactions. The Compat bit is  reserved and must not be set for upstream requests or &lt;a name="idd1e12243"&gt;&lt;/a&gt;configuration cycles.&lt;/p&gt;&lt;br /&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;/p&gt;&lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e12766"&gt;&lt;/a&gt;Flush Requests: Transaction  Limits&lt;/h5&gt; &lt;p class="docText"&gt;The Flush request is a tool used to manage posted writes headed  toward host memory. Two important limitations of the Flush request are:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;If the posted writes target memory other than host memory (e.g.  peer-to-peer transfers), then the flush request and response only guarantee that  the posted writes have reached the destination host bridge, not the ultimate  target. After the host bridge re-issues all peer-to-peer requests downstream  towards the intended targets, it sends the target done response back to the  original requestor; it is entirely possible the flush response (target done)  will reach the original requestor before the request is seen at the  target.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Flushes have no impact on the isochronous virtual channels. If  isochronous flow control is not enabled on a link, then packets which do have  the Isoc bit set actually travel in the normal virtual channels and &lt;span class="docEmphUl"&gt;will be&lt;/span&gt; affected by Flush  requests.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch04lev3sec14"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e12792"&gt;&lt;/a&gt;Fence Requests&lt;/h5&gt; &lt;p class="docText"&gt;Another tool in the management of posted write transactions is  the HyperTransport Fence command. The main features of the Fence request  are:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;A Fence request provides a barrier between posted writes which  applies to all UnitID's (transaction streams). This is different from the Flush  which is specific to the posted writes associated with a single transaction  stream. When the Fence is decoded by the bridge, it sends any previously posted  writes in its buffers toward memory. As always, ordering is maintained for  posted writes within individual single transaction streams, but no particular  ordering is required for different streams.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The Fence request travels in the posted virtual channel,  meaning that there is no response expected or sent.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;br /&gt;&lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e13017"&gt;&lt;/a&gt;Fence Requests: Transaction  Limits&lt;/h5&gt; &lt;p class="docText"&gt;The Fence request is a tool used to manage posted writes headed  toward host memory from all transaction streams. Limitations of the Fence  request include:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Fence requests are issued from a device to a host bridge, or  from one host bridge to another. While a tunnel forwards fence requests it sees,  tunnels and single-link cave devices are never the target of a fence request and  are never required to perform the fence function internally.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Fences have no impact on the isochronous virtual channels. If  isochronous flow control is not enabled, then other packets which do have the  Isoc bit set actually travel in the normal virtual channels and &lt;span class="docEmphUl"&gt;will be&lt;/span&gt; affected by fence requests.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;If a fence request is seen by an end-of-chain device, it  decodes the transaction and drops it. It may optionally choose to log the event  as an end-of-chain error.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch04lev3sec16"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e13047"&gt;&lt;/a&gt;Atomic Read-Modify-Write  Requests&lt;/h5&gt; &lt;p class="docText"&gt;While sized read and sized write requests can handle most  general purpose HyperTransport data transfers, there are times when a combined,  or atomic, read/write command is needed.&lt;/p&gt;&lt;a name="ch04lev3sec17"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Two Problems In Shared Memory Schemes&lt;/h5&gt; &lt;p class="docText"&gt;Two problems related to shared memory schemes include:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;A memory location may be used for storing a "semaphore" to be  checked by multiple devices (e.g. CPUs or I/O masters) before using a shared  system resource. If the contents of the semaphore location indicate the resource  is available, the device which reads it then over-writes the semaphore value to  indicate the resource is now busy. If another agent reads the semaphore location  and sees it is busy, it must wait until the agent using it clears the semaphore  location, thus indicating it is again free. The problem arises when a sharing  agent has read the semaphore and found the device is not busy. Before it  over-writes the data value to claim the resource, another agent reads the  semaphore location and also concludes the device is not busy. Now there is a  race condition which can result in both devices attempting to over-write the  semaphore and use the resource.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The second problem is simpler. If a shared memory location is  being used as an accumulator, agents will periodically read the current value,  add a constant to it, and write the result back. Again, there is a hazard that  the location will be read by one agent and before it can modify it and write it  back, another agent may read it with a similar intention. In this case, one of  the addends may be lost from the sum.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt; &lt;p class="docText"&gt;Most modern bus protocols that support shared memory include a  mechanism to avoid the conditions just described. HyperTransport uses the Atomic  Read-Modify-Write request for this purpose. The purpose of the Atomic RMW is to  force a one-qword (8 byte) memory location to remain "locked" for the duration  of the read/modify/write operation required to check and change the targeted  location. No other agent is allowed to access the address carried by the Atomic  RMW request packet until the entire transaction completes. It is the  responsibility of the bridge managing the memory to enforce the locking  mechanism.&lt;/p&gt; &lt;p class="docText"&gt;As a transaction, the Atomic RMW behaves like non-posted write  that generates a read response. The read response is accompanied by a single  qword of data — the value read from the targeted memory location before any  changes are made.&lt;/p&gt;&lt;a name="ch04lev4sec10"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Atomic RMW Variants&lt;/h5&gt; &lt;p class="docText"&gt;The Atomic Read-Modify-Write request has two variants that are  designed to address the two cases just described.&lt;/p&gt;&lt;a name="ch04lev4sec11"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;&lt;a name="idd1e13103"&gt;&lt;/a&gt;Compare And Swap&lt;/h5&gt; &lt;p class="docText"&gt;The &lt;span class="docEmphasis"&gt;Compare and Swap&lt;/span&gt; variant of  the Atomic RMW sends two qwords of data with the request. One qword (the &lt;span class="docEmphasis"&gt;compare&lt;/span&gt; value) is to be checked against the current  value in memory; the other qword (the &lt;span class="docEmphasis"&gt;input&lt;/span&gt;  value) is the data to be written to the memory location if the compare value is  equal to the current value. If the compare value is &lt;span class="docEmphUl"&gt;not&lt;/span&gt; equal to the current value, the input value is not  written to memory. In either case, a read response will be returned accompanied  by the original qword read from memory.&lt;/p&gt;&lt;a name="ch04lev4sec12"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;&lt;a name="idd1e13129"&gt;&lt;/a&gt;Fetch And Add&lt;/h5&gt; &lt;p class="docText"&gt;The &lt;span class="docEmphasis"&gt;Fetch and Add&lt;/span&gt; variant of  Atomic RMW sends a single qword (the &lt;span class="docEmphasis"&gt;input&lt;/span&gt; value)  of data with the request. When the Atomic RMW reaches the bridge to main memory,  the bridge unconditionally reads the current value from memory, adds the input  value to it, and writes the result back to memory. The memory location remains  locked to other transactions while the read-modify-write is in progress. A read  response is then returned to the requestor, accompanied by the original qword  read from memory.&lt;/p&gt;&lt;br /&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;/p&gt;&lt;h5 class="docSection3Title"&gt;Atomic RMW Requests: Transaction Limits&lt;/h5&gt; &lt;p class="docText"&gt;The Atomic RMW request locks a qword memory address block while  a read-modify-write operation is performed. Limitations of the Atomic RMW  request include:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The request transfer size, as indicated in the Mask/Count  field, is restricted to either one or two qwords. Following the request, a read  response returns a single qword of data from memory.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;These transactions are designed to be generated by I/O devices  or bridges, and target system memory. Other than the host bridge, no  HyperTransport devices are expected to support atomic operations. If a target  detects an unsupported RMW, it may return a one qword read response with the  error bit set or perform a non-atomic read-modify-write. The current  HyperTransport Specification does not require peer-to-peer reflection of Atomic  RMW.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch04lev2sec9"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Control Packets: Responses&lt;/h4&gt; &lt;p class="docText"&gt;There are two response types used in HyperTransport: Read  Response and &lt;a name="idd1e13537"&gt;&lt;/a&gt;Target Done. Responses are returned by  target devices following a &lt;a name="idd1e13541"&gt;&lt;/a&gt;non-posted request, and much  of the response packet field information is extracted from the requests that  caused them. Because responses are routed back to the original requestor either  implicitly or based on &lt;a name="idd1e13545"&gt;&lt;/a&gt;UnitID&lt;a name="idd1e13549"&gt;&lt;/a&gt;,  they don't require a 40 bit address field like requests do. All response packets  are four bytes.&lt;/p&gt;&lt;a name="ch04lev3sec19"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e13556"&gt;&lt;/a&gt;Read Responses&lt;/h5&gt; &lt;p class="docText"&gt;The four-byte read response is returned when data requests are  made, including RdSized and Atomic RMW requests. All HyperTransport read  transactions are non-posted and split; this means that data is never returned  immediately as it generally is on buses such as PCI. The advantage of split  reads is that the latency involved, in waiting for a target to access its  internal memory before returning read data, can be minimized by sending the  request, releasing the bus, and waiting for the target to initiate the return of  data when it has it.&lt;/p&gt; &lt;p class="docText"&gt;In HyperTransport, the read response is used by the target to  indicate the return of previously requested data. The read response immediately  precedes the data, and contains the following general information:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;The response packet type.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Whether the response should travel in the standard or  isochronous virtual channel.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;UnitID&lt;a name="idd1e13586"&gt;&lt;/a&gt;&lt;a name="idd1e13589"&gt;&lt;/a&gt; which acts  as an address for responses.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;A direction bit indicating whether the response is moving  upstream or downstream.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Whether relaxed ordering may be used for this response relative  to posted writes moving in the same stream.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Error bits indicating whether or not the returning data can be  considered valid; if it is invalid, error bits indicate whether the error  occurred at the target or if the request inadvertently reached an end-of-chain  device.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-357822184251784449?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/357822184251784449/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=357822184251784449' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/357822184251784449'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/357822184251784449'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/transactions-in-ht-technology.html' title='Transactions in HT Technology'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-3300836754676051734</id><published>2007-06-26T21:37:00.000-07:00</published><updated>2007-06-26T21:38:02.788-07:00</updated><title type='text'>Sized Read And Sized Write Requests</title><content type='html'>&lt;p class="docText"&gt;The eight-byte &lt;a name="idd1e11593"&gt;&lt;/a&gt;sized read and &lt;a name="idd1e11597"&gt;&lt;/a&gt;sized write packets (abbreviated &lt;span class="docEmphasis"&gt;RdSized&lt;/span&gt; and &lt;span class="docEmphasis"&gt;WrSized&lt;/span&gt; in  the Specification) are the mainstream commands used to perform most of the data  transfers to both memory or I/O in HyperTransport. Some of the options available  with sized read and write requests are:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Byte or dword read/write data transfers; valid data transferred  ranges from 0 bytes to 64 bytes (16 dwords).&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Posted or non-posted virtual channel for writes. Reads are  always split transactions traveling in the non-posted virtual channel.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Isochronous posted or non-posted virtual channels for the  request and any subsequent response. &lt;a name="idd1e11621"&gt;&lt;/a&gt;Isochronous &lt;a name="idd1e11625"&gt;&lt;/a&gt;flow control buffers are required to support this  traffic.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Coherency option bit which indicates whether the transaction  requires enforcement of host cache coherency. If the transaction does not target  host memory, this feature does not apply.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Assignment of a non-zero &lt;a name="idd1e11637"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;Sequence ID&lt;/span&gt; attribute to requests forces other devices  to maintain strict ordering for all requests from same source. A Sequence ID of  0 indicates that there is no strict ordering required.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Use of reserved ranges in RdSized and WrSized request packet  address fields to support special-case transactions, including &lt;a name="idd1e11647"&gt;&lt;/a&gt;configuration cycles, interrupt requests, and  End-Of-Interrupt (EOI) messages, etc.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-3300836754676051734?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/3300836754676051734/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=3300836754676051734' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/3300836754676051734'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/3300836754676051734'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/sized-read-and-sized-write-requests.html' title='Sized Read And Sized Write Requests'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-1006758444179881505</id><published>2007-06-26T21:35:00.000-07:00</published><updated>2007-06-26T21:37:42.614-07:00</updated><title type='text'>The Packet-Based Protocol in HT Technology</title><content type='html'>&lt;h3 class="docSection1Title"&gt;The Packet-Based Protocol&lt;/h3&gt; &lt;p class="docText"&gt;HyperTransport employs a packet-based protocol in which all  information —address, commands, and data — travel in packets which are multiples  of four bytes each. Packets are used in link management (e.g. flow control and  &lt;a name="idd1e10159"&gt;&lt;/a&gt;error reporting) and as building blocks in constructing  more complex transactions such as read and write data transfers.&lt;/p&gt; &lt;p class="docText"&gt;It should be noted that, while packet descriptions in this  chapter are in terms of bytes, the link's bidirectional interface width (2, 4,  8, 16, or 32 bits) ultimately determines the amount of packet information sent  during each &lt;span class="docEmphasis"&gt;bit time&lt;/span&gt; on HyperTransport links.  There are two bit times per clock period.&lt;/p&gt; &lt;p class="docText"&gt;Before looking at packet function and use, the following  sections describe the mechanics of packet delivery over 2,4,8,16, and 32 bit  scalable link interfaces.&lt;/p&gt;&lt;a name="ch04lev2sec1"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;8 Bit Interfaces&lt;/h4&gt; &lt;p class="docText"&gt;For 8-bit interfaces, one byte of packet information may be  sent in each bit time. For example, a 4-byte request packet would be sent by the  transmitter during four adjacent bit times, least significant byte first as  shown. Total  time to complete a four-byte packet is two clock periods.&lt;/p&gt;&lt;br /&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;/p&gt;&lt;h4 class="docSection2Title"&gt;Interfaces Wider Than 8 Bits&lt;/h4&gt; &lt;p class="docText"&gt;For 16 or 32 bit interfaces, packet delivery is accelerated by  sending multiple bytes of packet information in parallel with each other.&lt;/p&gt;&lt;h3 class="docSection1Title"&gt;The Two Packet Types: Control And Data&lt;/h3&gt; &lt;p class="docText"&gt;Packets moving across links fall into two groups: control  packets and data packets. Control packet types are further divided into three  additional classes: Information, Request, and Response.&lt;/p&gt;&lt;a name="ch04lev2sec4"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e10275"&gt;&lt;/a&gt;Control Packet Purpose&lt;/h4&gt; &lt;p class="docText"&gt;The three classes of control packets serve the following  purposes on a HyperTransport link:&lt;/p&gt;&lt;a name="ch04lev3sec3"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e10285"&gt;&lt;/a&gt;Information packets&lt;/h5&gt; &lt;p class="docText"&gt;Information packets are always 4 bytes each. They are used for  nearest neighbor communication between the transmitter-receiver pairs on each  link; communication between these nodes is necessary for dynamic &lt;a name="idd1e10292"&gt;&lt;/a&gt;flow control updates and other miscellaneous functions.  Information packets are not buffered internally or subject to flow control; when  sent by a transmitter they &lt;span class="docEmphUl"&gt;must&lt;/span&gt; be accepted by the  receiver.&lt;/p&gt;&lt;a name="ch04lev3sec4"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e10303"&gt;&lt;/a&gt;Request packets&lt;/h5&gt; &lt;p class="docText"&gt;Requests are 4 bytes in length if there is no address field, or  8 bytes if the packet does include an address field. They may be either posted  or non-posted, and the basic job of a request is to define a pending data or  message transaction, or to help bridges manage posted write transactions  (through the use of Flush and Fence commands). These packets originate at a  source device and are accepted by a target device.&lt;/p&gt; &lt;p class="docText"&gt;Devices in the path between the source and target forward  requests along, subject to HyperTransport rules for ordering.&lt;/p&gt;&lt;a name="ch04lev3sec5"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e10317"&gt;&lt;/a&gt;Response packets&lt;/h5&gt; &lt;p class="docText"&gt;Responses are always 4 bytes each. They are returned by the  target after it has serviced a &lt;a name="idd1e10324"&gt;&lt;/a&gt;non-posted request.  Devices in the path between the response sender and the original requester  forward responses along, subject to HyperTransport rules for ordering.&lt;/p&gt; &lt;p class="docText"&gt;When associated with a non-posted write or &lt;a name="idd1e10331"&gt;&lt;/a&gt;flush request, the &lt;a name="idd1e10335"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;target done&lt;/span&gt; response packet acts as a confirmation  (returned to the source device) that the operation has completed. In the event  of a problem delivering non-posted write data or completing the flush, the  response packet will contain an error flag and a bit indicating whether the  target done response is being returned by the intended target OR by another  device acting on its behalf (e.g. end-of-chain device).&lt;/p&gt; &lt;p class="docText"&gt;For read transactions, which are always split in  HyperTransport, the &lt;a name="idd1e10344"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;read&lt;/span&gt;  response packet precedes the returning data and identifies the specific read  request being serviced. In the event of an error when fetching the data, the  read response will contain an error flag and a bit indicating whether the  problem occurred at the intended target or at an end-of-chain device acting on  its behalf. If there is an error, all data is driven back as FFh by either the  target or the end-of-chain device.&lt;/p&gt;&lt;a name="ch04lev2sec5"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e10355"&gt;&lt;/a&gt;Data Packets&lt;/h4&gt; &lt;p class="docText"&gt;While there is only one type of data packet, consisting of 1-16  Dwords, the payload of valid information within a data packet ranges from 0-64  valid bytes--depending on the attributes of the request that caused it. The  appropriate time to send a data packet also depends on the request/response  associated with it:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;For &lt;a name="idd1e10368"&gt;&lt;/a&gt;write requests, the data packet is  sent immediately after the request. Because there is no routing information in a  data packet, the request is used to deliver the data to the intended  target.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;For &lt;a name="idd1e10376"&gt;&lt;/a&gt;read requests, the data packet  immediately follows the read response. Key fields in the response are filled in  with requester transaction stream information provided in the read request (e.g.  UnitID and Source Tag). The response is then used to route the read data packet  back to the original requester.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;&lt;a name="idd1e10383"&gt;&lt;/a&gt;Atomic read-modify-write requests are a  hybrid. A data packet is sent with the request (as in a write transaction) and  another data packet is returned following the read response (as in a read  transaction).&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Finally, some requests don't have data packets at all (e.g.  Flush and Fence).&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;br /&gt;&lt;h3 class="docSection1Title"&gt;The Need To &lt;a name="idd1e10399"&gt;&lt;/a&gt;Interleave Control  And &lt;a name="idd1e10403"&gt;&lt;/a&gt;Data Packets&lt;/h3&gt; &lt;p class="docText"&gt;An important feature of HyperTransport packet management is  that a transmitter may interleave control packets with data packets associated  with earlier requests. Interleaving control packets with data helps mitigate  "stalls" in sending new control packets on the multiplexed CAD bus when large  data transfers are in progress. An example of such as stall is as follows:&lt;/p&gt;&lt;a name="ch04pr01"&gt;&lt;/a&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList"&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="1"&gt; &lt;p class="docText"&gt;A transmitter starts a Sized &lt;a name="idd1e10419"&gt;&lt;/a&gt;Dword Write  of 64 bytes (16 dwords) on a 2-bit HyperTransport link interface.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="2"&gt; &lt;p class="docText"&gt;After the write transaction commences, the transmitter realizes  it needs to send a read request or NOP information packet over the  bus.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;" value="3"&gt; &lt;p class="docText"&gt;Without the ability to interleave control packets, the  transmitter would have to send the entire data payload first (64 bytes x 4 bit  times/byte = 256 bit times). This represents a worst-case latency of 128 clocks  to start sending the new control packet.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt; &lt;p class="docText"&gt;To avoid such situations, HyperTransport allows a transmitter  to insert new control packets into a data payload on four byte boundaries, as  long as the control packets do not have any immediate data of their own. For  example, read requests and NOP flow control packets are candidates for  interleaving; write requests would not be candidates for interleaving because  they &lt;span class="docEmphUl"&gt;are&lt;/span&gt; accompanied by immediate data.&lt;/p&gt;&lt;a name="ch04lev2sec6"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;The &lt;a name="idd1e10449"&gt;&lt;/a&gt;CTL Signal Indicates  Packet Type&lt;/h4&gt; &lt;p class="docText"&gt;A transmitter uses the CTL signal on a HyperTransport link  interface to indicate the presence of control vs. data packets it is sending  concurrently on the CAD bus. When CTL is asserted (high), a control packet is in  transit on the CAD bus; when CTL is deasserted (low), a data packet is being  sent. During idle periods, CTL is asserted and control information &lt;a name="idd1e10462"&gt;&lt;/a&gt;NOP packets are sent.&lt;/p&gt; &lt;p class="docText"&gt;When interleaving control packets and data packets on a link,  the transmitter is required to observe the following rules as it asserts and  deasserts the CTL signal:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;CTL is always asserted and deasserted on four byte  boundaries.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The only time CTL is deasserted is when a data packet  associated with an earlier control packet (e.g., request or response packet) is  being sent.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;CTL is asserted during all bit times of a control packet; for  control packets which are either 4 or 8 bytes, the packet must be sent in its  entirety without deasserting CTL (there is no interleaving within control  packets). This also means that &lt;a name="idd1e10483"&gt;&lt;/a&gt;flow control must assure  that transmitters never start sending a control packet if the receiver lacks  sufficient buffer space to accept all bytes at full speed. Changes in flow  control buffer availability are reported by means of NOP packets.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;CTL is deasserted through all bit times of data  packets.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Re-assertion of CTL within data packets is permitted on four  byte boundaries if the transmitter decides to interleave a new control packet,  providing it does not have immediate data of its own. After the control packet  is sent, CTL is again deasserted and the current data packet transfer  resumes.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Only one data packet may be in progress at a time, although it  may be paused for the interleaving of control packet(s).&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Ordering of control packets is not affected by the fact that  data packets may be paused to interleave them.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;The bit time immediately following the end of a data packet is  always the start of a control packet, and CTL must be  asserted.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;For each packet variant, the &lt;a name="idd1e10553"&gt;&lt;/a&gt;virtual  channel (VChan) is indicated in the second column: posted, non-posted, or  response. Note: information packets do not travel in any of the virtual channels  and are not subject to flow control.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;The first byte in each control packet type contains a 6-bit &lt;a name="idd1e10564"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;Command (CMD) Code.&lt;/span&gt; By  sending this information at the beginning of a control packet, the receiver is  informed immediately of the type of packet being transferred, the number of  bytes to expect, and the format of the bit fields contained within. &lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;In some Command Codes, a number of bits are variables  (indicated by ".xxx") which are used to select transaction options: dword vs.  byte transfer count, isochronous flag, coherency requirement, etc.; &lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-1006758444179881505?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/1006758444179881505/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=1006758444179881505' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/1006758444179881505'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/1006758444179881505'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/packet-based-protocol-in-ht-technology.html' title='The Packet-Based Protocol in HT Technology'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-3236525257071187314</id><published>2007-06-26T21:34:00.000-07:00</published><updated>2007-06-26T21:35:45.480-07:00</updated><title type='text'>The High Speed Signals in Hypertransport Technology</title><content type='html'>&lt;h3 class="docSection1Title"&gt;The &lt;a name="idd1e9773"&gt;&lt;/a&gt;High Speed Signals (One Set  In Each Direction)&lt;/h3&gt; &lt;p class="docText"&gt;Each high-speed signal is actually a differential signal pair.  CAD (Command/Address/Data) information consists of the two basic types of  HyperTransport packets: control and data. When a link transmitter sends packets  on the CAD bus, the receive side of the interface uses the CLK and CTL signals,  also supplied by the transmitter, to latch in packet information during each bit  time. CTL distinguishes control packets from data packets.&lt;/p&gt;&lt;a name="ch03lev2sec1"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;The &lt;a name="idd1e9790"&gt;&lt;/a&gt;CAD Signal Group&lt;/h4&gt; &lt;p class="docText"&gt;The CAD bus is always driven by the transmitter side of a link,  and is comprised of signal pairs that carry HyperTransport &lt;span class="docEmphasis"&gt;requests, responses,&lt;/span&gt; and &lt;span class="docEmphasis"&gt;data.&lt;/span&gt; Each CAD bus may consist of between 2 bits (two  differential signal pairs) and 32 bits (thirty-two differential signal pairs).  The HyperTransport specification permits the CAD bus width to be different  (asymmetrical) for the two directions. To enable the corresponding receiver to  make a distinction as to the type of information currently being sent over the  CAD bus, the transmitter also drives the &lt;a name="idd1e9817"&gt;&lt;/a&gt;CTL signal (see  the following description).&lt;/p&gt;&lt;a name="ch03lev2sec2"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Control Signal (CTL)&lt;/h4&gt; &lt;p class="docText"&gt;This signal pair is driven by the transmitter to qualify the  information being sent concurrently over the &lt;a name="idd1e9829"&gt;&lt;/a&gt;CAD signals.  If this signal is asserted (high), the transmitter is indicating that it is  sending a &lt;a name="idd1e9835"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;control&lt;/span&gt; packet;  if deasserted, the transmitter is sending a &lt;span class="docEmphasis"&gt;data&lt;/span&gt;  packet. The receiver uses this information when routing incoming CAD information  to appropriate request queues, data buffers, etc. There is one (and only one)  CTL signal for each link direction, regardless of the width of the CAD  bus.&lt;/p&gt;&lt;a name="ch03lev2sec3"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e9854"&gt;&lt;/a&gt;Clock Signal(s) (CLK)&lt;/h4&gt; &lt;p class="docText"&gt;As a source-synchronous connection, each HyperTransport  transmitter sends a differential clock signal along with CAD and CTL signals to  the receiver at the other end of the link. There is one &lt;a name="idd1e9864"&gt;&lt;/a&gt;CLK signal pair for &lt;span class="docEmphUl"&gt;each byte&lt;/span&gt; of  CAD width. While the timing on each clock pair is the same, replicating clocks  help in routing of CAD signal pairs with respect to their clock signals. The  current HyperTransport specification allows clock speeds from 200MHz (default)  to 800MHz.&lt;/p&gt;&lt;br /&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;/p&gt;&lt;h3 class="docSection1Title"&gt;Scaling Hazards: Burden Is On The Transmitter&lt;/h3&gt; &lt;p class="docText"&gt;It is a requirement in HyperTransport that the transmitter side  of each link must be aware of the capabilities of its corresponding receiver and  avoid the double hazard of a scalable bus: running at a faster clock rate than  the receiver can handle &lt;span class="docEmphUl"&gt;or&lt;/span&gt; using a wider data path  than the receiver supports. Because the link is not a shared bus, the  transmitter side of each device is concerned with the capabilities of only one  target.&lt;/p&gt;&lt;br /&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;/p&gt;&lt;h3 class="docSection1Title"&gt;The &lt;a name="idd1e9897"&gt;&lt;/a&gt;Low Speed Signals&lt;/h3&gt;&lt;a name="ch03lev2sec4"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Power OK (&lt;a name="idd1e9905"&gt;&lt;/a&gt;PWROK) And&lt;a name="idd1e9909"&gt;&lt;/a&gt;Reset (&lt;a name="idd1e9913"&gt;&lt;/a&gt;RESET#)&lt;/h4&gt; &lt;p class="docText"&gt;PWROK used with RESET# indicates to HyperTransport devices  whether a &lt;span class="docEmphasis"&gt;Cold or Warm&lt;/span&gt; Reset is in progress.  Which system logic component is responsible for managing the PWROK and RESET#  signals is beyond the scope of the HyperTransport specification, but timing and  use of the signals are defined. The basic use of the signals includes:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;At power up, PWROK is asserted by system logic when it can be  guaranteed that system power and clocks related to HyperTransport are within  proper limits.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;RESET# is asserted by system logic to indicate that a reset is  required. The state of PWROK when RESET# is seen asserted indicates the type of  reset to be performed. PWROK and RESET# both asserted is a warm reset; PWROK  deasserted and RESET# asserted indicates cold reset.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;After initial system power up, reset, and initialization, a  cold or warm reset may also be generated under software control writing  configuration registers in the host bridge.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;The HyperTransport specification describes the actions to be  taken by devices during either type of reset event. &lt;/p&gt;&lt;a name="ch03lev2sec5"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e9953"&gt;&lt;/a&gt;LDTSTOP#&lt;/h4&gt; &lt;p class="docText"&gt;(&lt;span class="docEmphasis"&gt;Note: the signal names LDTSTOP# and  LDTREQ# were carried forward from the earlier name AMD assigned to  HyperTransport technology&lt;/span&gt; — &lt;span class="docEmphBoldItalic"&gt;L&lt;/span&gt;&lt;span class="docEmphasis"&gt;ightning&lt;/span&gt; &lt;span class="docEmphBoldItalic"&gt;D&lt;/span&gt;&lt;span class="docEmphasis"&gt;ata&lt;/span&gt; &lt;span class="docEmphBoldItalic"&gt;T&lt;/span&gt;&lt;span class="docEmphasis"&gt;ransfer&lt;/span&gt;).&lt;/p&gt; &lt;p class="docText"&gt;LDTSTOP# is an input to HyperTransport devices which is  asserted by system logic to enable and disable link activity during power  management state transitions. &lt;span class="docEmphUl"&gt;Support for this signal is  optional for HyperTransport devices.&lt;/span&gt;&lt;/p&gt; &lt;p class="docText"&gt;A transmitter which detects LDTSTOP# asserted finishes sending  any control packet in progress, then commences a disconnect NOP sequence  followed by disabling its output drivers (if so enabled in the transmitter's  Configuration Space &lt;span class="docEmphasis"&gt;Tri-State Enable Bit).&lt;/span&gt; Upon  receipt of the disconnect NOP sequence, the target also turns off its input  receivers (if similarly enabled in it's Configuration Space &lt;span class="docEmphasis"&gt;Tri-State Enable Bit).&lt;/span&gt;&lt;/p&gt; &lt;p class="docText"&gt;Later, when the transmitter detects LDTSTOP# deasserted, it  re-enables its drivers and begins the initialization sequence. A receiver that  responds to LDTSTOP# deasserted turns its input receivers on.&lt;/p&gt;&lt;a name="ch03lev2sec6"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e9996"&gt;&lt;/a&gt;LDTREQ#&lt;/h4&gt; &lt;p class="docText"&gt;LDTREQ# is a wire-or'd output from HyperTransport devices that  is used to request system logic to re-enable links previously disabled using the  LDTSTOP# mechanism. Upon receipt of the LDTREQ# signal from one or more  HyperTransport devices, system logic (typically the South Bridge) deasserts  LDTSTOP# which triggers the sequence described previously. Specifically, the  LDTREQ# signal indicates that a HyperTransport transaction is required somewhere  in a system that is currently in the ACPI C3 state; the system is required to  transition to the C0 state. &lt;span class="docEmphUl"&gt;Support for this signal is  optional for HyperTransport devices.&lt;/span&gt;&lt;/p&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;span class="docEmphUl"&gt;&lt;/span&gt;&lt;/p&gt;&lt;table border="0" cellpadding="0" cellspacing="0" width="100%"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td valign="top"&gt;&lt;a name="ch03lev1sec6"&gt;&lt;/a&gt; &lt;h3 class="docSection1Title"&gt;Where Are The Interrupt, Error, And Wait State  Signals?&lt;/h3&gt; &lt;p class="docText"&gt;The HyperTransport specification eliminates a number of control  signals that are commonly found on other buses. While devices are not prohibited  from implementing signals beyond those defined in the specification,  HyperTransport is a generic, simple interface and handles interrupts, errors,  and data wait states in the following general way:&lt;/p&gt;&lt;a name="ch03lev2sec7"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e10017"&gt;&lt;/a&gt;Interrupt Signaling&lt;/h4&gt; &lt;p class="docText"&gt;Interrupts are conveyed in HyperTransport as messages sent over  the link in the &lt;a name="idd1e10024"&gt;&lt;/a&gt;posted request channel. This eliminates  the need for dedicated interrupt signal traces. Depending on the architecture,  it may also eliminate the need for a separate interrupt controller (e.g.  IOAPIC). &lt;/p&gt;&lt;a name="ch03lev2sec8"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e10039"&gt;&lt;/a&gt;Error Signaling&lt;/h4&gt; &lt;p class="docText"&gt;HyperTransport error handling employs &lt;a name="idd1e10046"&gt;&lt;/a&gt;CRC checking of bit traffic across each link interface. In  the event of an error, there are several possible handling schemes. All of this  is done without any dedicated error signals. &lt;/p&gt;&lt;a name="ch03lev2sec9"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Wait State Signaling&lt;/h4&gt; &lt;p class="docText"&gt;Wait states during transmission of data are a problem on any  bus because they represent wasted time on the part of the devices performing the  transfer and for other devices waiting to perform subsequent transfers. In  HyperTransport, wait state, disconnect, and retry mechanisms used on other buses  are eliminated. This is made possible through a coupon-based &lt;a name="idd1e10067"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;flow control&lt;/span&gt; scheme that  guarantees that no transfer will be started by a transmitter which cannot be  immediately accepted by the corresponding receiver on the other side of the  link. Dynamic flow control information concerning buffer availability is  embedded in NOP packets sent by each device — removing the need for dedicated  transmitter and receiver &lt;span class="docEmphasis"&gt;ready&lt;/span&gt; signals. &lt;/p&gt;&lt;a href="0321168453_"&gt;&lt;img src="FILES/pixel.gif" border="0" height="1" width="1" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-3236525257071187314?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/3236525257071187314/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=3236525257071187314' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/3236525257071187314'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/3236525257071187314'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/high-speed-signals-in-hypertransport.html' title='The High Speed Signals in Hypertransport Technology'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-5973153921512361338</id><published>2007-06-26T21:33:00.000-07:00</published><updated>2007-06-26T21:34:28.028-07:00</updated><title type='text'>Managing the Links of Hypertransport Technology</title><content type='html'>&lt;p class="docText"&gt;This section introduces a collection of miscellaneous topics  that we have labeled Link Management. They include:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;&lt;a name="idd1e9277"&gt;&lt;/a&gt;Flow Control&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Initialization and Reset&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Configuration&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Error Detection and Handling&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;Each of these topics is discussed in the following  sections.&lt;/p&gt;&lt;a name="ch02lev2sec12"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Flow Control&lt;/h4&gt; &lt;p class="docText"&gt;Other than information packets, all packets are transmitted  from a transmitter to a buffer in the receiver. The receiver buffer will  overflow if the transmitter sends too many packets. Flow control ensures that  the transmitter only sends as many packets to the receiver device as buffer  space allows.&lt;/p&gt; &lt;p class="docText"&gt;Information packets are not subject to flow control. They are  not transmitted to buffers within a device. Devices are always ready to accept  information packets (e.g. NOP packets). Only request packets, response packets  and data packets are subject to flow control.&lt;/p&gt; &lt;p class="docText"&gt;Flow control occurs across each link between the source and the  ultimate target device. HyperTransport devices must implement the six types of  buffers listed above as part of its receiver state-machine. A designer  implements buffers of appropriate size to meet bandwidth/performance  requirements. The size of each buffer is conveyed to the transmitter during  initialization, and available space is updated dynamically through NOP  transmission.&lt;/p&gt; &lt;p class="docText"&gt;HyperTransport requires transmitters on each link to accept NOP  packets from receivers at reset indicating virtual channel buffering capacity,  then establish a packet coupon scheme that:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Guarantees no transmitter will send a packet that the receiver  can't accept&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Eliminates the need for inefficient disconnects and retries on  the link.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Requires each receiver to dynamically inform the transmitter  (via NOP packets) as buffer space becomes available.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;With three virtual channels, there are three pairs of buffers  in each receiver to handle request/responses and the data:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Posted Request Buffer&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Posted Request Data Buffer&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Non-Posted Request Buffer&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Non-Posted Request Data Buffer&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Response Buffer&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Response Data Buffer&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;Buffer entries are sized according to what will be contained in  them.&lt;/p&gt; &lt;p class="docText"&gt;If A Device Supports the optional &lt;a name="idd1e9367"&gt;&lt;/a&gt;Isochronous Channel, it must implement additional flow  control buffers to support them. An "ISOC" bit is set in request and response  packets indicating routing. If the "ISOC" bit is set, all link devices that  support it will use these channels; others will pass Isochronous pacekts along  in regular channels.&lt;/p&gt; &lt;p class="docText"&gt;ISOC traffic is exempt from the &lt;a name="idd1e9374"&gt;&lt;/a&gt;fairness  algorithm implemented for non-ISOC traffic, resulting in higher performance. &lt;a name="idd1e9378"&gt;&lt;/a&gt;Isochronous transactions are serviced by devices before  non-isochronous traffic. Theoretically, isochronous traffic may result in  starving non-isochronous traffic. Applications must guarantee that isochronous  bandwidth does not exceed overall available bandwidth.&lt;/p&gt;&lt;a name="ch02lev2sec13"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Initialization and&lt;a name="idd1e9387"&gt;&lt;/a&gt; Reset&lt;/h4&gt; &lt;p class="docText"&gt;HyperTransport defines two classes of reset events:&lt;/p&gt; &lt;p class="docText"&gt;&lt;span class="docEmphStrong"&gt;Cold Reset.&lt;/span&gt; This occurs on  boot and starts when the &lt;a name="idd1e9402"&gt;&lt;/a&gt;PWROK and &lt;a name="idd1e9406"&gt;&lt;/a&gt;RESET# signals are both seen low. When this happens:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;All devices and links return to default inactive state&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Previously assigned UnitID numbers are "forgotten" and all  return to default UnitID of 0.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;All Configuration Space registers return to default state&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;All &lt;a name="idd1e9428"&gt;&lt;/a&gt;error bits and dynamic status bits  are cleared&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;&lt;span class="docEmphStrong"&gt;Warm Reset.&lt;/span&gt; This occurs when  PWROK is high and RESET is seen low.&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;All devices and links return to default inactive state&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Previously assigned UnitID numbers are "forgotten", and all  return to default UnitID of 0.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;All Configuration Space registers defined as &lt;span class="docEmphasis"&gt;persistent&lt;/span&gt; retain previous values. The same is true for  Status and error bits defined as persistent.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;All other error bits and dynamic status bits are  cleared&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;Because HyperTransport supports scalable link width and clock  speed, a set of default minimum link capabilities are in effect following cold  reset.&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Initial link width is conveyed when both devices sample CAD  signal inputs from the other at the end of reset. Initial link clock speed is  200MHz.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Later, Configuration of devices allows optimizing CAD width and  clock speeds for each link.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Refer to the core topic section on Reset and Initialization for  details on this process.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;It is a motherboard's responsibility to tie upper CAD inputs to  0 if a device receiver is attached to a narrower transmitter CAD  interface.&lt;/p&gt;&lt;a name="ch02lev2sec14"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Configuration&lt;/h4&gt; &lt;p class="docText"&gt;At boot time, PCI configuration is used to set-up  HyperTransport devices:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Read in configuration information about device requirements and  capabilities.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Program the device with address range, error handling policy,  etc.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;Basic configuration of a device is similar to that of PCI  devices; however, specific HyperTransport-specific features are handled via the  &lt;span class="docEmphasis"&gt;advanced &lt;a name="idd1e9530"&gt;&lt;/a&gt;capability  registers&lt;/span&gt;.&lt;/p&gt;&lt;a name="ch02lev2sec15"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Error Detection and Handling&lt;/h4&gt; &lt;p class="docText"&gt;HyperTransport defines required and optional error detection  and handling. Key areas of error handling:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Cycle Redundancy Check (&lt;a name="idd1e9550"&gt;&lt;/a&gt;CRC) generation  and checking on each link.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Protocol (violation) errors&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;&lt;a name="idd1e9563"&gt;&lt;/a&gt;Receive buffer overflow errors&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;End-Of-Chain errors&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Chain Down errors&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;&lt;a name="idd1e9578"&gt;&lt;/a&gt;Response errors&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;p class="docText"&gt;Signals on each HyperTransport link fall into two groups: high  speed signals associated with the sending and receiving of control and data  packets, and miscellaneous low-speed signals required for such things as reset  and power management. Whereas the low speed signals are not scalable and employ  conventional low voltage CMOS signalling, the high speed signal group is  scalable in terms of both bus width and clock rate, and each signal is actually  a low-voltage differential signal pair.&lt;/p&gt; &lt;p class="docText"&gt;While device pin count varies with scaling, signal group  functions remain the same; the only real difference in signaling over a 32-bit  link vs. a 2-bit link is the number of bit times required to shift information  onto the bus.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-5973153921512361338?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/5973153921512361338/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=5973153921512361338' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/5973153921512361338'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/5973153921512361338'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/managing-links-of-hypertransport.html' title='Managing the Links of Hypertransport Technology'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-563065288468971520</id><published>2007-06-26T21:32:00.001-07:00</published><updated>2007-06-26T21:33:45.763-07:00</updated><title type='text'>I/O Streams in Hypertransport Technology</title><content type='html'>&lt;p class="docText"&gt;In addition to virtual channels, HyperTransport also defines  I/O streams. An I/O stream consists of the requests, responses, and data  associated with a particular UnitID and HyperTransport link. Ordering rules  require that I/O streams be treated independently from each other. When a  request/response packet is sent, it is tagged with sender attributes (UnitID, &lt;a name="idd1e8636"&gt;&lt;/a&gt;Source Tag, and &lt;a name="idd1e8640"&gt;&lt;/a&gt;Sequence ID) that are  used by other devices to identify the &lt;a name="idd1e8644"&gt;&lt;/a&gt;transaction stream  in use, and the required ordering within it. Entries within the virtual channel  buffers include the transaction stream identifiers (attributes).&lt;/p&gt; &lt;p class="docText"&gt;Used properly, the independent I/O streams create the effect of  separate connections between devices and the &lt;a name="idd1e8651"&gt;&lt;/a&gt;host bridge  above them — much as a shared bus connection appears.&lt;/p&gt;&lt;a name="ch02lev2sec10"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Transactions (Requests, Responses, and Data)&lt;/h4&gt; &lt;p class="docText"&gt;Transfers initiated by HT devices require one or more  transactions to complete. These devices may need to perform a variety of  operations that include:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;sending or forwarding data (write)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;requesting that a target return data to it (read)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;performing an &lt;span class="docEmphasis"&gt;atomic&lt;/span&gt;  read/modify/write operation&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;wanting additional control over ordering of its posted  transactions (using Flush and Fence commands)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;wanting to broadcast a message to all downstream agents (done  by bridges only)&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;The format of these transactions also vary depending on the  type of operation (request) specified as listed below:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Requests that behave like reads and that require a read  response and data (i.e., Sized Read, Atomic RMW)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Requests that behave like writes, and require a &lt;a name="idd1e8710"&gt;&lt;/a&gt;target done response to confirm completion (i.e. Non-posted  Sized Writes)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;&lt;a name="idd1e8717"&gt;&lt;/a&gt;Posted Requests that behave like writes  but don't require any target response or data. (i.e. Posted Sized Writes,  Broadcast Message, or Fence)&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a name="ch02lev3sec9"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e8726"&gt;&lt;/a&gt;Transaction Requests&lt;/h5&gt; &lt;p class="docText"&gt;Every transaction begins with the transmission of a Request  Packet. Note that the actual format of a request packet varies depending on the  particular request, but in general each request contains the following  information:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Target address within HyperTransport memory space&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;The request type (command)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Sender's transaction stream ID (UnitID, SeqID)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;The amount of data to be transferred (if any)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Other attributes: virtual channel to use, etc.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;HT defines seven basic request types. The characteristics of  each request type is discussed in the following sections.&lt;/p&gt;&lt;a name="ch02lev3sec10"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e8766"&gt;&lt;/a&gt;Transaction Responses&lt;/h5&gt; &lt;p class="docText"&gt;Responses are generated by the target device in cases where  data is to be returned from the target device, or when confirmation of  transaction completion is required. Specifically, in HyperTransport, a response  follows all non-posted requests. A target responds to:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Return data to satisfy an earlier read or Atomic Read-Modify  Write (RMW) request&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Confirm the arrival of non-posted write data&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Confirm the completion of a Flush operation&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Report errors&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;The information in a response varies both with the Request that  causes it, and with the direction the response is traveling in the  HyperTransport fabric. However, content of an HT response generally  includes:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Response type (command)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Response direction (upstream or downstream)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Transaction stream (UnitID, Source Tag)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Misc. info: virtual channel to use, error, etc.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a name="ch02lev2sec11"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e8827"&gt;&lt;/a&gt;Transaction Types&lt;/h4&gt; &lt;p class="docText"&gt;As discussed earlier, HT defines seven basic transaction types.  This section introduces the characteristics of each type and defines any  sub-types that exist.&lt;/p&gt;&lt;a name="ch02lev3sec11"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e8840"&gt;&lt;/a&gt;Sized Read Transactions&lt;/h5&gt; &lt;p class="docText"&gt;Sized Read transactions permit remote access to a device memory  or memory-mapped I/O (MMIO) address space. The operation may be initiated on HT  from the host bridge (&lt;a name="idd1e8853"&gt;&lt;/a&gt;PIO operation), or an HT device may  wish to read data from memory (DMA operation) or from another HT device  (peer-to-peer operation). Two types of Sized Read transactions define the  different quantities of data to be read.&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;Sized (Byte) Read —&lt;/span&gt; this  request defines an aligned 4 byte block of address space from which 0 to 4 bytes  can be read. Any single byte location or any group of bytes within the 4 byte  block can be accessed. The typical use of this transaction is for reading MMIO  registers.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;Sized (DW) Read —&lt;/span&gt; this request  identifies an aligned 64 byte block of address space from which 4-64 bytes can  be read. Any continuous group of aligned 4 byte groups (DWs) can be  accessed.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;p class="docText"&gt;The basic rules for maintaining high performance of HT reads  include:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;For reads, the requester won't issue the request until it has  buffers available to receive all requested data without wait states.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;The requester won't issue the request until it knows the target  has room in its transaction queue to accept it (&lt;a name="idd1e8921"&gt;&lt;/a&gt;Flow  Control)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Upon receiving the read request, the target won't issue the  read response until it has all requested data and status available to send. Once  it starts the response, there will be no wait states until the read response  packet and all data (up to 16 dwords) have been sent.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Upon receiving the response, the requester will check the &lt;a name="idd1e8933"&gt;&lt;/a&gt;error bits to make certain the data is valid.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;The target and any bridges in the path de-allocate buffers and  queue entries as soon as the response has been sent.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a name="ch02lev3sec12"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e8947"&gt;&lt;/a&gt;Sized Write Transactions&lt;/h5&gt; &lt;p class="docText"&gt;Sized Write transactions permit the host bridge (PIO operation)  to send data to a HyperTransport device, or permits a HyperTransport device to  send data to memory (DMA operation) or to another device (Peer-to-peer  operation). Two types of Sized Write requests permit different sizes of memory  or MMIO space to be accessed.&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;Sized (Byte) Write —&lt;/span&gt;&lt;a name="idd1e8968"&gt;&lt;/a&gt; this request identifies an aligned block of 32 bytes of  address space into which data is to be written. The amount of data to be written  can be from 0 to 32 bytes. Note that the maximum transfer size of 32 bytes only  occurs if the start address is 32 byte aligned. If the start address is not on a  32-byte boundary, the transfer will be less than 32 bytes. Furthermore, no &lt;a name="idd1e8974"&gt;&lt;/a&gt;Byte Write transaction crosses a 32 byte address boundary.  Any combination of bytes (need not be contiguous) can be written from the start  address to the next aligned 32 byte block of address space.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;Sized (DW) Write —&lt;/span&gt; this request  identifies an aligned block of 64 bytes of address space into which data can be  written. The start address must be aligned on 4-byte boundaries, and data to be  written is always aligned in 4- byte contiguous groups (DWs). The amount of data  written can be from 1 to 16 DW increments.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e9085"&gt;&lt;/a&gt;Flush&lt;/h5&gt; &lt;p class="docText"&gt;Flush is useful in cases where a device must be certain that  its &lt;a name="idd1e9092"&gt;&lt;/a&gt;posted writes are "visible" in host memory before it  takes subsequent action. Flush is an upstream, non-posted "dummy" read command  that pushes all &lt;a name="idd1e9096"&gt;&lt;/a&gt;posted requests ahead of it to memory.  Note that only previously posted writes within the same transaction stream as  Flush transaction need be flushed to memory. When an intermediate bridge  receives a Flush transaction, it generates one or more Sized Write transactions  necessary to forward all data in its upstream posted-write buffer toward the  host bridge. Ultimately, the host bridge receives the command and flushes the  previously-posted writes to memory. Receipt of the read response from the host  bridge is confirmation that the flush operation has completed.   &lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e9125"&gt;&lt;/a&gt;Fence&lt;/h5&gt; &lt;p class="docText"&gt;Fence is designed to provide a barrier between &lt;a name="idd1e9132"&gt;&lt;/a&gt;posted writes, which applies across all UnitIDs and therefore  across all I/O streams and all virtual channels. Thus, the fence command is  global because it applies to all I/O streams. The Fence command goes in the &lt;a name="idd1e9136"&gt;&lt;/a&gt;posted request virtual channel and has no response. The  behavior of a Fence is as follows:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;The &lt;a name="idd1e9146"&gt;&lt;/a&gt;PassPW bit must be clear so that the  Fence pushes all requests in the posted channel ahead of it.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Packets with their PassPW bit &lt;span class="docEmphStrong"&gt;clear&lt;/span&gt; will not pass a Fence regardless of UnitID.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Packets with their PassPW bit &lt;span class="docEmphStrong"&gt;set&lt;/span&gt; may pass a Fence.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;A nonposted request with PassPW clear will not pass a Fence as  it is forwarded through the chain, but it may do so after it reaches a host  bridge.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;&lt;a name="idd1e9172"&gt;&lt;/a&gt;Fence requests are never issued as part  of an ordered sequence, so their SeqID will always be 0. Fence requests with  PassPW set, or with a nonzero SeqID, are legal, but may have an unpredictable  effect. Fence is only issued from a device to a host bridge or from one host  bridge to another. Devices are never the target of a fence so they do not need  to perform the intended function. If a device at the end of the chain receives a  fence, it must decode it properly to maintain proper operation of the flow  control buffers. The device should then drop it.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-563065288468971520?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/563065288468971520/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=563065288468971520' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/563065288468971520'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/563065288468971520'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/io-streams-in-hypertransport-technology.html' title='I/O Streams in Hypertransport Technology'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-8166980831222043951</id><published>2007-06-26T21:30:00.000-07:00</published><updated>2007-06-26T21:32:26.982-07:00</updated><title type='text'>Packetized Transfers of Hypertransport Technology</title><content type='html'>&lt;h3 class="docSection1Title"&gt;Packetized Transfers&lt;/h3&gt; &lt;p class="docText"&gt;Transactions are constructed out of combinations of various  packet types and carry the commands, address, and data associated with each  transaction. Packets are organized in multiples of 4-byte blocks. If the link  uses data paths that are narrower than 32 bits, successive bit-times are added  to complete the packet transfer on an aligned 4-byte boundary. The primary  packet types include:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;a name="idd1e8257"&gt;&lt;/a&gt;&lt;span class="docEmphRoman"&gt;Control Packets  —&lt;/span&gt; used to manage various HT features, initiate transactions, and respond  to transactions&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;a name="idd1e8272"&gt;&lt;/a&gt;&lt;span class="docEmphRoman"&gt;Data packets  —&lt;/span&gt; that carry the payload associated with a control packet (maximum  payload is 64 bytes).&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;p class="docText"&gt;For every group of 8 bits (or less) within the CAD path, there  is a &lt;a name="idd1e8306"&gt;&lt;/a&gt;CLK signal. These groups of signals are transmitted  source synchronously with the associated CLK signal. Source synchronous clocking  requires that CLK and its associated group of &lt;a name="idd1e8310"&gt;&lt;/a&gt;CAD signals  must all be routed with equal length traces in order to minimize skew between  the signals.&lt;/p&gt; &lt;a name="ch02lev2sec7"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e8319"&gt;&lt;/a&gt;Control Packets&lt;/h4&gt;  &lt;p class="docText"&gt;Control packets manage various HT features, initiate  transactions, and respond to transactions as listed below:&lt;/p&gt;  &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Information packets&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Request packets&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Response packets&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;a name="ch02lev3sec4"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e8348"&gt;&lt;/a&gt;Information packet (4  bytes)&lt;/h5&gt;  &lt;p class="docText"&gt;Information packets are exchanged between the two devices on a  link. They are used by the two devices to synchronize the link, convey a serious  error condition using the Sync Flood mechanism, and to update &lt;a name="idd1e8355"&gt;&lt;/a&gt;flow control buffer availability dynamically (using tags in  &lt;a name="idd1e8359"&gt;&lt;/a&gt;NOP packets). The information packets are:&lt;/p&gt;  &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;NOP&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Sync/Error&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;a name="ch02lev3sec5"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e8379"&gt;&lt;/a&gt;Request packet (4 or 8  bytes)&lt;/h5&gt;  &lt;p class="docText"&gt;Request packets initiate HT transactions and special functions.  The request packets include:&lt;/p&gt;  &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Sized Write (Posted)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Broadcast Message&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Sized Write (non-posted)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Sized Read&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Flush&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Fence&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Atomic Read-Modify-Write&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;a name="ch02lev3sec6"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e8425"&gt;&lt;/a&gt;Response packet (4 bytes)&lt;/h5&gt;  &lt;p class="docText"&gt;Response packets are used in HT split-transactions to reply to  a previous request. The response may be a &lt;a name="idd1e8435"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;Read Response&lt;/span&gt; with data, or simply a &lt;a name="idd1e8441"&gt;&lt;/a&gt;&lt;span class="docEmphasis"&gt;Target Done Response&lt;/span&gt;  confirming a non-posted write has reached its destination.&lt;/p&gt; &lt;a name="ch02lev2sec8"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e8452"&gt;&lt;/a&gt;Data Packets&lt;/h4&gt;  &lt;p class="docText"&gt;Some Request/Response command packets have data associated with  them. Data packet structure varies with the command which caused it:&lt;/p&gt;  &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Sized &lt;a name="idd1e8465"&gt;&lt;/a&gt;Dword Read Response or &lt;a name="idd1e8471"&gt;&lt;/a&gt;Write data packets are 1-16 dwords (4-64 bytes)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Sized &lt;a name="idd1e8479"&gt;&lt;/a&gt;Byte &lt;a name="idd1e8483"&gt;&lt;/a&gt;Read  Response data packets are 1 dword (any byte combination valid)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;&lt;a name="idd1e8490"&gt;&lt;/a&gt;Sized Byte Write data packets are 0-32  bytes (any byte combination valid)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Read-Modify-Write.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;h3 class="docSection1Title"&gt;HyperTransport Protocol Concepts&lt;/h3&gt; &lt;a name="ch02lev2sec9"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Channels and Streams&lt;/h4&gt;  &lt;p class="docText"&gt;In HyperTransport, as in other protocols, ordering rules are  needed for read, posted/non-posted write transactions, and responses returning  from earlier requests. In a point-point fabric, all of these occur over the same  link. In addition, transactions from different devices are also merging over the  same links. HyperTransport implements &lt;a name="idd1e8513"&gt;&lt;/a&gt;Virtual Channels and  &lt;a name="idd1e8517"&gt;&lt;/a&gt;I/O Streams to differentiate a device's posted requests,  non-posted requests, and responses from each other and from those originating  from different sources.&lt;/p&gt; &lt;a name="ch02lev3sec7"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Virtual Channels&lt;/h5&gt;  &lt;p class="docText"&gt;HyperTransport defines a set of three required virtual channels  that dictate transaction management and ordering:&lt;/p&gt;  &lt;ul&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;Posted Requests &lt;a name="idd1e8536"&gt;&lt;/a&gt;—&lt;/span&gt; &lt;a name="idd1e8542"&gt;&lt;/a&gt;Posted write transactions  belong to this channel.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;Non-Posted Requests —&lt;/span&gt; Reads,  non-posted writes, and flushes belong to this channel.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;Responses —&lt;/span&gt; &lt;a name="idd1e8563"&gt;&lt;/a&gt;Read responses and &lt;a name="idd1e8567"&gt;&lt;/a&gt;target done packets  belong to this channel.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;  &lt;p class="docText"&gt;An additional set of Posted, Non-Posted and Response virtual  channels is required for &lt;a name="idd1e8576"&gt;&lt;/a&gt;&lt;span class="docEmphBoldItalic"&gt;isochronous&lt;/span&gt; transactions, if supported. This  dedicated set of virtual channels assist in guaranteeing the bandwidth required  of isochronous transactions.&lt;/p&gt;  &lt;p class="docText"&gt;When packets are sent over a link, they are sent in one of the  virtual channels. Attribute bits in the packets tag them as to which channel  they should travel. Each device is responsible for maintaining queues and  buffers for managing the virtual channels and enforcing ordering rules.&lt;/p&gt;  &lt;p class="docText"&gt;Each device implements separate command/data buffers for each  of the 3 required virtual channels.Doing so ensures that transactions  moving in one virtual channel do not block transactions moving in another  virtual channel. There are I/O ordering rules covering interactions between the  three virtual channels of the same I/O stream. Transactions in different I/O  streams have no ordering rules (with exception of ordering rules associated with  Fence requests). Enforcing ordering rules between transactions in the same I/O  stream prevents deadlocks from occurring and guarantees data is transferred  correctly. Based on ordering requirements, nodes may not:&lt;/p&gt;  &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Make accepting a request dependent on the ability of that node  to issue an outgoing request.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Make accepting a request dependent on the receipt of a response  due to a request previously issued by that node.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Make issuing a response dependent on the ability to issue a  request.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Make issuing a response&lt;a name="idd1e8610"&gt;&lt;/a&gt; dependent upon  receipt of a response due to a previous request.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-8166980831222043951?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/8166980831222043951/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=8166980831222043951' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/8166980831222043951'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/8166980831222043951'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/packetized-transfers-of-hypertransport.html' title='Packetized Transfers of Hypertransport Technology'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-1769201461306368500</id><published>2007-06-26T21:28:00.000-07:00</published><updated>2007-06-26T21:30:37.525-07:00</updated><title type='text'>Research on the Hypertransport Technology</title><content type='html'>&lt;h3 class="docSection1Title"&gt;HT Signals&lt;/h3&gt; &lt;p class="docText"&gt;The HT signals can be grouped into two broad categories&lt;br /&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;The link signal group —&lt;/span&gt; used to  transfer packets in both directions (High-Speed Signals).&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;The support signal group —&lt;/span&gt; that  provides required resources such as power and &lt;a name="idd1e7342"&gt;&lt;/a&gt;reset, as  well as other signals to support optional features such power management  (Low-Speed Signals).&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;h4 class="docSection2Title"&gt;Link Packet Transfer Signals&lt;/h4&gt; &lt;p class="docText"&gt;The high-speed signals used for packet transfer in both  directions across an HT link include:&lt;a name="idd1e7368"&gt;&lt;/a&gt;&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;CAD (command, address, data).&lt;/span&gt;  Multiplexed signals that carry control packets (request, response, information)  and data packets. Note that the width of the CAD bus is scalable from 2-bits to  32-bits.&lt;br /&gt;&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;CLK (clock).&lt;/span&gt; Source-synchronous  clock for CAD and CTL signals. A separate clock signal is required for each byte  lane supported by the link. Thus, the number of &lt;a name="idd1e7402"&gt;&lt;/a&gt;CLK  signals required is directly proportional to the number of bytes that can be  transferred across the link at one time.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;CTL (control).&lt;/span&gt; Indicates  whether a &lt;a name="idd1e7415"&gt;&lt;/a&gt;control packet or &lt;a name="idd1e7419"&gt;&lt;/a&gt;data  packet is currently being delivered via the &lt;a name="idd1e7423"&gt;&lt;/a&gt;CAD  signals.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;h4 class="docSection2Title"&gt;Link Support Signals&lt;/h4&gt; &lt;p class="docText"&gt;The low-speed link support signals consist of power- and  initialization-related signals and power management signals. Power- and  initialization-related signals include:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;span class="docEmphRoman"&gt;V&lt;sub&gt;LDT&lt;/sub&gt; &amp; Ground —&lt;/span&gt;  The 1.2 volt supply that powers HT drivers and receivers&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;a name="idd1e7480"&gt;&lt;/a&gt;&lt;span class="docEmphRoman"&gt;PWROK —&lt;/span&gt;  Indicates to devices residing in the HT fabric that power and clock are  stable.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;a name="idd1e7495"&gt;&lt;/a&gt;&lt;span class="docEmphRoman"&gt;RESET# —&lt;/span&gt;  Used to &lt;a name="idd1e7503"&gt;&lt;/a&gt;reset and initialize the HT interface within  devices and perhaps their internal logic (device specific).&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Power management signals&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;a name="idd1e7520"&gt;&lt;/a&gt;&lt;span class="docEmphRoman"&gt;LDTREQ# —&lt;/span&gt;  Requests re-enabling links for normal operation.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docText"&gt;&lt;a name="idd1e7532"&gt;&lt;/a&gt;&lt;span class="docEmphRoman"&gt;LDTSTOP#  —&lt;/span&gt; Enables and disables links during system state  transitions.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;h3 class="docSection1Title"&gt;Scalable Performance&lt;/h3&gt; &lt;p class="docText"&gt;The width of the transmit and receive portion of the link (&lt;a name="idd1e7567"&gt;&lt;/a&gt;CAD signals) may be different. For example, devices that  typically send most of their data to main memory (upstream) and receive limited  data from the host can implement a wide path in the high performance direction  and narrow path for traffic in the lesser used direction, thereby reducing  cost.&lt;/p&gt; &lt;p class="docText"&gt;The HyperTransport link combines the advantages of both serial  and parallel bus architectures. HT provides options for the number of data paths  implemented and for the clock rate at which data is transferred;  thus, providing scalable link performance ranging from 0.2GB/s to 12.8GB/s. This  &lt;a name="idd1e7582"&gt;&lt;/a&gt;scalability is helpful to system designers. For  example:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;An implementation that needs all the available bandwidth (e.g.  system chipsets), can use wide links (up to 32 bits), running at the highest  clock frequencies (up to 800MHz now and 1GHz in the future).&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Implementations that don't require high bandwidth but do  require low power may use narrow links (as few as 2 bits) and lower frequencies  (down to 200MHz).&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;p class="docText"&gt;HyperTransport lends itself to scaling well because:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;The high frequency bus translates to fewer pins required to  transfer a specific amount of data. The same protocol is used regardless of link  width.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Differential signaling results in a very low current path to  ground, thereby reducing the number of power and ground pins required for  devices.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Each additional byte lane added has its own source synchronous  clock.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;HT's implementation of ACPI compliant power management and  interrupt signaling is message based, reducing pin count. Note that only two  additional signals, &lt;a name="idd1e7626"&gt;&lt;/a&gt;LDTSTOP# and &lt;a name="idd1e7630"&gt;&lt;/a&gt;LDTREQ#, are required for managing power.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;h4 class="docSection2Title"&gt;Clock Speeds&lt;a name="idd1e7953"&gt;&lt;/a&gt;&lt;/h4&gt; &lt;p class="docText"&gt;HyperTransport clock speeds currently supported are 200MHz,  300MHz, 400MHz, 500MHz, 600MHz, and 800MHz. Note that 700MHz is not supported.  Both rising edge and falling edges of the clock are used to clock signals. The  clocking mechanism is referred to as double data rate (DDR) clocking. DDR  clocking translates to an effective clock frequency that is double the actual  clock frequency. In addition, because each link is dual simplex, the actual link  bandwidth is quadrupled when compared to the clock rate.&lt;/p&gt;&lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;800MHz clock with DDR = effective clock of 1,600MHz/s  (1.6GTransfers/s)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;1.6GTransfers/s x 4 bytes = 6.4GB/s&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;6.4GB/s in both directions = 12.8GB/s.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;p class="docText"&gt;Based on point-to-point links, a HyperTransport &lt;a name="idd1e8137"&gt;&lt;/a&gt;chain may be extended into a fabric, using single and  multi-link devices together. Devices defined for HT include:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Single HT link "cave" devices used to implement a peripheral  function&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Single or multi-link Bridges; (HT-to-HT, or HT to one or more  other protocols such as PCI, PCI-X, AGP or Infiniband)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Multi-link &lt;a name="idd1e8176"&gt;&lt;/a&gt;Tunnel devices used to  implement a function and extend a link to a neighboring device downstream, thus  creating a chain&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-1769201461306368500?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/1769201461306368500/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=1769201461306368500' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/1769201461306368500'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/1769201461306368500'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/ht-signals-ht-signals-can-be-grouped.html' title='Research on the Hypertransport Technology'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-1446207680941889942</id><published>2007-06-26T21:26:00.000-07:00</published><updated>2007-06-26T21:27:59.358-07:00</updated><title type='text'>General about Hypertransport Technology</title><content type='html'>&lt;h3 class="docSection1Title"&gt;&lt;br /&gt;&lt;/h3&gt; &lt;p class="docText"&gt;HyperTransport provides a point-to-point interconnect that can  be extended to support a wide range of devices.  HyperTransport provides a high-speed, high-performance,  point-to-point dual simplex link for interconnecting IC components on a PCB.  Data is transmitted from one device to another across the link.&lt;/p&gt;&lt;p class="docText"&gt;The width of the link along with the clock frequency at which  data is transferred are scalable:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Link width ranges from 2 bits to 32-bits&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Clock Frequency ranges from 200MHz to 800MHz (and 1GHz in the  future)&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;This &lt;a name="idd1e6961"&gt;&lt;/a&gt;scalability allows for a wide range  of link performance and potential applications with bandwidths ranging from  200MB/s to 12.8GB/s.&lt;a name="idd1e6965"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p class="docText"&gt;At the current revision of the spec, 1.04, &lt;a name="idd1e6971"&gt;&lt;/a&gt;there is no support for connectors implying that all  HyperTransport (HT) devices are soldered onto the motherboard. HyperTransport is  technically an "inside-the-box" bus. In reality, connectors have been designed  for systems that require board to board connections, and where analyzer  interfaces are desired for debug.&lt;/p&gt;&lt;br /&gt;&lt;h3 class="docSection1Title"&gt;Transfer Types Supported&lt;/h3&gt; &lt;p class="docText"&gt;HT supports two types of addressing semantics:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;legacy PC, address-based semantics&lt;a name="idd1e7083"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;&lt;a name="idd1e7089"&gt;&lt;/a&gt;messaging semantics common to networking  environments&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;br /&gt;&lt;h4 class="docSection2Title"&gt;&lt;a name="idd1e7108"&gt;&lt;/a&gt;Address-Based Semantics&lt;/h4&gt; &lt;p class="docText"&gt;The HT bus was initially implemented as a PC compatible  solution that by definition uses Address-based semantics. This includes a  40-bit, or 1 Terabye (TB) address space. Transactions specify locations within  this address space that are to be read from or written to.&lt;br /&gt;&lt;/p&gt;&lt;p class="docText"&gt;HyperTransport does not contain dedicated I/O address space.  Instead, CPU I/O space is mapped to high memory address range  (FD_FC00_0000h—FD_FDFF_FFFFh). Each HyperTransport device is configured at  initialization time by the boot ROM configuration software to respond to a range  of memory address spaces. The devices are assigned addresses via the base  address registers contained in the configuration register header. Note that  these registers are based on the PCI Configuration registers, and are also  mapped to memory space (FD_FE00_0000h—FD_FFFF_FFFFh. Unlike the PCI bus, there  is no dedicated configuration address space.&lt;/p&gt; &lt;p class="docText"&gt;Read and write request command packets contain a 40-bit address  &lt;a name="idd1e7146"&gt;&lt;/a&gt;Addr[39:2]. Additional memory address ranges are used for  interrupt signaling and system management messages. Details regarding the use of  each range of address space is discussed in subsequent chapters that cover the  related topic. &lt;/p&gt;&lt;a name="ch02lev2sec2"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Data Transfer Type and Transaction Flow&lt;/h4&gt; &lt;p class="docText"&gt;The HT architecture supports several methods of data transfer  between devices, including:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Programmed I/O&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;DMA&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Peer-to-peer&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e7193"&gt;&lt;/a&gt;Programmed I/O Transfers&lt;/h5&gt; &lt;p class="docText"&gt;Transfers that originate as a result of executing code on the  host CPU are called programmed I/O transfers. For example, a device driver for a  given HT device might execute a read transaction to check its device status.&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;p class="docText"&gt;&lt;br /&gt;&lt;/p&gt;&lt;h5 class="docSection3Title"&gt;&lt;a name="idd1e7228"&gt;&lt;/a&gt;DMA Transfers&lt;/h5&gt; &lt;p class="docText"&gt;HT devices may wish to perform a &lt;a name="idd1e7239"&gt;&lt;/a&gt;direct  memory access (DMA) by simply initiating a read or write transfer.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-1446207680941889942?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/1446207680941889942/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=1446207680941889942' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/1446207680941889942'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/1446207680941889942'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/general-about-hypertransport-technology.html' title='General about Hypertransport Technology'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-2577167212130516606</id><published>2007-06-26T21:24:00.002-07:00</published><updated>2007-06-26T21:25:54.689-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HyperTransport'/><title type='text'>What HT Brings</title><content type='html'>&lt;h3 class="docSection1Title"&gt;What HT Brings&lt;/h3&gt; &lt;p class="docText"&gt;HyperTransport is a point-to-point, high-performance,  "inside-the-box" motherboard interconnect bus. It targets IT, Telecom, and other  applications requiring high bandwidth, &lt;a name="idd1e6670"&gt;&lt;/a&gt;scalability, and  low latency access. &lt;/p&gt;&lt;h4 class="docSection2Title"&gt;Key Features Of HyperTransport Protocol&lt;/h4&gt; &lt;p class="docText"&gt;The key characteristics of the HT technology include:&lt;a name="idd1e6701"&gt;&lt;/a&gt;&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;Open architecture, non-proprietary bus&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;One or more fast, point-to-point links&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Scaling of individual link width and clock speed to suit  cost/performance targets&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Split-transaction protocol eliminates retries, disconnects, and  wait-states.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Standard and optional isochronous traffic support&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;PCI compatible; designed for minimal impact on OS and driver  software&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;CRC error generation and checking&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Programmable error handling strategy for CRC, protocol, and  other errors&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Message signalled interrupts&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;System Management features&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Support for bridges to legacy busses&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;x86 compatibility features&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;Device types including tunnels, bridges, and end devices permit  construction of a system fabric comprised of independent, customized  links.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt; &lt;p class="docText"&gt;&lt;span class="docEmphasis"&gt;Formerly known as AMD's Lightning Data  Transport (&lt;span class="docEmphasis"&gt;LDT&lt;/span&gt;), HyperTransport is backed by a  consortium of developers.&lt;/span&gt;&lt;a name="idd1e6774"&gt;&lt;/a&gt; &lt;span class="docEmphasis"&gt;&lt;/span&gt;&lt;a name="idd1e6783"&gt;&lt;/a&gt;&lt;/p&gt;&lt;a name="ch01lev2sec7"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;The Cost Factor&lt;/h4&gt; &lt;p class="docText"&gt;In addition to technology-related issues, there is always  pressure on the platform designer to increase performance and other capabilities  with each new generation, but to do so at a lower cost than the previous one.  One popular method of measuring the success of this effort is to compare the  bandwidth of one I/O bus to another, and the number of signals required to  achieve it. This bandwidth-per-pin comparison works fairly well because I/O bus  bandwidth is a critical factor in determining if system data bottlenecks exist,  and a lower pin count translates directly into cost savings due to smaller IC  packages, lower power, simplified motherboard routing, etc.&lt;/p&gt; &lt;p class="docText"&gt;&lt;span class="docEmphStrong"&gt;An example:&lt;/span&gt;&lt;/p&gt; &lt;p class="docText"&gt;The bandwidth-per-pin for a generic 32-bit PCI bus during a  burst transfer is approximately &lt;span class="docEmphStrong"&gt;3.5 MB/s&lt;/span&gt; (132  MB/s [33MHz x 4 bytes]/38 pins [32 data signals + 5 control lines + 1 clock]).  By comparison, a 32 bit HyperTransport interface running at the lowest clock  speed of 200MHz yields a per-pin burst bandwidth of approximately &lt;span class="docEmphStrong"&gt;22 MB/s&lt;/span&gt; (1600 MB/s [200Mhz x 2 DDR x 4 bytes]/74 pins  [32 CAD signal &lt;span class="docEmphUl"&gt;pairs&lt;/span&gt; + 4 clock &lt;span class="docEmphUl"&gt;pairs&lt;/span&gt; + 1 CTL &lt;span class="docEmphUl"&gt;pair&lt;/span&gt;]).&lt;a name="idd1e6825"&gt;&lt;/a&gt;&lt;a name="idd1e6828"&gt;&lt;/a&gt;&lt;/p&gt;&lt;a name="ch01lev2sec8"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Networking Support&lt;/h4&gt; &lt;p class="docText"&gt;Finally, at the time of the writing of this book, the  HyperTransport I/O Link Specification is at revision 1.04. This specification  revision mainly targets I/O subsystem improvements in conventional desktop and  server platforms.&lt;/p&gt; &lt;p class="docText"&gt;A growing number of applications require architectures that  integrate well with networking environments. In many of these systems, unlike  desktops and servers, processing may be decentralized and features such as  message streaming, peer-peer transfers, and assigned isochronous bandwidth  become important. In addition, device types such as &lt;span class="docEmphasis"&gt;switches&lt;/span&gt; help in building topologies suited to  communications networking. To accommodate networking applications, work is well  underway on the 1.05 and 1.1 revisions of the HyperTransport I/O Link  Specification. The 1.05 specification includes the HyperTransport &lt;span class="docEmphasis"&gt;switch&lt;/span&gt; specification and the 1.1 specification  incorporates the &lt;span class="docEmphasis"&gt;networking extensions&lt;/span&gt;  specification. &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-2577167212130516606?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/2577167212130516606/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=2577167212130516606' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/2577167212130516606'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/2577167212130516606'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/what-ht-brings.html' title='What HT Brings'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-3178628339116903376</id><published>2007-06-26T21:24:00.001-07:00</published><updated>2007-06-26T21:24:52.680-07:00</updated><title type='text'>Shared Bus</title><content type='html'>&lt;h5 class="docSection4Title"&gt;A Shared Bus Runs At Limited Clock Speeds&lt;/h5&gt; &lt;p class="docText"&gt;The fact that multiple devices (including PCB connectors)  attach to a shared bus means that trace lengths and electrical complexity will  limit the maximum usable clock speed. For example, a generic PCI bus has a  maximum clock speed of 33MHz; the PCI Specification permits increasing the clock  speed to 66MHz, but the number of devices/connectors on the bus is very  limited.&lt;/p&gt;&lt;a name="ch01lev4sec2"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;A Shared Bus May Be Host To Many Device Types&lt;/h5&gt; &lt;p class="docText"&gt;The requirements of devices on a shared bus may vary widely in  terms of bandwidth needed, tolerance for bus access latency, typical data  transfer size, etc. All of this complicates arbitration on the bus when multiple  masters wish to initiate transactions.&lt;/p&gt;&lt;a name="ch01lev4sec3"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Backward Compatibility Prevents Upgrading  Performance&lt;/h5&gt; &lt;p class="docText"&gt;If a critical shared bus is based on an open architecture,  especially one that defines user "add-in" connectors, then another problem in  upgrading bus bandwidth is the need to maintain backward compatibility with all  of the devices and cards already in existence. If the bus protocol is enhanced  and a user installs an "older generation card", then the bus must either revert  back to the earlier protocol or lose its compatibility.&lt;/p&gt;&lt;a name="ch01lev4sec4"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;Special Problems If The Shared Bus Is PCI&lt;/h5&gt; &lt;p class="docText"&gt;As popular as it has been, PCI presents additional problems  that contribute to performance limits:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;PCI doesn't support split transactions, resulting in  inefficient &lt;span class="docEmphasis"&gt;retries.&lt;/span&gt;&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Transaction size (there is no limit) isn't known, which makes  it difficult to size buffers and causes frequent &lt;span class="docEmphasis"&gt;disconnects&lt;/span&gt; by targets. Devices are also allowed to  insert numerous &lt;span class="docEmphasis"&gt;wait states&lt;/span&gt; during each data  phase.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;All PCI transactions by I/O devices targeting main memory  generally require a "snoop" cycle by CPUs to assure coherency with internal  caches. This impacts both CPU and PCI performance.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Its data bus scalability is very limited (32/64 bit  data)&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Because of the PCI electrical specification (low-power,  reflected wave signals), each PCI bus is physically limited in the number of ICs  and connectors vs. PCI clock speed&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;PCI bus arbitration is vaguely specified. Access latencies can  be long and difficult to quantify. If a second PCI bus is added (using a PCI-PCI  bridge), arbitration for the secondary bus typically resides in the new bridge.  This further complicates PCI arbitration for traffic moving vertically to  memory.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch01lev4sec5"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;A Note About PCI-X&lt;/h5&gt; &lt;p class="docText"&gt;Other than scalability and the number of devices possible on  each bus, the PCI-X protocol has resolved many of the problems just described  with PCI. For third-party manufacturers of high performance add-in cards and  embedded devices, the shared bus PCI-X is a straightforward extension of PCI  which yields huge bandwidth improvements (up to about 2GB/s with PCI-X  2.0).&lt;/p&gt;&lt;a name="ch01lev3sec5"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;The&lt;a name="idd1e6595"&gt;&lt;/a&gt;Point-to-Point Interconnect  Approach&lt;/h5&gt; &lt;p class="docText"&gt;An alternative to the shared I/O bus approach of PCI or PCI-X  is having point-to-point links connecting devices. This method is being used in  a number of new bus implementations, including HyperTransport technology. A  common feature of point-to-point connections is much higher bandwidth  capability; to achieve this, point-to-point protocols adopt some or all of the  following characteristics:&lt;/p&gt; &lt;ul&gt;&lt;li&gt; &lt;p class="docList"&gt;only two devices per connection.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;low voltage, differential signaling on the high speed data  paths&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;source-synchronous clocks, sometimes using double data rate  (DDR)&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;very tight control over PCB trace lengths and routing&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;integrated termination and/or compensation circuits embedded in  the two devices which maintain signal integrity and account for voltage and  temperature effects on timing.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;dual simplex interfaces between the devices rather than one  bi-directional bus; this enables duplex operations and eliminates "turn around"  cycles.&lt;/p&gt; &lt;/li&gt;&lt;li&gt; &lt;p class="docList"&gt;sophisticated protocols that eliminate retries, disconnects,  wait-states, etc.&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;a name="ch01lev4sec6"&gt;&lt;/a&gt; &lt;h5 class="docSection4Title"&gt;A Note About Connectors&lt;/h5&gt; &lt;p class="docText"&gt;While connectors may or may not be defined in a point-to-point  link specification, they may be designed into some implementations to connect  from board-board or for the attachment of diagnostic equipment. There is no  definition of a peripheral add-in card connector for HyperTransport as there is  in PCI or PCI-X.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-3178628339116903376?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/3178628339116903376/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=3178628339116903376' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/3178628339116903376'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/3178628339116903376'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/shared-bus.html' title='Shared Bus'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5449984521470454692.post-689479645286532285</id><published>2007-06-26T21:17:00.000-07:00</published><updated>2007-06-26T21:24:27.748-07:00</updated><title type='text'>Background: I/O Subsystem Bottlenecks</title><content type='html'>&lt;h3 class="docSection1Title"&gt;Background: I/O Subsystem Bottlenecks&lt;/h3&gt; &lt;p class="docText"&gt;New I/O buses are typically developed in response to changing  system requirements and to promote lower cost implementations.  Current-generation I/O buses such as PCI are rapidly falling behind the  capabilities of other system components such as processors and memory. Some of  the reasons why the I/O bottlenecks are becoming more apparent are described  below.&lt;/p&gt;&lt;a name="ch01lev2sec1"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Server Or Desktop Computer: Three Subsystems&lt;/h4&gt; &lt;p class="docText"&gt;A server or desktop computer system is comprised of three major  subsystems:&lt;/p&gt;&lt;span style="font-weight: bold;"&gt; &lt;ol class="docList" type="1"&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Processor (in servers, there may be more than one)&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;Main DRAM Memory. There are a number of different synchronous  DRAM types, including SDRAM, DDR, and Rambus.&lt;/p&gt;&lt;/span&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: normal;"&gt; &lt;p class="docList"&gt;I/O (Input/Output devices). Generally, all components which are  not processors or DRAM are lumped together in this subsystem group. This would  include such things as graphics, mass storage, legacy hardware, and the buses  required to support them: PCI, PCI-X, AGP, USB, IDE,  etc.&lt;/p&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/span&gt;&lt;a name="ch01lev2sec2"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;CPU Speed Makes Other Subsystems Appear Slow&lt;/h4&gt; &lt;p class="docText"&gt;Because of improvements in CPU internal execution speed,  processors are more demanding than ever when they access external resources such  as memory and I/O. Each external read or write by the processor represents a  huge performance hit compared to internal execution.&lt;/p&gt;&lt;a name="ch01lev3sec1"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;Multiple CPUs Aggravate The Problem&lt;/h5&gt; &lt;p class="docText"&gt;In systems with multiple CPUs, such as servers, the problem of  accessing external devices becomes worse because of competition for access to  system DRAM and the single set of I/O resources.&lt;/p&gt;&lt;a name="ch01lev2sec3"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;DRAM Memory Keeps Up Fairly Well&lt;/h4&gt; &lt;p class="docText"&gt;Although it is external to the processor(s), system DRAM memory  keeps up fairly well with the increasing demands of CPUs for a couple of  reasons. First, the performance penalty for accessing external memory is  mitigated by the use of internal processor caches. Modern processors generally  implement multiple levels of internal caches that run at the full CPU clock rate  and are tuned for high "hit rates". Each fetch from an internal cache eliminates  the need for an external bus cycle to memory.&lt;/p&gt; &lt;p class="docText"&gt;In addition, in cases where an external memory fetch is  required, DRAM technology and the use of synchronous bus interfaces to it (e.g.  DDR, RAMBUS, etc.) have allowed it to maintain bandwidths comparable with the  processor external bus rates.&lt;/p&gt;&lt;a name="ch01lev2sec4"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;I/O Bandwidth Has Not Kept Pace&lt;/h4&gt; &lt;p class="docText"&gt;While the processor internal speed has raced forward, and  memory access speed has managed to follow along reasonably well with the help of  caches, I/O subsystem evolution has not kept up.&lt;/p&gt;&lt;a name="ch01lev3sec2"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;This Slows Down The Processor&lt;/h5&gt; &lt;p class="docText"&gt;Although external DRAM accesses by processors can be minimized  through the use of internal caches, there is no way to avoid external bus  operations when accessing I/O devices. The processor must perform small,  inefficient external transactions which then must find their way through the I/O  subsystem to the bus hosting the device.&lt;/p&gt;&lt;a name="ch01lev3sec3"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;It Also Hurts Fast Peripherals&lt;/h5&gt; &lt;p class="docText"&gt;Similarly, bus master I/O devices using PCI or other subsystem  buses to reach main memory are also hindered by the lack of bandwidth. Some  modern peripheral devices (e.g. SCSI and IDE hard drives) are capable of running  much faster than the busses they live on. This represents another system  bottleneck. This is a particular problem in cases where applications are running  that emphasize time-critical movement of data through the I/O subsystem over CPU  processing.&lt;/p&gt;&lt;a name="ch01lev2sec5"&gt;&lt;/a&gt; &lt;h4 class="docSection2Title"&gt;Reducing I/O Bottlenecks&lt;/h4&gt; &lt;p class="docText"&gt;Two important schemes have been used to connect I/O devices to  main memory. The first is the shared bus approach, as used in PCI and PCI-X. The  second involves point-to-point component interconnects, and includes some  proprietary busses as well as open architectures such as HyperTransport. These  are described here, along with the advantages and disadvantages of each.&lt;/p&gt;&lt;a name="ch01lev3sec4"&gt;&lt;/a&gt; &lt;h5 class="docSection3Title"&gt;The Shared Bus Approach&lt;/h5&gt; &lt;p class="docText"&gt;&lt;a class="docLink" href="#ch01fig01"&gt;Figure 1-1&lt;/a&gt; on page 12  depicts the common "North-South" bridge PCI implementation. Note that the PCI  bus acts as both an "add-in" bus for user peripheral cards and as an  interconnect bus to memory for all devices residing on or below it. Even traffic  to and from the USB and IDE controllers integrated in the South Bridge must  cross the PCI bus to reach main memory.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5449984521470454692-689479645286532285?l=cpu-hypertransport.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://cpu-hypertransport.blogspot.com/feeds/689479645286532285/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5449984521470454692&amp;postID=689479645286532285' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/689479645286532285'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5449984521470454692/posts/default/689479645286532285'/><link rel='alternate' type='text/html' href='http://cpu-hypertransport.blogspot.com/2007/06/background-io-subsystem-bottlenecks.html' title='Background: I/O Subsystem Bottlenecks'/><author><name>Info Center</name><uri>http://www.blogger.com/profile/10560464513846233657</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
