Just like the Thunderbird, the Duron features an exclusive L2 cache versus the more conventional inclusive L2 cache architecture. The Pentium III and the Celeron both implement an inclusive L2 cache, meaning that all of the data stored in their L1 cache (32KB) is duplicated in their L2 cache (256KB or 128KB). So while the Pentium III has 256KB of L2 cache, 32KB of that is actually duplicated from what is in the L1 cache. The same goes for the Celeron which features 128KB of L2 cache, 32KB of which is a duplicate of everything stored in its L1 cache.
An exclusive cache which, as you can tell by the name is the opposite of an inclusive cache, doesn't duplicate L1 data in the L2 cache area. The L2 cache only contains the copy-back cache blocks that are to be written back to the memory sub system (basically everything that doesn't fit in L1 and would normally go to the system memory if there was no L2 cache).
So although the Duron only has a 64KB L2 cache, it does not contain a copy of the L1 cache and thus works along side it to store a total of 192KB of frequently used data (128KB L1 + 64KB L2). This is in comparison to the Celeron who’s inclusive L2 cache must duplicate everything stored in L1, meaning that the 128KB L2 is reduced to a usable 96KB L2. Because of this, the Celeron actually only has 128KB of cache in which it can store frequently used data.
Unfortunately, just like the Thunderbird, the Duron features a 64-bit data path to the L2 cache unlike the Pentium III and the new Celeron (533A, 566 and above) which have a 256-bit data path to their L2 cache. This effectively quadruples the amount of L2 cache bandwidth the Pentium III and new Celeron hold over the Duron which does penalize it somewhat.
One of the reasons that the new Celeron performed so poorly, even when overclocked to the same speed, as the Pentium III (i.e. Celeron 850/100 vs Pentium III 850/100) was because the Celeron’s L2 cache featured a 4-way set associative mapping algorithm versus the Pentium III’s 8-way set associative L2 cache. The reason this discrepancy exists is because Intel essentially disables 1/2 of the L2 cache on the Pentium III in order to produce a Celeron (this can be confirmed by noting that the die sizes of the two chips are identical) and by doing that you essentially get half the “associativity”.
The Duron is not produced this way, the core is physically smaller than the Thunderbird and thus it can retain the same 16-way set associative L2 cache as its big brother, just in a smaller cache size. For a thorough explanation of the benefits of a 16-way set associative L2 cache versus the Pentium III’s 8-way set associative L2 cache, take a look at page 5 of our AMD Athlon “Thunderbird” Review.
This is in contrast to Intel’s production method, which takes those Pentium III chips that have ‘bad’ parts of their L2 cache, disable half of the cache (that contain the failed L2 cache blocks) and remark the CPU as a Celeron.
The reason AMD doesn’t do this is because they actually duplicate all L2 cache columns so their chips actually have twice as many columns for the L2 cache than are necessary. Since each column is duplicated, if there are a few bad cache blocks the columns can simply be deleted without worrying about not having a full 64KB or 256KB of L2 cache available for the part. This does make the chip a little larger than it should be, but the difference in manufacturing cost should be negligible since you don’t have to throw away a die if it has a few bad blocks.
|
0 Comments
View All Comments