Flag This Hub

Understanding Intel Core I7: Part 1

By


See all 7 photos

The Core I7 mark the introduction of the Nahal, based on an architecture with many changes on the Penryn processors and above, including an integrated memory controller and FSB's long-awaited migration to a serial bus, point-to-point, two improvements that have been introduced years earlier by AMD, which Intel had resisted until then.

While the Core I7 processor is still a niche, the new architecture provide the basis for Intel processors for years, so it is important to study a little about it.


Starting with a little historical context, in early 2006, Intel was in a complicated situation. ThePentium D, based on inefficient NetBurst architecture lost to the Athlon X2 in terms of performance and in terms of efficiency, wasting energy and yielding very little.

At the time, the AMD processors were higher in both desktops and in servers and Intel lost ground quickly on both fronts. When everything seemed lost, Intel made the Core architecture,which led to the Core 2 Duo processors and the other of the line current. To avoid the same mistake it made with the platform NetBurst, Intel has invested heavily in research and development, to develop several new architectures in parallel and invest heavily in developing new techniques of manufacture and modernization of its factories.

The marketing department, rushed to create a term that symbolizes a new phase, the "tick-točka" now being used extensively in the advertising material of Intel. The idea is simple: to present new architectures and new manufacturing techniques in alternate years, where a "tick" is the launch of a new architecture (such as Penryn and Nehal) while "točka" is the launch of a new technique for manufacturing (45 nanometers or 32 nanometers, for example), closing the cycle.

The plan is to keep the interested public, announcing a new architecture, or migration to a new manufacturing process once every year and maintain a rapid pace of evolution, that AMD has difficulties to follow.

Within the idea, the migration to the technique of 0065 micron in 2005 was a "tick", the launch of the Core platform in 2006 was a "točka" and the launch of Penryn in 2007, based on the new architecture of 0045 micron was a new "tick", which was followed by the announcement of Nahal (pronounced "nerreilem"), which represents a new architecture, still produced using the technology of 45 nanometers, but with several architectural changes in the Penryn.

As with all other processors from Intel, the "Nehal" is only the name-code of the architecture.To effectively reach the shelves, he earned the name of Intel Core I7.

Unlike Yorkfield, used in Core 2 Quad processor Q9000 series of (which was obtained by combining two dual-core processors, linked via the FSB), the Nehal is a native quad-core processor, where the 4 cores share the same silicon chip:

The 4 cores are composed of no less than 731 million transistors, that even with the technology of production of 45 nanometers, occupy an area of 263 mm ². To get an idea, it is more than 10 times the size of an Atom 230, which has only 25.9 mm ².

To accommodate the 4 cores, Intel has made several changes in the architecture of the caches. Instead of a large shared L2 cache, Intel chose to use an architecture similar to that used by the AMD Phenom, with a small L2 cache (256-KB) for each core and generous 8MB of L3 cache shared between all cores. Within the architecture, the L3 cache takes the position that the Core 2 Duo was executed by the L2 cache, serving as a reservoir of common data.

The big difference between the cache and the Nahal Phenom lies in how data are stored in caches. AMD is used in a cache "exclusive", where the L2 cache stores data than L1 and L3 cache stores data different from those of L2, maximizing the storage space. Intel, on the other hand, uses a "inclusive" where the L1 and L2 cache store copies of data also stored in the L3 cache.

While reducing the total volume of data that can be stored in caches, the system from Intel provides a small gain in performance, since the processor does not need to check the data in each of the caches independently.

Another reason for using the cache inclusive placements are the new low-power (C3 and C6) incurred by the processor, where some (or all nuclei) are completely off, reducing the consumption of a low level, but in turn causing the loss of data stored in L1 and L2 cache. As the L3 cache is independent of the 4 cores, it remains active, allowing the nuclei recharge the caches from the L3 to agree, but the processor needs to perform operations to check, you need not get the data back into RAM.

This is where Intel's investment in new techniques of production were paid, since with smaller transistors, they can afford to produce more processors and more cache, offsetting the loss of space caused by the use of the system with a unique larger volume of cache.

The L1 cache is still divided into two blocks (32 KB for data and 32 KB for instructions), as well as in all previous cases, but there was an increase in access latency, which increased from 3 to 4 cycles on Penryn. The loss of performance is offset by the reduction in latency of the L2 cache, which fell considerably, from 15 to 11 cycles.

This reduction in the time of access is one of Intel's justifications for the use of an L2 cache as small. With only 11 cycles to access, it works more like a cache level "one and a half," which serves as an intermediary between the L1 cache and the large block of shared L3 cache.

The L3 cache works with a latency of 39 cycles, which may seem quite compared to the latency of the L1 and L2 caches, but it is a bit faster than L3 cache used in the Phenom, which in addition to smaller, works with a latency of 43 cycles.

Another dramatic change is the inclusion of an integrated memory controller, and we have AMD. The integrated memory controller, substantially reduces the latency time in memory, resulting in a considerable gain in performance. A major reason the Athlon X2 have remained competitive with the Core 2 Duo, although it has far less cache was precisely due to the fact of using the dedicated driver, while the Core 2 Duo's work depended on the chipset.

The broadly, we can say that the Athlon X2 need to access the memory more often (due to smaller cache) but in return lose less time because each access to the memory controller integrated. Intel though resisted, but ended up having to sell the idea.

Instead of using a single-channel controller, or dual-channel, Intel chose to use a triple-channel controller, with support for memory DDR3, operating until the 1:33 GT / s. This means a total bandwidth of up to 32 GB / s (when using 3 modules). To get an idea, that is 40 times more than we were 10 years ago, when memory modules are usable SDR PC-100 with the Pentium III.

The three channels operate independently, so that the processor for starting a new reading on one of the modules while still expects the figures for a reading before, in another module.This helps to reduce the latency time of memory access, which is proportionately much higher in DDR3 modules.

Naturally, to get the best benefit of the triple-channel, you must use the modules in trios. By using a single module, one of the channels will be activated and, using four, the last module share the same channel with the first.


The problem with the integrated controller is that it substantially increases the number of contacts of the processor, which completely breaks the compatibility with the plates Socket 775 today. The version of the triple-channel Nehal use an LGA socket with no less than 1366 contacts. The format has also changed the processor, to be rectangular, as in the old Pentium Pro:

Company changes in the processor, launched a new chipset, the X58, which is par with the ICH10, which concentrates the interfaces of I / O:

As the memory controller is moved into the processor, the X58 chipset is a relatively simple, it basically serves as an interface between the bus QPI processor, the peripheral PCI Expressand ICH10 chip (accessed through a bus DMI), which concentrates the other interfaces. The transistors for the PCI Express lines occupy most of the chip die and that is why he still occupies a relatively large area.

Another novelty is that the X58 is certified by nVidia for use of SLI (ie, the first in which Intel agreed to pay the license), which allows the development of plaques that are compatible with both SLI and CrossFireX with. As we are talking about the top high-end market here, where he paid $ 999 by the processor and another $ 400 for the motherboard, the SLI support has been an important resource.

The main point is that triple-channel is available in the family I7 processors, which are intended for servers and workstations for high performance. The i5 processors for desktop (based on core Lynnfield) have only two channels enabled, change that is reflected in socket, which has a smaller number of contacts.

Like this Hub?
Please wait working