This design relies on a concept called Explicitly Parallel Instruction Computing (EPIC), similar to the pre-existing VLIW design but with a number of enhancements. Where a typical VLIW will assign sub-instructions from each long instruction word to a particular fixed functional unit, the Itanium supports several bundle mappings to allow for more instruction mixing possibilities and which include a balance between serial and parallel execution modes. There was room left in the initial bundle encodings to add more mappings in future versions of IA-64. In addition the Itanium has individually settable predicate registers to cause a kind of runtime determined "no output" mode to each instruction.
The IA-64 architecture includes a very generous complement of registers: 128 each of 82-bit floating point and 64-bit integer registers. In addition to the sheer number, IA-64 adds in a register rotation mechanism that is controlled by the Register Stack Engine. So rather than the typical spill/fill or window mechanisms used in other processors, the Itanium can rotate in a set of new registers to accomodate for new function parameters or temporaries. The register rotation mechanism combined with predication is also very effective in executing automatically unrolled loops.
The architecture also provides a versatile set of instructions that cater to a wide spectrum of usage. Thus we have explicit instructions for multimedia operations, and explicit instructions for floating point operations.
Despite its great capabilities, the IA-64 instruction set is notoriously difficult to program directly. Intel has strongly recommended against the practice of assembly programming on Itanium, in general, and instead use their C++ compiler (which contains platform specific heuristics.)
In order to support IA-32, the Itanium can switch into 32-bit mode with special jump escape instructions, and then return in an analogous way. The IA-32 instructions have been mapped to the Itanium's functional units. However, since the Itanium is built primarily for speed of its EPIC-style instructions, and because it has not out-of-order execution capabilities, the IA-32 instructions execute at a severe performance penalty compared to either the IA-64 mode, or its Pentium line of processors. For example, the Itanium functional units do not automatically generate integer flags as a side effect of ordinary ALU computation, and does not intrinsically support multiple outstanding unaligned memory loads. There have been reports that Intel is seeking a software emulation based solution (much like Transmeta did) for executing IA-32 code in future versions of Itanium to replace the current hardware solution.
Interestingly a raw Itanium, when first booted, is actually missing some of its instruction functionality. A boot-rom like programmed called an EFI program is loaded which loads additional code into on-chip memory for defining these instructions, and performing other boot-time configurations, such as choosing the execution mode of the processor (64-bit versus 32-bit.) This design allows Itanium system to be deployed with different capabilities depending on the contents of the EFI program.
Although other 64-bit architectures have existed for a long time most (MIPS, Alpha, PA-RISC) have faded from the marketplace. Itanium's remaining competition for the 64-bit server and workstation market appear to be the newcomer AMD with its AMD64 architecture, and the entrenched rivals: IBM's POWER architecture, and Sun's Sparc64 architecture. Apple may also challenge Intel with its XServe product line based on the IBM PowerPC architecture.
Intel has so far only marketed the Itanium in the high-end work station, server and super-computer space, however there have been reports of Intel working on IA-64 extensions to its IA-32 line of processors, as way to address the eventual problem of limitations of the Pentium's 32bit addressable memory space.