PA-RISC Processors
Introduction
The PA-RISC architecture was the offspring of previous design efforts and lessons that HP learned from developing the FOCUS CPU, the world’s first full 32-bit microprocessor. The FOCUS CPU was at its time (pre-1984) a huge chip (about 450,000 FETs) with a stack-based instruction set. There is a detailed description of the FOCUS architecture on the HP 9000/500 FOCUS page.
The PA-RISC processors were designated to replace the old 16-bit stack-based CPUs in HP 3000 servers and the Motorola 680x0 CPUs in HP’s Unix workstations and servers. Overall PA-RISC was a quite conservative RISC design:
- The instruction set is implemented in hardware and not microcoded, as for in example in conventional CISCs (or HP FOCUS).
- The instruction size has a fixed length — one word (32-bit).
- Only three addressing modes: long/short displacement and indexed mode.
- Only load/store operation access the memory, no computational instructions directly access the memory.
- The PA-RISC instruction set was designed to be a good target for optimizing compilers. Many simple, frequently used instructions execute in just one cycle, more complex computation were assigned to assist processors or software algorithms.
Compared to other RISC architectures from the time, the original PA-RISC design was rather unspectacular — it had typically fewer features but remained always at competitive speeds, especially in Floating Point and SMP (multiprocessing) areas. HP was the first to include multimedia extension in a commercially available microprocessor (MAX-1 in PA-7100LC and MAX-2 64-bit in PA-8000 — similar to Intel’s MMX et al.) which allowed vector operations on two or four 16-bit subwords in 32-bit or 64-bit integer registers.
The original PA-RISC 1.0 architecture included a single instruction/data bus; PA-RISC later on moved to a Harvard-style architecture with seperate instruction and data buses. It has thirty-two 32-bit integer general purpose registers (GR0-GR31), seven shadow registers (SR0-SR6) for fast-interrupts and thirty-two 64-bit Floating Point registers for the FPU, which also could be combined to 64×32-bit and 16×128-bit. The FPU is able to execute a Floating Point instruction simultaneously to the ALU. The original addressing was 48-bit wide, it was later on expanded to 64-bit (with the introduction of the PA-8000 line).1
- Great Microprocessors of the Past and Present, John Bayko (June 2001/V 12.1.1: BURKS. Accessed 28 Dec 2007)
Overview Table
| CPU | ISA | Clock max |
FETs | L1 Cache max |
L2 Cache max |
TLB | Super scalar |
SMP | Units |
|---|---|---|---|---|---|---|---|---|---|
| TS-1 | PA 1.0 32-bit |
8MHz | ? | 128KB I/D off-chip |
4K I/D | 1-way | No | 1 Integer External FPU |
|
| NS-1 | PA 1.0 32-bit |
30MHz | 144k | 128KB off-chip |
4K I/D | 1-way | No | 1 Integer External FPU |
|
| NS-2 | PA 1.0 32-bit |
27.5MHz | ? | 1024KB I/D off-chip |
16K I/D | 1-way | No | 1 Integer External FPU |
|
| CMOS26B | PA 1.0 32-bit |
50MHz | 196k | 1024KB I/D off-chip |
8K I/D | 1-way | Yes | 1 Integer External FPU |
|
| PA-7000 | PA 1.1a 32-bit |
66MHz | 577k | 256KB I 256KB D off-chip |
96 I 96 D |
1-way | No | 1 Integer External FPU |
|
| PA-7100/ PA-7150 |
PA 1.1b 32-bit |
125MHz | 850k | 1MB I 2MB D off-chip |
120 | 2-way | Yes | 1 Integer 1 Floating Point |
|
| PA-7100LC | PA 1.1c 32-bit |
100MHz | 900k | 1KB I on-chip |
2MB off-chip |
64 | 2-way | No | 2 Integer 1 Floating Point |
| PA-7200 | PA 1.1d 32-bit |
140MHz | 1.3M | 2KB on-chip |
1MB I 2MB D off-chip |
120 | 2-way | Yes | 2 Integer 1 Floating Point |
| PA-7300LC | PA 1.1e 32-bit |
180MHz | 9.2M | 64KB I 64KB D on-chip |
8MB off-chip |
96 | 2-way | No | 2 Integer 1 Floating Point |
| PA-8000 | PA 2.0 64-bit |
230MHz | 4.5M | 1MB I 1MB D off-chip |
96 | 4-way | Yes | 4 Integer 4 Floating Point 2 Load/Store |
|
| PA-8200 | PA 2.0 64-bit |
300MHz | 4.5M | 2MB I 2MB D off-chip |
120 | 4-way | Yes | 4 Integer 4 Floating Point 2 Load/Store |
|
| PA-8500 | PA 2.0 64-bit |
440MHz | 140M | 512KB I 1MB D on-chip |
160 | 4-way | Yes | 4 Integer 4 Floating Point 2 Load/Store |
|
| PA-8600 | PA 2.0 64-bit |
550MHz | 140M | 512KB I 1MB D on-chip |
160 | 4-way | Yes | 4 Integer 4 Floating Point 2 Load/Store |
|
| PA-8700 | PA 2.0 64-bit |
875MHz | 186M | 768KB I 1.5MB D on-chip |
240 | 4-way | Yes | 4 Integer 4 Floating Point 2 Load/Store |
|
| PA-8800 2-core |
PA 2.0 64-bit |
1GHz | 300M | 2× 768KB I 768KB D on-chip |
32MB off-chip |
2× 240 |
2× 4-way |
Yes | 2× 4 Integer 4 Floating Point 2 Load/Store |
| PA-8900 2-core |
PA 2.0 64-bit |
1.1GHz | 317M | 2× 768KB I 768KB D on-chip |
64MB off-chip |
2× 240 |
2× 4-way |
Yes | 2× 4 Integer 4 Floating Point 2 Load/Store |
| Hitachi PA/50 |
PA 1.1 32-bit |
60MHz | 1.28M | 8KB I 4KB D on-chip |
32 I 64 D |
1-way(?) | No(?) | 1 Integer 1 Floating Point |
|
| Hitachi HARP-1 |
PA 1.1 32-bit |
150MHz | 2.8M | 8KB I 16KB D on-chip |
512KB I 512KB D off-chip |
128 I 128 D |
2-way | No(?) | 1 (2?) Integer 1 Floating Point |
- ISA: Instruction set architecture — version of the PA-RISC architecture and its width, i. e. integer register width and maximum addressable memory (32-bit or 64-bit)
- FETs: Number of transistors
- L1/L2 Caches: Maximum amount of Level 1 and Level 2 cache memories — on-chip is integrated onto the CPU die while off-chip cache is implemented with separate chips (most PA-RISC processors supported larger off-chip caches than were implemented in actual products)
- TLB: (Maximum) size of Transition lookaside buffer memory (in entries)
- SMP: Capability of the CPU to work in multi-processor configuration
- Units: Number of functional processing units, for integer and floating point arithmetic, and load/store operations
Early PA-RISC
The first PA-RISC processors, designed and used in the mid to late-1980s in the
HP 9000/800 servers (and HP 3000 MPE/iX systems), are very poorly documented.
Their exact nomenclatura is not clear, one group of sources refers to them as
TS-1, NS-1 and NS-2, while other call — apparently the same
processors — PN-5, PN-7 and PN-10.
These early CPUs still mostly were chipsets —
multiple separate chips and components formed the central processing unit (contrary to
the mostly single-chip post-PA-7000 implementations). The chips were based on TTL (Transistor-transistor
logic), NMOS-III (Negative channel Metal-Oxide Semiconductor), and, lastly, CMOS26B.
An interesting aspect of these CPUs are their huge TLB sizes — from 2048 up
to 16384 entries, an enormous number of entries when compared
to other computers [...]
(their successors and competitors had sizes typically in
the low to mid hundreds).
TS-1
Used in: 840
Introduced in: 1986
The TS-1 was the very first production processor ot the PA-RISC family and integrated version 1.0 of PA-RISC on six boards (each 8.4×11.3″) of TTL: I-unit (Instruction Unit, controls the instruction flows, executes branches and handles interrupts and traps etc.), Register File Board (contains the 32 32-bit general registers GR0-GR31 and 25 control registers in SRAMs), E-unit (Execution Unit, performs arithmetic and address calculation with the integer ALU, does also load and store operations), TLB (translation lookaside buffer with 4096 entries for 2KB pages), Cache controller (contains the split instruction and data caches — 64KB for each I and D) and FPC (floating-point coprocessor, handles FP operations parallel to the CPU/ALU). Each board contained about 150 ICs. TS-1 processors were only used in the 840 server systems.
- PA-RISC version 1.0 32-bit
- External FPU (the ADD/MUL/DIV chip was taken over from the HP 9000/550 FOCUS system)
- Three-stage pipeline
- 4096-entry TLB off-chip, direct-mapped
- Off-chip L1 cache of 128KB (I/D) direct-mapped/one-way associative
- Physical address space of 27-bit (128MB main memory could be addressed)
- 8MHz clock speed
- Six (some sources say five) printed circuit boards, implemented in FAST TTL and (25ns and 35ns) SRAMs/PALs
NS-1
Used in: 825, 835, 850
Introduced in: 1987
The first implementation of PA-RISC (1.0) in a NMOS fabrication process followed shortly on the original TTL-based TS-1 and was called accordingly NS-1. It implemented the central processing unit on a single chip and needed several other ICs to complete the whole processor (including external FPU and cache chips).
The NS-1 processor is contained on one circuit board (two on 825) and integrates the complete CPU as a single NMOS-III chip, accompanied by eight other (NMOS-III) VLSI chips: SIU (system interface unit), two CCUs (cache controller units CCU0 and CCU1), TCU (TLB controller unit), MIU (math interface unit, which speaks to the FP chips) and three third-party floating point (FP) chips (ADD, MUL and DIV). Cache and TLB memory was implemented in separate chips, their sizes varying on the different computer models — from 16KB up to 128KB cache and TLBs with 2048 up to 4192 entries.
- PA-RISC version 1.0 32-bit
- External third-party FPU
- Three-stage pipeline
- 2048-4096-entry TLB off-chip
- Off-chip L1 cache of 16KB (HP 9000/825) to 128KB (others), unified
- Physical address space of 29-bit (512MB main memory could be addressed)
- 25-30MHz clock speed
- One circuit board (two boards on HP 9000/825), 144,000 FETs, implemented in NMOS-III packaged in a 272-pin ceramic PGA package
- CPU itself is a single chip, accompanied by eight other VLSI chips (cache/TLB, FPU and I/O and bus attachments)
NS-2
Used in: 822, 832,
845, 855, 860
Introduced in: 1989-1990
The last NMOS-based PA-RISC processor was the NS-2, a tweaked follow-on to the NS-1 with increased pipeline stages (from three to five), new TLB and cache controllers and significantly larger caches (1MB) and TLBs (16K).
The NS-2 design was simplified over its NS-1 predecessor. The processor is contained on one processor circuit board and integrates the CPU as a single NMOS-III chip, with seven other VLSI (NMOS-III) chips on its side: SIU (system interface unit), two CCUs (cache controller units, split into instruction and data — ICCU and DCCU), TCU (TLB controller unit), FPC (floating point controller [or coprocessor], speaks to the FP chips) and two third-party floating point (FP) chips (ADD, MULTI). The bus structure connecting these chips was updated (and simplified), with the CPU having private connections to the cache and TLB controllers (for which the NS-1 CPU had to use the shared cache bus).
- PA-RISC version 1.0 32-bit
- External third-party FPU
- Five-stage pipeline
- 16384-entry TLB off-chip
- Off-chip L1 cache up to 1024KB, split into I/D
- Physical address space of 29-bit (512MB main memory could be addressed)
- 27.5MHz clock speed
- One circuit board, implemented in NMOS-III
- CPU itself is a single chip, needs seven VLSI support chips for cache/TLB, FPU and memory/bus interfaces
CMOS26B (PCX)
Used in: 842, 852,
865, 870
Introduced in: 1990?
The last PA-RISC 1.0 design was either called CMOS26B or PCX and was the first PA-RISC processor fabricated in a CMOS process. It implemented the NS-1/NS-2 NMOS-based design in a much more compact CMOS die and integrated several of the processor functions previously supplied on external VLSI chips onto the CPU die. The still needed support chips of the CMOS26B include: three CMUX (cache multiplexer — one instruction, two data; equivalent to the earlier CCUs), SPI (SMB to processor interface — SMB is the system main bus), FPC (floating point coprocessor) and two FP chips (MUL/DIV and ADD/SUB) [not completely clear if the latter two or latter three chips are third-party].
- PA-RISC version 1.0 32-bit
- First multi-processor-capable PA-RISC CPU (up to four-way SMP)
- Direct predecessor of the PA-7000 (PCXS) processor which integrated most processor logic minus the FPU onto a single die/chip
- External FPU
- 8192-entry TLB on-chip
- Off-chip L1 cache up to 1024KB, split into I/D (apparently asymmetrical 1:2 I/D)
- Physical address space of 29-bit (512MB main memory could be addressed)
- 50MHz clock speed
- One circuit board, 196,000 FETs, 1.0 micron, implemented in CMOS (CMOS26B)
- CPU is a single chip, needs seven other (VLSI) support chips for memory/bus interfaces and I/O
CMOS26B is quite possibly not the correct name of this CPU/chipset. CMOS26B
is in fact the name of the HP CMOS fabrication process in which this processor was fabbed.
The CMOS26B
processor apparently was also called PCX, the CMOS26B
just being a reference to its fab process. There are sources which also mention a CS-1
processor — from the nomenclatura this would point to a CMOS-based design but the
performance figures/charts do not really match up with the CMOS26B/PCX described
here.
References
- Wayne E. Holt (ed.), Beyond RISC! An Essential Guide to Hewlett-Packard Precision Architecture (January 1988: Software Research Northwest Inc.)
- Hardware Design of the First HP Precision Architecture Computers (PDF) David A. Fotland et al (March 1987: Hewlett-Packard Journal)
- HP 3000 Series 950 and HP 9000 Model 850S Family CE Handbook (PDF) Hewlett-Packard Company (October 1990. Accessed January 2008 at hpmuseum.net)
- HP 9000 Series 800 Model 825S Hardware Technical Data (PDF) Hewlett-Packard Company (September 1988. Accessed January 2008 at hpmuseum.net)
- HP 3000/925 and HP 9000/825/835 Computer Systems CE Handbook (PDF) Hewlett-Packard Company (May 1988. Accessed January 2008 at hpmuseum.net)
- New midrange members of the Hewlett-Packard Precision Architecture Computer Family Thomas O. Meyer et al (June 1989: Hewlett Packard Journal. Accessed January 2008 at findarticles.com)
- HP 9000 Series 800 Model 822S/832S Technical Data (PDF) Hewlett-Packard Company (1989. Accessed January 2008 at hpmuseum.net)
PA-7000 (PCX-S) (Cheetah)
Used in
Time of introduction
1991
Overview
The PA-7000 was the first PA-RISC 1.1 CPU implementation and saw its first uses in the first PA-RISC 700 series workstations and later on in some of the Nova servers. It was still a multi-chip implementation.
Details
- PA-RISC version 1.1a 32-bit
- Needs external FPU (commonly used was a coprocessor developed by HP and Texas Instruments)
- Five-stage pipeline
- 96/96 I/D TLB
- 4/4 I/D BTLB
- 32-bit bus to I cache
64-bit bus to D cache - Off-chip caches up to 256KB/256KB I/D
- Up to 66MHz frequency with 5.0V core voltage
- 14.2×14.2 mm2 die, 577,000 FETs, 1.0 micron, 2-layer CMOS (FPU fabbed in 0.8 micron)
References
- Various
- Évolution des gammes de processeurs MIPS, DEC Alpha, PowerPC, SPARC, x86 et PA-RISC (PDF) André Seznec and Thierry Lafage (INRIA: June 1997)
PA-7100/PA-7150 (PCX-T) (Thunderbird)
Used in
- 715/{33,50,75}
- 725/{50,75}
- 735/{99,125}
- 742i/50
- 745i/{50,100}
- 747i/{50,100}
- 755/{99,125}
- G50, G60, G70
- H50, H60, H70
- I50, I60, I70
- T500, T520
- Convex SPP1000/{CD,XA}
- Stratus Continuum 610S, 610, 615S, 615, 620, 625, 1220, 1225, 1245
Time of introduction
Early 1992 (PA-7150: 1994)
Overview
The PA-7100 is a superscalar processor that is therefore able to issue more than one instruction at a
time. It is the first PA-RISC CPU to integrate the ALU and FPU on a single die
thus saving board space and lowering production cost. The communication
channel between the PA-7100 and its instruction cache has been doubled which
enables this CPU to achieve instruction level parallelism as described
above. In this, multiple consecutive instructions are fetched by the CPU and
simultaneously dispatched to independent integer and floating point
units.
Connection to memory and I/O is provided by the external Processor-Memory Interface (PMI) chip, to which
the PA-7100 attaches via the P-bus. The PA-7100 is apparently multi-processing capable, with
two alternative strategies: either two PA-7100s share the same P-bus to a (shared) PMI, or each PA-7100
is attached to its own PMI, which shares the memory and I/O bus with the other PA-7100/MPIs.
The PA-7150 is a PA-7100 with tweaks to the core and cache subsystem to allow clock frequencies up to 125MHz.
Details
- PA-RISC version 1.1b 32-bit
- Two functional units: 1 integer ALU, 1 Floating Point unit
- 2-way superscalar
- CPU, FPU, MMU and cache controller on one VLSI chip
- Five-stage pipeline
- Pipeline store technique for reduction of penalty for execution of any store to data cache
- Stall-on-use mechanism for parallel procession of instruction streams and cache misses
- 3-instruction queue
- Hardware TLB miss handler
- Hardware static branch support
- I/D cache bypass (7150)
- Off-chip L1 caches up to 1MB I and 2MB D realized in asynchronous standard SRAMs
- I/D caches are both 64-bit per access, direct mapped, parity protected and cycled at CPU clock
- Caches are software accessible
- Caches are virtually indexed and physically tagged to minimize latency
- 120-entry fully associative TLB
- 16-entry BTLB with programmable page sizes up to 64MB
- P-bus system bus, speed interface programmable to 1.0, .67 and .50 of processor speed
- Dual precision floating point latency: 2 cycles at 100MHz, load-use penalty is one cycle, branch penalty 0 (predicted) and 1 cycle (mispredicted)
- Two different multiprocessing connection strategies supported (shared MPI and dedicated MPIs)
- MP cache coherency support
- Up to 100MHz frequency (PA-7100) with 5.0V core voltage
- Up to 125MHz frequency (PA-7150) with 5.0V core voltage
- 14.0×14.0 mm2 die, 850,000 FETs, 0.8 micron, 3-layer metal CMOS (CMOS26B process) packaged in a 504-pin ceramic PGA package
- Power dissipation of 30W at 100MHz
References
- Various
- Évolution des gammes de processeurs MIPS, DEC Alpha, PowerPC, SPARC, x86 et PA-RISC (PDF) André Seznec and Thierry Lafage (INRIA: June 1997)
PA-7100LC (PCX-L) (Hummingbird)
Used in
- 712/{60,80,100}
- 715/{64,80,100}
- 725/100
- 743i/{64,100}
- 748i/{64,100}
- D200, D210, D300, D310
- E25, E35, E45, E55
- Hitachi 3050RX 225, 235
- SAIC Galaxy 1100
Time of introduction
1994
Overview
The PA-7100LC was primarily designed as a single-chip solution for application in low cost systems while still delivering the performance of 1991 high-end workstations and servers. Contrary to earlier PA-RISC version 1.1 implementations which needed several support chips for the MPU the PA-7100LC integrates the CPU, FPU, MIOC (memory and I/O controller) and a first-level cache on a single VLSI chip. Both CPU and FPU support the PA-RISC 1.1 Edition 3 ISA.
Details
- PA-RISC version 1.1c 32-bit
- Three functional units: 2 integer ALUs, 1 Floating Point unit1
- 2-way superscalar
- Five-stage pipeline
- DRAM-memory & cache controller (MIOC) integrated on die
- 1KB on-chip I L1 instruction cache, direct mapped, 64-bit per access, prefetch from off-chip I cache
- 8KB-2MB off-chip unified I/D L1 cache, direct mapped, hashed address, virtual index, 480-600MB/s bandwidth
- The 1KB on-chip I cache is not really considered a true cache, thus the off-chip cache in fact is the system’s real L1 cache
- 32-Byte cache line size
- Support for bi-endian load-store operations
- MAX-1 multimedia extensions (subword arithmetic) for multimedia applications, e.g. MPEG decoding
- Floating Point load-store to I/O space
- 64-entry unified I/D TLB, fully associative, 4K page size
- 8-entry BTLB, page sizes from 512K - 64M
- 64-bit wide load/store operations
- I and D cache bypassing
- Stall on use D cache miss policy
- Don’t fill on miss cache hint
- Hardware TLB miss handler support
- Hardware static branch prediction
- GSC bus interface
- 64-bit ECC interface to the main memory
- Instruction line prefetch from main memory
- Up to 100MHz clock
- Not MP capable
- 14.2×14.2 mm2 die, 900,000 FETs, 0.75 micron, 3-layer aluminium process packaged in a 432-pin PGA
- Only one of the two integer ALUs is able to handle loads, stores and shifts, these operations can only be paired with simple math operations, like integer addition or multiplication. Both units can handle branch operations.
References
- PA7100LC ERS (External Reference Specification) (PDF, 410KB)
- Detailed official description of the PA-7100LC processor and its microarchitecture. Hewlett-Packard Company (1999).
PA-7200 (PCX-T') (Thunderbird')
Used in
- C100, C110
- D250, D260, D350, D360
- J200, J210
- K100, K200, K210, K220, K400, K410, K420
- Convex SPP1200/{CD,XA}
- Convex SPP1600/{CD,XA}
Time of introduction
Early 1995
Overview
The PA-7200 is leveraged from the original PA-7100 design, big parts of the core were just shrunk for the new 0.55 micron CMOS14A process. The FPU was taken over completely unchanged, retaining the same latencies for addition and multiplication even at a higher clock rate. It also acquired the cache design, e.g. had (for the time) big off-chip caches clocked at full CPU speed (140MHz). This chip was aimed at high-performance general-purpose applications but also on specialized applications that used large working sets which could take advantage of the high-bandwidth bus interface.
Details
- PA-RISC version 1.1d 32-bit
- Three functional units: 2 integer ALUs, 1 Floating Point
- 2-way superscalar
- FPU, MMU, cache controller integrated on die
- Five-stage pipeline
- 2KB on-chip
assist
L1 cache, fully associative, holds 64 32-Byte cache lines - Off-chip L1 caches up to 1MB I and 2MB D realized in asynchronous SRAMs with one cycle latency
- (the 2KB on-chip assist cache is not really considered a true cache, thus the off-chip cache is the system’s de-facto L1 cache.)
- Caches are 64-bit per access, direct mapped, parity protected and cycled at CPU speed
- Caches are virtually indexed and physically tagged to minimize latency
- 120-entry fully associative TLB
- 16-entry BTLB
- Hardware TLB miss support
- Six predecode bits
- Support for uncached memory pages
- Bi-endian support
- System speed interface speed programmable to 1.0, 0.75 and .67 processor speed
- Runway system/memory bus, 64-bit wide, 120MHz, 960MB/s max. bandwidth
- Glueless interface to the system bus for up to four-way SMP (four CPUs on same Runway processor bus)
- Can have up to six bus-transactions in progress at once
- Up to 140MHz frequency with 4.4V core and 3.3V I/O voltage
- 14.0×15.0 mm2 die, 1,300,000 FETs, 0.55 micron, 3-layer metal CMOS (CMOS14A process) packaged in a 540-pin ceramic PGA package
- Power dissipation of 29W at 140MHz
References
- Design of the HP PA 7200 CPU (PDF, 170KB)
- Overview on the PA-7200 internals and memory/cache architecture. Kenneth K. Chan et al (February 1996: Hewlett-Packard Journal).
- A Different Kind of RISC
- PA-7200 general overview. Dick Pountain (August 1994: BYTE Journal).
PA-7300LC (PCX-L2) (Velociraptor)
Used in
- 744/{132L,165L}
- 745/132L, 745/165L
- 748/132L, 748/165L
- A180, A180C
- B132L, B132L+, B160L, B180L+
- C132L, C160L
- D220, D230, D320, D330
- RDI PrecisionBook 132, 160, 180
- Hitachi 3050RX 255, 355E, 365E
Time of introduction
Mid 1996
Overview
The PA-7300LC is the direct descendant of the PA-7100LC and likewise designed for low-cost systems. It is still a PA-RISC 1.1 32-bit processor, in contrast to the new PA-RISC 2.0 64-bit PA-8000 introduced in the same timeframe. While the PA-7300LC is rather close to the original PA-7100LC design it has several significantenhancements:
- Large on-chip L1 caches (in contrast to the small
assist
caches of the 7100LC and 7200) - Integrated L2 controller onto the MIOC
- Improved bus interface (faster GSC)
The then current process technologies made it possible to include a large L1 cache on the CPU die, breaking a long-standing HP tradition of (large) off-chip L1 caches. The PA-7300LC was the last PA-RISC version 1.1 CPU, all later workstations and servers used 64-bit PA-RISC 2.0 processors.
- PA-RISC version 1.1e 32-bit
- Three functional units: 2 integer ALUs, 1 Floating Point unit1
- 2-way superscalar
- MAX-1 multimedia extensions (subword arithmetic) for multimedia applications (not explicitly mentioned on the PA7300LC, but its documentation states support for MAX-1 instructions)
- 64KB/64KB I/D on-chip L1 caches, each two-way set associative, virtually indexed
- Cache line size of 32 Byte
- Caches have a 64-bit datapath to the execution units, 256-bit datapath to main memory
- Optional unified I/D L2 off-chip cache, up to 8192KB
- No hashing for both I and D caches
- L2 cache is write-through, direct mapped, physically indexed and physically tagged
- Instruction prefetch buffer moved from memory controller to L1 instruction cache, thus allowing prefetch hits without penalty
- On-chip MIOC memory controller
- 96-entry unified I/D TLB
- 8-entry BTLB
- 4-entry ILAB
- GSC system bus interface (implements GSC+ features), maximum clock frequency of 40MHz — actual system implement from 33MHz (132MB/s), 36MHz (140MB/s) and up to 40MHz (160MB/s)
- Either 64-bit or 128-bit datapath from execution units to the memory
- Up to 180MHz frequency with 3.3V core voltage
- 15.3×17.0 mm2 die, 9,200,000 FETs, 0.5 micron, 4-layer metal CMOS (CMOS14C process) packaged in a 464-pin ceramic PGA package
- Only one of the two integer ALUs is able to handle loads, stores and shifts, these operations can only be paired with simple math operations, like integer addition o multiplication. Both units can handle branch operations.
References
- PA7300LC ERS (External Reference Specification) (PDF, 716KB)
- Detailed official description of the PA-7300LC processor and its microarchitecture. Hewlett-Packard Company (1996).
-
The PA-7300LC: the first
System on a Chip
(archive.org mirror) - Presentation prepared for Microprocessor Forum 1995 summarizing the PA-7300LC. Tom Meyer (1996).
- The PA 7300LC Microprocessor: A Highly Integrated System on a Chip (PDF, 50KB).
- Shorter summary of the PA-7300LC’s design objectives and goals. Terry W. Blanchard and Paul G. Tobin (June 1997: Hewlett-Packard Journal).
PA-8000 (PCX-U) (Onyx)
Used in
- C160, C180
- D270, D280, D370, D380
- J280, J282
- K250, K260, K450, K460
- R380
- T600
- Convex SPP2000 (S-Class)
- Stratus Continuum 628, 1228
Time of introduction
January 1996
Overview
The PA-8000 is the first chip to implement the 64-bit PA-RISC 2.0 architecture which includes many extensions to support 64-bit computing. This includes that all integer registers and functional units (ALU, shift/merge) have been widened to 64-bit, i.e. native 64-bit integer arithmetic. The flat virtual address space is 64-bit wide although most PA-RISC version 2.0 CPUs only support a physical address space of 40-bit. Other extensions include fast TLB insert instructions, memory prefetch instructions, support for variable sized pages, branch prediction hinting and FPMAC (Floating Point Multiply Accumulate) units. The instruction decode logic is not integrated with the functional units’ pipeline logic. This architecture allows the chip to partially decode instructions well in advance of the instruction’s actual execution by the functional unit(s).
A key feature of the PA-8000 is the IRB (Instruction Reorder Buffer). Due to restrictions on compiler scheduling, the design team decided that the CPU should perform its own instruction scheduling. The IRB can store up to 28 computation and 28 load/store instructions; it tracks interdepencies between these instructions and allows execution as soon as the instructions are ready. Branch prediction outcomes are also tracked and due to re-scheduling the CPU can execute instructions past cache misses. The IRB is the key part in the OOO execution capabilty of the chip.
In short, the PA-8000 is a decoupled architecture with four-instruction dispatch and aggressive out-of-order (OoO) execution. It has additionally dual floating-point and dual load/store units, a large OOO dispatch window and, following a long HP tradition, no on-chip caches. The (large) primary caches have been kept off-chip to increase the amount of data that can be accessed in a single cycle. Although the latency of the caches is roughly two cycles this can be hidden with complete pipelining resulting practically in one access per cycle. Nothing in the design of this chip was leveraged from previous chip designs.
Details
- PA-RISC version 2.0 64-bit
- Ten functional units: 2 integer ALUs, 2 shift/merge units, 2 complete load/store pipelines, 2 Floating Point multiply/accumulate units, 2 Floating Point divide/square root units
- 4-way superscalar
- Two address adders
- 96-entry fully-associative dual-ported TLB
- TLB miss penalty of 61 cycles
- 32-entry BTAC (Branch Target Address Cache)
- 256-entry BHT (Branch History Table)
- Dynamic and static branch prediction modes
- Off-chip L1 caches up to 1MB I and 1MB D, realized in synchronous 6.7ns (150MHz) late-write 1Mb SRAMs, one cycle latency
- Caches are direct-mapped and dual-ported
- 56-entry instruction queue/reorder buffer (IRB)
- MAX-2 multimedia extensions (subword arithmetic) for multimedia applications, e.g. MPEG decoding
- Each instruction includes five predecode bits
- Bi-endian support
- Runway system/memory bus, 120MHz, 64-bit wide, featuring split transactions and glueless multiprocessing. Max. throughput of 960MB/s
- Up to 180MHz frequency with 3.3V core voltage
- 17.7×19.6 mm2 die, 4,500,000 FETs, 0.5 micron, 5-layer metal CMOS packaged in a 1,085-pin flip-chip LGA package
References
- Advanced Performance features of the 64-bit PA-8000 (archive.org mirror)
- Detailed description of the PA-8000 innards, presented at CompCon 95. Doug Hunt (1995: IEEE CS Press). [Article reprint for vanished cpus.hp.com]
- PA-8000 Combines Complexity and Speed (archive.org mirror)
- More general introduction to the PA-8000. Linley Gwennap (1994: Microprocessor Report, Volume 8 Number 15). [Article reprint for vanished cpus.hp.com]
- Four-Way Superscalar PA-RISC Processors (PDF, 190KB)
- Overview on PA-8000 and its sucessor PA-8200 with an eye on their execution capabilities. Anne P. Scott et al (August 1997: Hewlett-Packard Journal).
PA-8200 (PCX-U+) (Vulcan)
Used in
- C200, C240
- D390
- J2240
- K370, K380, K570, K580
- R390
- V2200, V2250
Time of introduction
May 1997
Overview
Shortly after the introduction of the PA-8000 the design team noted several aspects of this chip for improvement in the successor:
- Branch prediction
- TLB miss rates
- Cache sizes
The new chip should offer improved performace, compatibility with existing applications and short time to market.
The whole design should be heavily leveraged from the existing PA-8000 design foundation.
The availability of new 4Mb SRAMs with faster access times allowed an increased CPU clock-speed and a bigger cache
size. Furthermore the team analyzed that the PA-8200 performance could be enhanced significantly if wasted cycles
while waiting for instructions and data were reduced. Due to this, it was concluded that increasing the BHT, TLB and caches
are high benefit, low risk
improvements.
Details
- PA-RISC version 2.0 64-bit
- Ten functional units: 2 integer ALUs, 2 shift/merge units, 2 complete load/store pipelines, 2 Floating Point multiply/accumulate units, 2 Floating Point divide/square root units
- 4-way superscalar
- Two address adders
- 120-entry fully-associative dual-ported TLB
- 42-entry BTAC (Branch Target Address Cache)
- 1024-entry BHT (Branch History Table)
- Dynamic and static branch prediction modes
- Off-chip L1 caches up to 2MB I and 2MB D, realized in synchronous 5ns (200MHz) late-write 4Mb SRAMs, one cycle latency
- Caches are direct-mapped and dual-ported
- 56-entry instruction queue/reorder buffer (IRB)
- Each instruction includes five predecode bits
- MAX-2 multimedia extensions (subword arithmetic) for multimedia applications, e.g. MPEG decoding
- Bi-endian support
- Runway system/memory bus, 120MHz, 64-bit wide, featuring split transactions and glueless multiprocessing. Max. throughput of 960MB/s
- Up to 300MHz frequency with 3.3V core voltage
- 17.7×19.6 mm2 die, 4,500,000 FETs, 0.5 micron, 5-layer metal CMOS packaged in a 1,085-pin flip-chip LGA package
References
- Four-Way Superscalar PA-RISC Processors (PDF, 190KB)
- Overview on PA-8000 and its predecessor PA-8200 with an eye on their execution capabilities. Anne P. Scott et al (August 1997: Hewlett-Packard Journal).
- HP Pumps Up PA-8x00 Family (archive.org mirror)
- Description and results of the improvements made in PA-8200 and PA-8500. Linley Gwennap (October 1994: Microprocessor Report, Volume 10 Number 14). [Article reprint for vanished cpu.hp.com]
PA-8500 (PCX-W) (Vulcan)
Used in
- A400-44 (rp2400), A500-44 (rp2450)
- B1000, B2000
- C360, C3000
- J5000, J7000
- L1000-36, L1000-44 (rp5400), L2000-36, L2000-44 (rp5450)
- N4000-36, N4000-44 (rp7400)
- V2500
- Stratus Continuum 419, 429, 616S, 616, 619, 629, 1219, 1229
Time of introduction
September 1998
Overview
The PA-8500 is a direct evolution of the PA-8000 and PA-8200 processors; the processing core was taken over incorporating only minor changes. However, for the first time in a PA-RISC CPU, a large L1 cache was integrated directly onto the CPU die, breaking with a long-standing HP tradition of keeping the L1 caches off-chip (although the two years older PA-7300LC processor already included an albeit smaller L1 cache on-chip). Some of the other improvements include bigger TLB and BHT. The PA-8500 is a full 64-bit chip and as such supports a flat 64-bit virtual address space, although only 40 physical address bits are used by the chip, corresponding to one Terabyte of directly addressable memory. Backward compatibility to older 32-bit PA-RISC CPUs is provided.
The big challenge in developing the PA-8500 was its huge on-chip cache. It had to fit onto the allocated die area and be able to keep up with the IRB. A similar cache design to that of its predecessors was used, although the RAM cells for the cache now sat directly on the die. The data cache is composed of 0.5MB banks, implemented with four 0.125MB arrays providing error correction. The data is organized in such way that either a full cache line can be addressed at once or four ways of associativity together. The instruction cache is implemented as one bank of 0.5MB four-way set associative pipelined cache, providing 128 bits of instruction plus pre-decode bits per cycle.
As his predecessors the PA-8500 is able to execute instructions speculatively; the processor guesses the path of the ongoing instructions and executes them in this path. If the guess is found to be incorrect, the speculatively executed instructions are discarded. Speculative execution is aided by a branch prediction mechanism based on the branch history table (BHT).
Details
- PA-RISC version 2.0 64-bit
- Ten functional units: 2 integer ALUs, 2 shift/merge units, 2 complete load/store pipelines, 2 Floating Point multiply/accumulate units, 2 Floating Point divide/square root units
- 4-way superscalar
- Two address adders
- 160-entry fully-associative dual-ported TLB
- 32-entry BTAC (branch target address cache)
- 2048-entry BHT (branch history table)
- Dynamic and static branch prediction modes
- On-chip L1 caches 0.5MB I and 1MB D, each 4-way set associatve
- 32 or 64 Byte cache line size
- Supports up to 1 TB of physically addressable memory (40-bit physical addresses)
- 56-entry instruction queue/reorder buffer (IRB)
- MAX-2 multimedia extensions (subword arithmetic) for multimedia applications, e.g. MPEG decoding
- Bi-endian support
- Runway system/memory bus, 125MHz, 64-bit, DDR (double data rate), about 2.0GB/s peak bandwidth
- Up to 440MHz frequency with 2.0V core voltage
- 21.3×22.0 mm2 die, 140,000,000 FETs, 0.25 micron, 5-layer metal CMOS packaged in a 544-pin LGA package
References
- HP Pumps Up PA-8x00 Family (archive.org mirror)
- Description and results of the improvements made in PA-8200 and PA-8500. Linley Gwennap (October 1994: Microprocessor Report, Volume 10 Number 14). [Article reprint for vanished cpu.hp.com]
- A 500 MHz 1.5 MByte Cache with On-Chip CPU (PDF, 141KB)
- Slides of a presentation on the PA-8500 CPU. Jonathan Lachman and J. Michael Hill (1997: ISSCC).
- PA-8500: The Continuing Evolution of the PA-8000 Family (archive.org mirror)
- Description of PA-8500 development and technical details. Gregg Lesartre and Doug Hunt (1997: Proceedings of CompCon, IEEE CS Press). [Article reprint for vanished cpu.hp.com]
PA-8600 (PCX-W+) (Landshark)
Used in
- A400-5X (rp2400), A500-5X (rp2450)
- B2000 (some), B2600
- C3600
- J5600, J6000, J7600
- L1000-5X (rp5400), L2000-5X (rp5450)
- L1500-5X (rp5430), L3000-5X (rp5470)
- N4000-5X (rp7400)
- V2600
- Superdome
- Stratus Continuum 439, 449, 651-2, 652-2, 1251-2, 1252-2
Time of introduction
January 2000
Overview
The PA-8600 basically was just a PA-8500 with minor modifications to make it fit onto a new manufacturing process in order to achieve higher clock speeds. One of the only real changes applied to the original design was a quasi LRU replacement policy for the instruction cache. Moreover, the interface to the Runway bus apparently was slightly modified, and the order of the bus transaction reworked.
Details
- PA-RISC version 2.0 64-bit
- Ten functional units: 2 integer ALUs, 2 shift/merge units, 2 complete load/store pipelines, 2 Floating Point multiply/accumulate units, 2 Floating Point divide/square root units
- 4-way superscalar
- Two address adders
- 160-entry fully-associative dual-ported TLB
- 32-entry BTAC (branch target address cache)
- 2048-entry BHT (branch history table)
- Dynamic and static branch prediction modes
- On-chip L1 caches 0.5MB I and 1MB D, each 4-way set associatve
- 32 or 64 Byte cache line size
- Supports up to 1 TB of physically addressable memory (40-bit physical addresses)
- 56-entry instruction queue/reorder buffer (IRB)
- MAX-2 multimedia extensions (subword arithmetic) for multimedia applications, e.g. MPEG decoding
- Quasi LRU replacement policy for the instruction cache
- Bi-endian support
- Runway system/memory bus, 125MHz, 64-bit, DDR (double data rate), about 2.0GB/s peak bandwidth
- Up to about 550MHz frequency with 2.0V core voltage
- 21.3×22.0 mm2 die, 140,000,000 FETs, 0.25 micron, 5-layer metal CMOS packaged in a 544-pin LGA package
PA-8700 (PCX-W2) (Piranha)
Used in
- A400-6X (rp2430), A500-6X, A500-7X (rp2470)
- C3650, C3700, C3750
- J6700
- L1500-6X, L1500-7X, L1500-8X (rp5430), L3000-6X, L3000-7X, L3000-8X (rp5470)
- N4000-6X, N4000-7X (rp7400)
- N4000-6X, N4000-7X, N4000-8X (rp7405, rp7410)
- Superdome
Time of introduction
August 2001
Overview
The PA-8700 is basically an enhanced and revamped PA-8500 core with some slight modifications. As all PA-8x00 CPUs before, it logically still is very close to the original PA-8000 core from 1997. All subsequent new CPUs from HP were based on this design and added several features and some slight modifications to it while retaining the basic PA-RISC version 2.0 core. The PA-8700 enhanced the on-chip L1 caches and the TLB significantly while switching to a new CMOS-process helped boosting the clock-frequency. The chip was at its time one of the largest available commercial CPUs and one of the first to be manufactured in a SOI (Silicon On Insulator) process. The PA-8700 was manufactured by IBM, in contrast to the PA-8500 and PA-8600, which were fabbed by Intel, after HP gave up its processor fabs long time ago.
Details
- PA-RISC version 2.0 64-bit
- Ten functional units: 2 integer ALUs, 2 shift/merge units, 2 complete load/store pipelines, 2 Floating Point multiply/accumulate units, 2 Floating Point divide/square root units
- 4-way superscalar
- Two address adders
- 240-entry fully-associative dual-ported TLB
- 32-entry BTAC (branch target address cache)
- 2048-entry BHT (branch history table)
- Dynamic and static branch prediction modes
- 0.75MB I and 1.5MB D on-chip L1 caches, each 4-way set associatve, implemented in independent 0.75MB banks.
- 32 or 64 Byte cache line size
- Data cache prefetching
- Quasi LRU replacement policy for both the instruction and data cache.
- Supports up to 16 TB of physically addressable memory (44-bit physical addresses)
- 56-entry instruction queue/reorder buffer (IRB)
- MAX-2 multimedia extensions (subword arithmetic) for multimedia applications, e.g. MPEG decoding
- Bi-endian support
- Support for hardware lock-stepping, i.e. operating multiple chips in parallel to detect faults
- Runway system/memory bus, 125MHz, 64-bit, DDR (double data rate), about 2.0GB/s peak bandwidth
- Up to 750MHz (875MHz on the PA-8700+) frequency with 1.5V core voltage
- 16.0×19.0 mm2 die, 186,000,000 FETs, 0.18 micron, 7-layer Silicon-on-Insulator CMOS packaged in a 544-pin LGA package
References
- A 900MHz 2.25MByte Cache with On Chip CPU (PDF, 119KB)
- Slides of a presentation on the PA-8700 CPU, centered on the CPUs cache subsystem. J. Michael Hill and Jonathan Lachman (2000: ISSCC).
PA-8800 (PCX-?) (Mako)
Used in
- C8000
- L1500-9X (rp5430), L2000-9X (rp5450)
- N4000-9X (rp7405, rp7410)
- rp3410, rp3440
- rp4410, rp4440
- rp7420
- rp8400, rp8410, rp8420
- Superdome
Time of introduction
2004
Overview
The PA-8800 integrates two PA-8700 cores onto a single die, adds a very large off-die L2 cache (though with a very significant bandwidth) onto the CPU module, enhances the clock frequency a bit further and uses the Itanium/McKinley processor/system bus. Mako was supposed to breathe fresh life in the PA-RISC line, though it had strong internal competition from the Itanium line (based on much HP development; with Intel) and as such was not marketed much. Most systems which could handle a PA-8800 use the HP zx1 chipset and could be hardware-upgraded to use Itanium 2/IA64 processors.
Details
- PA-RISC version 2.0 64-bit
- Twenty functional units: four integer ALUs, four shift/merge units, four complete load/store pipelines, four Floating Point multiply/accumulate units, four Floating Point divide/square root units
- 4-way superscalar
- Two address adders
- 240-entry fully-associative dual-ported TLB per core
- 32-entry BTAC (branch target address cache) per core
- 2048-entry BHT (branch history table) per core
- Dynamic and static branch prediction modes
- 0.75MB I and 0.75MB D on-chip L1 caches per core
- No data passing between the cores’ L1 caches
- 32MB off-chip L2 cache, four-way associative, physically indexed and tagged
- L2 cache is shared between both CPU cores
- L2 cache controller is on-die
- L2 implemented in DDR-ESRAM, four 8MB chips, 300MHz clock, each 2.7GB/s bandwidth
- Total >10GB/s L2 cache bandwidth
- 1MB SRAM tags for L2 cache
- ECC for L2 data and tags
- Itanium 2/McKinley processor bus, 200MHz clock (
double-pumped
), 128-bit datapath, 6.4GB/s bandwidth, data ECC-protected, signals parity - Up to 1 GHz frequency with 1.5V core voltage
- 23.6×15.5 mm2 die, 300,000,000 FETs, 0.13 micron, 8-layer Silicon-on-Insulator CMOS (fabbed by IBM)
References
- HP’s Mako Processor (PDF, 1.4MB)
- Slides of a presentation on the PA-8800 CPU. David J. C. Johnson (2001: Microprocessor Forum).
PA-8900
Used in
- rp3410, rp3440
- rp4410, rp4440
- C8000
- L1500-9X (rp5430), L2000-9X (rp5450) (probably)
- N4000-9X (rp7405, rp7410) (probably)
- Superdome
Time of introduction
2005
Overview
The PA-8900 is a slightly tweaked PA-8800 featuring a doubled L2 cache and a higher clock frequency. It is probably the last processor of the PA-RISC family, no more new PA-8x00s will be released. Future systems will be based around Itanium-family chips, although since HP dropped its line of Itanium-based workstations it seems the PA-8900-powered C8000 workstation will be one of the last HP-UX workstations (together with the similar Itanium-based zx2000).
Information on the PA-8900 is generally scarce, it seems there was not much interest releasing many details on the inner workings and architecture, no whitepapers or more detailed articles could be found.
Details
- PA-RISC version 2.0 64-bit
- Twenty functional units: four integer ALUs, four shift/merge units, four complete load/store pipelines, four Floating Point multiply/accumulate units, four Floating Point divide/square root units
- Two address adders
- 240-entry fully-associative dual-ported TLB per core
- 32-entry BTAC (branch target address cache) per core
- 2048-entry BHT (branch history table) per core
- Dynamic and static branch prediction modes
- 4-way superscalar
- 0.75MB I and 0.75MB D on-chip L1 caches per core
- 64MB off-chip L2 cache, four-way associative, physically indexed and tagged
- ECC for L2 data and tags
- Itanium 2/McKinley processor bus, 200MHz clock (
double-pumped
), 128-bit datapath, 6.4GB/s bandwidth, data ECC-protected, signals parity - 44 bit physical addressing
- 64 bit virtual addressing
- Four GB maximum page size
- Up to 1.1 GHz frequency
- 23.6×15.5 mm2 die, 317,000,000 FETs, 0.13 micron, 8-layer Silicon-on-Insulator CMOS (apparently fabbed by IBM)
References
- Overview of the HP 9000 rp3410-2, rp3440-4, rp4410-4, and rp4440-8 Servers (PDF, 700KB)
- Technical Whitepaper from HP on new servers and the PA-8900 processor. Hewlett-Packard (2005).
Hitachi PA/50
Used in
Time of introduction
About 1993
Overview
The PA/50 was a PA-RISC version 1.1 compatible processor designed and manufactured by Hitachi. Two designs were developed: M and L, the latter being the lower-cost product. They were meant as personal workstation or high-end embedded controllers. Hitachi integrated a set of previously features not existing (at that time) in any PA-RISC CPU,e.g. on-chip caches, data-prefetching, a power-saving mode and SDRAM support.
Details
- PA-RISC version 1.1 32-bit
- Built-in, pipelined FPU
- L1 I: 8KB, 2-way set-associative, 32-byte blocks
- L1 D: 4KB, 2-way set-associative, 32-byte blocks, copy-back
- L1 caches are on-chip
- Uncacheable memory (per page)
- TLB: I/D 32/64-entry, 2-way set, 4K-page, each +2 additional block entries
- BTLB (256KB-32MB)
- Seven 32-bit shadow registers for fast interrupts
- Data-prefetching
- Non-blocking cache
- Power-saving mode, reducing frequency to 1/8
- Support for SDRAM
- PA/50L: Up to 33MHz frequency with 3.3V core voltage
- PA/50M: Up to 60MHz frequency with 5.0V core voltage
- 11.5×12.0 mm2 die, 1,280,000 FETs, 0.6 micron, 3-layer metal CMOS packaged in a 160-pin plastic QFP package
References
- PROgress (PA-RISC) Newsletter - comp.sys.hp
- Candace Doyle (October 1993: Precision Risc Organization. Accessed December 2007)
Hitachi HARP-1
Used in
- Hitachi SR2201 supercomputer (HARP-1E)
- Probably others
Time of introduction
June 1994
Overview
The HARP-1 was a PA-RISC version 1.1 compatible CPU from Hitachi; apparently a larger, faster version than the above PA/50 (sadly not much information available).
Apparently the HARP-1E model of this processor includes (pseudo
) vector processing modifications/add-ons
and was used in Hitachi vector/supercomputers. It seems the L1 cache was also increased to 16 KB/16 KB of instruction and
data.
Details
- PA-RISC version 1.1 32-bit
- Three functional units: apparently two integer ALUs and one floating point unit
- Built-in, pipelined FPU
- 2-way superscalar
- L1 I cache: 8KB, 1-way set-associative, 32-byte blocks
- L1 D cache: 16KB, 2-way set-associative, 32-byte blocks, copy-back
- L1 caches are on-chip
- L2 I/D 512/512KB, off-chip
- TLB: I/D 128/128-entry, 1-way set
- (Some say a second level TLB was included)
- Up to 150MHz frequency with 3.3V core voltage
- 16.2×16.5 mm2 die, 2,800,000 FETs, 0.5 micron 3-layer aluminium + 1-layer tungsten BiCMOS
References
- Chronology of Workstation Computers (1993) Ken Polsson (November 2007. Accessed November 2007)
- PROgress (PA-RISC) Newsletter - comp.sys.hp Candace Doyle (October 1993: Precision Risc Organization. Accessed December 2007)
- Basic Concept of Cooperative Timing-driven Design Automation Technology for High-speed RISC Processor HARP-1 (PDF) Hidekazu Terai et al (October 1999: Hitachi Ltd. Accessed January 2008)
Other Processors
Winbond W89K
Time of introduction: Spring 1994
Winbond introduced an embedded PA-RISC controller chip in 1994 — the Winbond W89K.
The chip was pin-compatible with the then-popular Intel 80486DX and thus could be used
as a drop-in replacement in mid-1990s PCs together with Winbond BIOS replacement chips.
Rationale at that time was to allow hardware developers utilize existing 486DX mainboards
and components for a shorter product development lifecycle.
The W89K was a level 0 PA-RISC 1.1 implementation: a 32-bit PA-RISC processor
however without support for virtual addressing.
- PA-RISC version 1.1 (third edition) 32-bit
- Level 0 implementation (no virtual addressing): no MMU
- Five-stage pipeline
- One functional unit: one 32-bit integer ALU
- 2KB/2KB I/D on-chip L1 caches
- 80486 (Intel) bus interface
- 33MHz and 66MHz clock speeds were available, with the latter apparently having been achieved with a clock-doubling also used in the Intel’s 80486DX/2 (the chips uses an internal clock-doubler on the external 33MHz bus)
- On-chip JTAG support
- 14.3×14.3 mm2 die, 1,100,000 FETs, 0.8 micron, 3-layer metal CMOS
References
- PROgress (PA-RISC) Newsletter - comp.sys.hp
- Candace Doyle (October 1993: Precision Risc Organization. Accessed December 2007)
- Winbond, Varian sign deal for thin-film IC process
- Terho Uimonen (April 1994: Electronic News. Accessed January 2008 at findarticles.com)
- PA-RISC in a PC box (was: Re: HP's vision of a low-end 3000) - comp.sys.hp.mpe
- Stan Sieler (Februar 1996. Accessed December 2007)
Winbond W90210 | W90215
Time of introduction: Fall 1997
Shortly after the W89K embedded controllers Winbond apparently also introduced a more
sophisticated PA-RISC processor in the W90K line of embedded controllers.
The W90210F still was a 32-bit PA-RISC 1.1 design, but integrated many external I/O
components on the chip — DRAM and DMA controllers, a PCI bridge and various
I/O ports.
As its predecessor the W90210F was also a level 0 PA-RISC 1.1 implementation: no virtual addressing.
It apparently was used in Internet appliances
: set-top boxes, TV sets, DVD players, PDAs,
VoIP devices, and also for industrial automation.
The W90215 is identical to the W90210 but did not include license rights for the
embedded operating system (and was thus cheaper).
- PA-RISC version 1.1 (third edition) 32-bit
- Level 0 implementation (no virtual addressing): no MMU
- Five-stage pipeline
- One functional unit: one 32-bit integer ALU
- L1 I cache: 4KB, direct mapped, 32-byte blocks, 256 entries
- L1 D cache: 8KB, 2-way set-associative, 32-byte blocks, 2×64 entries, write-back
- MAX-1 multimedia extensions (subword arithmetic) for multimedia applications, e.g. MPEG decoding
- 80486 (Intel) bus interface
- DRAM controller
- ROM/FLASH interface
- DMA controller (2-channel 8-bit)
- PCI bridge
- Two serial ports
- Parallel port
- 33MHz and 66MHz clock speeds (?)
- 208-pin PQF package
References
- W90210F PA-RISC Embedded Controller (.pdf)
- Winbond Electronics Corp. (October 1997. Accessed January 2008)
Winbond W90220 and W90221
Time of introduction: Spring 1999
The W90220F is, as its predecessor W90210 a 32-bit PA-RISC 1.1 design without MMU (level 0), but integrated many external I/O components on the chip — DRAM and DMA controllers, PCI bridge, IDE channels, I/O ports and, on the W90221, even an integrated graphics/TV chip. It had the same target systems, set-top boxes and internet appliances. The sucessor W90221 is apparently very similar with higher clock speed, integrated (S)VGA and TV controller
- PA-RISC version 1.1 (third edition) 32-bit
- Level 0 implementation (no virtual addressing): no MMU
- Six-stage pipeline
- Two functional units: one 32-bit integer ALU and one 32-bit multiply-accumulate (MAC) module (for DSP purposes, can be used as two 16-bit modules too)
- L1 I cache: 4KB, direct mapped, 32-byte blocks, 256 entries
- L1 D cache: 4KB, 4-way set-associative, write-back or write-through
- MAX-1 multimedia extensions (subword arithmetic) for multimedia applications, e.g. MPEG decoding
- 80486 (Intel) bus interface
- Hardware dynamic branch prediction
- 256-entry branch-target-buffer (i. e. BTAC)
- Memory controller (supports DRAM, EDO-DRAM and SRAM; W90221 additionally SDRAM)
- ROM/FLASH interface
- DMA controller (2-channel 8-bit)
- IDE I/O controller (four 16-bit channels)
- W90221: VGA and TV controller (W9971)
- PCI bridge
- Two serial ports
- Parallel port
- Serial ICE port
- Up to 150MHz clock speed at 3.3V/5V I/O and 3.3V core
- W90221: 133MHz clock speed with apparently 3.3V at both I/O and core
- 0.35 micron single-poly-triple-metal CMOS
- 208-pin PQF package
References
- W90220F PA-RISC Embedded Controller (.pdf)
- Winbond Electronics Corp. (March 1999. Accessed January 2008)
Oki OP32
Oki Semiconductors OP32/50N was introduced in 1994 as a embedded controller, based on a 32-bit PA-RISC design with integrated DRAM and DMA controllers. The chip was targeted at laser printers, Fax machines, X-Terminals and the Telecom and Automotive markets.
- PA-RISC version 1.1 32-bit
- 33MHz frequency
- 14.3×14.3 mm2 die, 1,100,000 FETs, 0.8 micron, 3-layer metal CMOS
References
- PROgress (PA-RISC) Newsletter - comp.sys.hp
- Candace Doyle (October 1993: Precision Risc Organization. Accessed December 2007)