Original Technical Reference Manual by Martin Brennan, Tim Dunn and John Mathieson - Revision 8 (28 February, 2001)
This is a direct re-creation of the jag_v8.pdf found available online. I'm providing this as a better way to view the documentation with direct links to different sections.
Introduction This document is the Jaguar Technical Reference Manual - it is a definitive reference work for the programmer's view of the Jaguar ASICs. It is neither a hardware reference work nor a guide to a particular implementation of the Jaguar design.
This document covers the Tom and Jerry chip set. Users of the earlier prototype Jaguar silicon should consult the Appendix on the differences and enhancements. This document does not describe the prototype silicon, Revision 4 is the definitive work.
Jaguar is a custom chip set primarily intended to be the heart of a very high-performance games / leisure computer. It may also be used as a graphics accelerator in more complex systems, and applied to work-station and business uses.
As well as a general purpose CPU, Jaguar contains four processing units. These are:
Jaguar provides these blocks with a 64-bit data path to external memory devices, and is capable of a very high data transfer rate into external dynamic RAM.
Jaguar contains two custom chips, code-named Tom and Jerry.
For graphics, Tom contains the Object Processor, the Blitter and the Graphics Processor. For sound, Jerry holds the Digital Sound Processor. In addition to these, there is an external CPU, currently a 68000. When animating graphics there are therefore four processing elements, all of which have got specific roles to play.
The CPU is used as a manager. It deals with communications with the outside world, and manages the system for the other processors. It is the highest level in the control flow of a Jaguar program, and has complete control of the system.
The Object Processor is at the other end of the chain for generating graphics. It reads an object list, and on the basis of the commands there assembles each display line of the video picture. Objects are usually areas of pixels, and these may overlap and may be easily moved from frame to frame. The order in which they are processed in the object list determines how they overlap. Objects can also modify what is already in the display line being assembled, and can scale bit-maps. They may contain transparent pixels.
The Object Processor performs all the functions of a traditional sprite engine, while also offering all the flexibility of a pixel-map based system. It is capable of a range of animation effects, and is a powerful graphics tool in its own right.
The Graphics Processor and Blitter provide a tightly coupled pair of processors for performing a much wider range of animation effects. A design goal of this system was to provide a fast throughput when rendering 3D polygons. The Graphics Processor therefore has a fast instruction throughput, and a powerful ALU with a parallel multiplier, a barrel-shifter, and a divide unit, in addition to the normal arithmetic functions.
The Graphics Processor has four kilobytes of fast internal RAM, which is used for local program and data space. This allows it to execute programs in parallel with the other processing units.
The Blitter is capable of performing a range of blitting operation 64 bits at a time, allowing fast block move and fill operations, and it can generate strips of pixels for Gouraud shaded Z-buffered polygons 64 bits at a time. It is also capable of rotating bit-maps, line-drawing, character-painting, and a range of other effects. The graphics processor and the Blitter will usually act together preparing bit-maps in memory, which are then displayed by the Object Processor.
The DSP has eight kilobytes of fast internal RAM, and is tightly coupled to audio DACs, and has its own timers with related interrupt controller.
The Jaguar video section has been designed to drive a PAL/NTSC TV. The display has a horizontal resolution of up to 720 pixels and a vertical resolution of about 220 lines non-interlaced or 440 lines interlaced. However by adopting a flexible approach to the design the chip can be used with a range of display standards through VGA to Workstation. This will allow the chip to become the backbone of many (possibly unforeseen) products.
Two colour resolutions are supported, 24-bit RGB and our own standard 16-bit CRY (Cyan, Red, Intensity). The 24-bit mode is useful for applications requiring true colour. The 16-bit mode is designed for animation. It consumes less memory, fits better into 64 bit memory, is simpler to shade and is almost indistinguishable from 24-bit mode.
Jaguar decouples the pixel frequency from the system clock by using a line buffer. This means that the system clock does not have to be related to the colour carrier frequency and may be unaffected by gen-locking. There are actually two line buffers one is displayed while the other is prepared by the Object Processor. Each line buffer is a 360 x 32-bit RAM which is cycled at 40 MHz. The line buffer contains physical pixels these may be either 16-bit CRY pixels or 24-bit RGB pixels. The line buffers may be swapped over at the start and in the middle of display lines.
The 16-bit CRY pixels at the output of the line buffer are converted to 24-bit RGB pixels using a combination of look-up tables and small multipliers.
The video timing is completely programmable in units of the pixel clock. The pixel clock can be up to 40 MHz although there is provision for use with an external multiplexer. For TV applications the pixel clock will be in the range 12 to 15 MHz. The pixel clock will be synthesised from the chroma carrier or from an external video source using a device like the MC1378. Eight bits per pixel at up to 160 MHz can be supported by using an external multiplexer, colour-look-up and DAC.
Jaguar uses an Object Processor, this combines the advantages of frame store and sprite based architectures. Jaguar's Object Processor is simple yet sophisticated. It has scaled and unscaled bit-map objects, branch objects for controlling its control flow, and interrupt objects. It can interrupt the graphics processor to perform more complex operations on its behalf. The graphics processor will support perspective, rotation, branches, palette loads, etc.
The Object Processor can write into the line buffer at up to two pixels per clock cycle. The source data can be 1,2,4,8,16 or 24 bits per pixel. Except for 24 bits, objects of different colour resolutions can be mixed. The low resolution objects, one to eight bits, use a palette to obtain a 16-bit physical colour.
A sophistication in the Object Processor is that it can modify the existing contents of the line buffer with another image. This could be used to produce shadows, mist or smoke, coloured glass or say the effect of a room illuminated by flash lamp.
The Object Processor can also ignore data which is stored alongside pixel data. If, for instance, a Z buffer is needed then this can be situated next to the pixels. This helps because DRAM RAS pre-charges are needed less frequently.
Each object is described by an object header which is two phrases for an unscaled object and three phrases for a scaled object. When an image has been processed the modified header is written back to memory.
The Object Processor fetches one phrase (64 bits) of video data at a time. This phrase is expanded into pixels (and written into the line buffer) while the next phrase is fetched.
Image data consists of a whole number of phrases. The image data may need to be padded with transparent pixels (colour zero in 1,2,4,8 & 16-bit modes).
The Object Processor writes into the line buffer at one write per system clock tick. In 24-bits-per-pixel mode and for scaled objects one pixel is written per cycle. For unscaled objects with 16 or fewer bits-per-pixel two pixels are written per cycle. Most objects will therefore be expanded at twice the system clock rate.
If the read-modify-write flag is set in the object header the object data is added to the previous contents of the line buffer. In this case the data rate into the line buffer is halved.
This peak rate may be reduced if the memory bandwidth is not high enough. However if 64-bit wide DRAM is installed then these data rates will be sustained for all modes.
When accessing successive locations in 64-bit wide DRAM the memory cycle time is two clock ticks. These are page mode cycles. When the DRAM row address must change there is an overhead of between three and seven clock cycles (depending on DRAM speed). These RAS cycles will occur infrequently during object data fetches but will typically occur during the first data read after reading the object header (because the header and image data will not normally be near each other in memory). RAS cycles will also occur after refresh cycles or if a bus master with a higher priority steals some memory cycles in an area of memory with a different row address. Refresh cycles will normally be postponed until object processing has completed.
Jaguar's memory controller is very fast and flexible. It hides the memory width, speed and type from the other parts of the system.
Memory is grouped into banks that may be of different widths, speeds and types (although both ROM banks have the same width and speed). Each bank is enabled by a chip select. In the case of DRAM there are two chip selects RAS & CAS. Memory widths can be 8,16,32 or 64 bits wide but the memory controller makes it all look 64 bits wide.
There are eight write strobes - one for each eight bits. There are three output enables corresponding to d[0-15], d[16-31] and d[32-63]. Three memory types are supported: DRAM, SRAM and ROM.
ROM or EPROM is used for bootstrap and for cartridges. The ROM speed is programmable. The memory controller allows the system to view ROM as 64 bits wide. Pull-up and pull-down resistors determine the ROM width during reset.
DRAM is the principal memory type, as it is cheap and fast when used in fast page mode. In fast page mode the DRAM cycles at two ticks per transfer. The row time access is programmable. The column access time is not programmable and can only be adjusted by changing the system clock (a page mode cycle takes two clock ticks). The memory controller decides on a cycle by cycle basis whether the next cycle can be a fast page mode cycle. Data and algorithms should be organised to minimise the number of page changes.
There are four memory banks; two of ROM and two of DRAM
JAGUAR has been designed to work with any 16 or 32-bit microprocessor with (up to) 24 address lines. The interface is based on the 68000 but most microprocessors can be attached by using a PAL to synthesize those control signals which differ. All peripherals are memory mapped; there is no separate IO space.
The width of the microprocessor is determined during reset by a pull-up / pull-down resistor. Variations in the address of the cold boot code/vector is accommodated by making the bootstrap ROM appear everywhere until the memory configuration is set up by the microprocessor.
The microprocessor interface is generally asynchronous so the clock speeds of the microprocessor and co-processors may be independent.
Jerry uses the same microprocessor interface.
The CPU normally has the lowest bus priority but under interrupt its priority is increased.
The following list gives the priorities of all bus masters.
Highest priority
Lowest priority
Jaguar's memory map depends on how it is being used.
Following reset the following 2 Mbyte window, corresponding to the ROM0 area, is repeated throughout the 16 Mbyte address space until memory is configured by the microprocessor by writing to MEMCON1. (This allows the system to boot whether the microprocessor is a 680X0, an 80X86 or a Transputer.) After configuration, this map corresponds to the area defined as ROM0 on the maps below.
| 1FFFFF | Bootstrap ROM |
| 120000 | |
| Jerry DSP | |
| 118000 | |
| Joysticks and GPIO0-5 | |
| 114000 | |
| Jerry | |
| 110000 | |
| Internal Registers | |
| 100000 | |
| Bootstrap ROM | |
| 000000 |
When the memory configuration is set one of two memory maps is selected depending on bit ROMHI of the memory configuration register.
| ROMHI=1 | ROMHI=0 | ||||
|---|---|---|---|---|---|
| FFFFFF |
ROM0 Bootstrap ROM and registers |
2 Mbytes | FFFFFF |
DRAM0 Dynamic Ram |
4 Mbytes |
| E00000 | C00000 | ||||
|
ROM1 Cartridge ROM |
6 Mbytes |
DRAM1 Dynamic Ram |
4 Mbytes | ||
| 800000 | 800000 | ||||
|
DRAM1 Dynamic Ram |
4 Mbytes |
ROM1 Cartridge ROM |
6 Mbytes | ||
| 400000 | 200000 | ||||
|
DRAM0 Dynamic Ram |
4 Mbytes |
ROM0 Bootstrap ROM and registers |
2 Mbytes | ||
| 000000 | 000000 | ||||
ROM0 is the bootstrap ROM but internal (ASIC) memory and peripherals occupy 128 Kbytes of this space, as shown above. ROM1 is the cartridge ROM. DRAM0 and DRAM1 are the two banks of DRAM.
A 68000 system will naturally operate with RAM at 0, so the ROMHI map is assumed throughout this document. If the system is operated with ROMHI = 0 then the first digit of all internal addresses should be 1 rather than F.
Internal Memory is mostly 16 bits wide to allow operation with 16-bit microprocessors. 32-bit write cycles are allowed to some areas of internal memory notably the line buffer and the graphics processor memory. The line buffer support 32-bit writes primarily in order to accelerate Blitter writes to the line buffer. The graphics processor supports 32-bit writes to accelerate program and data loads.
TODO: REGISTER DEFS AND THEIR DESCRIPTIONS
Jerry and external peripherals occupy the 64k above the internal memory. All Peripheral Memory is 16 bits wide although it is likely that many devices will have eight bit busses.