# Chapter 43 Parallel Operation in the Control Data 6600<sup>1</sup> James E. Thornton ## History In the summer of 1960, Control Data began a project which culminated October, 1964 in the delivery of the first 6600 Computer. In 1960 it was apparent that brute force circuit performance and parallel operation were the two main approaches to any advanced computer. This paper presents some of the considerations having to do with the parallel operations in the 6600. A most important and fortunate event coincided with the beginning of the 6600 project. This was the appearance of the high-speed silicon transistor, which survived early difficulties to become the basis for a nice jump in circuit performance. #### System Organization The computing system envisioned in that project, and now called the 6600, paid special attention to two kinds of use, the very large scientific problem and the time sharing of smaller problems. For the large problem, a high-speed floating point central processor with access to a large central memory was obvious. Not so obvious, but important to the 6600 system idea, was the isolation of this central arithmetic from any peripheral activity. It was from this general line of reasoning that the idea of a multiplicity of peripheral processors was formed (Fig. 1). Ten such peripheral processors have access to the central memory on one side and the peripheral channels on the other. The executive control of the system is always in one of these peripheral processors, with the others operating on assigned peripheral or control tasks. All ten processors have access to twelve input-output channels and may "change hands," monitor channel activity, and perform other related jobs. These processors have access to central memory, and may pursue independent transfers to and from this memory. Each of the ten peripheral processors contains its own memory for program and buffer areas, thereby isolating and protecting the more critical system control operations in the separate processors. <sup>1</sup>AFIPS Proc. FJCC, pt. 2, vol. 26, 1964, pp. 33-40. The central processor operates from the central memory with relocating register and file protection for each program in central memory. # Peripheral and Control Processors The peripheral and control processors are housed in one chassis of the main frame. Each processor contains 4096 memory words of 12 bits length. There are 12- and 24-bit instruction formats to provide for direct, indirect, and relative addressing. Instructions provide logical, addition, subtraction, and conditional branching. Instructions also provide single word or block transfers to and from any of twelve peripheral channels, and single word or block transfers to and from central memory. Central memory words of 60 bits length are assembled from five consecutive peripheral words. Each processor has instructions to interrupt the central processor and to monitor the central program address. To get this much processing power with reasonable economy and space, a time-sharing design was adopted (Fig. 2). This design contains a register "barrel" around which is moving the dynamic information for all ten processors. Such things as program address, accumulator contents, and other pieces of information totalling 52 bits are shifted around the barrel. Each complete trip around requires one major cycle or one thousand nanoseconds. A "slot" in the barrel contains adders, assembly networks, distribution network, and interconnections to perform one step of any peripheral instruction. The time to perform this step or, in other words, the time through the slot, is one minor cycle or one hundred nanoseconds. Each of the ten processors, therefore, is allowed one minor cycle of every ten to perform one of its steps. A peripheral instruction may require one or more of these steps, depending on the kind of instruction. In effect, the single arithmetic and the single distribution and assembly network are made to appear as ten. Only the memories are kept truly independent. Incidentally, the memory read-write cycle time is equal to one complete trip around the barrel, or one thousand nanoseconds. Input-output channels are bi-directional, 12-bit paths. One 12-bit word may move in one direction every major cycle, or 1000 nanoseconds, on each channel. Therefore, a maximum burst rate of 120 million bits per second is possible using all ten peripheral processors. A sustained rate of about 50 million bits per second can be maintained in a practical operating system. Each channel may service several peripheral devices and may interface to other systems, such as satellite computers. Peripheral and control processors access central memory through an assembly network and a dis-assembly network. Since with ntral one mory ction sing. ondid or mels, nory. from structure. nomy lesign namic ogram nation te trip nds. A stribuof any 1 other or one fore, is teps. A : steps, on and emories d-write , or one or 1000 orst rate ripheral second channel to other memory k. Since Fig. 1. Control Data 6600. $^{\mbox{Fig.}}$ 2. 6600 peripheral and control processors. five peripheral memory references are required to make up one central memory word, a natural assembly network of five levels is used. This allows five references to be "nested" in each network during any major cycle. The central memory is organized in independent banks with the ability to transfer central words every minor cycle. The peripheral processors, therefore, introduce at most about 2% interference at the central memory address control. A single real time clock, continuously running is available to all peripheral processors. #### **Central Processor** The 6600 central processor may be considered the high-speed arithmetic unit of the system (Fig. 3). Its program, operands, and results are held in the central memory. It has no connection to the peripheral processors except through memory and except for two single controls. These are the exchange jump, which starts or interrupts the central processor from a peripheral processor, and the central program address which can be monitored by a peripheral processor. A key description of the 6600 central processor, as you will see in later discussion, is "parallel by function." This means that a number of arithmetic functions may be performed concurrently. To this end, there are ten functional units within the central processor. These are the two increment units, floating add unit, fixed add unit, shift unit, two multiply units, divide unit, boolean unit, and branch unit. In a general way, each of these units is a three address unit. As an example, the floating add unit obtains two 60-bit operands from the central registers and produces a 60 bit result which is returned to a register. Information to and from these units is held in the central registers, of which there are twenty-four. Eight of these are considered index registers, are of 18 bits length, and one of which always contains zero. Eight are considered address registers, are of 18 bits length, and serve to address the five read central memory trunks and the two store central memory trunks. Eight are considered floating point Fig. 3. Block diagram of 6600. registers, are of 60 bits length, and are the only central registers to access central memory during a central program. In a sense, just as the whole central processor is hidden behind central memory from the peripheral processors, so, too, the ten functional units are hidden behind the central registers from central memory. As a consequence, a considerable instruction efficiency is obtained and an interesting form of concurrency is feasible and practical. The fact that a small number of bits can give meaningful definition to any function makes it possible to develop forms of operand and unit reservations needed for a general scheme of concurrent arithmetic. Instructions are organized in two formats, a 15-bit format and a 30-bit format, and may be mixed in an instruction word (Fig. 4). As an example, a 15-bit instruction may call for an ADD, designated by the f and m octal digits, from registers designated by the f and f octal digits, the result going to the register designated by the f octal digit. In this example, the addresses of the three-address, floating add unit are only three bits in length, each address referring to one of the eight floating point registers. The 30-bit format follows this same form but substitutes for the f octal digit an 18-bit constant f which serves as one of the input operands. These two formats provide a highly efficient control of concurrent operations. As a background, consider the essential difference between a general purpose device and a special device in which high speeds are required. The designer of the special device can generally improve on the traditional general purpose device by introducing some form of concurrency. For example, some activities of a Fig. 4. Fifteen-bit instruction format. housekeeping nature may be performed separate from the main sequence of operations in separate hardware. The total time to complete a job is then optimized to the main sequence and excludes the housekeeping. The two categories operate concurrently. It would be, of course, most attractive to provide in a general purpose device some generalized scheme to do the same kind of thing. The organization of the 6600 central processor provides just this kind of scheme. With a multiplicity of functional units, and of operand registers and with a simple and highly efficient addressing system, a generalized queue and reservation scheme is practical. This is called the *scoreboard*. The scoreboard maintains a running file of each central register, of each functional unit, and of each of the three operand trunks to and from each unit. Typically, the scoreboard file is made up of two-, three-, and four-bit quantities identifying the nature of register and unit usage. As each new instruction is brought up, the conditions at the instant of issuance are set into the scoreboard. A snapshot is taken, so to speak, of the pertinent conditions. If no waiting is required, the execution of the instruction is begun immediately under control of the unit itself. If waiting is required (for example, an input operand may not yet be available in the central registers), the scoreboard controls the delay, and when released, allows the unit to begin its execution. Most important, this activity is accomplished in the scoreboard and the functional unit, and does not necessarily limit later instructions from being brought up and issued. In this manner, it is possible to issue a series of instructions, some related, some not, until no functional units are left free or until a specific register is to be assigned more than one result. With just those two restrictions on issuing (unit free and no double result), several independent chains of instructions may proceed concurrently. Instructions may issue every minor cycle in the absence of the two restraints. The instruction executions, in comparison, range from three minor cycles for fixed add, 10 minor cycles for floating multiply, to 29 minor cycles for floating divide. To provide a relatively continuous source of instructions, one buffer register of 60 bits is located at the bottom of an instruction stack capable of holding 32 instructions (Fig. 5). Instruction words from memory enter the bottom register of the stack pushing up the old instruction words. In straight line programs, only the bottom two registers are in use, the bottom being refilled as quickly as memory conflicts allow. In programs which branch back to an instruction in the upper stack registers, no refills are allowed after the branch, thereby holding the program loop completely in the stack. As a result, memory access or memory conflicts are no longer involved, and a considerable speed increase can be had. Five memory trunks are provided from memory into the central processor to five of the floating point registers (Fig. 6). One address register is assigned to each trunk (and therefore to the Fig. 5. 6600 instruction stack operation. Fig. 6. Central processor operating registers. tl h ut be tie C( C floating point register). Any instruction calling for address register result implicitly initiates a memory reference on that trunk. These instructions are handled through the scoreboard and therefore tend to overlap memory access with arithmetic. For example, a new memory word to be loaded in a floating point register can be brought in from memory but may not enter the register until all previous uses of that register are completed. The central registers, therefore, provide all of the data to the ten functional units, and receive all of the unit results. No storage is maintained in any unit. Central memory is organized in 32 banks of 4096 words. Consecutive addresses call for a different bank; therfore, adjacent addresses in one bank are in reality separated by 32. Addresses may be issued every 100 nanoseconds. A typical central memory information transfer rate is about 250 million bits per second. As mentioned before, the functional units are hidden behind the registers. Although the units might appear to increase hardware duplication, a pleasant fact emerges from this design. Each unit may be trimmed to perform its function without regard to others. Speed increases are had from this simplified design. As an example of special functional unit design, the floating multiply accomplishes the coefficient multiplication in nine minor cycles plus one minor cycle to put away the result for a total of 10 minor cycles, or 1000 nanoseconds. The multiply uses layers of carry save adders grouped in two halves. Each half concurrently forms a partial product, and the two partial products finally merge while the long carries propagate. Although this is a fairly large complex of circuits, the resulting device was sufficiently smaller than originally planned to allow two multiply units to be included in the final design. To sum up the characteristics of the central processor, remember that the broadbrush description is "concurrent operation." In other words, any program operating within the central processor utilizes some of the available concurrency. The program need not be written in a particular way, although certainly some optimization can be done. The specific method of accomplishing this concurrency involves issuing as many instructions as possible while handling most of the conflicts during execution. Some of the essential requirements for such a scheme include: - 1 Many functional units - 2 Units with three address properties - 3 Many transient registers with many trunks to and from the units - 4 A simple and efficient instruction set #### Construction Circuits in the 6600 computing system use all-transistor logic (Fig. silicon transistor operates in saturation when switched Fig. 7. 6600 printed circuit module. "on" and averages about five nanoseconds of stage delay. Logic circuits are constructed in a cordwood plug-in module of about 2½ inches by 2½ inches by 0.8 inch. An average of about 50 transistors are contained in these modules. Memory circuits are constructed in a plug-in module of about six inches by six inches by 2½ inches (Fig. 8). Each memory module contains a coincident current memory of 4096 12-bit Fig. 8. 6600 memory module. Fig. 9. 6600 main frame section. words. All read-write drive circuits and bit drive circuits plus address translation are contained in the module. One such module is used for each peripheral processor, and five modules make up one bank of central memory. Logic modules and memory modules are held in upright hinged chassis in an X shaped cabinet (Fig. 9). Interconnections between modules on the chassis are made with twisted pair transmission lines. Interconnections between chassis are made with coaxial cables. Both maintenance and operation are accomplished at a programmed display console (Fig. 10). More than one of these Fig. 10. 6600 display console. consoles may be included in a system if desired. Dead start facilities bring the ten peripheral processors to a condition which allows information to enter from any chosen peripheral device. Such loads normally bring in an operating system which provides a highly sophisticated capability for multiple users, maintenance, and so on. The 6600 Computer has taken advantage of certain technology advances, but more particularly, logic organization advances which now appear to be quite successful. Control Data is exploring advances in technology upward within the same compatible structure, and identical technology downward, also within the same compatible structure. #### References Allard, Wolf, and Zemlin [1964]; Clayton, Dorff, and Fagen [1964]. #### MPENDIX 1 ISP OF CDC 6600 PERIPHERAL AND CONTROL PROCESSOR start which evice. vides iance, iology /ances ata is : com- within Fagen ``` ! ISP of the CDC 6600 Peripheral and Control Processor, Barrel distributor, ! and 1/0 channels. Initial version by Gary Leive (ca. 1978) ! Although the 6600 has 10 identical Peripheral and Control processors, the ! ISP for a single processor is shown. An identifying parameter is utilized to specify which of the ten processors is active during simulation. The CDC ! 6600 Peripheral and Control processors each possess a 4006 word 12 bit loca ! memory. The ISP shows only one 4006 word memory which is used by all the ! "processors". begin 0 := A[id] = A[id] slr d, 1 := A[id] = A[id] sr0 (not d) end, 411 := A[id]<5:0> = A[id]<5:0> ! LMN - Logical difference d ...Channel.State.. CHAN[0:11]<11:0>, cact[0:11]<>, cful[0:11]<>, ! I/O channels ! Channel active indicator ! Channel full indicator ! Barrel A registers ! Barrel P registers ! Barrel Q registers ! Barrel K registers **PCP.Memory.State** N.PCP[0:4095]<11:0>, ! Only one PCP m read[0:4]<11:0>, ! Read pyramid c. read(59:0> := read[0:4]<11:0>, ! Write [0:4]<11:0>, ... Write(50:0> := write[0:4]<11:0>, ... | ! Only one PCP memory is shown ! Read pyramid **PCP.Instruction.format** pir(23:0), f < 5:0> := pir(23:18), d < 5:0> := pir(17:12), m (11:0) := pir(11:0), dm(17:0> := pir(17:0), ! PCP Instruction register **Addressing.Calculation**{us} index(id<3:0>)<11:0> :* ! Indexed addressing begin DECODE d eq1 0 => DDE d eq. begin 0:= begin index = m + M.PCP[d]; P[id] = P[id] + 1 end, ••Barrel.Execution•• barrel(main) := rel{main} := begin next pcp(0) next pcp(1) next pcp(2) next pcp(2) next pcp(3) next pcp(4) next pcp(6) next pcp(6) next pcp(6) next pcp(8) next pcp(8) next pcp(8) next RESTART barrel end. ! Activate processor 0 ! Activate processor 1 ! Activate processor 3 ! Activate processor 3 ! Activate processor 4 ! Activate processor 6 ! Activate processor 6 ! Activate processor 7 m.PCP[0] = P[id] + 1 next PCP[0] = m; Q[id] = d next CRNO := ! Activate processor ! Activate processor ! Do it all again #61 := ! CRM - Central read (d) ! words from (A) to m begin c.read = MP[A[id]] next M.PCP[P[id]+0] = read[0] next M.PCP[P[id]+1] = read[1] next M.PCP[P[id]+2] = read[2] next M.PCP[P[id]+3] = read[3] next M.PCP[P[id]+4] = read[4] next P[id] = P[id] + 5: A[id] = A[id] + 1: O[id] = O[id] = 1 next If O[id] neq 0 => RESTART CRMO end next P[id] = M.PCP[D] end. begin **PCP.Execution**{oc} pcp(id<3:0>) := Pcp(id<3:0>) := begin pir<23:12 = M.PCP[P[id]] next P[id] = I next m = M.PCP[P[id]]; K[id] 5:00 = f; Q[id] = d next DECODE K[id] = begin [#00,#24.#25]:= no.op(), #14 := A[id] = d, #15 := A[id] = d, #30 := A[id] = M.PCP[id], #30 := A[id] = M.PCP[id], #34 := M.PCP[id] = A[id], #40 := A[id] = M.PCP[id], #44 := M.PCP[id] = A[id], #40 := A[id] = M.PCP[id] = A[id], #40 := A[id] = M.PCP[id] = A[id], #40 := A[id] = M.PCP[id] = A[id], #40 := A[id] = M.PCP[id] = A[id], #41 := M.PCP[M.PCP[id]] = A[id], #420 := begin A[id] = dm; P[id] = P[id] + I</pre> ! PSN - Pass ! LUN - Load d into a ! LUN - Load compliment d ! LUD - Load (d) ! STD - Store (d) ! LUI - Load (d) ! LUI - Load (d) ! LUI - Load (d) ! LUC - Load dm #62 := begin begin write[0] = M.PCP[d+0] next ! CWD - Central write (A) = d write[1] = M.PCP[d+1] next write[2] = M.PCP[d+2] next write[2] = M.PCP[d+3] next write[4] = M.PCP[d+4] next MP[A[id]] = c.write end. begin M.PCP[0] = P[id] + 1 next ! CWM - Central write (d) P[id] = m; ! to (A) from m CWMO := begin write[0] = M.PCP[P[id]+0] next write[0] = M.PCP[P[id]+1] next begin A[id] = dm; P[id]=P[id]+1 A[id] = dm; P[id]=P[id]=1 end, #50 := A[id] = M.PCP[index(id)], ! LDM - Load (m + (d)) #54 := M.PCP[index(id)] = A[id], ! STM - Store (m + (d)) #57 := A[id] = A[id] + (us) d, ! ADN - Add d, #58 := A[id] = A[id] + (us) d, ! SBN - Subtract d, #58 := A[id] = A[id] + (us) M.PCP[d],! ADD - Add (d) #59 := A[id] = A[id] + (us) M.PCP[d],! SBD - Subtract (d) #50 := A[id] = A[id] + (us) | ADI - Add ((d)) #51 := A[id] = A[id] + (us) | ADI - Add ((d)) #52 := A[id] = A[id] + (us) | SBI - Subtract ((d)) #52 := A[id] = A[id] - A[id] + (us) | SBI - Subtract ((d)) #53 := A[id] = A[id] - A[id] + (us) | SBI - Subtract ((d)) #63 := ``` # APPENDIX 1 (cont'd.) #### MPENDIX 2 ISP OF THE CDC 6600 ddress unter smel d sannel d on CHAN d ``` CDC6600{process} := begin macro not.described := [no.op()]. **Reservation.Control.State** ! ISP of the CDC 6600 abusy[0:7]<>, arw [0:7]<>, bbusy[0:7]<>, brw [0:7]<>, xbusy[0:7]<>, xrw [0:7]<>, ! A registers busy bits ! A registers read(0)/write(1) ! B registers busy bits ! B registers read(0)/write(1) ! X registers busy bits ! X registers read(0)/write(1) ! Floating point instructions are not described. ! The central processor and central memory are described in this ! ISP. An auxillary ISP (PC6600.ISP) describes the peripheral ! processors and control barrel execution. ! The ten functional units are described and allow parallel ! simulation. fbusy[0:9]<>, ! Functional Unit busy bits The following tables are used to deallocate the resource assignments either in the event of conflict during allocation, or during deallocation at instruction completion. if 77U(> indicates usage of the registers by a unit. 1 = used, 0 = not used functional Unit A register A register usage functional Unit B register B register usage Functional Unit X register X register usage Functional Unit X register X register usage Instructions are processed from an instruction stack. Instruction conflicts are resolved by keeping a "scorecard" containing utilization information on all registers and all functional units. Reservation control decodes an instruction to determine register utilization. Source and destination registers are allocated if they are not being used as destinations of another functional unit. If the required functional unit is free and if both the source and destination registers are available, the instruction is released to the unit for execution. If the resources are into available, reservation control holds the instruction until the resources become available. At the completion of execution by a functional unit, the resources I are released by marking the scorecard. fa [0:9]<2:0>, fau[0:9]<>, fb [0:9]<2:0>, fbu[0:9]<2:0>, fx [0:9]<2:0>, fxu[0:9]<>, ! The following page by page index of the ISP is provided to aid ! in locating CDC 6600 architectural features. ! Temporary for arith unit number ""Central Memory. States" defines the Central Memory. ""Processor .States" defines central processor carriers. ""Instruction.formats" defines instruction fields. ""Implementation.Declarationss" defines ISP related variables. "Reservation.Control.States" defines variables used by reservation control. These declarations constitute the resource allocation "scorecard". Describe the reservation control execution. ""Instruction.fetch" describes the instruction stack control and instruction fetch processes. ""Central.Memory.Access" describes the instruction read and the register associated memory access processes. ""Exchange.Jumps" is the processor interrupt facility. "Instruction.Cycles" is the main instruction processing cycle. Instruction secution is initiated by issuing the instructions to the appropriate functional unit. ! The functional units are: Branch Unit. Shift Unit. Add Unit. Long Add Unit. Multiply Unit 0. Multiply Unit 1. Divide Unit. Increment Unit 0. Increment Unit 1. **Central.Memory.State** MP[0:4095]<59:0>. ! Use only 4k of 60 bit memory **Processor.State** xjp[0:15]<59:0>, xja<16:0>, xjf<>, ! Exchange Jump Package ! Exchange Jump Address ! Exchange Jump Flag px(19.0). PC(17:0) := px(19:2). i1c(1:0) := px(1:0). i1c(1:0) := px(1:0). AREGIG:7;(17:0). AREGIG:7;(17:0). AREGIG:7;(17:0). AREGIG:7;(69:0). AREGIG:7;(69:0). AREGIG:7;(69:0). FLUCK(17:0). FLUCK(17:0). HAC(17:0). ! Pseudo program counter ! Program counter ! Program counter ! Instruction length count ! Instruction stack counter ! A registers ! A registers ! X registers ! X registers ! Ref Address (central memory) ! Field length of program ! Reference Address for ECS ! Field length for ECS ! Frield length for ECS ! Program exit mode ! Monitor exchange end, dest()<> := ! Desc.... begin dest = 0 next DECODE fm => begin [#10:#45, #47.#70:#77]:= (fx[unit] = i.: fxu[unit] = 1; If not xbusy[i.] => dest = xbusy[i.] = xrw[i.] = 1), #50:#57 := (fa[unit] = i.: fau[unit] = 1; If not abusy[i.] => dest = abusy[i.] = arw[i.] = 1), "#A.#26, "#hfunit] = i.: fbu[unit] = 1; "#hfunit] = i.: fbu[unit] = 1; "#hfunit] = i.: fbu[unit] = 1; "#hfunit] = i.: dest = bbusy[i.] = brw[i.] = 1), **Instruction.Format** ! Instruction register ! Short instruction (15 bit) ! Long instruction extension i0<14:0> := I<29:15>, i1<14:0> := I<14:0>, f. (2:0) := I(29:27), m. (2:0) := I(28:24), fm (5:0) := I(29:24), I. (2:0) := I(29:24), j. (2:0) := I(20:18), k. (2:0) := I(17:15), k1(17:0) := I(17:0), is[0:7](69:0), ism[0:31](14:0) := is[0:7](69:0), ishi(17:0), ! High address limit in stack ! Low address limit in stack ! Stack insert counter mark := ! Mark stack as invalid begin islo = ishi = PC end, **Implementation.Declarations** dealloc(dunit<3:0>)(critical) := ! Deallocate resources stop.bit(). ! Stop flag ``` ``` rni(pci<17:0>)<59:0> := begin If not range(pci) => r ! Read next instruction pegin If not range(pci) => rni = MP[RACM + pci] end, aref(reg<2:0>,val<17:0>) := ! A register forced ! memory access end end, #40:#42 := DECODE fbusy[5] => ! Multiply Units begin 0 := unit = 5, 1 := If not fbusy[6] => unit = 6 end, #44:#47 := unit = 7, #50:#77 := DECODE fbusy[8] => ! Increment Units begin 0 := unit = 8, 1 := If not fbusy[9] => unit = 9 end **Exchange.Jump**(us) Exchange jump is the central processor's interrupt mechanism. Exchange jump is initiated by power on or by one of the ten peripheral processors. All of the central processor's state (including all registers) is exchanged with 10 words of central memory. The central memory starting address is provided by the "interrupting" peripheral processor. The central memory words are formatted such that all of the state can be extracted and loaded into the appropriate registers This implementation uses a 16 word holding area (xjp) to format and temporarily preserve the old state until the new state is loaded. end end next If unit neq 15 => begin DECODE fbusy[unit] => begin 0 := DECODE (not dest()) or (not source()) => U := fbusy[unit] = 1. 1 := begin dealloc(unit) next RESTART reserv end end end, 1 := begin WAIT (not fbusy[unit]) next RESTART reserv end **Instruction.Fetch**{us} ! Instruction fetch is always from the instruction stack. If ! the stack is empty (initial power on or branch out of stack), or if there are less than three instruction words left in the ! stack, fetch reloads the stack before obtaining an instruction. ! Instructions may be 15 or 30 bits long and aligned on any 15 bit ! boundry. Fetch obtains 15 bits of an instruction then determines ! if a second 15 bits are required. ARCIGO] = ARCIGO] = MP[x]a + 00] = x]p[00]; MP[x]a + 01] = x]p[01]; MP[x]a + 02] = x]p[02]; MP[x]a + 03] = x]p[03]; MP[x]a + 04] = x]p[03]; MP[x]a + 04] = x]p[06]; MP[x]a + 05] = x]p[07]; MP[x]a + 06] = x]p[07]; MP[x]a + 06] = x]p[07]; MP[x]a + 07] = x]p[07]; MP[x]a + 10] = x]p[07]; MP[x]a + 10] = x]p[11]; MP[x]a + 12] = x]p[11]; MP[x]a + 12] = x]p[13]; MP[x]a + 14] = x]p[14]; MP[x]a + 15] = x]p[15] next x]f = 0 end, ##Control of an analysis analysis of an ! Check for 30 bit instructions **Central.Memory.Access**{oc} ! Centeral memory is always accessed indirectly by a user program. ! The Read Next Instruction (RMI) routine is used to load the ! instruction stack. Touching the A registers 1 through 7 causes ! the corresponding X register to be loaded (A[1:5]) from memory ! or stored (A[6:7]) in memory. **Instruction.Cycle** start(main) := rt(main):= begin ! Initialization WAIT (xjf) next ! Wait for exchange jump stop.bit = 0 next ! Clear stop bit mark() next ! Instruction Stack empty run := ! Main cycle begin If xjf => xj() next ! Check for exchange jump If stop.bit => RESTART start next If not range => begin fetch() next ! Get an instruction reserv() ! Reservation control end next ! will not return until range = 0 next ! all usage conflicts are ! resolved. range(rel<17:0>)<> := ! Address range fault check. ! Fault ! Address exit select ! Get an instruction ! Reservation control ! will not return until ! all usage conflicts are ! resolved. ! Issue the instruction end end, exec() next RESTART run end end ``` ## APPENDIX 2 (cont'd.) ``` The instruction is issued to the appropriate execution unit. Degin DECODE unit => begin 0 := BRANCH.UNIT(I), 1 := BOOLEAN.UNIT(I), 2 := SHIFT.UNIT(I), 3 := ADD.UNIT(I), 4 := LONG.ADD.UNIT(I), 5 := MULTIPLY.UNIT.0(I), 6 := MULTIPLY.UNIT.1(I), 7 := DIVIDE.UNIT(I), 8 := IMCREMENT.UNIT.1(I), 9 := IMCREMENT.UNIT.1(I) end shift(main) := ! The remainder of the ISP describes the ten arithmetic processing ! units. These units will function in parallel much as they do ! in the real CDC 6600. ! Bute that floating point instructions are decoded but this ISP ! does not describe their actual execution. 1 := XREG[i.] = XREG[k.] slr (not BREG[j.]<5:0>) 1: XREG[i.] = XREG[k.] slr (not BREG] end, not.described, ! NXi begin xREG[i.] <= XREG[k.] < 59 > 0 XREG[k.] < 47:0); RREG[j.] <= M2000 - (us) XREG[k.] < 58:48 > end, BRANCH.UNIT(i<29:0>){process; critical} := begin **Branch.Declarations** fm <5:0> := i<29:24>, i. <2:0> := i<23:21>, j. <2:0> := i<20:18>, k. <2:0> := i<17:15>, BREG[j.] <= #zuuu - tuo; nn.ot...; end, begin XREG[i.]<47:0> = XREG[k.]<47:0>; XREG[i.]<59> = XREG[k.]<59>; DECODE XREG[k.]<59> => hendin #27 := end end, #43 := begin XREG[i.] = 0 next XREG[i.]<59> = (jk neq 0) next XREG[i.] = XREG[i.] srd (jk -(us) 1) ! MX i XI en end next dealloc(2) end end, end ADD.UNIT(i<29:0>){process; critical} :* begin **Add.Declarations** #U177 := LT := If BREG[:] | square | BREG[:] | > PC = end next | F (PC lss(us) islo) or (PC gtr(us) ishi) => mark() next dealloc(0) end **Add.Execution**{oc} add(main) := begin DECODE fm => DECODE fm => begin #30 := not.described, #31 := not.described, #32 := not.described, #33 := not.described, #35 := not.described end next dealloc(3) end. ••Boolean.Unit•• BOOLEAN.UNIT(i<29:0>){process: critical} := begin **Boolean.Declarations** fm (5:0) := 1<29:24>, i. <2:0) := 1<23:21>, j. <2:0> := 1<20:18>, k. <2:0> := 1<17:15>, ueal end end, ••Long.Add.Unit•• **Boolean.Execution**(us) LONG.ADD.UNIT(i<29:0>){process; critical} := begin boolean(main) := **Long.Add.Declarations** fm <5:0> := i<29:24>, i. <2:0> := i<23:21>, j. <2:0> := i<20:18>, k. <2:0> := i<17:15>, **Long.Add.Execution**{oc} ladd(main) := ladd(main): begin DECODE fm => begin #35:= XREG[i.] = XREG[j.] + XREG[k.], #37:= XREG[i.] = XREG[j.] - XREG[k.], otherwise:= no.op() end next dealloc(4) end end. **Shift.Unit** SHIFT.UNIT(i<29:0>){process; critical} := begin **Shift.Declarations** **Multiply.Unit.O** fm (5:0) := i(29:24), i. (2:0) := i(23:21), j. (2:0) := i(20:18), k. (2:0) := i(17:15), jk (5:0) := i(20:15), MULTIPLY.UNIT.0(i<29:0>){process; critical} := begin ••Multiply.O.Declarations•• ``` ### APPENDIX 2 (cont'd.) ``` #5 := aref(i.,AREG[j.] - BREG[k.]) #6 := aref(i.,BREG[j.] + BREG[k.]) #7 := aref(i.,BREG[j.] - BREG[k.]) end #60:#67 := SBi end end, #60:#67 := SBi begin OECODE m. => DEGIN #0 := BREG[i.] = AREG[j.] + k1, #1 := BREG[i.] = BREG[j.] + k1, #2 := BREG[i.] = BREG[j.] + k1, #3 := BREG[i.] = AREG[j.] + k1, #4 := BREG[i.] = AREG[j.] + BREG[k.], #5 := BREG[i.] = AREG[j.] + BREG[k.], #6 := BREG[i.] = BREG[j.] + BREG[k.], #7 := BREG[i.] = BREG[j.] - BREG[k.] end fm <5:0> := i<29:24>, **Multiply.O.Execution**{oc} py0{main} := O(main) := begin DECODE fm => begin #40 := not.described, #41 := not.described, #42 := not.described end next dealloc(5) end **Multiply.Unit.1** MULTIPLY.UNIT.1(i<29:0>){process: critical} := begin Multiply.1.Declarations** fm <5:0> := i<29:24>, **Multiply.1.Execution**{oc} mpy1{main} := begin DECODE fm => begin #40 := not.described, #41 := not.described, #42 := not.described end next dealloc(6) end 1 FXi -> Xj * Xk 1 RXi -> Xj * Xk 1 DXi -> Xj * Xk dea11oc(8) end end, end, **Increment.Unit.1** **Divide.Unit** INCREMENT.UNIT.1(i<29:0>){process; critical} := begin DIVIDE.UMIT(i<29:0>){process; critical} := begin **Increment.1.Declarations** fm (5:0) := i(29:24), m, (2:0) := i(26:24), i, (2:0) := i(23:21), j, (2:0) := i(20:18), k, (2:0) := i(17:15), k1(17:0) := i(17:0), fm (5:0) := i(29:24), m. (2:0) := i(28:24), i. (2:0) := i(23:21), j. (2:0) := i(20:18), k. (2:0) := i(17:15), k!(17:0) := i(17:0), xcnt(5:0). ! Counter for CXi ! FXi -> Xi = Xj / Xk ! RXi -> Xi = Xj / Xk SB1 begin BCODE m. => begin #0 := BREG[i.] = BREG[j.] + k1, #1 := BREG[i.] = BREG[j.] + k1, #2 := BREG[i.] = XREG[j.] + k1, #3 := BREG[i.] = XREG[j.] + (17:0) + BREG[k.], #4 := BREG[i.] = AREG[j.] + BREG[k.], #5 := BREG[i.] = AREG[j.] + BREG[k.], #6 := BREG[i.] = BREG[j.] + BREG[k.], #7 := BREG[i.] = BREG[j.] - BREG[k.], #7 := BREG[i.] = BREG[j.] - BREG[k.], INCREMENT.UNIT.0(i<29:0>){process: critical} := begin #70:#77 := SXi = begin DECODE m. => begin #0: XREG[i.] <= BREG[j.] + k1. #1: XREG[i.] <= BREG[j.] + k1. #2: XREG[i.] <= REG[j.] (17:0) + k1. #3: XREG[i.] <= XREG[j.] (17:0) + BREG[k.]. #4: XREG[i.] <= AREG[j.] + BREG[k.]. #6: XREG[i.] <= AREG[j.] + BREG[k.]. #6: XREG[i.] <= BREG[j.] + BREG[k.]. #7: XREG[i.] <= BREG[j.] + BREG[k.]. #7: XREG[i.] <= BREG[j.] - BREG[k.]. ##1: XREG[i.] <= BREG[j.] - BREG[k.]. fm (5:0) := i(29:24), m. (2:0) := i(26:24), i. (2:0) := i(23:21), j. (2:0) := i(20:18), k. (2:0) := i(17:15), k1(17:0) := i(17:0), **Increment.O.Ex. incr0{main} := begin DECODE fm => begin #50:#57 := SAi := begin DECODE m. => begin #0 := aref(i..AREG[j.] + k1), #1 := aref(i..BREG[j.] + k1), #2 := aref(i..RREG[j.](17:0) + k1), #3 := aref(i..RREG[j.](17:0) + BREG[k.]), #4 := aref(i..AREG[j.] + BREG[k.]), REQUIRE.ISP [PC6600.isp]. ! End CDC 6600 ``` 59U(1011 4 118 int ca! tio