Tuesday, February 19, 2019
Risc & Pipelining
What is come outd centering snip calculator Architecture? * reduced instruction set computing stands for Reduced steering Set Computer. * An Instruction set is a set of instruction manual that helps the user to construct machine voice communication programs to do computable tasks. History * In early days, the master(prenominal)frames consumed a take of resources for operating theaters * Due to this, in 1980 David Paterson, University of Berkeley introduced the reduced instruction set computing cin maven casept. * This included fewer book of book of operating operating instructions with simple(a) constructs which had fast-paced execution, and less retrospection usage by the CPU. * close to a year was taken to design and fabricate reduced instruction set computer I in silicon * In 1983, Berkeley reduced instruction set computing II was produced.It is with reduced instruction set computing II that RISC idea was opened to the industry. * In later years it was comb ine into Intel Processors * later whatsoever years, a revolution took place between the deuce Instruction Sets. * Whereby RISC started incorporating more than(prenominal) complex instructions and CISC started to reduce the complexity of their instructions. * By mid 1990s some RISC dish upors became more complex than CISC * In todays date the inconsistency between the RISC and CISC is blurred. Characteristics and Comparisons * As mentioned, the difference between RISC and CISC is getting eradicated. But these were the initial differences between the two.RISC CISC less instructions More (100-250) More pictures hence more on splinter repositing (faster) less(prenominal) registers Operations done within the registers of the CPU groundwork be done external to CPU eg memory Fixed aloofness instruction format hence easily de reckond Variable length Instruction execution in one quantify cycle hence simpler instructions In multiple clock cycles Hard wired hence faster Micro prog rammed Fewer addressing modes A variety Addressing modes Register direct. Immediate addressing, unequivocal addressing Give events on one set of instructions for a event operation, Instruction Formats ttp//www-cs-faculty. stanford. edu/eroberts/courses/soco/projects/2000-01/risc/risccisc/ Advantages and Disadvantages * Speed of instruction execution is improved * fast m to market the central processing units since few instructions take less time to design and fabricate * Smaller chip size because fewer transistors atomic number 18 required * Consumes lower power and hence dissipates less heat * Less expensive because of fewer transistors * Because of the fixed length of the instructions, it does not use the memory efficiently * For complex operations, the number of instructions will be largerPipelining The tooth root of pipelining is thought to be in the early 1940s. The processor has specialised social units for implementation for each one be in the instruction cycle. The instructions argon makeed simultaneously. It is worry an host line. IF ID OF OE OS IF ID OF OE OS IF ID OF OE OS IF ID OF OE OS Time step (clocks) Pipelining is used to accelerate the speed of the processor by overlapping un worry demos in the instruction cycle. It improves the instruction execution bandwidth. Each instruction takes 5 clock cycles to complete.When pipelining is used, the first instruction takes 5 clock cycles, but the following instructions finish 1 clock cycle after the forward one. Types of Pipelining There argon various types of pipelining. These include Arithmetic channel, Instruction pipeline, superpipelining, superscaling and vector processing Arithmetic pipeline Used to deal with scientific problems like floating point operations and fixed point multiplications. There atomic number 18 different segments or sub operations for these operations. These can be performed concurrently leading to faster execution.Instruction pipeli ne This is the general pipelining, which have been explained before. Pipeline Hazards information Dependency When two or more instructions attempt to sell the equal data resource. When an instruction is trying to access or curve data which is being modified by an otherwise instruction. There argon three types of data dependance RAW Read After redeem This happens when instruction ij bear witnesss before instruction ii writes the data. This promoter that the pass judgment read is too old. WAR spare After Read This happens when instruction ij writes before instruction ii reads the data.This means that the value read is too new. WAW Write After Write This happens when instruction ij writes before instruction ii writes the data. This means that a wrong value is stored. Solutions Data Dependency * Stall the pipeline This means that a data dependency is predicted and the consequent instructions are not allowed to enter the pipeline. There is a need for special ironware t o predict the data dependency. Also a time delay is caused * strike the pipeline This means that when a data dependency occurs, all other instructions are removed from the pipeline. This besides causes a time delay. slow up load Insertion of No Operation operating instructions in between data dependent instructions. This is done by the compiler and it avoids data dependency Clock Cycle 1 2 3 4 5 6 1. subvert R1 IF OE OS 2. Load R2 IF OE OS 3. Add R1 + R2 IF OE OS 4. Store R3 IF OE OS Clock Cycle 1 2 3 4 5 6 7 1. Load R1 IF OE OS 2. Load R2 IF OE OS 3. NOP IF OE OS 4. Add R1 + R2 IF OE OS 5. Store R3 IF OE OS differentiate Dependency this happens when one instruction in the pipeline growthes into another instruction.Since the instructions have already entered the pipeline, when a leg occurs this means that a subsection penalty occurs. Solutions sleeve Dependency 1. Branch prediction A branch to an instruction to an instruction and its outcom e is predicted and instructions are pipelined accordingly 2. Branch target buffer 3. Delayed Branch The compiler predicts branch dependencies and rearranges the code in such a commission that this branch dependency is avoided. No operation instructions can also be used. No operation instructions 1. despatch MEM100 R1 2. INCREMENT R2 3. ADD R3 R3 + R4 4. SUB R6 R6-R5 . BRA X Clock Cycle 1 2 3 4 5 6 7 8 9 1. Load IF OE OS 2. increment IF OE OS 3. Add IF OE OS 4. Subtract IF OE OS 5. Branch to X IF OE OS 6. Next instructions IF OE OS Clock Cycle 1 2 3 4 5 6 7 8 9 1. Load IF OE OS 2. Increment IF OE OS 3. Add IF OE OS 4. Subtract IF OE OS 5. Branch to X IF OE OS 6. NOP IF OE OS 7. Instructions in X IF OE OS Adding NOP InstructionsClock Cycle 1 2 3 4 5 6 7 8 1. Load IF OE OS 2. Increment IF OE OS 3. Branch to X IF OE OS 4. Add IF OE OS 5. Subtract IF OE OS 6. Instructions in X I F OE OS Re arranging the instructions Intel Pentium 4 processors have 20 stage pipelines. Today, near of these circuits can be found embedded inside to the amplyest degree micro-processors. Superscaling It is a form of parallelism combined with pipelining. It has a redundant execution unit which provides for the parallelism. Superscalar 1984 Star Technologies Roger ChenIF ID OF OE OS IF ID OF OE OS IF ID OF OE OS IF ID OF OE OS IF ID OF OE OS IF ID OF OE OS IF ID OF OE OS IF ID OF OE OS Superpipelining It is the implementation of longer pipelines that is pipelines with more stages. It is mainly usable when some stages in the pipeline take longer than the others. The longest stage determines the clock cycle. So if these long stages can be broken down into smaller stages, past the clock cycle time can be reduced.This reduces time wasted, which will be significant if a number of instructions are performed. Superpipelining is simple because it does not need any additional hardware like for superscaling. There will be more side cause for superpipelining since the number of stages in the pipeline is increased. There will be a longer delay caused when there is a data or branch dependency. Vector Processing Vector Processors 1970s Vector Processors pipeline the data also not just the instructions. For example, if many meter need to be added together like adding 10 pairs of numbers, in a normal processor, each pair will be added at a time.This means the same sequence of instruction fetching and decoding will have to be carried out 10 times. But in vector processing, since the data is also pipelined, the instruction fetch and decode will totally occur once and the 10 pairs of numbers (operands) will be fetched altogether. Thus the time to process the instructions are reduced significantly. C(110) = A(110) + B(110) They are mainly used in specialised applications like long range weather forecasting, artificial word of hon or brasss, image processing etc.Analysing the performance limitations of the rather conventional CISC flare architectures of the period, it was discovered really quickly that operations on vectors and matrices were one of the most demanding CPU bound numerical computational problems faced. RISC Pipelining RISC has simple instructions. This simplicity is utilised to reduce the number of stages in the instruction pipeline. For example the Instruction Decode is not necessary because the encoding in RISC architecture is simple. Operands are all stored in the registers hence there is no need to fetch them from the memory.This reduces the number of stages further. Therefore, for pipelining with RISC architecture, the stages in the pipeline are instruction fetch, operand execute and operand store. Because the instructions are of fixed length, each stage in the RISC pipeline can be penalise in one clock cycle. Questions 1. Is vector processing a type of pipelining 2. RISC and pipelinin g The simplest way to examine the advantages and disadvantages of RISC architecture is by contrasting it with its harbinger CISC (Complex Instruction Set Computers) architecture. Multiplying Two Numbers in MemoryOn the justifiedly is a diagram representing the storage scheme for a generic computer. The main memory is divided into repairs numbered from (row) 1 (column) 1 to (row) 6 (column) 4. The execution unit is responsible for carrying out all computations. However, the execution unit can tho operate on data that has been loaded into one of the six registers (A, B, C, D, E, or F). Lets say we want to find the product of two numbers one stored in location 23 and another stored in location 52 and then store the product back in the location 23. The CISC ApproachThe primary goal of CISC architecture is to complete a task in as few lines of assembly as possible. This is achieved by building processor hardware that is capable of arrangement and executing a series of operations. F or this particular task, a CISC processor would come prepared with a specific instruction (well call it MULT). When executed, this instruction loads the two values into separate registers, multiplies the operands in the execution unit, and then stores the product in the appropriate register. Thus, the entire task of multiplying two numbers can be completed with one instruction MULT 23, 52MULT is what is known as a complex instruction. It operates directly on the computers memory slangs and does not require the programmer to explicitly call any make full or storing functions. It closely resembles a command in a high level language. For instance, if we let a represent the value of 23 and b represent the value of 52, then this command is indistinguishable to the C pedagogy a = a * b. One of the primary advantages of this system is that the compiler has to do very little work to translate a high-level language statement into assembly.Because the length of the code is relatively s hort, very little RAM is required to store instructions. The emphasis is put on building complex instructions directly into the hardware. The RISC Approach RISC processors only use simple instructions that can be executed within one clock cycle. Thus, the MULT command exposit above could be divided into three separate commands make full, which moves data from the memory bank to a register, PROD, which finds the product of two operands located within the registers, and interject, which moves data from a register to the memory banks.In order to perform the exact series of steps described in the CISC approach, a programmer would need to code intravenous feeding lines of assembly LOAD A, 23 LOAD B, 52 PROD A, B STORE 23, A At first, this may seem like a much less efficient way of completing the operation. Because there are more lines of code, more RAM is needed to store the assembly level instructions. The compiler must also perform more work to convert a high-level language statem ent into code of this form. CISC RISC violence on hardware Emphasis on software package Includes multi-clock complex instructions Single-clock, educed instruction only Memory-to-memory LOAD and STORE incorporated in instructions Register to register LOAD and STORE are independent instructions Small code sizes, high cycles per second Low cycles per second, large code sizes Transistors used for storing complex instructions Spends more transistors on memory registers However, the RISC strategy also brings some very important advantages. Because each instruction requires only one clock cycle to execute, the entire program will execute in most the same amount of time as the multi-cycle MULT command.These RISC reduced instructions require less transistors of hardware space than the complex instructions, leaving more room for general purpose registers. Because all of the instructions execute in a uniform amount of time (i. e. one clock), pipelining is possible. Separating th e LOAD and STORE instructions actually reduces the amount of work that the computer must perform. After a CISC-style MULT command is executed, the processor automatically erases the registers. If one of the operands needs to be used for another computation, the processor must re-load the data from the memory bank into a register.In RISC, the operand will remain in the register until another value is loaded in its place. The Performance Equation The following equation is commonly used for expressing a computers performance ability The CISC approach attempts to minimise the number of instructions per program, sacrificing the number of cycles per instruction. RISC does the opposite, reducing the cycles per instruction at the cost of the number of instructions per program. RISC Roadblocks Despite the advantages of RISC establish processing, RISC chips took over a decade to gain a foothold in the commercial world. This was largely due to a lack of software support.Although Apples Powe r Macintosh line featured RISC-based chips and Windows NT was RISC compatible, Windows 3. 1 and Windows 95 were designed with CISC processors in mind. Many companies were unwilling to take a chance with the emerging RISC technology. Without commercial interest, processor developers were unable to make RISC chips in large plenty volumes to make their price competitive. another(prenominal) major setback was the presence of Intel. Although their CISC chips were becoming increasingly unwieldy and difficult to develop, Intel had the resources to plow through development and produce powerful processors.Although RISC chips might surpass Intels efforts in specific areas, the differences were not great enough to persuade buyers to change technologies. The Overall RISC Advantage Today, the Intel x86 is arguable the only chip which retains CISC architecture. This is primarily due to advancements in other areas of computer technology. The price of RAM has decreased dramatically. In 1977, 1MB of DRAM cost most $5,000. By 1994, the same amount of memory cost only $6 (when adjusted for inflation). Compiler technology has also shape more sophisticated, so that the RISC use of RAM and emphasis on software has become ideal.