Question 1
In a k-way set associative cache, the cache is divided into v sets, each of which consists of k lines. The lines of a set are placed in sequence one after another. The lines in set s are sequenced before the lines in set (s+1). The main memory blocks are numbered 0 onwards. The main memory block numbered j must be mapped to any one of the cache lines from.
 A (j mod v) * k to (j mod v) * k + (k-1) B (j mod v) to (j mod v) + (k-1) C (j mod k) to (j mod k) + (v-1) D (j mod k) * v to (j mod k) * v + (v-1)
GATE CS 2013    Computer Organization and Architecture
Discuss it

Question 1 Explanation:
Number of sets in cache = v. So, main memory block j will be mapped to set (j mod v), which will be any one of the cache lines from (j mod v) * k to (j mod v) * k + (k-1). (Associativity plays no role in mapping- k-way associativity means there are k spaces for a block and hence reduces the chances of replacement.)
 Question 2
Consider the following sequence of micro-operations.
     MBR ← PC
MAR ← X
PC ← Y
Memory ← MBR
Which one of the following is a possible operation performed by this sequence?
 A Instruction fetch B Operand fetch C Conditional branch D Initiation of interrupt service
GATE CS 2013    Computer Organization and Architecture
Discuss it

Question 2 Explanation:
MBR - Memory Buffer Register ( that stores the data being transferred to and from the immediate access store) MAR - Memory Address Register ( that holds the memory location of data that needs to be accessed.) PC - Program Counter ( It contains the address of the instruction being executed at the current time ) The 1st instruction places the value of PC into MBR The 2nd instruction places an address X into MAR. The 3rd instruction places an address Y into PC. The 4th instruction places the value of MBR ( which was the old PC value) into Memory. Now it can be seen from the 1st and the 4th instructions, that the control flow was not sequential and the value of PC was stored in the memory, so that the control can again come back to the address where it left the execution. This behavior is seen in the case of interrupt handling. And here X can be the address of the location in the memory which contains the beginning address of Interrupt service routine. And Y can be the beginning address of Interrupt service routine. In case of conditional branch (as for option C ) only PC is updated with the target address and there is no need to store the old PC value into the memory. And in the case of Instruction fetch and operand fetch ( as for option A and B), PC value is not stored anywhere else. Hence option D.
 Question 3
Consider an instruction pipeline with five stages without any branch prediction: Fetch Instruction (FI), Decode Instruction (DI), Fetch Operand (FO), Execute Instruction (EI) and Write Operand (WO). The stage delays for FI, DI, FO, EI and WO are 5 ns, 7 ns, 10 ns, 8 ns and 6 ns, respectively. There are intermediate storage buffers after each stage and the delay of each buffer is 1 ns. A program consisting of 12 instructions I1, I2, I3, …, I12 is executed in this pipelined processor. Instruction I4 is the only branch instruction and its branch target is I9. If the branch is taken during the execution of this program, the time (in ns) needed to complete the program is
 A 132 B 165 C 176 D 328
GATE CS 2013    Computer Organization and Architecture
Discuss it

Question 3 Explanation:
Pipeline will have to be stalled till Ei stage of l4 completes,
as Ei stage will tell whether to take branch or not.

After that l4(WO) and l9(Fi) can go in parallel and later the
following instructions.
So, till l4(Ei) completes : 7 cycles * (10 + 1 ) ns = 77ns
From l4(WO) or l9(Fi) to l12(WO) : 8 cycles * (10 + 1)ns = 88ns
Total = 77 + 88 = 165 ns
 Question 4
A RAM chip has a capacity of 1024 words of 8 bits each (1K × 8). The number of 2 × 4 decoders with enable line needed to construct a 16K × 16 RAM from 1K × 8 RAM is
 A 4 B 5 C 6 D 7
GATE CS 2013    Computer Organization and Architecture
Discuss it

Question 4 Explanation:
RAM chip size = 1k ×8[1024 words of 8 bits each]
RAM to construct =16k ×16
Number of chips required = (16k x 16)/ ( 1k x 8)
= (16k x 2)
[16 chips vertically with each having 2 chips
horizontally]
So to select one chip out of 16 vertical chips,
we need 4 x 16 decoder.

Available decoder is  2 x 4 decoder
To be constructed is 4 x 16 decoder

Hence 4 + 1 = 5 decoders are required. 
 Question 5
The following code segment is executed on a processor which allows only register operands in its instructions. Each instruction can have atmost two source operands and one destination operand. Assume that all variables are dead after this code segment.
   c = a + b;
d = c * a;
e = c + a;
x = c * c;
if (x > a) {
y = a * a;
}
else {
d = d * d;
e = e * e;
}
Suppose the instruction set architecture of the processor has only two registers. The only allowed compiler optimization is code motion, which moves statements from one place to another while preserving correctness. What is the minimum number of spills to memory in the compiled code?
 A 0 B 1 C 2 D 3
GATE CS 2013    Computer Organization and Architecture
Discuss it

Question 5 Explanation:
r1......r2
a.......b......c = a + b
a.......c......x = c * c
a.......x......but we will have to store c in mem as we don't know if x > a
................. or not
y.......x......y = a * a
choosing the best case of x > a , min spills = 1 
 Question 6
Consider the same data as above question. What is the minimum number of registers needed in the instruction set architecture of the processor to compile this code segment without any spill to memory? Do not apply any optimization other than optimizing register allocation.
 A 3 B 4 C 5 D 6
GATE CS 2013    Computer Organization and Architecture
Discuss it

Question 6 Explanation:
Note that for solving the above problem we are not allowed for code motion. So, we will start analyzing the code line by line and determine how many registers will be required to execute the above code snippet. Assuming the registers are numbered R1, R2, R3 and R4. The analysis has been shown in the table below So from the above analysis we can conclude that we will need minimum 4 registers to execute the above code snippet. This explanation has been contributed by Namita Singh.
 Question 7
The amount of ROM needed to implement a 4 bit multiplier is
 A 64 bits B 128 bits C 1 Kbits D 2 Kbits
GATE CS 2012    Computer Organization and Architecture
Discuss it

Question 7 Explanation:
For a 4 bit multiplier, there are 24 * 24 combinations, i.e., 28 combinations. Also, Output of a 4 bit multiplier is 8 bits. Thus, the amount of ROM needed = 28 * 8 = 211 = 2048 bits = 2Kbits
 Question 8
Register renaming is done in pipelined processors
 A as an alternative to register allocation at compile time B for efficient access to function parameters and local variables C to handle certain kinds of hazards D as part of address translation
GATE CS 2012    Computer Organization and Architecture
Discuss it

Question 8 Explanation:
Register renaming is done to avoid data hazards
 Question 9
A computer has a 256 KByte, 4-way set associative, write back data cache with block size of 32 Bytes. The processor sends 32 bit addresses to the cache controller. Each cache tag directory entry contains, in addition to address tag, 2 valid bits, 1 modified bit and 1 replacement bit. The number of bits in the tag field of an address is
 A 11 B 14 C 16 D 27
GATE CS 2012    Computer Organization and Architecture
Discuss it

Question 9 Explanation:
A set-associative scheme is a hybrid between a fully associative cache, and direct mapped cache. It's considered a reasonable compromise between the complex hardware needed for fully associative caches (which requires parallel searches of all slots), and the simplistic direct-mapped scheme, which may cause collisions of addresses to the same slot (similar to collisions in a hash table). (source: http://www.cs.umd.edu/class/spring2003/cmsc311/Notes/Memory/set.html). Also see http://csillustrated.berkeley.edu/PDFs/handouts/cache-3-associativity-handout.pdf   Number of blocks = Cache-Size/Block-Size = 256 KB / 32 Bytes = 213 Number of Sets = 213 / 4 = 211 Tag + Set offset + Byte offset = 32 Tag + 11 + 5 = 32 Tag = 16
 Question 10
Consider the data given in previous question. The size of the cache tag directory is
 A 160 Kbits B 136 bits C 40 Kbits D 32 bits
GATE CS 2012    Computer Organization and Architecture
Discuss it

Question 10 Explanation:
16 bit address 2 bit valid 1 modified 1 replace Total bits = 20 20 × no. of blocks = 160 K bits.
 Question 11
Consider a hypothetical processor with an instruction of type LW R1, 20(R2), which during execution reads a 32-bit word from memory and stores it in a 32-bit register R1. The effective address of the memory location is obtained by the addition of a constant 20 and the contents of register R2. Which of the following best reflects the addressing mode implemented by this instruction for operand in memory?
GATE CS 2011    Computer Organization and Architecture
Discuss it

Question 11 Explanation:
 Question 12
On a non-pipelined sequential processor, a program segment, which is a part of the interrupt service routine, is given to transfer 500 bytes from an I/O device to memory.

Initialize the count to 500
LOOP: Load a byte from device
Decrement the count
If count != 0 go to LOOP 
Assume that each statement in this program is equivalent to machine instruction which takes one clock cycle to execute if it is a non-load/store instruction. The load-store instructions take two clock cycles to execute. The designer of the system also has an alternate approach of using DMA controller to implement the same transfer. The DMA controller requires 20 clock cycles for initialization and other overheads. Each DMA transfer cycle takes two clock cycles to transfer one byte of data from the device to the memory. What is the approximate speedup when the DMA controller based design is used in place of the interrupt driven program based input-output?
 A 3.4 B 4.4 C 5.1 D 6.7
GATE CS 2011    Computer Organization and Architecture
Discuss it

Question 12 Explanation:
 Explanation:
STATEMENT                                           CLOCK CYCLE(S) NEEDED
Initialize the count to 500                                            1
LOOP: Load a byte from device                                                2
Decrement the count                                                    1
If count != 0 go to LOOP                                               1

Interrrupt driven transfer time = 1+1+500×(2+2+1+1+1) = 3502
DMA based transfer time = 20+500*2 = 1020
Speedup = 3502/1020 ≈ 3.4

Source: http://clweb.csa.iisc.ernet.in/rahulsharma/gate2011key.html
 Question 13
Consider evaluating the following expression tree on a machine with load-store architecture in which memory can be accessed only through load and store instructions. The variables a, b, c, d and e initially stored in memory. The binary operators used in this expression tree can be evaluate by the machine only when the operands are in registers. The instructions produce results only in a register. If no intermediate results can be stored in memory, what is the minimum number of registers needed to evaluate this expression?
 A 2 B 9 C 5 D 3
GATE CS 2011    Computer Organization and Architecture
Discuss it

Question 13 Explanation:

R1←c,  R2←d,  R2←R1+R2,  R1←e,  R2←R1-R2
Now to calculate the rest of the expression we must load a and b into the registers but we need the
content of R2 later.
So we must use another Register.
R1←a, R3←b, R1←R1-R3, R1←R1+R2

 Question 14
Consider an instruction pipeline with four stages (S1, S2, S3 and S4) each with combinational circuit only. The pipeline registers are required between each stage and at the end of the last stage. Delays for the stages and for the pipeline registers are as given in the figure: What is the approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non-pipeline implementation?
 A 4 B 2.5 C 1.1 D 3
GATE CS 2011    Computer Organization and Architecture
Discuss it

Question 14 Explanation:
Pipeline registers overhead is not counted in normal
time execution

So the total count will be

5+6+11+8= 30 [without pipeline]

Now, for pipeline, each stage will be of 11 n-sec (+ 1 n-sec for overhead).
and, in steady state output is produced after every pipeline cycle. Here,
in this case 11 n-sec. After adding 1n-sec overhead, We will get 12 n-sec
of constant output producing cycle.

dividing 30/12 we get 2.5 
 Question 15
An 8KB direct-mapped write-back cache is organized as multiple blocks, each of size 32-bytes. The processor generates 32-bit addresses. The cache controller maintains the tag information for each cache block comprising of the following. 1 Valid bit 1 Modified bit As many bits as the minimum needed to identify the memory block mapped in the cache. What is the total size of memory needed at the cache controller to store meta-data (tags) for the cache?
 A 4864 bits B 6144 bits C 6656 bits D 5376 bits
GATE CS 2011    Computer Organization and Architecture
Discuss it

Question 15 Explanation:
   Cache size = 8 KB
Block size = 32 bytes
Number of cache lines = Cache size / Block size = (8 × 1024 bytes)/32 = 256

total bits required to store meta-data of 1 line = 1 + 1 + 19 = 21 bits
total memory required = 21 × 256 = 5376 bits
Source: http://clweb.csa.iisc.ernet.in/rahulsharma/gate2011key.html
 Question 16
A main memory unit with a capacity of 4 megabytes is built using 1M x 1-bit DRAM chips. Each DRAM chip has 1K rows of cells with 1K cells in each row. The time taken for a single refresh operation is 100 nanoseconds.The time required to perform one refresh operation on all the cells in the memory unit is
 A 100*210 nanoseconds B 100*220 nanoseconds C 3200*220 nanoseconds
GATE CS 2010    Computer Organization and Architecture
Discuss it

Question 16 Explanation:
Number of chips required for 4MB MM =
(4 * 2^20 * 8) / (1 * 2^20) = 32

Time required to refresh one chip = 2^20 * 100 ns.

Hence, time required to refresh MM = 32 * 2^20 * 100 ns
= 3200 * 2^20 ns 
 Question 17
A 5-stage pipelined processor has Instruction Fetch(IF),Instruction Decode(ID),Operand Fetch(OF),Perform Operation(PO)and Write Operand(WO)stages.The IF,ID,OF and WO stages take 1 clock cycle each for any instruction.The PO stage takes 1 clock cycle for ADD and SUB instructions,3 clock cycles for MUL instruction,and 6 clock cycles for DIV instruction respectively.Operand forwarding is used in the pipeline.What is the number of clock cycles needed to execute the following sequence of instructions?
     Instruction           Meaning of instruction
I0 :MUL R2 ,R0 ,R1	      R2 ¬ R0 *R1
I1 :DIV R5 ,R3 ,R4  	      R5 ¬ R3/R4
I2 :ADD R2 ,R5 ,R2	      R2 ¬ R5+R2
I3 :SUB R5 ,R2 ,R6	      R5 ¬ R2-R6
 A 13 B 15 C 17 D 19
GATE CS 2010    Computer Organization and Architecture
Discuss it

Question 17 Explanation:
Operand Forwarding : In this technique the value of operand is given to the concerned stage of dependent instruction before it is stored. In the above question, I2 is dependent on I0 and I1, and I3 is dependent on I2. Let's see this question with a time-space diagram. The above is a space-time diagram representing the pipeline in which the instructions gets executed. Instruction 0 is a MUL operation which take 3 clock cycles of CPU in the PO stage, and at any other stage it takes only 1 cycle. Instruction 1 is a DIV operation which take 6 clock cycles of CPU in the PO stage, and at any other stage it takes only 1 cycle. It can be noticed here that even when the OF stage was free in the 4th clock cycle, then also the instruction 1 was not given to it. This is a design issue. The operands should be fetched only if they are going to get operated or executed in the next cycle, else there is a possibility of data corruption. As PO stage was not free in the next cycle hence OF was delayed and was done for instruction 1 only just before 1 cycle of going to PO stage. Instruction 2 is an ADD operation which take 1 clock cycles of CPU in all stages. But it is a dependent operation. it needs the operands which are provided by Instruction 0 and 1. Instruction 2 needs R5 and R2 to add, it gets R2 on time, because till the time Instruction 2 reaches its PO stage R2 would have been stored in memory. Now R5 is also needed, but Instruction 2's PO and Instruction 1's WO are in parallel. That means Instruction 2 can't take the value of R5 before it is stored by Instruction 1. So here comes the concept of Operand Forwarding. Before Instruction 1 store it's result/value which is R5, it can first forward it to instruction 2's Fetch-Execute Buffer, so that Instruction 2 can also use it in parallel to Instruction's WO stage. This will save extra clock cycles required( if Operand forwarding is not used, and R5 need to be taken from memory). In Instruction 3, same operand forwarding concept is applied for the value of R2 which is computed by Instruction 2. Hence operand forwarding saved 2 extra clock cycles here. ( 1 cycle in Instruction 2 and 1 cycle in Instruction 3). So the total no of cycles are 15, which can be seen from the diagram, each instance of the stage represents 1 clock cycle. So total 15.
 Question 18
The program below uses six temporary variables a, b, c, d, e, f.

a = 1
b = 10
c = 20
d = a+b
e = c+d
f = c+e
b = c+e
e = b+f
d = 5+e
return d+f
Assuming that all operations take their operands from registers, what is the minimum number of registers needed to execute this program without spilling?
 A 2 B 3 C 4 D 6
GATE CS 2010    Computer Organization and Architecture
Discuss it

Question 18 Explanation:
All of the given expressions use at-most 3 variables, so we never nee more than 3 registers. See  http://en.wikipedia.org/wiki/Register_allocation It requires minimum 3 registers. Principle of Register Allocation : If a variable needs to be allocated to a register, the system checks for any free register available, if it finds one, it allocates. If there is no free register, then it checks for a register that contains a dead variable ( a variable whose value is not going to be used in future ), and if it finds one then it allocates. Otherwise it goes for Spilling ( it checks for a register whose value is needed after the longest time, saves its value into the memory, and then use that register for current allocation, later when the old value of the register is needed, the system gets it from the memory where it was saved and allocate it in any register which is available ). But here we should not apply spilling as directed in the question. Let's allocate the registers for the variables. a = 1 ( let's say register R1 is allocated for variable 'a' )   b = 10 ( R2 for 'b' , because value of 'a' is going to be used in the future, hence can not replace variable of 'a' by that of 'b' in R1)   c = 20 ( R3 for 'c', because values of 'a' and 'b' are going to be used in the future, hence can not replace variable 'a' or 'b' by 'c' in R1 or R2 respectively)   d = a+b ( now, 'd' can be assigned to R1 because R1 contains dead variable which is 'a' and it is so called because it is not going to be used in future, i.e. no subsequent expression uses the value of variable 'a')   e = c+d ( 'e' can be assigned to R1, because currently R1 contains value of varibale 'd' which is not going to be used in the subsequent expression.) Note: an already calculated value of a variable is used only by READ operation ( not WRITE), hence we have to see only on the RHS side of the subsequent expressions that whether the variable is going to be used or not.   f = c+e ( ' f ' can be assigned to R2, because vaule of 'b' in register R2 is not going to be used in subsequent expressions, hence R2 can be used to allocate for ' f ' replacing 'b' )   b = c+e ( ' b ' can be assigned to R3, because value of 'c' in R3 is not being used later )   e = b+f ( here 'e' is already in R1, so no allocation here, direct assignment )   d = 5+e ( 'd' can be assigned to either R1 or R3, because values in both are not used further, let's assign in R1 )   return d+f ( no allocation here, simply contents of registers R1 and R2 are added and returned)   hence we need only 3 registers, R1 R2 and R3.
 Question 19
A computer system has an L1 cache, an L2 cache, and a main memory unit connected as shown below. The block size in L1 cache is 4 words. The block size in L2 cache is 16 words. The memory access times are 2 nanoseconds. 20 nanoseconds and 200 nanoseconds for L1 cache, L2 cache and main memory unit respectively. When there is a miss in L1 cache and a hit in L2 cache, a block is transferred from L2 cache to L1 cache. What is the time taken for this transfer?
 A 2 nanoseconds B 20 nanoseconds C 22 nanoseconds D 88 nanoseconds
GATE CS 2010    Computer Organization and Architecture
Discuss it

Question 19 Explanation:
A block to access in L2 cache requires 20 nanoseconds, and 2 seconds to place in L1-cache. The block size in L1 cache is 4 words and there are total 16 words, so total time is 4*(20+2) = 88.
 Question 20
Consider the data from above question. When there is a miss in both L1 cache and L2 cache, first a block is transferred from main memory to L2 cache, and then a block is transferred from L2 cache to L1 cache. What is the total time taken for these transfers?
 A 222 nanoseconds B 888 nanoseconds C 902 nanoseconds D 968 nanoseconds
GATE CS 2010    Computer Organization and Architecture
Discuss it

Question 20 Explanation:
Access time from main memory is 200. So total time to access is is 200+20 nanoseconds Similarly for L1 cache to access from L2 cache is 20+2 nanoseconds So total time is 4*(220 + 22) = 968 nanoseconds.
 Question 21
How many 32K x 1 RAM chips are needed to provide a memory capacity of 256K-bytes?
 A 8 B 32 C 64 D 128
GATE-CS-2009    Computer Organization and Architecture
Discuss it

Question 21 Explanation:
We need 256 Kbytes, i.e., 256 x 1024 x 8 bits. We have RAM chips of capacity 32 Kbits = 32 x 1024 bits. (256 * 1024 * 8)/(32 * 1024) = 64
 Question 22
Consider a 4 stage pipeline processor.   The number of cycles needed by the four instructions I1, I2, I3, I4 in stages S1, S2, S3, S4 is shown below:
 S1 S2 S3 S4 I1 2 1 1 1 I2 1 3 2 2 I3 1 1 1 3 I4 1 2 2 2
What is the number of cycles needed to execute the following loop? For (i=1 to 2) {I1; I2; I3; I4;}
 A 16 B 23 C 28 D 30
GATE-CS-2009    Computer Organization and Architecture
Discuss it

Question 22 Explanation:
This question is different from other questions on pipeline with respect to the no of cycles taken by each instruction in each stage, i.e. an instruction here may take different no of cycles in different stages, and also that two instructions may take different no of cycles in the same stage as well. Therefore, here we have to consider two things : 1) Eligibility 2) Availability i.e. an instruction i should be eligible to be given to stage j, and a stage j should be available(free) to handle/process instruction i. Now, let's see how both the above things can be achieved. An instruction i will be eligible to be given to stage j, if and only if, the instruction i has completed stage j-1. Similarly, a Stage j will be available for instruction i, if and only if, the Stage j has completed instruction i-1. So, by following and fulfilling above two criteria we have to determine the total no of cycles taken by these instructions in a loop of 2 iterations. Note: An instruction i will be eligible for processing in iteration 2, if and only if, it has completed its processing in iteration 1.
 Question 23
Consider a 4-way set associative cache (initially empty) with total 16 cache blocks. The main memory consists of 256 blocks and the request for memory blocks is in the following order: 0, 255, 1, 4, 3, 8, 133, 159, 216, 129, 63, 8, 48, 32, 73, 92, 155. Which one of the following memory block will NOT be in cache if LRU replacement policy is used?
 A 3 B 8 C 129 D 216
GATE-CS-2009    Computer Organization and Architecture
Discuss it

Question 23 Explanation:
4 way set associative so 16 block will be divided in 4 sets of 4 blocks each. We apply(Address mod  4) function to decide set.
 Set 0 0 48 0 mod4=0 * 4 32 255 mod4=3 * 8 8 1 mod4=1 * 216 92 4 mod4=0 * Set 1 1 1 3 mod4=3 * 133 133 8 mod4=0 * 129 129 133 mod4=1 * 73 73 159 mod4=3 * Set 2 216 mod4=0 * 129 mod4=1 * 63 mod4=3 * 8 mod4=0 * Set 3 255 155 98 mod4=0 * 3 3 32 mod4=0 * 159 159 73 mod4=1 * 63 63 92 mod4=0 * 155 Mod4=3 *
All  * are misses S1 in the first stage ans S2 in the second. In the second stage 216 is not present in the cache. So (D) is correct option.
 Question 24
Which of the following is/are true of the auto-increment addressing mode?
I.  It is useful in creating self-relocating code.
II. If it is included in an Instruction Set Architecture,
calculation.
III.The amount of increment depends on the size of the data
item accessed.
 A I only B II only C III Only D II and III only
Computer Organization and Architecture    GATE CS 2008
Discuss it

Question 24 Explanation:
In auto-increment addressing mode the address where next data block to be stored is generated automatically depending upon the size of single data item required to store. Self relocating code takes always some address in memory and statement says that this mode is used for self relocating code so option 1 is incorrect and  no additional ALU is required So option (C) is correct option.
 Question 25
Which of the following must be true for the RFE (Return from Exception) instruction on a general purpose processor?
I.   It must be a trap instruction
II.  It must be a privileged instruction
III. An exception cannot be allowed to occur during
execution of an RFE instruction 
 A I only B II only C I and II only D I, II and III only
Computer Organization and Architecture    GATE CS 2008
Discuss it

Question 25 Explanation:
RFE (Return From Exception) is a privileged trap instruction that is executed when exception occurs, so an exception is not allowed to execute. In computer architecture for a general purpose processor, an exception can be defined as an abrupt transfer of control to the operating system. Exceptions are broadly classified into 3 main categories: a. Interrupt: it is mainly caused due to I/O device. b. Trap: It is caused by the program making a syscall. c. Fault: It is accidentally caused by the program that is under execution such as( a divide by zero, or null pointer exception etc). The processor’s fetch instruction unit makes a poll for the interrupts. If it finds something unusual happening in the machine operation it inserts an interrupt pseudo- instruction in the pipeline in place of the normal instruction. Then going through the pipeline it starts handling the interrupts. The operating system explicitly makes a transition from kernel mode to user mode, generally at the end of an interrupt handle pr kernel call by using a privileged instruction RFE( Return From Exception) instruction. This solution is contributed by Namita Singh
 Question 26
For inclusion to hold between two cache levels L1 and L2 in a multi-level cache hierarchy, which of the following are necessary?
I. L1 must be a write-through cache
II. L2 must be a write-through cache
III. The associativity of L2 must be greater than that of L1
IV. The L2 cache must be at least as large as the L1 cache 
 A IV only B I and IV only C I, III and IV only D I, II, III and IV
Computer Organization and Architecture    GATE CS 2008
Discuss it

Question 26 Explanation:
Answer is (B), i.e., (i) and (iv) are true. Because inclusion says the L2 cache should be Superset of L1 cache. If "Write Through update" is not used (and "Write Back update" is used ) at L1 cache, then the modified data in L1 cache will not be present in L2 cache for some time unless the block in L1 cache is replaced by some other block. Hence "Write Through Update " should be used. Associativity doesn't matter. L2 cache must be at least as large as L1 cache, since all the words in L1 are also is L2.
 Question 27
Which of the following are NOT true in a pipelined processor?
I.  Bypassing can handle all RAW hazards.
II. Register renaming can eliminate all register
carried WAR hazards.
III. Control hazard penalties can be eliminated by
dynamic branch prediction.
 A I and II only B I and III only C II and III only D I, II and III
Computer Organization and Architecture    GATE CS 2008
Discuss it

Question 27 Explanation:
I - False, Bypassing can't handle all RAW hazard, consider when any instruction depends on the result of LOAD instruction, now LOAD updates register value at Memory Access Stage (MA), so data will not be available directly on Execute stage. II - True, register renaming can eliminate all WAR Hazard. III- False, It cannot completely eliminate, though it can reduce Control Hazard Penalties
 Question 28
The use of multiple register windows with overlap causes a reduction in the number of memory accesses for
I. Function locals and parameters
II. Register saves and restores
III. Instruction fetches   
 A I only B II only C III only D I, II and III
Computer Organization and Architecture    GATE CS 2008
Discuss it

Question 28 Explanation:
I is true as by using multiple register windows, we eliminate the need to access the variable values again and again from the memory. Rather, we store them in the registers.

II is false as register saves and restores would still be required for each and every variable.

III is also false as instruction fetch is not affected by memory access using multiple register windows.

So, only I is true. Hence, A is the correct option.

Please comment below if you find anything wrong in the above post.
 Question 29
In an instruction execution pipeline, the earliest that the data TLB (Translation Lookaside Buffer) can be accessed is
 A before effective address calculation has started B during effective address calculation C after effective address calculation has completed D after data cache lookup has completed
Computer Organization and Architecture    GATE CS 2008
Discuss it

Question 29 Explanation:
When we calculate effective address, first of all we access TLB to access the Frame number. Logical address generated by CPU breaks in two parts : page number and page offset, for faster accessing of data we place some page table entries in a small hardware TLB whose access time is same as cache memory. So initially when page no. is mapped to find the corresponding frame no., first it is look up in TLB and then in page-table (in case if TLB miss). During effective address calculation TLB is accessed. So (B) is correct option.
 Question 30
Consider a machine with a 2-way set associative data cache of size 64 Kbytes and block size 16 bytes. The cache is managed using 32 bit virtual addresses and the page size is 4 Kbytes. A program to be run on this machine begins as follows:
double ARR[1024][1024];
int i, j;

// Initialize array ARR to 0.0
for(i = 0; i < 1024; i++)
for(j = 0; j < 1024; j++)
ARR[i][j] = 0.0;

The size of double is 8 Bytes. Array ARR is located in memory starting at the beginning of virtual page 0xFF000 and stored in row major order. The cache is initially empty and no pre-fetching is done. The only data memory references made by the program are those to array ARR. The total size of the tags in the cache directory is
 A 32 Kbits B 34 Kbits C 64 Kbits D 68 Kbits
Computer Organization and Architecture    GATE CS 2008
Discuss it

Question 30 Explanation:
Virtual Address = 32 bits Cache address is of the form: TAG | SET | BLOCK For BLOCK of 16 bytes, we need 4 bits. Total number of sets(each set containing 2 Blocks) = 64 KB / (2 * 16) B = 211 So, Number of SET bits = 11 Number of TAG bits = 32 - (11 + 4) = 17   Thus, cache address = 17 | 11 | 4 (TAG | SET | BLOCK) Tag memory size = Number of tag bits * Total number of blocks = 17 * 2 * 211 (Total Number of blocks = 2 * Total number of sets) = 68 KB   Thus, D is the correct choice.
 Question 31
The cache hit ratio for this initialization loop is
 A 0% B 25% C 50% D 75%
Computer Organization and Architecture    GATE CS 2008
Discuss it

Question 31 Explanation:
Explanation: Cache hit ratio=No. of hits/total accesses =1024/(1024+1024) =1/2=0.5=50% So (C) is correct option
 Question 32
Consider a 4-way set associative cache consisting of 128 lines with a line size of 64 words. The CPU generates a 20-bit address of a word in main memory. The number of bits in the TAG, LINE and WORD fields arerespectively:
 A 9,6,5 B 7, 7, 6 C 7, 5, 8 D 9, 5, 6
Computer Organization and Architecture    GATE-CS-2007
Discuss it

Question 32 Explanation:
Here the number of sets = 128/4 = 32 (as it is 4 say set associative)

We have total 64 words then we need 6 bits to identify the word

So the line offset is 5 bits and the word offset is 6 bits

and the TAG = 20-(5+6) =9 bits

so it should be 9,5,6
 Question 33
Consider a pipelined processor with the following four stages:
IF: Instruction Fetch
ID: Instruction Decode and Operand Fetch
EX: Execute
WB: Write Back
The IF, ID and WB stages take one clock cycle each to complete the operation. The number of clock cycles for the EX stage dependson the instruction. The ADD and SUB instructions need 1 clock cycle and the MUL instruction needs 3 clock cycles in the EX stage. Operand forwarding is used in the pipelined processor. What is the number of clock cycles taken to complete the following sequence of instructions?
ADD R2, R1, R0       R2 <- R0 + R1
MUL R4, R3, R2       R4 <- R3 * R2
SUB R6, R5, R4       R6 <- R5 - R4

 A 7 B 8 C 10 D 14
Computer Organization and Architecture    GATE-CS-2007
Discuss it

Question 33 Explanation:
Explanation: Order of instruction cycle phases IF”  ID”  EX”  WB” We  have 3 instructions. which represents wait in pipeline due to result dependently.
 1 2 3 4 5 6 7 8 R2!R1!R0 IF ID EX WB R4!R3!R2 IF ID EX EX EX WB R6!R5!R4 IF ID - - EX WB
This is the table shows the cycle phases and number of cycles require for given instruction. No. of cycles required=8 So (B) is correct option.
 Question 34
Consider the following program segment. Here R1, R2 and R3 are the general purpose registers. Assume that the content of memory location 3000 is 10 and the content of the register R3 is 2000. The content of each of the memory locations from 2000 to 2010 is 100. The program is loaded from the memory location 1000. All the numbers are in decimal. Assume that the memory is word addressable. The number of memory references for accessing the data in executing the program completely is:
 A 10 B 11 C 20 D 21
Computer Organization and Architecture    GATE-CS-2007
Discuss it

Question 34 Explanation:
Explanation: Ist memory reference R1←M[3000] and then in the loop which runs for 10 times, because the content of memory location 3000 is 10 given in question and loop will run 10 times as R2← M[R3] M[R3] ←R2 There are two memory reference every iteration 10*2=20 Total=20+1=21 So  (D) is correct option.
 Question 35
Consider the data given in above question. Assume that the memory is word addressable. After the execution of this program, the content of memory location 2010 is:
 A 100 B 101 C 102 D 110
Computer Organization and Architecture    GATE-CS-2007
Discuss it

Question 35 Explanation:
Explanation: Program stores results from 2000 to 2010. It stores 110,109,108…..100 at  2010 location. So at 2010 it stores 100 Because DEC R1 is instruction which decrements register value by 1. So (A) is correct option.
 Question 36
Consider the data given in above questions. Assume that the memory is byte addressable and the word size is 32 bits. If an interrupt occurs during the execution of the instruction “INC R3”, what return address will be pushed on to the stack?
 A 1005 B 1020 C 1024 D 1040
Computer Organization and Architecture    GATE-CS-2007
Discuss it

Question 36 Explanation:
Explanation: If memory is byte addressable so for each instruction it requires 1 word that is equal to 4 bytes which require 4 addresses
 Instruction Word location MOV R1,3000 2 1000-1007 MOV R2,R1 1 1008-1011 ADD R2,R1 1 1012-1015 MOV(R3),R2 1 1016-1019 INC R3 1 1020-1023 DEC R1 1 1024-1027
Interrupt occur during execution of instruction INC R3. So CPU will complete the execution of this instruction and push the next address 1024 in the stack. So after interrupt service program can be resumed for next instruction. So (C) is correct option.
 Question 37
Consider a machine with a byte addressable main memory of 216 bytes. Assume that a direct mapped data cache consisting of 32 lines of 64 bytes each is used in the system. A 50 × 50 two-dimensional array of bytes is stored in the main memory starting from memory location 1100H. Assume that the data cache is initially empty. The complete array is accessed twice. Assume that the contents of the data cache do not change in between the two accesses. How many data cache misses will occur in total?
 A 40 B 50 C 56 D 59
Computer Organization and Architecture    GATE-CS-2007
Discuss it

Question 37 Explanation:
Size of main memory=216 bytes Size of cache=32*64 Bytes =2 11 Bytes Size of array=2500 Bytes Array is stored in main memory but cache will be empty Size of cache=2048 Bytes So number of page faults=2500-2048=452 Complete array will be access twice So for second access no. of total page faults=452*2=904 So total page faults=452+904=1356 So data cache misses will be 56 So (C) is correct option
 Question 38
Consider the data given in above question. Which of the following lines of the data cache will be replaced by new blocks in accessing the array for the second time?
 A line 4 to line 11 B line 4 to line 12 C line 0 to line 7 D line 0 to line 8
Computer Organization and Architecture    GATE-CS-2007
Discuss it

Question 38 Explanation:
Size of Main Memory = 2^16 bytes No. of lines = 32 = 2^5 Size of each line = 64 = 2^6 bytes => Word Offset = 6 2 way set is present Hence, No.of sets = No. of lines / 2 = 2^5 / 2 = 2^4 Now, Main memory format= ////// Sir 1st image here //////// Starting memory location : 1100 H => 0001 0001 0000 0000 Offset : 00 0000 Line   : 01 00 (4)   This solution is contributed by Mohit Gupta.
 Question 39
A CPU has 24-bit instructions. A program starts at address 300 (in decimal). Which one of the following is a legal program counter (all values in decimal)?
 A 400 B 500 C 600 D 700
Computer Organization and Architecture    GATE-CS-2006
Discuss it

Question 39 Explanation:
Here, size of instruction  =  24/8 = 3 bytes.

Program Counter can shift 3 bytes at a time to jump to next instruction.

So the given options must be divisible by 3. only 600 is satisfied.
 Question 40
A machine has a 32-bit architecture, with 1-word long instructions. It has 64 registers, each of which is 32 bits long. It needs to support 45 instructions, which have an immediate operand in addition to two register operands. Assuming that the immediate operand is an unsigned integer, the maximum value of the immediate operand is ____________.
 A 16383
Computer Organization and Architecture    GATE-CS-2014-(Set-1)
Discuss it

Question 40 Explanation:
1 Word = 32 bits

Each instruction has 32 bits

To support 45 instructions, opcode must contain 6-bits

Register operand1 requires 6 bits, since the total registers
are 64.

Register operand 2 also requires 6 bits.

14-bits are left over for immediate Operand Using 14-bits,
we can give maximum 16383,

Since 2^14=16384(from 0 to 16383) 
 Question 41
Consider a 6-stage instruction pipeline, where all stages are perfectly balanced.Assume that there is no cycle-time overhead of pipelining. When an application is executing on this 6-stage pipeline, the speedup achieved with respect to non-pipelined execution if 25% of the instructions incur 2 pipeline stall cycles is
 A 4 B 8 C 6 D 7
Computer Organization and Architecture    GATE-CS-2014-(Set-1)
Discuss it

Question 41 Explanation:
It was a numerical digit type question so answer must be 4.

As for 6 stages, non-pipelining takes 6 cycles.

There were 2 stall cycles for pipelining for 25% of the instructions

So pipe line time = (1+(25/100)*2) = 1.5

Speed up = Non pipeline time/Pipeline time = 6/1.5 = 4
 Question 42
A 4-way set-associative cache memory unit with a capacity of 16 KB is built using a block size of 8 words. The word length is 32 bits. The size of the physical address space is 4 GB. The number of bits for the TAG field is _____
 A 5 B 15 C 20 D 25
Computer Organization and Architecture    GATE-CS-2014-(Set-2)
Discuss it

Question 42 Explanation:
In a k-way set associate mapping, cache memory is divided into sets, each of size k blocks. Size of Cache memory = 16 KB As it is 4-way set associative,K = 4 Block size B = 8 words The word length is 32 bits. size of Physical address space = 4 GB. --------------------------------------------------- No of blocks in Cache Memory(N) = (size of cache memory / size of a block) = (16*1024 bytes / 8*4 bytes) = 512 (as 1 word = 4 bytes) No of sets(S) = (No of blocks in cache memory/ no of blocks in a set) = N/K = 512/4 = 128 Now,size of physical address = 4GB = 4*(2^30) Bytes = 2^32 Bytes These physical adresses are divided equally among the sets. Hence, each set can access ((2^32)/128) bytes = 2^25 bytes = 2^23 words = 2^20 blocks So, each set can access total of 2^20 blocks. So to identify these 2^20 blocks, each set needs TAG bits of length 20 bits. Hence option C.
 Question 43
In designing a computer’s cache system, the cache block (or cache line) size is an important parameter. Which one of the following statements is correct in this context?
 A A smaller block size implies better spatial locality B A smaller block size implies a smaller cache tag and hence lower cache tag overhead C A smaller block size implies a larger cache tag and hence lower cache hit time D A smaller block size incurs a lower cache miss penalty
Computer Organization and Architecture    GATE-CS-2014-(Set-2)
Discuss it

Question 43 Explanation:
Block : The memory is divided into equal size segments. Each segment is called a block. Data in cache is retrieved in form of blocks. The idea is to use Spatial Locality (Once a location is retrieved, it is highly probable that the nearby locations would be retrieved in near future). TAG bits : Each cache block is given a set of TAG bits to identify which main memory block is present in that cache block. Option A : If the block size is small, there would be less number of near-by address for future references by CPU to be present into that block. Hence this is not better spatial locality. Option B : If the block size is smaller, no of blocks would be more in cache, hence more cache tag bits would be needed, not less. Option C : Cache tag bits are more ( because more no of blocks due to smaller block size ), but more cache tag bits can't lower the hit time ( even it will increase ). Option D : If there is a miss at cache memory ( i.e. the needed block by the CPU is not present in the cache memory ), then that block has to be moved from next lower level of memory ( lets say main memory ) in the memory hierarchy, and if the block size is lower, then it takes less time to be placed into cache memory, hence less miss penalty. Hence option D.
 Question 44
If the associativity of a processor cache is doubled while keeping the capacity and block size unchanged, which one of the following is guaranteed to be NOT affected?
 A Width of tag comparator B Width of set index decoder C Width of way selection multiplexor D Width of processor to main memory data bus
Computer Organization and Architecture    GATE-CS-2014-(Set-2)
Discuss it

Question 44 Explanation:
If associativity is doubled, keeping the capacity and block size constant, then the number of sets gets halved. So, width of set index decoder can surely decrease - (B) is false. Width of way-selection multiplexer must be increased as we have to double the ways to choose from- (C) is false As the number of sets gets decreased, the number of possible cache block entries that a set maps to gets increased. So, we need more tag bits to identify the correct entry. So, (A) is also false. (D) is the correct answer- main memory data bus has nothing to do with cache associativity- this can be answered without even looking at other options.
 Question 45
Consider a main memory system that consists of 8 memory modules attached to the system bus, which is one word wide. When a write request is made, the bus is occupied for 100 nanoseconds (ns) by the data, address, and control signals. During the same 100 ns, and for 500 ns thereafter, the addressed memory module executes one cycle accepting and storing the data. The (internal) operation of different memory modules may overlap in time, but only one request can be on the bus at any time. The maximum number of stores (of one word each) that can be initiated in 1 millisecond is ____________
 A 1000 B 10000 C 100000 D 100
Computer Organization and Architecture    GATE-CS-2014-(Set-2)
Discuss it

Question 45 Explanation:
One request initiation takes 100 ns. As the operations of memory module may overlap in time another, request can be initiated before it completes its remaining 500 ns. Thus total requests that can be initiated is 1000000 ns/100 ns =10000.
 Question 46
Consider the following processors (ns stands for nanoseconds). Assume that the pipeline registers have zero latency.
P1: Four-stage pipeline with stage
latencies 1 ns, 2 ns, 2 ns, 1 ns.
P2: Four-stage pipeline with stage
latencies 1 ns, 1.5 ns, 1.5 ns, 1.5 ns.
P3: Five-stage pipeline with stage
latencies 0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns.
P4: Five-stage pipeline with stage
latencies 0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns. 
Which processor has the highest peak clock frequency?
 A P1 B P2 C P3 D P4
Computer Organization and Architecture    GATE-CS-2014-(Set-3)
Discuss it

Question 46 Explanation:
Peak clock frequency = 1 / Maximum latency

Maximum of latencies is minimum in P3

i.e.

P1 : f= 1/2 = 0.5 GHz

P2: f=1/1.5 = 0.67 GHz

P4: f=1/1.1 GHz

P3 : f=1/1 GHz =1

Thus P3 is be the right answer


This explanation has been contributed by Abhishek Kumar.
 Question 47
An instruction pipeline has five stages, namely, instruction fetch (IF), instruction decode and register fetch (ID/RF), instruction execution (EX), memory access (MEM), and register writeback (WB) with stage latencies 1 ns, 2.2 ns, 2 ns, 1 ns, and 0.75 ns, respectively (ns stands for nanoseconds). To gain in terms of frequency, the designers have decided to split the ID/RF stage into three stages (ID, RF1, RF2) each of latency 2.2/3 ns. Also, the EX stage is split into two stages (EX1, EX2) each of latency 1 ns. The new design has a total of eight pipeline stages. A program has 20% branch instructions which execute in the EX stage and produce the next instruction pointer at the end of the EX stage in the old design and at the end of the EX2 stage in the new design. The IF stage stalls after fetching a branch instruction until the next instruction pointer is computed. All instructions other than the branch instruction have an average CPI of one in both the designs. The execution times of this program on the old and the new design are P and Q nanoseconds, respectively. The value of P/Q is __________.
 A 1.5 B 1.4 C 1.8 D 2.5
Computer Organization and Architecture    GATE-CS-2014-(Set-3)
Discuss it

Question 47 Explanation:
Each one takes average 1CPI.

In 1st case 80% take 1 clock and 20% take 3 clocks so total time:

p = (.8*1 + .2*3)*2.2=3.08.
q = (.8*1 + 6*.2)*1=2
p/q = 1.54 
 Question 48
A CPU has a cache with block size 64 bytes. The main memory has k banks, each bank being c bytes wide. Consecutive c − byte chunks are mapped on consecutive banks with wrap-around. All the k banks can be accessed in parallel, but two accesses to the same bank must be serialized. A cache block access may involve multiple iterations of parallel bank accesses depending on the amount of data obtained by accessing all the k banks in parallel. Each iteration requires decoding the bank numbers to be accessed in parallel and this takes. k/2 ns The latency of one bank access is 80 ns. If c = 2 and k = 24, the latency of retrieving a cache block starting at address zero from main memory is:
 A 92 ns B 104 ns C 172 ns D 184 ns
Computer Organization and Architecture    GATE-CS-2006
Discuss it

Question 48 Explanation:
Explanation: Size of cache block=64 B No. of main memory banks K=24 Size of each bank C=2 bytes i.e each bank in memory is 2 bytes and there are 24 such banks. So, in one iteration we can get 2*24 = 48 bytes and getting 64 bytes requires 2 iterations. So time taken for  parallel access T=decoding time +latency time. T = (K/2)+latency = 12+80 = 92 ns But C=2 for accesses =2*92=184ns (since in each iteration we need to select the banks and the bank decoding time (k/2) is independent of the number of banks we are going to access) This solution is contributed by Nitika Bansal.
 Question 49
A CPU has a five-stage pipeline and runs at 1 GHz frequency. Instruction fetch happens in the first stage of the pipeline. A conditional branch instruction computes the target address and evaluates the condition in the third stage of the pipeline. The processor stops fetching new instructions following a conditional branch until the branch outcome is known. A program executes 109 instructions out of which 20% are conditional branches. If each instruction takes one cycle to complete on average, the total execution time of the program is:
 A 1.0 second B 1.2 seconds C 1.4 seconds D 1.6 seconds
Computer Organization and Architecture    GATE-CS-2006
Discuss it

Question 49 Explanation:
In the 3rd stage of pipeline, there will be 2 stall cycles i.e. 2 delay slots.
Total number of instructions = 109
20% out of 109 are conditional branches.
Therefore, Cycle penalty = 0.2 * 2 * 109 = 4 * 109
Clock speed is 1 GHz and each instruction on average takes 1 cycle.
Total execution time = (109 / 109) + 4 * (108 / 109) = 1.4 seconds

Thus, total execution time of the program is 1.4 seconds.

Please comment below if you find anything wrong in the above post.
 Question 50
Consider two cache organizations: The first one is 32 KB 2-way set associative with 32-byte block size. The second one is of the same size but direct mapped. The size of an address is 32 bits in both cases. A 2-to-1 multiplexer has a latency of 0.6 ns while a kbit comparator has a latency of k/10 ns. The hit latency of the set associative organization is h1 while that of the direct mapped one is h2. The value of h1 is:
 A 2.4 ns B 2.3 ns C 1.8 ns D 1.7 ns
Computer Organization and Architecture    GATE-CS-2006
Discuss it

Question 50 Explanation:
Cache size = 32 KB = 32 * 210 bytes Cache block size = 32 bytes Number of blocks = 2
Total combinations are : = cache size / (Number of blocks * block size) = 32 * 210 / (2 * 32) = 512 = 29
Therefore, number of index bits = 9
Since, cache block size is 32 bytes i.e. 25 bytes. Number of offset bits = 5
So, number of tag bits = 32 – 9 – 5 = 18
Hit latency (h1) = 0.6 + (18 / 10) ns = 2.4 ns

Thus, option (A) is correct.

Please comment below if you find anything wrong in the above post.
 Question 51
Consider two cache organizations: The first one is 32 KB 2-way set associative with 32-byte block size. The second one is of the same size but direct mapped. The size of an address is 32 bits in both cases. A 2-to-1 multiplexer has a latency of 0.6 ns while a kbit comparator has a latency of k/10 ns. The hit latency of the set associative organization is h1 while that of the direct mapped one is h2. The value of h2 is:
 A 2.4 ns B 2.3 C 1.8 D 1.7
Computer Organization and Architecture    GATE-CS-2006
Discuss it

Question 51 Explanation:
Cache size = 32 KB = 32 * 210 bytes Cache block size = 32 bytes Number of blocks = 1
Total combinations are : = cache size / (Number of blocks * block size) = 32 * 210 / (1 * 32) = 1024 = 210
Therefore, number of index bits = 10
Since, cache block size is 32 bytes i.e. 25 bytes. Number of offset bits = 5
So, number of tag bits = 32 – 10 – 5 = 17
Hit latency (h2) = (17 / 10) ns = 1.7 ns

Thus, option (D) is correct.

Please comment below if you find anything wrong in the above post.
 Question 52
A CPU has a 32 KB direct mapped cache with 128-byte block size. Suppose A is a twodimensional array of size 512×512 with elements that occupy 8-bytes each. Consider the following two C code segments, P1 and P2. P1:

for (i=0; i<512; i++) {
for (j=0; j<512; j++) {
x += A[i][j];
}
} 
P2:

for (i=0; i<512; i++) {
for (j=0; j<512; j++) {
x += A[j][i];
}
}
P1 and P2 are executed independently with the same initial state, namely, the array A is not in the cache and i, j, x are in registers. Let the number of cache misses experienced by P1 be M1 and that for P2 be M2 . The value of M1 is:
 A 0 B 2048 C 16384 D 262144
Computer Organization and Architecture    GATE-CS-2006
Discuss it

Question 52 Explanation:
[P1] runs the loops in a way that access elements of A in row major order and [P2] accesses elements in column major order. No of cache blocks = CacheSize/BlockSize = 32KB / 128 Byte = 256 No. of array elements in Each Block = BlockSize/ElementSize = 128 Byte / 8 Byte = 16 Total Misses for [P1] = ArraySize * (No. of array elements in Each Block) / (No of cache blocks) = 512 * 512 * 16 / 256 = 16384
 Question 53
A CPU has a 32 KB direct mapped cache with 128-byte block size. Suppose A is a twodimensional array of size 512×512 with elements that occupy 8-bytes each. Consider the following two C code segments, P1 and P2. P1:

for (i=0; i<512; i++) {
for (j=0; j<512; j++) {
x += A[i][j];
}
} 
P2:

for (i=0; i<512; i++) {
for (j=0; j<512; j++) {
x += A[j][i];
}
}
P1 and P2 are executed independently with the same initial state, namely, the array A is not in the cache and i, j, x are in registers. Let the number of cache misses experienced by P1 be M1 and that for P2 be M2 . The value of the ratio M1/M2 is:
 A 0 B 1/16 C 1/8 D 16
Computer Organization and Architecture    GATE-CS-2006
Discuss it

Question 53 Explanation:
[P2] runs the loops in a way that access elements of A in row major order and [P2] accesses elements in column major order. No of cache blocks = CacheSize/BlockSize = 32KB / 128 Byte = 256 No. of array elements in Each Block = BlockSize/ElementSize = 128 Byte / 8 Byte = 16 Total Misses for [P1] = ArraySize * (No. of array elements in Each Block) / (No of cache blocks) = 512 * 512 * 16 / 256 = 16384 Total Misses for [P2] = Total Number of elements in array (For every element, there would be a miss) = 512 * 512 = 262144. Ration m1/m2 = 16384 / 262144 = 1/16.
 Question 54
Which one of the following is true for a CPU having a single interrupt request line and a single interrupt grant line?
 A Neither vectored interrupt nor multiple interrupting devices are possible. B Vectored interrupts are not possible but multiple interrupting devices are possible. C Vectored interrupts and multiple interrupting devices are both possible. D Vectored interrupt is possible but multiple in­terrupting devices are not possible.
Computer Organization and Architecture    GATE-CS-2005
Discuss it

Question 54 Explanation:
CPU has single interrupt request and grant line Multiple request can be given to CPU but CPU interrupts only for highest priority interrupt so option A and D are wrong But in case of single interrupts line vectored interrupts are definitely not possible So (B) is correct option
 Question 55
Consider a three word machine instruction
ADD A[R0], @ B
The first operand (destination) "A [R0]" uses indexed addressing mode with R0 as the index register. The second operand (source) "@ B" uses indirect addressing mode. A and B are memory addresses residing at the second and the third words, respectively. The first word of the instruction specifies the opcode, the index register designation and the source and destination addressing modes. During execution of ADD instruction, the two operands are added and stored in the destination (first operand). The number of memory cycles needed during the execution cycle of the instruction is
 A 3 B 4 C 5 D 6
Computer Organization and Architecture    GATE-CS-2005
Discuss it

Question 55 Explanation:
In Indexed addressing mode, the base address is already in the instruction i.e A and to fetch the index data from R0 no memory access is required because it's a register So to fetch the operand only 1 memory cycle is required. Indirect Addressing mode requires 2 memory cycles only
 Question 56
Match each of the high level language statements given on the left hand side with the most natural addressing mode from those listed on the right hand side.
 1 A[1] = B[J];	     a Indirect addressing
2 while [*A++];     b Indexed, addressing
3 int temp = *x;    c Autoincrement 
 A (1, c), (2, b), (3, a) B (1, a), (2, c), (3, b) C (1, b), (2, c), (3, a) D (1, a), (2, b), (3, c)
Computer Organization and Architecture    GATE-CS-2005
Discuss it

Question 56 Explanation:
List 1                           List 2
1) A[1] = B[J];      b) Indirect addressing
Here indexing is used

2) while [*A++];     c) auto increment
The memory locations are automatically incremented

3) int temp = *x;    a) Indirect addressing
Here temp is assigned the value of int type stored
at the address contained in X

 Question 57
Consider a direct mapped cache of size 32 KB with block size 32 bytes. The CPU generates 32 bit addresses. The number of bits needed for cache indexing and the number of tag bits are respectively
 A 10, 17 B 10, 22 C 15, 17 D 5, 17
Computer Organization and Architecture    GATE-CS-2005
Discuss it

Question 57 Explanation:
Cache is direct mapped size of cache=32 KB  =2 5* 2 10 Bytes=2 15 Bytes. Require 15 bits for cache addressing  so CPU address has tag and index  No. of tag bits=32-15=17 From 15 cache addressing bits consist of blocks and words. Each block has 32 words(bytes) So require 5 bit.Index=block +word  Block=15-5=10  So 10,17 Hence (A)  is correct option.
 Question 58
A 5 stage pipelined CPU has the following sequence of stages:
IF — Instruction fetch from instruction memory,
RD — Instruction decode and register read,
EX — Execute: ALU operation for data and address computation,
MA — Data memory access - for write access, the register read
at RD stage is used,
WB — Register write back.
Consider the following sequence of instructions:
I1 : L R0, 1oc1;        R0 <= M[1oc1]
I2 : A R0, R0;           R0 <= R0 + R0
I3 : S R2, R0;           R2 <= R2 - R0
Let each stage take one clock cycle.
What is the number of clock cycles taken to complete the above sequence of instructions starting from the fetch of I1 ?
 A 8 B 10 C 12 D 15
Computer Organization and Architecture    GATE-CS-2005
Discuss it

Question 58 Explanation:

If we use operand forwarding from memory stage :

If we don’t use operand forwarding :

Thus, clock cycles = 8 / 11 Since, 11 is not in the option. So, clock cycles = 8.

Thus, option (A) is correct.

Please comment below if you find anything wrong in the above post.
 Question 59
Consider the following data path of a CPU. The, ALU, the bus and all the registers in the data path are of identical size. All operations including incrementation of the PC and the GPRs are to be carried out in the ALU. Two clock cycles are needed for memory read operation - the first one for loading address in the MAR and the next one for loading data from the memory bus into the MDR The instruction "add R0, R1" has the register transfer interpretation R0 < = R0 + R1. The minimum number of clock cycles needed for execution cycle of this instruction is.
 A 2 B 3 C 4 D 5
Computer Organization and Architecture    GATE-CS-2005
Discuss it

Question 59 Explanation:
Minimum number of clock cycles (execution only) = 3 1) load Y 2) input R1, add 3) output to R1
 Question 60
A 4-stage pipeline has the stage delays as 150, 120, 160 and 140 nanoseconds respectively. Registers that are used between the stages have a delay of 5 nanoseconds each. Assuming constant clocking rate, the total time taken to process 1000 data items on this pipeline will be
 A 120.4 microseconds B 160.5 microseconds C 165.5 microseconds D 590.0 microseconds
Computer Organization and Architecture    GATE-CS-2004
Discuss it

Question 60 Explanation:
Delay between each stage is 5 ns.
Total delay in pipline = 150 + 120 + 160 + 140 = 570
Total delay for one data item = 570 + 5*3 (Note that there are 3 intermediate registers)
= 585
For 1000 data items, first data will take 585 ns to complete and rest
999 data will take max of all the stages that is 160 ns + 5 ns register delay

Total Delay = 585 + 999*165 ns which is approximately 165.5 microsecond. 
 Question 61
For a pipelined CPU with a single ALU, consider the following situations
1. The j + 1-st instruction uses the result of the j-th instruction
as an operand
2. The execution of a conditional jump instruction
3. The j-th and j + 1-st instructions require the ALU at the same
time
Which of the above can cause a hazard ?
 A 1 and 2 only B 2 and 3 only C 3 only D All of above
Computer Organization and Architecture    GATE-CS-2003
Discuss it

Question 61 Explanation:
Case 1: Is of data dependency .this can’t be safe with single ALU so read after write. Case 2:Conditional jumps are always hazardous they create conditional dependency in pipeline. Case 3:This is write after read problem or concurrency dependency so hazardous All the three are hazardous So (D) is correct option.
 Question 62
The performance of a pipelined processor suffers if :
 A the pipeline stages have different delays B consecutive instructions are dependent on each other C the pipeline stages share hardware resources D all of the above
Computer Organization and Architecture    GATE-CS-2002
Discuss it

Question 62 Explanation:
Pipelining is a method to execute a program breaking it in several independent sequence of stages. In that case pipeline stages can’t have different delays ,no dependency among consecutive  instructions and sharing of hardware resources should not be there. So option (D) is correct
 Question 63
More than one word are put in one cache block to
 A exploit the temporal locality of reference in a program B exploit the spatial locality of reference in a program C reduce the miss penalty D none of the above
Computer Organization and Architecture    GATE-CS-2001
Discuss it

Question 63 Explanation:
 Question 64
Where does the swap space reside?
 A RAM B Disk C ROM D On-chip cache
Computer Organization and Architecture
Discuss it

 Question 65
Consider the following data path of a simple non-pilelined CPU. The registers A, B, A1, A2, MDR, the bus and the ALU are 8-bit wide. SP and MAR are 16-bit registers. The MUX is of size 8 × (2:1) and the DEMUX is of size 8 × (1:2). Each memory operation takes 2 CPU clock cycles and uses MAR (Memory Address Register) and MDR (Memory Date Register). SP can be decremented locally.
The CPU instruction “push r”, where = A or B, has the specification
  M [SP]
How many CPU clock cycles are needed to execute the “push r” instruction?
 A 1 B 3 C 4 D 5
Computer Organization and Architecture    GATE-CS-2001
Discuss it

Question 65 Explanation:
Push ‘r’ consist of following operations : M[SP ]!R SP!SP-1 ‘r’ is stored at memory at address stack pointer currently is, this take 2 clock cycles SP is then decremented to point to next top of stack So total cycles=3 So (B) is correct option
 Question 66
Comparing the time T1 taken for a single instruction on a pipelined CPU with time T2 taken on a non­ pipelined but identical CPU, we can say that
 A T1 <= T2 B T1 >= T2 C T1 < T2 D T1 is T2 plus the time taken for one instruction fetch cycle
Computer Organization and Architecture    GATE-CS-2000
Discuss it

Question 66 Explanation:
Pipelining does not increase the execution time of a single instruction. It increases the overall performance by executing instructions in multiple pipeline stages. We assume that each stage takes ‘T’ unit of time both in pipelined and non-pipelined CPU. Let total stages in pipelined CPU = Total stages in non-pipelined CPU = K and number of Instructions = N = 1
• Pipelined CPU : Total time (T1) = (K + (N - 1)) * T = KT
• Non-Pipelined CPU : Total time (T2) = KNT = KT Considering buffer delays in pipelined CPU, T1 >= T2  Thus, option (B) is the answer. Please comment below if you find anything wrong in the above post.
 Question 67
For computers based on three-address instruction formats, each address field can be used to specify which of the following:
S1: A memory operand
S2: A processor register
S3: An implied accumulator register
 A Either S1 or S2 B Either S2 or S3 C Only S2 and S3 D All of S1, S2 and S3
Computer Organization and Architecture    GATE-CS-2015 (Set 1)
Discuss it

Question 67 Explanation:
In Three address instruction format, each operand specifies either a memory address or a register. See http://homepage.cs.uiowa.edu/~ghosh/1-19-06.pdf
 Question 68
Consider a non-pipelined processor with a clock rate of 2.5 gigahertz and average cycles per instruction of four. The same processor is upgraded to a pipelined processor with five stages; but due to the internal pipeline delay, the clock speed is reduced to 2 gigahertz. Assume that there are no stalls in the pipeline. The speed up achieved in this pipelined processor is __________.
 A 3.2 B 3 C 2.2 D 2
Computer Organization and Architecture    GATE-CS-2015 (Set 1)
Discuss it

Question 68 Explanation:
Speedup = ExecutionTimeOld / ExecutionTimeNew

ExecutionTimeOld = CPIOld * CycleTimeOld
[Here CPI is Cycles Per Instruction]
= CPIOld * CycleTimeOld
= 4 * 1/2.5 Nanoseconds
= 1.6 ns

Since there are no stalls, CPUnew can be assumed 1 on average.
ExecutionTimeNew = CPInew * CycleTimenew
= 1 * 1/2
= 0.5

Speedup = 1.6 / 0.5 = 3.2
 Question 69
The least number of temporary variables required to create a three-address code in static single assignment form for the expression q + r/3 + s – t * 5 + u * v/w is
 A 4 B 8 C 7 D 9
Computer Organization and Architecture    GATE-CS-2015 (Set 1)
Discuss it

Question 69 Explanation:

The correct answer is 8. This question was asked as a fill in the blank type question in the exam.

Three address code is an intermediate code generated by compilers while optimizing the code. Each three address code instruction can have atmost three operands (constants and variables) combined with an assignment and a binary operator. The point to be noted in three address code is that the variables used on the left hand side (LHS) of the assignment cannot be repeated again in the LHS side. Static single assignment (SSA) is nothing but a refinement of the three address code.

So, in this question, we have
t1 = r / 3;

t2 = t * 5;

t3 = u * v;

t4 = t3 / w;

t5 = q + t1;

t6 = t5 + s;

t7 = t6 - t2;

t8 = t7 + t4;
Therefore, we require 8 temporary variables (t1 to t8) to create the three address code in static single assignment form.
 Question 70
Assume that for a certain processor, a read request takes 50 nanoseconds on a cache miss and 5 nanoseconds on a cache hit. Suppose while running a program, it was observed that 80% of the processor’s read requests result in a cache hit. The average read access time in nanoseconds is____________.
 A 10 B 12 C 13 D 14
Computer Organization and Architecture    GATE-CS-2015 (Set 2)
Discuss it

Question 70 Explanation:
The average read access time in nanoseconds = 0.8 * 5 + 0.2*50 = 14
 Question 71
Consider a processor with byte-addressable memory. Assume that all registers, including Program Counter (PC) and Program Status Word (PSW), are of size 2 bytes. A stack in the main memory is implemented from memory location (0100)16 and it grows upward. The stack pointer (SP) points to the top element of the stack. The current value of SP is (016E)16. The CALL instruction is of two words, the first word is the op-code and the second word is the starting address of the subroutine (one word = 2 bytes). The CALL instruction is implemented as follows:
   • Store the current value of PC in the stack.
• Store the value of PSW register in the stack.
• Load the starting address of the subroutine in PC. 
The content of PC just before the fetch of a CALL instruction is (5FA0)16. After execution of the CALL instruction, the value of the stack pointer is
 A (016A)16 B (016C)16 C (0170)16 D (0172)16
Computer Organization and Architecture    GATE-CS-2015 (Set 2)
Discuss it

Question 71 Explanation:
The current value of SP is (016E)16

The value of SP after following operations is asked
in question

• Store the current value of PC in the stack.
This operation increments SP by 2 bytes as size
of PC is given 2 bytes in question.
So becomes (016E)16 + 2  = (0170)16

• Store the value of PSW register in the stack.
This operation also increments SP by 2 bytes as size
of PSW is also given 2 bytes.
So becomes (0170)16 + 2  = (0172)16

The Load operation doesn't change SP.

So new value of SP is  (016E)16 
 Question 72
Consider the sequence of machine instructions given below:
  MUL R5, R0, R1
DIV R6, R2, R3
SUB R8, R7, R4 
In the above sequence, R0 to R8 are general purpose registers. In the instructions shown, the first register stores the result of the operation performed on the second and the third registers. This sequence of instructions is to be executed in a pipelined instruction processor with the following 4 stages: (1) Instruction Fetch and Decode (IF), (2) Operand Fetch (OF), (3) Perform Operation (PO) and (4) Write back the Result (WB). The IF, OF and WB stages take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for ADD or SUB instruction, 3 clock cycles for MUL instruction and 5 clock cycles for DIV instruction. The pipelined processor uses operand forwarding from the PO stage to the OF stage. The number of clock cycles taken for the execution of the above sequence of instructions is ___________
 A 11 B 12 C 13 D 14
Computer Organization and Architecture    GATE-CS-2015 (Set 2)
Discuss it

Question 72 Explanation:
  1   2   3   4   5   6   7   8   9   10   11   12   13
IF  OF  PO  PO  PO  WB
IF  OF          PO  PO  PO  PO  PO   WB
IF          OF                   PO   WB
IF          OF                    PO   WB
 Question 73
Consider a machine with a byte addressable main memory of 220 bytes, block size of 16 bytes and a direct mapped cache having 212 cache lines. Let the addresses of two consecutive bytes in main memory be (E201F)16 and (E2020)16. What are the tag and cache line address (in hex) for main memory address (E201F)16?
 A E, 201 B F, 201 C E, E20 D 2, 01F
Computer Organization and Architecture    GATE-CS-2015 (Set 3)
Discuss it

Question 73 Explanation:
Block Size = 16 bytes
Block Offset = 4

No. of sets or cache lines = 212
Number of index bits = 12

Size of main memory = 220
Number of tag bits = 20 - 12 - 4 = 4

Let us consider the hex address E201F
Tag lines = First 4 bits = E (in hex)
Cache lines = Next 12 bits  = 201 (In Hex) 
Refer http://virtual-labs.ac.in/labs/cse10/dmc.html
 Question 74
Consider the following code sequence having five instructions I1 to I5. Each of these instructions has the following format.
    OP Ri, Rj, Rk
where operation OP is performed on contents of registers Rj and Rk and the result is stored in register Ri.
   I1 : ADD R1, R2, R3
I2 : MUL R7, R1, R3
I3 : SUB R4, R1, R5
I4 : ADD R3, R2, R4
I5 : MUL R7, R8, R9 
Consider the following three statements:
S1: There is an anti-dependence between instructions I2 and I5.
S2: There is an anti-dependence between instructions I2 and I4.
S3: Within an instruction pipeline an anti-dependence always
creates one or more stalls. 
Which one of above statements is/are correct?
 A Only S1 is true B Only S2 is true C Only S1 and S2 are true D Only S2 and S3 are true
Computer Organization and Architecture    GATE-CS-2015 (Set 3)
Discuss it

Question 74 Explanation:
The given instructions can be written as below:
I1: R1 = R2 + R3
I2: R7 = R1 * R3
I3: R4 = R1 - R5
I4: R3 = R2 + R4
I5: R7 = R8 * R9 
An anti-dependency, also known as write-after-read (WAR), occurs when an instruction requires a value that is later updated.
S1: There is an anti-dependence between instructions I2 and I5.
False, I2 and I5 don't form any write after read situation.
They both write R7.

S2: There is an anti-dependence between instructions I2 and I4.
True, I2 reads R3 and I4 writes it.

S3: Within an instruction pipeline an anti-dependence always
creates one or more stalls.
Anti-dependency can be removed by renaming variables.
See following example.
1. B = 3
2. A = B + 1
3. B = 7
Renaming of variables could remove the dependency.
1. B = 3
N. B2 = B
2. A = B2 + 1
3. B = 7
 Question 75
Consider the following reservation table for a pipeline having three stages S1, S2 and S3.
     Time -->
-----------------------------
1    2   3    4     5
-----------------------------
S1  | X  |   |   |    |  X |
S2  |    | X |   | X  |    |
S3  |    |   | X |    |    |
The minimum average latency (MAL) is __________
 A 3 B 2 C 1 D 4
Computer Organization and Architecture    GATE-CS-2015 (Set 3)
Discuss it

Question 75 Explanation:
S1 | X | Y |   |   | X | Y | X | Y |   |   | X | Y |
S2 |   | X | Y | X | Y |   |   | X | Y | X | Y |   |
S3 |   |   | X | Y |   |   |   |   | X | Y |   |   |

We can interleave instructions like the above
pattern.

Latency between X and Y is 1.

Latency between fist and second X is 5.

The pattern repeats after that.
So, MAL is (1 + 5)/2;


 Question 76
What is the minimum size of ROM required to store the complete truth table of an 8-bit x 8-bit multiplier?
 A 32 K x 16 bits B 64 K x 16 bits C 16 K x 32 bits D 64 K x 32 bits
Computer Organization and Architecture    GATE-IT-2004
Discuss it

Question 76 Explanation:
Input to ROM - 2 lines ,8 bit each. Possible combinations in ROM - (2^8)x(2^8) Size of truth table = (2^8)*(2^8)=2^16=64 KB Maximum output size = 16 bit So, Answer is B
 Question 77
Consider a system with 2 level caches. Access times of Level 1 cache, Level 2 cache and main memory are 1 ns, 10ns, and 500 ns, respectively. The hit rates of Level 1 and Level 2 caches are 0.8 and 0.9, respectively. What is the average access time of the system ignoring the search time within the cache?
 A 13.0 ns B 12.8 ns C 12.6 ns D 12.4 ns
Computer Organization and Architecture    GATE-IT-2004
Discuss it

Question 77 Explanation:
First, the system will look in cache 1. If it is not found in cache 1, then cache 2 and then further in main memory (if not in cache 2 also). The average access time would take into consideration success in cache 1, failure in cache 1 but success in cache 2, failure in both the caches and success in main memory.
Average access time = [H1*T1]+[(1-H1)*H2*T2]+[(1-H1)(1-H2)*Hm*Tm]
where, H1 = Hit rate of level 1 cache = 0.8 T1 = Access time for level 1 cache = 1 ns H2 = Hit rate of level 2 cache = 0.9 T2 = Access time for level 2 cache = 10 ns Hm = Hit rate of Main Memory = 1 Tm = Access time for Main Memory = 500 ns   So, Average Access Time   = ( 0.8 * 1 ) + ( 0.2 * 0.9 * 10 ) + ( 0.2 * 0.1 * 1 * 500)

= 0.8 + 1.8 + 10

= 12.6 ns

Thus, C is the correct choice.
 Question 78
Consider a 4 stage pipeline processor. The number of cycles needed by the four instructions I1, I2, I3, I4 in stages S1, S2, S3, S4 is shown below: What is the number of cycles needed to execute the following loop?
for (i = 1; i < = 1000; i++)
{I1, I2, I3, I4}
 A 11 ns B 12 ns C 13 ns D 28 ns
Computer Organization and Architecture    GATE-IT-2004
Discuss it

 Question 79
A CPU has only three instructions I1, I2 and I3, which use the following signals in time steps T1-T5: I1 : T1 : Ain, Bout, Cin T2 : PCout, Bin T3 : Zout, Ain T4 : Bin, Cout T5 : End I2 : T1 : Cin, Bout, Din T2 : Aout, Bin T3 : Zout, Ain T4 : Bin, Cout T5 : End I3 : T1 : Din, Aout T2 : Ain, Bout T3 : Zout, Ain T4 : Dout, Ain T5 : End Which of the following logic functions will generate the hardwired control for the signal Ain ?
 A T1.I1 + T2.I3 + T4.I3 + T3 B (T1 + T2 + T3).I3 + T1.I1 C (T1 + T2 ).I1 + (T2 + T4).I3 + T3 D (T1 + T2 ).I2 + (T1 + T3).I1 + T3
Computer Organization and Architecture    GATE-IT-2004
Discuss it

 Question 80
In an enhancement of a design of a CPU, the speed of a floating point unit has been increased by 20% and the speed of a fixed point unit has been increased by 10%. What is the overall speedup achieved if the ratio of the number of floating point operations to the number of fixed point operations is 2:3 and the floating point operation used to take twice the time taken by the fixed point operation in the original design?
 A 1.155 B 1.185 C 1.255 D 1.285
Computer Organization and Architecture    GATE-IT-2004
Discuss it

Question 80 Explanation:
Speed Up = Time taken in original design / Time taken in enhanced design
In original design:
Ratio of floating point operations to fixed point operations = 2:3
Therefore let floating point operations be 2n and fixed point operations be 3n.

Ratio of time taken by floating point operation to fixed point operation =2:1
Therefore let time taken by floating point operation be 2t and by fixed point
operation be t.

Time taken by the original design = (2n * 2t) + (3n * t) = 7nt

In Enhanced design:
As the speed of the floating point operation is increased by
20%  (1.2 * original speed) time taken for a floating point operation
would be 83.33%  of the original time (original time/1.2)(This is because CPU
speed(S) is inversely proportional to execution time (T) hence if speed becomes
1.2S time would become T/1.2 )

Similarly for a of fixed point operation speed is increased by 10% (1.1 * original
speed), it means the time taken now would be 90.91% of the original time
(original time / 1.1) taken in case of fixed point operation.

Time taken by enhanced design= (2n * 2t /1.2) + (3n * t /1.1) = 6.06nt

Speed up= 7nt / 6.06nt = 1.155 This explanation has been contributed by Yashika Arora.
 Question 81
A dynamic RAM has a memory cycle time of 64 nsec. It has to be refreshed 100 times per msec and each refresh takes 100 nsec. What percentage of the memory cycle time is used for refreshing?
 A 10 B 6.4 C 1 D 0.64
Computer Organization and Architecture    Gate IT 2005
Discuss it

Question 81 Explanation:

Memory cycle time = 64 ns Memory is refreshed 100 times per msec.
Number of refreshes in 1 memory cycle (i.e in 64 ns) = (100 * 64 * 10-9) / 10-3 = 64 * 10-4.
Time taken for each refresh = 100 ns Time taken for 64 * 10-4 refreshes = 64 * 10-4 * 100 * 10-9 sec = 64 * 10-11 sec.
Percentage of the memory cycle time used for refreshing : = (Time taken to refresh in 1 memory cycle / Total time) * 100 = (64 * 10-11 / 64 * 10-9) * 100 = 1 %

Thus, option (C) is correct.

Please comment below if you find anything wrong in the above post.
 Question 82
We have two designs D1 and D2 for a synchronous pipeline processor. D1 has 5 pipeline stages with execution times of 3 nsec, 2 nsec, 4 nsec, 2 nsec and 3 nsec while the design D2 has 8 pipeline stages each with 2 nsec execution time How much time can be saved using design D2 over design D1 for executing 100 instructions?
 A 214 nsec B 202 nsec C 86 nsec D - 200 nsec
Computer Organization and Architecture    Gate IT 2005
Discuss it

Question 82 Explanation:

Total execution time = (k + n – 1) * maximum clock cycle Where k = total number of stages and n = total number of instructions
For D1 : k = 5 and n = 100 Maximum clock cycle = 4ns Total execution time = (5 + 100 - 1) * 4 = 416
For D2 : k = 8 and n = 100 Each clock cycle = 2ns Total execution time = (8 + 100 - 1) * 2 = 214
Thus, time saved using D2 over D1 = 416 – 214 =202

Thus, option (B) is correct.

Please comment below if you find anything wrong in the above post.
 Question 83
A hardwired CPU uses 10 control signals S1 to S10, in various time steps T1 to T5, to implement 4 instructions I1 to I4 as shown below: Which of the following pairs of expressions represent the circuit for generating control signals S5 and S10 respectively? ((Ij+Ik)Tn indicates that the control signal should be generated in time step Tn if the instruction being executed is Ij or lk)
 A S5=T1+I2⋅T3 and S10=(I1+I3)⋅T4+(I2+I4)⋅T5 B S5=T1+(I2+I4)⋅T3 and S10=(I1+I3)⋅T4+(I2+I4)⋅T5 C S5=T1+(I2+I4)⋅T3 and S10=(I2+I3+I4)⋅T2+(I1+I3)⋅T4+(I2+I4)⋅T5 D S5=T1+(I2+I4)⋅T3 and S10=(I2+I3)⋅T2+I4⋅T3+(I1+I3)⋅T4+(I2+I4)⋅T5
Computer Organization and Architecture    Gate IT 2005
Discuss it

 Question 84
n instruction set of a processor has 125 signals which can be divided into 5 groups of mutually exclusive signals as follows:
Group 1 : 20 signals, Group 2 : 70 signals, Group 3 : 2 signals, Group 4 : 10 signals, Group 5 : 23 signals.
How many bits of the control words can be saved by using vertical microprogramming over horizontal microprogramming?
 A 0 B 103 C 22 D 55
Computer Organization and Architecture    Gate IT 2005
Discuss it

Question 84 Explanation:

In horizontal microprogramming, each control signal is represented by one bit in the microinstruction. Therefore, total number of bits of the control words required in Horizontal microprogramming : = 20 + 70 + 2 + 10 + 23 = 125 bits
In vertical microprogramming, 'n' control signals encoded into log2 n bits. group 1 : log2 20 = 5 bits group 2 : log2 70 = 7 bits group 3 : log2 2 = 1 bits group 4 : log2 10 = 4 bits group 5 : log2 23 = 5 bits
Total number of bits required in vertical microprogramming = 5 + 7 + 1 + 4 + 5 = 22 bits
So, number of bits saved= 125 - 22 = 103 bits.

Thus, option (B) is correct.

Please comment below if you find anything wrong in the above post.
 Question 85
In a computer system, four files of size 11050 bytes, 4990 bytes, 5170 bytes and 12640 bytes need to be stored. For storing these files on disk, we can use either 100 byte disk blocks or 200 byte disk blocks (but can't mix block sizes). For each block used to store a file, 4 bytes of bookkeeping information also needs to be stored on the disk. Thus, the total space used to store a file is the sum of the space taken to store the file and the space taken to store the book keeping information for the blocks allocated for storing the file. A disk block can store either bookkeeping information for a file or data from a file, but not both. What is the total space required for storing the files using 100 byte disk blocks and 200 byte disk blocks respectively?
 A 35400 and 35800 bytes B 35800 and 35400 bytes C 35600 and 35400 bytes D 35400 and 35600 bytes
Computer Organization and Architecture    Gate IT 2005
Discuss it

Question 85 Explanation:

Using 100 bytes disk blocks :
1. File of size 11050 bytes Blocks required to store data = 11050/100 = 111 Blocks required for bookkeeping = (111 * 4)/100 = 5 Total blocks = 111 + 5 = 116
2. File of size 4990 bytes Blocks required to store data = 4990/100 = 50 Blocks required for bookkeeping = (50 * 4)/100 = 2 Total blocks = 50 + 2 = 52
3. File of size 5170 bytes Blocks required to store data = 5170/100 = 52 Blocks required for bookkeeping = (52 * 4)/100 = 3 Total blocks = 52 + 3 = 55
4. File of size 12640 bytes Blocks required to store data = 12640/100 = 127 Blocks required for bookkeeping = (127 * 4)/100 = 6 Total blocks = 127 + 6 = 133
Total space required for storing the files using 100 byte disk blocks = (116 + 52 + 55 + 133) * 100 = 35600 bytes

Using 200 bytes disk blocks :
1. File of size 11050 bytes Blocks required to store data = 11050/200 = 56 Blocks required for bookkeeping = (56 * 4)/200 = 2 Total blocks = 56 + 2 = 58
2. File of size 4990 bytes Blocks required to store data = 4990/200 = 25 Blocks required for bookkeeping = (25 * 4)/200 = 1 Total blocks = 25 + 1 = 26
3. File of size 5170 bytes Blocks required to store data = 5170/200 = 26 Blocks required for bookkeeping = (26 * 4)/200 = 1 Total blocks = 26 + 1 = 27
4. File of size 12640 bytes Blocks required to store data = 12640/200 = 64 Blocks required for bookkeeping = (64 * 4)/200 = 2 Total blocks = 64 + 2 = 66
Total space required for storing the files using 100 byte disk blocks = (58 + 26 + 27 + 66) * 200 = 35400 bytes

Thus, option (C) is correct.

Please comment below if you find anything wrong in the above post.
 Question 86
A processor can support a maximum memory of 4 GB, where the memory is word-addressable (a word consists of two bytes). The size of the address bus of the processor is at ____ least bits.   Note : This question was asked as Numerical Answer Type.
 A 16 B 31 C 32 D None
Computer Organization and Architecture    GATE-CS-2016 (Set 1)
Discuss it

Question 86 Explanation:
Maximum Memory = 4GB = 232 bytes Size of a word = 2 bytes Therefore, Number of words = 232 / 2 = 231 So, we require 31 bits for the address bus of the processor.   Thus, B is the correct choice.
 Question 87
The size of the data count register of a DMA controller is 16 bits. The processor needs to transfer a file of 29,154 kilobytes from disk to main memory. The memory is byte addressable. The minimum number of times the DMA controller needs to get the control of the system bus from the processor to transfer the file from the disk to main memory is _________   Note : This question was asked as Numerical Answer Type.
 A 3644 B 3645 C 456 D 1823
Computer Organization and Architecture    GATE-CS-2016 (Set 1)
Discuss it

Question 87 Explanation:
Size of data count register of the DMA controller = 16 bits Data that can be transferred in one go = 216 bytes = 64 kilobytes File size to be transferred = 29154 kilobytes So, number of times the DMA controller needs to get the control of the system bus from the processor to transfer the file from the disk to main memory = ceil(29154/64) = 456   Thus, C is the correct answer.
 Question 88
The stage delays in a 4-stage pipeline are 800, 500, 400 and 300 picoseconds. The first stage (with delay 800 picoseconds) is replaced with a functionally equivalent design involving two stages with respective delays 600 and 350 picoseconds. The throughput increase of the pipeline is _______ percent. [This Question was originally a Fill-in-the-Blanks question]
 A 33 or 34 B 30 or 31 C 38 or 39 D 100
Computer Organization and Architecture    GATE-CS-2016 (Set 1)
Discuss it

Question 88 Explanation:
Throughput of 1st case T1: 1/max delay =1/800
Throughput of 2nd case T2: 1/max delay= 1/600
%age increase in throughput: (T2-T1)/T2
= ( (1/600) - (1/800) ) / (1/800)
= 33.33%
 Question 89
A processor has 40 distinct instructions and 24 general purpose registers. A 32-bit instruction word has an opcode, two register operands and an immediate operand. The number of bits available for the immediate operand field is ____________ [This Question was originally a Fill-in-the-blanks Question]
 A 16 B 8 C 4 D 32
Computer Organization and Architecture    GATE-CS-2016 (Set 2)
Discuss it

Question 89 Explanation:
6 bits are needed for 40 distinct instructions ( because, 32 < 40 < 64 ) 5 bits are needed for 24 general purpose registers( because, 16< 24 < 32) 32-bit instruction word has an opcode(6 bit), two register operands(total 10 bits) and an immediate operand (x bits). The number of bits available for the immediate operand field => x = 32 - ( 6 + 10 ) = 16 bits
 Question 90
Suppose the functions F and G can be computed in 5 and 3 nanoseconds by functional units UF and UG, respectively. Given two instances of UF and two instances of UG, it is required to implement the computation F(G(Xi)) for 1 <= i <= 10. ignoring all other delays, the minimum time required to complete this computation is ________________ nanoseconds [Note that this is originally a Fill-in-the-Blanks Question]
 A 28 B 20 C 18 D 30
Computer Organization and Architecture    GATE-CS-2016 (Set 2)
Discuss it

Question 90 Explanation:
Background:Explanation: Pipelining is an implementation technique where multiple instructions are overlapped in execution. The stages are connected one to the next to form a pipe - instructions enter at one end, progress through the stages, and exit at the other end. Pipelining does not decrease the time for individual instruction execution. Instead, it increases instruction throughput. The throughput of the instruction pipeline is determined by how often an instruction exits the pipeline. The same concept is used in pipelining. Bottleneck here is UF as it takes 5 ns while UG takes 3ns only. We have to do 10 such calculations and we have 2 instances of UF and UG respectively. Since there are two functional units, each unit get 5 numbers of units to compute on. Suppose computation starts at time 0. which means G starts at 0 and F starts at 3rd second since G finishes computing first element at third second. So, UF can be done in 5*10/2=25 nano seconds. For the start UF needs to wait for UG output for 3 ns and rest all are pipelined and hence no more wait. So, answer is 3+25=28 This solution is contributed by Nitika Bansal Another solution : Since there are two functional units each unit get 5 numbers of units to compute on. Suppose computation starts at time 0. which means G starts at 0 and F starts at 3rd second since G finishes computing first element at third second. Time at which F ends computing = 3 + 5*5 = 28
 Question 91
Consider a processor with 64 registers and an instruction set of size twelve. Each instruction has five distinct fields, namely, opcode, two source register identifiers, one destination register identifier, and a twelve-bit immediate value. Each instruction must be stored in memory in a byte-aligned fashion. If a program has 100 instructions, the amount of memory (in bytes) consumed by the program text is ____________ [Note that this was originally a Fill-in-the-Blanks question]
 A 100 B 200 C 400 D 500
Computer Organization and Architecture    GATE-CS-2016 (Set 2)
Discuss it

Question 91 Explanation:
One instruction is divided into five parts,
1) The opcode- As we have instruction set of size 12,
an instruction opcode can be identified by 4 bits,
as 2^4=16 and we cannot go any less.

2) & (3) Two source register identifiers- As there
are total 64 registers, they can be identified by
6 bits. As they are two i.e. 6 bit + 6 bit.

4) One destination register identifier- Again it will
be 6 bits.

5) A twelve bit immediate value- 12 bit.

4 + 6 + 6 + 6 + 12 = 34 bit = 34/8 byte = 4.25 byte.

As there are 100 instructions,
We have a size of 425 byte, which can be stored in
500 byte memory from the given options.

Hence (D) 500 is the answer.
 Question 92
The width of the physical address on a machine is 40 bits. The width of the tag field in a 512 KB 8-way set associative cache is ____________ bits
 A 24 B 20 C 30 D 40
Computer Organization and Architecture    GATE-CS-2016 (Set 2)
Discuss it

Question 92 Explanation:
An easy approach would be we know that physical address is 40 bytes
We know cache size = no.of.sets*
lines-per-set*
block-size

Let us assume no of sets = 2^x
And block size= 2^y

So applying it in formula.
2^19 = 2^x + 8 + 2^y;
So x+y = 16

Now we know that to address block size and
set number we need 16 bits so remaining bits
must be for tag
i.e., 40 - 16 = 24
The answer is 24 bits 
If question extends and asks as what is the size of comparator, we need then it is 24 bit comparator. The above explanation is contributed by Sumanth Sunny Alternate Explanation:
Physical Address Bits = T(Tag Bits) + S(Set Bits) + O(Offset Bits) = 40 bits    (given)
Set = 8    (given)
Size of cache = 512 KB     (given)

Size of lines = 512 / 8 = 64 KB
So, O = 64/8 = 8 bits

Now, S + O = 8 + 8 = 16 bits
Hence, T = 40 - 16 = 24 bits

This explanation is contributed by Mohit Gupta.

Refer the following links for more understanding in the above topic:

Cache Memory
Cache Organization | Introduction

 Question 93
Consider a 3 GHz (gigahertz) processor with a three-stage pipeline and stage latencies v1, v2, and v3 such that v1 = 3v2/4 = 2v3. If the longest pipeline stage is split into two pipeline stages of equal latency, the new frequency is _________ GHz, ignoring delays in the pipeline registers
 A 2 B 4 C 8 D 16
Computer Organization and Architecture    GATE-CS-2016 (Set 2)
Discuss it

Question 93 Explanation:
Ans is B
Consider this pipeline
(V1) --> (V2) --> (V3)
Can be written as
(V) --> (4V/3) --> (V/2)
Where given V = V1 = 3V2/4 = 2V3 
Largest stage is stage 2 with 4V/3 seconds time required. Speed of processor is limited by this stage only. In fact this is the speed of the processor. Frequency given is 3Ghz, which means processor can execute
3 Giga clock cycle.... in 1 second
Or
1 clock cycle .....in (1/3G) secs
(G for giga)
But we know that stage latency of the largest stage in pipeline limits the time of 1 clock cycle. Hence
4V/3 = 1 clock cycle = 1/3G secs
V = 1/4G...........(1) 
Now largest stage that is stage 2 is split into equal size, so new pipeline is
(V)-->(2V/3)-->(2V/3)-->(V/2)
Now largest stage is V seconds Hence,
In V seconds do 1 clock cycle
In 1 second do 1/V clock cycles
But V = 1/4G
So in 1 second do 4 Ghz. {ANS} 
 Question 94
A file system uses an in-memory cache to cache disk blocks. The miss rate of the cache is shown in the figure. The latency to read a block from the cache is 1 ms and to read a block from the disk is 10 ms. Assume that the cost of checking whether a block exists in the cache is negligible. Available cache sizes are in multiples of 10 MB. The smallest cache size required to ensure an average read latency of less than 6 ms is _______ MB.
 A 10 B 20 C 30 D 40
Computer Organization and Architecture    GATE-CS-2016 (Set 2)
Discuss it

Question 94 Explanation:
When CPU needs to search for data, and finds it in cache, it's called a HIT, else wise MISS. If data is not found in the ache, then CPU searches it in main memory. Consider x to be MISS ratio, then (1-x) would be HIT ratio. Whenever there is hit, latency is 1ms and 10ms upon miss. Time to read from main memory(disk) for all misses = x * 10 ms Time to read for all hits from cache = (1-x)*1 ms Average time: 10x + 1 -x = 9x + 1 As asked in the question, average read latency should be less than 6 ms.
9x +1 < 6
9x < 5
x < 0.5556
For 20 MB, miss rate is 60% and for 30 MB, it is 40%. Thus, the smallest cache size required to ensure an average read latency of less than 6 ms is 30 MB.
 Question 95
Which of the following DMA transfer modes and interrupt handling mechanisms will enable the highest I/O band-width?
 A Transparent DMA and Polling interrupts B Cycle-stealing and Vectored interrupts C Block transfer and Vectored interrupts D Block transfer and Polling interrupts
Process Management    Input Output Systems    Computer Organization and Architecture    GATE IT 2006
Discuss it

 Question 96
 A It enables reduced instruction size B It allows indexing of array elements with same instruction C It enables easy relocation of data D It enables faster address calculations than absolute addressing
Computer Organization and Architecture    GATE IT 2006
Discuss it

Question 96 Explanation:
 Question 97
A cache line is 64 bytes. The main memory has latency 32ns and bandwidth 1G.Bytes/s. The time required to fetch the entire cache line from the main memory is
 A 32 ns B 64 ns C 96 ns D 128 ns
Computer Organization and Architecture    GATE IT 2006
Discuss it

Question 97 Explanation:
for 1 GBps bandwidth => it takes 1 sec to load 109 bytes on line
so, for 64 bytes it will take 64 * 1 /109 = 64 ns
main memory latency given is 32
so, total time required to place cache line is 64+32 = 96 ns
 Question 98
A computer system has a level-1 instruction cache (1-cache), a level-1 data cache (D-cache) and a level-2 cache (L2-cache) with the following specifications:

The length of the physical address of a word in the main memory is 30 bits. The capacity of the tag memory in the I-cache, D-cache and L2-cache is, respectively,
 A 1 K x 18-bit, 1 K x 19-bit, 4 K x 16-bit B 1 K x 16-bit, 1 K x 19-bit, 4 K x 18-bit C 1 K x 16-bit, 512 x 18-bit, 1 K x 16-bit D 1 K x 18-bit, 512 x 18-bit, 1 K x 18-bit
Computer Organization and Architecture    GATE IT 2006
Discuss it

Question 98 Explanation:
Number of blocks in cache = Capacity / Block size = 2m
Bits to represent blocks = m
Number of words in a block = 2n words
Bits to represent a word = n
tag bits = (length of the physical address of a word) – (Bits to represent blocks ) – (Bits to represent a word)
Each block will have it's own tag bits. So total tag bits = number of blocks x tag bits.
 Question 99
Which of the following systems is a most likely candidate example of a pipe and filter architecture ?
 A Expert system B DB repository C Aircraft flight controller D Signal processing
Computer Organization and Architecture    Gate IT 2007
Discuss it

Question 99 Explanation:

Reference: Software Architecture: A Case Based Approach By Vasudeva Varma, Varma Vasudeva
 Question 100
A processor takes 12 cycles to complete an instruction I. The corresponding pipelined processor uses 6 stages with the execution times of 3, 2, 5, 4, 6 and 2 cycles respectively. What is the asymptotic speedup assuming that a very large number of instructions are to be executed?
 A 1.83 B 2 C 3 D 6
Computer Organization and Architecture    Gate IT 2007
Discuss it

Question 100 Explanation:
For non pipeline processor,
It takes, 12 cycles to complete 1 instruction
So, for n instructions it will take 12n cycle
For pipelined processor,
Each stage time = max{each stage cycles} = max{3, 2, 5, 4, 6 and 2} = 6 cycles
So, for n instructions it will take = 6*6+ (n-1)*6 {6*6 for 1st instruction and for rest of n-1 it will take 6}
For a large number of instructions:
Limn->∞ 12n/36 + (n-1)*6 = 12/6 =2
 Question 101
A processor that has carry, overflow and sign flag bits as part of its program status word (PSW) performs addition of the following two 2's complement numbers 01001101 and 11101001. After the execution of this addition operation, the status of the carry, overflow and sign flags, respectively will be:
 A 1, 1, 0 B 1, 0, 0 C 0, 1, 0 D 1, 0, 1
Digital Logic & Number representation    Computer Organization and Architecture    Gate IT 2008
Discuss it

Question 101 Explanation:
01001101
+11101001
------------
100110110
carry flag =1
overflow happens only when two same sign numbers are added and carry generated is different from both added numbers.
so, overflow flag = 0,
sign bit = 0
 Question 102
The exponent of 11 in the prime factorization of 300! is
 A 27 B 28 C 29 D 30
Computer Organization and Architecture    Gate IT 2008
Discuss it

Question 102 Explanation:
To get exponent of 11, first we need to get figure 11 in our series
=> Below 11, there will be no factor of 11
11*12*13*14*........22.......33........44...........55.......66.....77....88...99...110...121....132.....
# # # # # # # # # # (11*11) (11*12)
143.........154......165.....176.....187......198.......209.....220.....231....242.....253....264.....275 11*11*2 ....286......297
Total count of 11's = 29
 Question 103
Assume that EA = (X)+ is the effective address equal to the contents of location X, with X incremented by one word length after the effective address is calculated; EA = −(X) is the effective address equal to the contents of location X, with X decremented by one word length before the effective address is calculated; EA = (X)− is the effective address equal to the contents of location X, with X decremented by one word length after the effective address is calculated. The format of the instruction is (opcode, source, destination), which means (destination ← source op destination). Using X as a stack pointer, which of the following instructions can pop the top two elements from the stack, perform the addition operation and push the result back to the stack.
Computer Organization and Architecture    Gate IT 2008
Discuss it

Question 103 Explanation:

Effective address is the address of the operand. In the given question format is (opcode, source, destination), destination ← source op destination Ex- ADD (X),(Y) -&gt; Source=location X Destination=location Y Operand at Y=operand at X +operand at Y

Here, -X = decrement pointer X and then use the new location pointed by pointer for the operand. +X = increment pointer X and then use the new location pointed by pointer for the operand. X- = decrement pointer X but first use the old location pointed by X. X+ = increment pointer X but first use the old location pointed by X.

Now stack pointer is pointing to X. Say X is 100.

Then,our output should pop the first two elements , i.e. 10 and 5 and put that result in memory location at 99.

1. ADD (X)- ,(X) ->  Take operand1 as data at memory location X and then decrement X. Operand1 taken as data memory location 100 = 10, X=X-1; X=99; Then take operand 2 as data at memory location new X , Operand2= 5; Now, push back their addition at location X,which is still 99 So, our result is location 99 is filled with 15 which is the desired result. 2.  ADD (X), (X)− Take operand1 as data at memory location X.Operand1 taken as data memory location 100 = 10, Then take operand 2 as data at memory location  X which is still 100, Operand 2= 10; Now, push back their addition at location X,which is 100 So,our result is location 100 is filled with 20 which is not the desired result. 3.  ADD -(X), (X)+ Decrement and then take operand1 as data at memory location X. So X=99; Operand1 taken as data memory location 99 = 5, Then increment and then take operand 2 as data at memory location X X=X+1; X=100; Operand 2= 10; Now, push back their addition at location X,which is 100 So,our result is location 100 is filled with 15 which is not the desired result. 4. ADD -(X), (X) Decrement and then take operand1 as data at memory location X. So X=99; Operand1 taken as data memory location 99 = 5, Then take operand 2 as data at memory location X which is 99 Operand 2= 5; Now, push back their addition at location X,which is 99 So,our result is location 99 is filled with 10 which is not the desired result.   This solution is contributed by Shashank Shanker khare .
 Question 104
Consider a CPU where all the instructions require 7 clock cycles to complete execution. There are 140 instructions in the instruction set. It is found that 125 control signals are needed to be generated by the control unit. While designing the horizontal microprogrammed control unit, single address field format is used for branch control logic. What is the minimum size of the control word and control address register?
 A 125, 7 B 125, 10 C 135, 7 D 135, 10
Computer Organization and Architecture    Gate IT 2008
Discuss it

Question 104 Explanation:
As each instruction akes 7 cycles
=> 140 instructions will take = 14*7 cycles
=> 2m>=980
=>m>=10 => 10+125 bits and 10 bits
 Question 105
A non pipelined single cycle processor operating at 100 MHz is converted into a synchro­nous pipelined processor with five stages requiring 2.5 nsec, 1.5 nsec, 2 nsec, 1.5 nsec and 2.5 nsec, respectively. The delay of the latches is 0.5 nsec. The speedup of the pipeline processor for a large number of instructions is
 A 4.5 B 4 C 3.33 D 3
Computer Organization and Architecture    Gate IT 2008
Discuss it

Question 105 Explanation:
For non pipelined system time required = 2.5+1.5+2.0+1.5+2.5 =10
for pipelined system = Max(stage delay)+Max(Latch delay)
=> 2.5+0.5 = 3.0
speedup = time in non-pipelined system/time in pipelined system
= 10/3 = 3.33
 Question 106
Consider a computer with a 4-ways set-associative mapped cache of the following characteristics: a total of 1 MB of main memory, a word size of 1 byte, a block size of 128 words and a cache size of 8 KB. The number of bits in the TAG, SET and WORD fields, respectively are:
 A 7, 6, 7 B 8, 5, 7 C 8, 6, 6 D 9, 4, 7
Memory Management    Computer Organization and Architecture    Gate IT 2008
Discuss it

Question 106 Explanation:
According to the question it is given that No. of bytes in a word= 1byte No. of words per block of memory= 128 words Total size of the cache memory= 8 KB So the total number of block can be calculated as under Cache size/(no. words per block* size of 1 word) = 8KB/( 128*1) =64 Since, it is given that the computer has a 4 way set associative memory. Therefore, Total number of sets in the cache memory given = number of cache blocks given/4 = 64/4 = 16 So, the number of SET bits required = 4 as 16= power(2, 4). Thus, with 4 bits we will be able to get 16 possible output bits As per the question only physical memory information is given we can assume that cache memory is physically tagged. So, the memory can be divided into 16 regions or blocks. Size of the region a single set can address = 1MB/ 16 = power(2, 16 )Bytes = power(2, 16) / 128 = power(2, 9) cache blocks Thus, to uniquely identify these power(2, 9) blocks we will need 9 bits to tag these blocks. Thus, TAG= 9 Cache block is 128 words so for indicating any particular block we will need 7 bits as 128=power(2,7). Thus, WORD = 7. Hence the answer will be (TAG, SET, WORD) = (9,4,7).   This solution is contributed by Namita Singh.
 Question 107
Consider a computer with a 4-ways set-associative mapped cache of the following character­istics: a total of 1 MB of main memory, a word size of 1 byte, a block size of 128 words and a cache size of 8 KB. While accessing the memory location 0C795H by the CPU, the contents of the TAG field of the corresponding cache line is
 A 000011000 B 110001111 C 00011000 D 110010101
Memory Management    Computer Organization and Architecture    Gate IT 2008
Discuss it

Question 107 Explanation:
TAG will take 9 bits SET will need 4 bits and WORD will need 7 bits of the cache memory location Thus, using the above conclusion as derived in previous question. The memory location 0C795H can be written as 0000 1100 0111 1001 0101 Thus TAG= 9 bits = 0000 1100 0 SET =4 bits =111 1 WORD = 7 bits =001 0101 Therefore, the matching option is option A.   This solution is contributed by Namita Singh .
There are 107 questions to complete.

## GATE CS Corner

See Placement Course for placement preparation, GATE Corner for GATE CS Preparation and Quiz Corner for all Quizzes on GeeksQuiz.