Computer Organization and Design - Chapter 1 - Book solutions - 4th edition - Hennessy, Patterson

Exercise 1.1
Find the word or phrase from the list below that best matches the description in the
following questions. Use the numbers to the left of words in the answer. Each answer should be used only once.

1.1.1 [2] <1.1> Computer used to run large problems and usually accessed via a
network

1.1.2 [2] <1.1> 10e15 or 2e50 bytes

1.1.3 [2] <1.1> A class of computers composed of hundred to thousand processors
and terabytes of memory and having the highest performance and cost

1.1.4 [2] <1.1> Today’s science fiction application that probably will be available
in the near future

1.1.5 [2] <1.1> A kind of memory called random access memory

1.1.6 [2] <1.1> Part of a computer called central processor unit

1.1.7 [2] <1.1> Thousands of processors forming a large cluster

1.1.8 [2] <1.1> Microprocessors containing several processors in the same chip

1.1.9 [2] <1.1> Desktop computer without a screen or keyboard usually accessed
via a network

1.1.10 [2] <1.1> A computer used to running one predetermined application or
collection of software

1.1.11 [2] <1.1> Special language used to describe hardware components

1.1.12 [2] <1.1> Personal computer delivering good performance to single users
at low cost

1.1.13 [2] <1.2> Program that translates statements in high-level language to
assembly language

1.1.14 [2] <1.2> Program that translates symbolic instructions to binary
instructions

1.1.15 [2] <1.2> High-level language for business data processing

1.1.16 [2] <1.2> Binary language that the processor can understand

1.1.17 [2] <1.2> Commands that the processors understand

1.1.18 [2] <1.2> High-level language for scientific computation

1.1.19 [2] <1.2> Symbolic representation of machine instructions

1.1.20 [2] <1.2> Interface between user’s program and hardware providing a
variety of services and supervision functions

1.1.21 [2] <1.2> Software/programs developed by the users

1.1.22 [2] <1.2> Binary digit (value 0 or 1)

1.1.23 [2] <1.2> Software layer between the application software and the hardware
that includes the operating system and the compilers

1.1.24 [2] <1.2> High-level language used to write application and system
software

1.1.25 [2] <1.2> Portable language composed of words and algebraic expressions
that must be translated into assembly language before run in a computer

1.1.26 [2] <1.2> 10e12 or 2e40 bytes




Exercise 1.2
Consider the different configurations shown in the table


1.2.1 [10] <1.3> For a color display using 8 bits for each of the primary colors
(red, green, blue) per pixel, what should be the minimum size in bytes of the frame
buffer to store a frame?

1.2.2 [5] <1.3> How many frames could it store, assuming the memory contains
no other information?

1.2.3 [5] <1.3> If a 256 Kbytes file is sent through the Ethernet connection, how
long it would take?
For problems below, use the information about access time for every type of memory
in the following table.



1.2.4 [5] <1.3> Find how long it takes to read a file from a DRAM if it takes 2
microseconds from the cache memory.

1.2.5 [5] <1.3> Find how long it takes to read a file from a disk if it takes 2 microseconds
from the cache memory.

1.2.6 [5] <1.3> Find how long it takes to read a file from a flash memory if it
takes 2 microseconds from the cache memory.

Exercise 1.3
Consider three different processors P1, P2, and P3 executing the same instruction
set with the clock rates and CPIs given in the following table.

1.3.1 [5] <1.4> Which processor has the highest performance expressed in
instructions per second?
1.3.2 [10] <1.4> If the processors each execute a program in 10 seconds, find the
number of cycles and the number of instructions.
1.3.3 [10] <1.4> We are trying to reduce the time by 30% but this leads to
an increase of 20% in the CPI. What clock rate should we have to get this time
reduction?
For problems below, use the information in the following table.


1.3.4 [10] <1.4> Find the IPC (instructions per cycle) for each processor.

1.3.5 [5] <1.4> Find the clock rate for P2 that reduces its execution time to
that of P1.

1.3.6 [5] <1.4> Find the number of instructions for P2 that reduces its execution
time to that of P3.








Exercise 1.4
Consider two different implementations of the same instruction set architecture.
There are four classes of instructions, A, B, C, and D. The clock rate and CPI of each
implementation are given in the following table.





1.4.1 [10] <1.4> Given a program with 10e6 instructions divided into classes as
follows: 10% class A, 20% class B, 50% class C, and 20% class D, which implementation
is faster?

1.4.2 [5] <1.4> What is the global CPI for each implementation?
1.4.3 [5] <1.4> Find the clock cycles required in both cases.


The following table shows the number of instructions for a program.


1.4.4 [5] <1.4> Assuming that arith instructions take 1 cycle, load and store 5
cycles, and branches 2 cycles, what is the execution time of the program in a 2 GHz
processor?

1.4.5 [5] <1.4> Find the CPI for the program.
1.4.6 [10] <1.4> If the number of load instructions can be reduced by one half,
what is the speedup and the CPI?
Exercise 1.5
Consider two different implementations, P1 and P2, of the same instruction set.
There are five classes of instructions (A, B, C, D, and E) in the instruction set. The
clock rate and CPI of each class is given below.

1.5.1 [5] <1.4> Assume that peak performance is defined as the fastest rate that
a computer can execute any instruction sequence. What are the peak performances
of P1 and P2 expressed in instructions per second?

1.5.2 [10] <1.4> If the number of instructions executed in a certain program
is divided equally among the classes of instructions except for class A, which
occurs twice as often as each of the others, which computer is faster? How much
faster is it?

1.5.3 [10] <1.4> If the number of instructions executed in a certain program
is divided equally among the classes of instructions except for class E, which occurs
twice as often as each of the others, which computer is faster? How much
faster is it?
The table below shows instruction-type breakdown for different programs. Using
this data, you will be exploring the performance trade-offs for different changes
made to an MIPS processor.



1.5.4 [5] <1.4> Assuming that computes take 1 cycle, loads and store instructions
take 10 cycles, and branches take 3 cycles, find the execution time on a 3 GHz MIPS
processor.

1.5.5 [5] <1.4> Assuming that computes take 1 cycle, loads and store instructions
take 2 cycles, and branches take 3 cycles, find the execution time on a 3 GHz MIPS
processor.

1.5.6 [5] <1.4> Assuming that computes take 1 cycle, loads and store instructions
take 2 cycles, and branches take 3 cycles, what is the speedup if the number of
compute instruction can be reduced by one-half?

Exercise 1.6
Compilers can have a profound impact on the performance of an application on
given a processor. This problem will explore the impact compilers have on execution
time.




1.6.1 [5] <1.4> For the same program, two different compilers are used. The table
above shows the execution time of the two different compiled programs. Find the
average CPI for each program given that the processor has a clock cycle time of 1 ns.

1.6.2 [5] <1.4> Assume the average CPIs found in 1.6.1, but that the compiled programs run on two different processors. If the execution times on the two processors
are the same, how much faster is the clock of the processor running compiler
A’s code versus the clock of the processor running compiler B’s code?

1.6.3 [5] <1.4> A new compiler is developed that uses only 600 million instructions
and has an average CPI of 1.1. What is the speedup of using this new compiler
versus using Compiler A or B on the original processor of 1.6.1?
Consider two different implementations, P1 and P2, of the same instruction set.
There are five classes of instructions (A, B, C, D, and E) in the instruction set. P1
has a clock rate of 4 GHz, and P2 has a clock rate of 6 GHz. The average number
of cycles for each instruction class for P1 and P2 are listed in the following table.


1.6.4 [5] <1.4> Assume that peak performance is defined as the fastest rate that
a computer can execute any instruction sequence. What are the peak performances
of P1 and P2 expressed in instructions per second?

1.6.5 [5] <1.4> If the number of instructions executed in a certain program is divided
equally among the five classes of instructions except for class A, which occurs
twice as often as each of the others, how much faster is P2 than P1?

1.6.6 [5] <1.4> At what frequency does P1 have the same performance of P2 for
the instruction mix given in 1.6.5?


Exercise 1.7
The following table shows the increase in clock rate and power of eight generations
of Intel processors over 28 years.


1.7.1 [5] <1.5> What is the geometric mean of the ratios between consecutive
generations for both clock rate and power? (The geometric mean is described in
Section 1.7.)

1.7.2 [5] <1.5> What is the largest relative change in clock rate and power
between generations?

1.7.3 [5] <1.5> How much larger is the clock rate and power of the last generation
with respect to the first generation?



Consider the following values for voltage in each generation.


1.7.4 [5] <1.5> Find the average capacitive loads, assuming a negligible static
power consumption.

1.7.5 [5] <1.5> Find the largest relative change in voltage between generations.

1.7.6 [5] <1.5> Find the geometric mean of the voltage ratios in the generations
since the Pentium.





Exercise 1.8
Suppose we have developed new versions of a processor with the following characteristics.


1.8.1 [5] <1.5> How much has the capacitive load varied between versions if the
dynamic power has been reduced by 10%?

1.8.2 [5] <1.5> How much has the dynamic power been reduced if the capacitive
load does not change?

1.8.3 [10] <1.5> Assuming that the capacitive load of version 2 is 80% the
capacitive load of version 1, find the voltage for version 2 if the dynamic power of
version 2 is reduced by 40% from version 1.
Suppose that the industry trends show that a new process generation varies as
follows.



1.8.4 [5] <1.5> Find the scaling factor for the dynamic power.

1.8.5 [5] <1.5> Find the scaling of the capacitance per unit area unit.

1.8.6 [5] <1.5> Assuming a Core 2 processor with a clock rate of 2.667 GHz, a
power consumption of 95 W, and a voltage of 1.1 V, find the voltage and clock rate
of this processor for the next process generation.




Exercise 1.9
Although the dynamic power is the primary source of power dissipation in CMOS,
leakage current produces a static power dissipation V × Ileak. The smaller the onchip
dimensions, the more significant is the static power. Assume the figures shown
in the following table for static and dynamic power dissipation for several generations
of processors.




1.9.1 [5] <1.5> Find the percentage of the total dissipated power comprised by
static power.

1.9.2 [5] <1.5> If the total dissipated power is reduced by 10% while maintaining
the static to total power rate of problem 1.9.1, how much should the voltage be
reduced to maintain the same leakage current?

1.9.3 [5] <1.5> Determine the ratio of static power to dynamic power for each
technology.
Consider now the dynamic power dissipation of different versions of a given processor
for three different voltages given in the following table.



1.9.4 [5] <1.5> Determine the static power at 0.8 V, assuming a static to dynamic
power ratio of 0.6.

1.9.5 [5] <1.5> Determine the static and dynamic power dissipation assuming
the rates obtained in problem 1.9.1.

1.9.6 [10] <1.5> Determine the geometric mean of the power variations between
versions.






Exercise 1.10
The table below shows the instruction type breakdown of a given application
executed on 1, 2, 4, or 8 processors. Using this data, you will be exploring the speedup
of applications on parallel processors.



1.10.1 [5] <1.4, 1.6> The table above shows the number of instructions required
per processor to complete a program on a multiprocessor with 1, 2, 4, or 8 processors.
What is the total number of instructions executed per processor? What is the
aggregate number of instructions executed across all processors?

1.10.2 [5] <1.4, 1.6> Given the CPI values on the right of the table above, find
the total execution time for this program on 1, 2, 4, and 8 processors. Assume that
each processor has a 2 GHz clock frequency.

1.10.3 [10] <1.4, 1.6> If the CPI of the arithmetic instructions was doubled,
what would the impact be on the execution time of the program on 1, 2, 4, or 8
processors?

The table below shows the number of instructions per processor core on a multicore
processor as well as the average CPI for executing the program on 1, 2, 4, or 8 cores.
Using this data, you will be exploring the speedup of applications on multicore
processors.


1.10.4 [10] <1.4, 1.6> Assuming a 3 GHz clock frequency, what is the execution
time of the program using 1, 2, 4, or 8 cores?

1.10.5 [10] <1.5, 1.6> Assume that the power consumption of a processor core
can be described by the following equation:
Power = 5.0mA /MHz  . Voltage2
where the operation voltage of the processor is described by the following equation:
Voltage = 1/5 Frequency + 0.4
with the frequency measured in GHz. So, at 5 GHz, the voltage would be 1.4 V. Find
the power consumption of the program executing on 1, 2, 4, and 8 cores assuming
that each core is operating at a 3 GHz clock frequency. Likewise, find the power
consumption of the program executing on 1, 2, 4, or 8 cores assuming that each
core is operating at 500 MHz.

1.10.6 [10] <1.5, 1.6> If using a single core, find the required CPI for this core
to get an execution time equal to the time obtained by using the number of cores
in the table above (execution times in problem 1.10.4). Note that the number of
instructions should be the aggregate number of instructions executed across all
the cores.



Exercise 1.11
The following table shows manufacturing data for various processors.


1.11.1 [10] <1.7> Find the yield.
1.11.2 [5] <1.7> Find the cost per die.

1.11.3 [10] <1.7> If the number of dies per wafer is increased by 10% and the
defects per area unit increases by 15%, find the die area and yield.
Suppose that, with the evolution of the electronic devices manufacturing technology,
the yield varies as shown in the following table.


1.11.4 [10] <1.7> Find the defects per area unit for each technology given a die
area of 200 mm2.

1.11.5 [5] <1.7> Represent graphically the variation of the yield together with
the variation of defects per unit area.

Exercise 1.12
The following table shows results for SPEC CPU2006 benchmark programs
running on an AMD Barcelona.





1.12.1 [5] <1.7> Find the CPI if the clock cycle time is 0.333 ns.

1.12.2 [5] <1.7> Find the SPECratio.
1.12.3 [5] <1.7> For these two benchmarks, find the geometric mean of the
SPECratio.


1.12.4 [5] <1.7> Find the increase in CPU time if the number of instructions of
the benchmark is increased by 10% without affecting the CPI.

1.12.5 [5] <1.7> Find the increase in CPU time if the number of instructions of
the benchmark is increased by 10% and the CPI is increased by 5%.

1.12.6 [5] <1.7> Find the change in the SPECratio for the change described in
1.12.5.




Exercise 1.13
Suppose that we are developing a new version of the AMD Barcelona processor
with a 4 GHz clock rate. We have added some additional instructions to the
instruction set in such a way that the number of instructions has been reduced by
15% from the values shown for each benchmark in Exercise 1.12. The execution
times obtained are shown in the following table.



1.13.1 [10] <1.8> Find the new CPI.
1.13.2 [10] <1.8> In general, these CPI values are larger than those obtained in
previous exercises for the same benchmarks. This is due mainly to the clock rate
used in both cases, 3 GHz and 4 GHz. Determine whether the increase in the CPI
is similar to that of the clock rate. If they are dissimilar, why?

1.13.3 [5] <1.8> How much has the CPU time been reduced?
The following table shows data for further benchmarks.



1.13.4 [10] <1.8> If the execution time is reduced by an additional 10% without
affecting to the CPI and with a clock rate of 4 GHz, determine the number of
instructions.

1.13.5 [10] <1.8> Determine the clock rate required to give a further 10% reduction
in CPU time while maintaining the number of instructions and with the CPI
unchanged.
1.13.6 [10] <1.8> Determine the clock rate if the CPI is reduced by 15% and the
CPU time by 20% while the number of instructions is unchanged.

Exercise 1.14
Section 1.8 cites as a pitfall the utilization of a subset of the performance equation
as a performance metric. To illustrate this, consider the following data for the
execution of a program in different processors.





1.14.1 [5] <1.8> One usual fallacy is to consider the computer with the largest
clock rate as having the largest performance. Check if this is true for P1 and P2.

1.14.2 [10] <1.8> Another fallacy is to consider that the processor executing the
largest number of instructions will need a larger CPU time. Considering that processor
P1 is executing a sequence of 106 instructions and that the CPI of processors
P1 and P2 do not change, determine the number of instructions that P2 can
execute in the same time that P1 needs to execute 106 instructions.

1.14.3 [10] <1.8> A common fallacy is to use MIPS (millions of instructions per
second) to compare the performance of two different processors, and consider that
the processor with the largest MIPS has the largest performance. Check if this is
true for P1 and P2.
Another common performance figure is MFLOPS (million of floating-point
operations per second), defined as
MFLOPS = No. FP operations / (execution time × 106)
but this figure has the same problems as MIPS. Consider the program in the following
table, running on the two processors below.

1.14.4 [10] <1.8> Find the MFLOPS figures for the programs.
1.14.5 [10] <1.8> Find the MIPS figures for the programs.
1.14.6 [10] <1.8> Find the performance for the programs and compare it with
MIPS and MFLOPS.

 Exercise 1.15
Another pitfall cited in Section 1.8 is expecting to improve the overall performance
of a computer by improving only one aspect of the computer. This might be true,
but not always. Consider a computer running programs with CPU times shown in
the following table.



1.15.1 [5] <1.8> How much is the total time reduced if the time for FP operations
is reduced by 20%?

1.15.2 [5] <1.8> How much is the time for INT operations reduced if the total
time is reduced by 20%?
1.15.3 [5] <1.8> Can the total time can be reduced by 20% by reducing only the
time for branch instructions?
The following table shows the instruction type breakdown per processor of given
applications executed in different numbers of processors.



Assume that each processor has a 2 GHz clock rate.
1.15.4 [10] <1.8> How much must we improve the CPI of FP instructions if we
want the program to run two times faster?

1.15.5 [10] <1.8> How much must we improve the CPI of L/S instructions if we
want the program to run two times faster?

1.15.6 [5] <1.8> How much is the execution time of the program improved if the
CPI of INT and FP instructions is reduced by 40% and the CPI of L/S and Branch
is reduced by 30%?

Exercise 1.16
Another pitfall, related to the execution of programs in multiprocessor systems, is
expecting improvement in performance by improving only the execution time of
part of the routines. The following table shows the execution time of five routines
of a program running on different numbers of processors.



1.16.1 [10] <1.8> Find the total execution time and by how much it is reduced if
the time of routines A, C, and E is improved by 15%.

1.16.2 [10] <1.8> How much is the total time reduced if routine B is improved
by 10%?

1.16.3 [10] <1.8> How much is the total time reduced if routine D is improved
by 10%?
Execution time in a multiprocessor system can be split into computing time for
the routines plus routing time spent sending data from one processor to another.
Consider the execution time and routing time given in the following table. In this
case, the routing time is an important component of the total time.


1.16.4 [10] <1.8> For each doubling of the number of processors, determine the
ratio of new to old computing time and the ratio of new to old routing time.

1.16.5 [5] <1.8> Using the geometric means of the ratios, extrapolate to find the
computing time and routing time in a 128-processor system.
1.16.6 [10] <1.8> Find the computing time and routing time for a system with
one processor.