Solutions Computer Organization and Design - 4th edition - Hennessy, Patterson: Computer Organization and Design - Chapter 1 - Book solutions - 4th edition

Exercise 1.1
Find the word or phrase from the list below that best matches the description in the
following questions. Use the numbers to the left of words in the answer. Each answer should be used only once.

1.1.1 [2] <1.1> Computer used to run large problems and usually accessed via a
network

Get 1.1.1 exercise solution

1.1.2 [2] <1.1> 10e15 or 2e50 bytes

Get 1.1.2 exercise solution

1.1.3 [2] <1.1> A class of computers composed of hundred to thousand processors
and terabytes of memory and having the highest performance and cost

Get 1.1.3 exercise solution

1.1.4 [2] <1.1> Today’s science fiction application that probably will be available
in the near future

Get 1.1.4 exercise solution

1.1.5 [2] <1.1> A kind of memory called random access memory

Get 1.1.5 exercise solution

1.1.6 [2] <1.1> Part of a computer called central processor unit

Get 1.1.6 exercise solution

1.1.7 [2] <1.1> Thousands of processors forming a large cluster

Get 1.1.7 exercise solution

1.1.8 [2] <1.1> Microprocessors containing several processors in the same chip

Get 1.1.8 exercise solution

1.1.9 [2] <1.1> Desktop computer without a screen or keyboard usually accessed
via a network

Get 1.1.9 exercise solution

1.1.10 [2] <1.1> A computer used to running one predetermined application or
collection of software

Get 1.1.10 exercise solution

1.1.11 [2] <1.1> Special language used to describe hardware components

Get 1.1.11 exercise solution

1.1.12 [2] <1.1> Personal computer delivering good performance to single users
at low cost

Get 1.1.12 exercise solution

1.1.13 [2] <1.2> Program that translates statements in high-level language to
assembly language

Get 1.1.13 exercise solution

1.1.14 [2] <1.2> Program that translates symbolic instructions to binary
instructions

Get 1.1.14 exercise solution

1.1.15 [2] <1.2> High-level language for business data processing

Get 1.1.15 exercise solution

1.1.16 [2] <1.2> Binary language that the processor can understand

Get 1.1.16 exercise solution

1.1.17 [2] <1.2> Commands that the processors understand

Get 1.1.17 exercise solution

1.1.18 [2] <1.2> High-level language for scientific computation

Get 1.1.18 exercise solution

1.1.19 [2] <1.2> Symbolic representation of machine instructions

Get 1.1.19 exercise solution

1.1.20 [2] <1.2> Interface between user’s program and hardware providing a
variety of services and supervision functions

Get 1.1.20 exercise solution

1.1.21 [2] <1.2> Software/programs developed by the users

Get 1.1.21 exercise solution

1.1.22 [2] <1.2> Binary digit (value 0 or 1)

Get 1.1.22 exercise solution

1.1.23 [2] <1.2> Software layer between the application software and the hardware
that includes the operating system and the compilers

Get 1.1.23 exercise solution

1.1.24 [2] <1.2> High-level language used to write application and system
software

Get 1.1.24 exercise solution

1.1.25 [2] <1.2> Portable language composed of words and algebraic expressions
that must be translated into assembly language before run in a computer

Get 1.1.25 exercise solution

1.1.26 [2] <1.2> 10e12 or 2e40 bytes

Get 1.1.26 exercise solution

Exercise 1.2
Consider the different configurations shown in the table

1.2.1 [10] <1.3> For a color display using 8 bits for each of the primary colors
(red, green, blue) per pixel, what should be the minimum size in bytes of the frame
buffer to store a frame?

Get 1.2.1 exercise solution

1.2.2 [5] <1.3> How many frames could it store, assuming the memory contains
no other information?

Get 1.2.2 exercise solution

1.2.3 [5] <1.3> If a 256 Kbytes file is sent through the Ethernet connection, how
long it would take?
For problems below, use the information about access time for every type of memory
in the following table.

Get 1.2.3 exercise solution

1.2.4 [5] <1.3> Find how long it takes to read a file from a DRAM if it takes 2
microseconds from the cache memory.

Get 1.2.4 exercise solution

1.2.5 [5] <1.3> Find how long it takes to read a file from a disk if it takes 2 microseconds
from the cache memory.

Get 1.2.5 exercise solution

1.2.6 [5] <1.3> Find how long it takes to read a file from a flash memory if it
takes 2 microseconds from the cache memory.

Get 1.2.6 exercise solution

Exercise 1.3
Consider three different processors P1, P2, and P3 executing the same instruction
set with the clock rates and CPIs given in the following table.

1.3.1 [5] <1.4> Which processor has the highest performance expressed in
instructions per second?

Get 1.3.1 exercise solution

1.3.2 [10] <1.4> If the processors each execute a program in 10 seconds, find the
number of cycles and the number of instructions.

Get 1.3.2 exercise solution

1.3.3 [10] <1.4> We are trying to reduce the time by 30% but this leads to
an increase of 20% in the CPI. What clock rate should we have to get this time
reduction?
For problems below, use the information in the following table.

Get 1.3.3 exercise solution

1.3.4 [10] <1.4> Find the IPC (instructions per cycle) for each processor.

Get 1.3.4 exercise solution

1.3.5 [5] <1.4> Find the clock rate for P2 that reduces its execution time to
that of P1.

Get 1.3.5 exercise solution

1.3.6 [5] <1.4> Find the number of instructions for P2 that reduces its execution
time to that of P3.

Get 1.3.6 exercise solution

Exercise 1.4
Consider two different implementations of the same instruction set architecture.
There are four classes of instructions, A, B, C, and D. The clock rate and CPI of each
implementation are given in the following table.

1.4.1 [10] <1.4> Given a program with 10e6 instructions divided into classes as
follows: 10% class A, 20% class B, 50% class C, and 20% class D, which implementation
is faster?

Get 1.4.1 exercise solution

1.4.2 [5] <1.4> What is the global CPI for each implementation?

Get 1.4.2 exercise solution

1.4.3 [5] <1.4> Find the clock cycles required in both cases.

Get 1.4.3 exercise solution

The following table shows the number of instructions for a program.

1.4.4 [5] <1.4> Assuming that arith instructions take 1 cycle, load and store 5
cycles, and branches 2 cycles, what is the execution time of the program in a 2 GHz
processor?

Get 1.4.4 exercise solution

1.4.5 [5] <1.4> Find the CPI for the program.

Get 1.4.5 exercise solution

1.4.6 [10] <1.4> If the number of load instructions can be reduced by one half,
what is the speedup and the CPI?

Get 1.4.6 exercise solution

Exercise 1.5
Consider two different implementations, P1 and P2, of the same instruction set.
There are five classes of instructions (A, B, C, D, and E) in the instruction set. The
clock rate and CPI of each class is given below.

1.5.1 [5] <1.4> Assume that peak performance is defined as the fastest rate that
a computer can execute any instruction sequence. What are the peak performances
of P1 and P2 expressed in instructions per second?

Get 1.5.1 exercise solution

1.5.2 [10] <1.4> If the number of instructions executed in a certain program
is divided equally among the classes of instructions except for class A, which
occurs twice as often as each of the others, which computer is faster? How much
faster is it?

Get 1.5.2 exercise solution

1.5.3 [10] <1.4> If the number of instructions executed in a certain program
is divided equally among the classes of instructions except for class E, which occurs
twice as often as each of the others, which computer is faster? How much
faster is it?
The table below shows instruction-type breakdown for different programs. Using
this data, you will be exploring the performance trade-offs for different changes
made to an MIPS processor.

Get 1.5.3 exercise solution

1.5.4 [5] <1.4> Assuming that computes take 1 cycle, loads and store instructions
take 10 cycles, and branches take 3 cycles, find the execution time on a 3 GHz MIPS
processor.

Get 1.5.4 exercise solution

1.5.5 [5] <1.4> Assuming that computes take 1 cycle, loads and store instructions
take 2 cycles, and branches take 3 cycles, find the execution time on a 3 GHz MIPS
processor.

Get 1.5.5 exercise solution

1.5.6 [5] <1.4> Assuming that computes take 1 cycle, loads and store instructions
take 2 cycles, and branches take 3 cycles, what is the speedup if the number of
compute instruction can be reduced by one-half?

Get 1.5.6 exercise solution

Exercise 1.6
Compilers can have a profound impact on the performance of an application on
given a processor. This problem will explore the impact compilers have on execution
time.

1.6.1 [5] <1.4> For the same program, two different compilers are used. The table
above shows the execution time of the two different compiled programs. Find the
average CPI for each program given that the processor has a clock cycle time of 1 ns.

Get 1.6.1 exercise solution

1.6.2 [5] <1.4> Assume the average CPIs found in 1.6.1, but that the compiled programs run on two different processors. If the execution times on the two processors
are the same, how much faster is the clock of the processor running compiler
A’s code versus the clock of the processor running compiler B’s code?

Get 1.6.2 exercise solution

1.6.3 [5] <1.4> A new compiler is developed that uses only 600 million instructions
and has an average CPI of 1.1. What is the speedup of using this new compiler
versus using Compiler A or B on the original processor of 1.6.1?
Consider two different implementations, P1 and P2, of the same instruction set.
There are five classes of instructions (A, B, C, D, and E) in the instruction set. P1
has a clock rate of 4 GHz, and P2 has a clock rate of 6 GHz. The average number
of cycles for each instruction class for P1 and P2 are listed in the following table.

Get 1.6.3 exercise solution

1.6.4 [5] <1.4> Assume that peak performance is defined as the fastest rate that
a computer can execute any instruction sequence. What are the peak performances
of P1 and P2 expressed in instructions per second?

Get 1.6.4 exercise solution

1.6.5 [5] <1.4> If the number of instructions executed in a certain program is divided
equally among the five classes of instructions except for class A, which occurs
twice as often as each of the others, how much faster is P2 than P1?

Get 1.6.5 exercise solution

1.6.6 [5] <1.4> At what frequency does P1 have the same performance of P2 for
the instruction mix given in 1.6.5?

Get 1.6.6 exercise solution

Exercise 1.7
The following table shows the increase in clock rate and power of eight generations
of Intel processors over 28 years.

1.7.1 [5] <1.5> What is the geometric mean of the ratios between consecutive
generations for both clock rate and power? (The geometric mean is described in
Section 1.7.)

Get 1.7.1 exercise solution

1.7.2 [5] <1.5> What is the largest relative change in clock rate and power
between generations?

Get 1.7.2 exercise solution

1.7.3 [5] <1.5> How much larger is the clock rate and power of the last generation
with respect to the first generation?

Get 1.7.3 exercise solution

Consider the following values for voltage in each generation.

1.7.4 [5] <1.5> Find the average capacitive loads, assuming a negligible static
power consumption.

Get 1.7.4 exercise solution

1.7.5 [5] <1.5> Find the largest relative change in voltage between generations.

Get 1.7.5 exercise solution

1.7.6 [5] <1.5> Find the geometric mean of the voltage ratios in the generations
since the Pentium.

Get 1.7.6 exercise solution

Exercise 1.8
Suppose we have developed new versions of a processor with the following characteristics.

1.8.1 [5] <1.5> How much has the capacitive load varied between versions if the
dynamic power has been reduced by 10%?

Get 1.8.1 exercise solution

1.8.2 [5] <1.5> How much has the dynamic power been reduced if the capacitive
load does not change?

Get 1.8.2 exercise solution

1.8.3 [10] <1.5> Assuming that the capacitive load of version 2 is 80% the
capacitive load of version 1, find the voltage for version 2 if the dynamic power of
version 2 is reduced by 40% from version 1.
Suppose that the industry trends show that a new process generation varies as
follows.

Get 1.8.3 exercise solution

1.8.4 [5] <1.5> Find the scaling factor for the dynamic power.

Get 1.8.4 exercise solution

1.8.5 [5] <1.5> Find the scaling of the capacitance per unit area unit.

Get 1.8.5 exercise solution

1.8.6 [5] <1.5> Assuming a Core 2 processor with a clock rate of 2.667 GHz, a
power consumption of 95 W, and a voltage of 1.1 V, find the voltage and clock rate
of this processor for the next process generation.

Get 1.8.6 exercise solution

Exercise 1.9
Although the dynamic power is the primary source of power dissipation in CMOS,
leakage current produces a static power dissipation V × Ileak. The smaller the onchip
dimensions, the more significant is the static power. Assume the figures shown
in the following table for static and dynamic power dissipation for several generations
of processors.

1.9.1 [5] <1.5> Find the percentage of the total dissipated power comprised by
static power.

Get 1.9.1 exercise solution

1.9.2 [5] <1.5> If the total dissipated power is reduced by 10% while maintaining
the static to total power rate of problem 1.9.1, how much should the voltage be
reduced to maintain the same leakage current?

Get 1.9.2 exercise solution

1.9.3 [5] <1.5> Determine the ratio of static power to dynamic power for each
technology.
Consider now the dynamic power dissipation of different versions of a given processor
for three different voltages given in the following table.

Get 1.9.3 exercise solution

1.9.4 [5] <1.5> Determine the static power at 0.8 V, assuming a static to dynamic
power ratio of 0.6.

Get 1.9.4 exercise solution

1.9.5 [5] <1.5> Determine the static and dynamic power dissipation assuming
the rates obtained in problem 1.9.1.

Get 1.9.5 exercise solution

1.9.6 [10] <1.5> Determine the geometric mean of the power variations between
versions.

Get 1.9.6 exercise solution

Exercise 1.10
The table below shows the instruction type breakdown of a given application
executed on 1, 2, 4, or 8 processors. Using this data, you will be exploring the speedup
of applications on parallel processors.

1.10.1 [5] <1.4, 1.6> The table above shows the number of instructions required
per processor to complete a program on a multiprocessor with 1, 2, 4, or 8 processors.
What is the total number of instructions executed per processor? What is the
aggregate number of instructions executed across all processors?

Get 1.10.1 exercise solution

1.10.2 [5] <1.4, 1.6> Given the CPI values on the right of the table above, find
the total execution time for this program on 1, 2, 4, and 8 processors. Assume that
each processor has a 2 GHz clock frequency.

Get 1.10.2 exercise solution

1.10.3 [10] <1.4, 1.6> If the CPI of the arithmetic instructions was doubled,
what would the impact be on the execution time of the program on 1, 2, 4, or 8
processors?

The table below shows the number of instructions per processor core on a multicore
processor as well as the average CPI for executing the program on 1, 2, 4, or 8 cores.
Using this data, you will be exploring the speedup of applications on multicore
processors.

Get 1.10.3 exercise solution

1.10.4 [10] <1.4, 1.6> Assuming a 3 GHz clock frequency, what is the execution
time of the program using 1, 2, 4, or 8 cores?

Get 1.10.4 exercise solution

1.10.5 [10] <1.5, 1.6> Assume that the power consumption of a processor core
can be described by the following equation:
Power = 5.0mA /MHz . Voltage2
where the operation voltage of the processor is described by the following equation:
Voltage = 1/5 Frequency + 0.4
with the frequency measured in GHz. So, at 5 GHz, the voltage would be 1.4 V. Find
the power consumption of the program executing on 1, 2, 4, and 8 cores assuming
that each core is operating at a 3 GHz clock frequency. Likewise, find the power
consumption of the program executing on 1, 2, 4, or 8 cores assuming that each
core is operating at 500 MHz.

Get 1.10.5 exercise solution

1.10.6 [10] <1.5, 1.6> If using a single core, find the required CPI for this core
to get an execution time equal to the time obtained by using the number of cores
in the table above (execution times in problem 1.10.4). Note that the number of
instructions should be the aggregate number of instructions executed across all
the cores.

Get 1.10.6 exercise solution

Exercise 1.11
The following table shows manufacturing data for various processors.

1.11.1 [10] <1.7> Find the yield.

Get 1.11.1 exercise solution

1.11.2 [5] <1.7> Find the cost per die.

Get 1.11.2 exercise solution

1.11.3 [10] <1.7> If the number of dies per wafer is increased by 10% and the
defects per area unit increases by 15%, find the die area and yield.
Suppose that, with the evolution of the electronic devices manufacturing technology,
the yield varies as shown in the following table.

Get 1.11.3 exercise solution

1.11.4 [10] <1.7> Find the defects per area unit for each technology given a die
area of 200 mm2.

Get 1.11.4 exercise solution

1.11.5 [5] <1.7> Represent graphically the variation of the yield together with
the variation of defects per unit area.

Get 1.11.5 exercise solution

Exercise 1.12
The following table shows results for SPEC CPU2006 benchmark programs
running on an AMD Barcelona.

1.12.1 [5] <1.7> Find the CPI if the clock cycle time is 0.333 ns.

Get 1.12.1 exercise solution

1.12.2 [5] <1.7> Find the SPECratio.

Get 1.12.2 exercise solution

1.12.3 [5] <1.7> For these two benchmarks, find the geometric mean of the
SPECratio.

Get 1.12.3 exercise solution

1.12.4 [5] <1.7> Find the increase in CPU time if the number of instructions of
the benchmark is increased by 10% without affecting the CPI.

Get 1.12.4 exercise solution

1.12.5 [5] <1.7> Find the increase in CPU time if the number of instructions of
the benchmark is increased by 10% and the CPI is increased by 5%.

Get 1.12.5 exercise solution

1.12.6 [5] <1.7> Find the change in the SPECratio for the change described in
1.12.5.

Get 1.12.6 exercise solution

Exercise 1.13
Suppose that we are developing a new version of the AMD Barcelona processor
with a 4 GHz clock rate. We have added some additional instructions to the
instruction set in such a way that the number of instructions has been reduced by
15% from the values shown for each benchmark in Exercise 1.12. The execution
times obtained are shown in the following table.

1.13.1 [10] <1.8> Find the new CPI.

Get 1.13.1 exercise solution

1.13.2 [10] <1.8> In general, these CPI values are larger than those obtained in
previous exercises for the same benchmarks. This is due mainly to the clock rate
used in both cases, 3 GHz and 4 GHz. Determine whether the increase in the CPI
is similar to that of the clock rate. If they are dissimilar, why?

Get 1.13.2 exercise solution

1.13.3 [5] <1.8> How much has the CPU time been reduced?
The following table shows data for further benchmarks.

Get 1.13.3 exercise solution

1.13.4 [10] <1.8> If the execution time is reduced by an additional 10% without
affecting to the CPI and with a clock rate of 4 GHz, determine the number of
instructions.

Get 1.13.4 exercise solution

1.13.5 [10] <1.8> Determine the clock rate required to give a further 10% reduction
in CPU time while maintaining the number of instructions and with the CPI
unchanged.

Get 1.13.5 exercise solution

1.13.6 [10] <1.8> Determine the clock rate if the CPI is reduced by 15% and the
CPU time by 20% while the number of instructions is unchanged.

Get 1.13.6 exercise solution

Exercise 1.14
Section 1.8 cites as a pitfall the utilization of a subset of the performance equation
as a performance metric. To illustrate this, consider the following data for the
execution of a program in different processors.

1.14.1 [5] <1.8> One usual fallacy is to consider the computer with the largest
clock rate as having the largest performance. Check if this is true for P1 and P2.

Get 1.14.1 exercise solution

1.14.2 [10] <1.8> Another fallacy is to consider that the processor executing the
largest number of instructions will need a larger CPU time. Considering that processor
P1 is executing a sequence of 106 instructions and that the CPI of processors
P1 and P2 do not change, determine the number of instructions that P2 can
execute in the same time that P1 needs to execute 106 instructions.

Get 1.14.2 exercise solution

1.14.3 [10] <1.8> A common fallacy is to use MIPS (millions of instructions per
second) to compare the performance of two different processors, and consider that
the processor with the largest MIPS has the largest performance. Check if this is
true for P1 and P2.
Another common performance figure is MFLOPS (million of floating-point
operations per second), defined as
MFLOPS = No. FP operations / (execution time × 106)
but this figure has the same problems as MIPS. Consider the program in the following
table, running on the two processors below.

Get 1.14.3 exercise solution

1.14.4 [10] <1.8> Find the MFLOPS figures for the programs.

Get 1.14.4 exercise solution

1.14.5 [10] <1.8> Find the MIPS figures for the programs.

Get 1.14.5 exercise solution

1.14.6 [10] <1.8> Find the performance for the programs and compare it with
MIPS and MFLOPS.

Get 1.14.6 exercise solution

Exercise 1.15
Another pitfall cited in Section 1.8 is expecting to improve the overall performance
of a computer by improving only one aspect of the computer. This might be true,
but not always. Consider a computer running programs with CPU times shown in
the following table.

1.15.1 [5] <1.8> How much is the total time reduced if the time for FP operations
is reduced by 20%?

Get 1.15.1 exercise solution

1.15.2 [5] <1.8> How much is the time for INT operations reduced if the total
time is reduced by 20%?

Get 1.15.2 exercise solution

1.15.3 [5] <1.8> Can the total time can be reduced by 20% by reducing only the
time for branch instructions?
The following table shows the instruction type breakdown per processor of given
applications executed in different numbers of processors.

Get 1.15.3 exercise solution

Assume that each processor has a 2 GHz clock rate.
1.15.4 [10] <1.8> How much must we improve the CPI of FP instructions if we
want the program to run two times faster?

Get 1.15.4 exercise solution

1.15.5 [10] <1.8> How much must we improve the CPI of L/S instructions if we
want the program to run two times faster?

Get 1.15.5 exercise solution

1.15.6 [5] <1.8> How much is the execution time of the program improved if the
CPI of INT and FP instructions is reduced by 40% and the CPI of L/S and Branch
is reduced by 30%?

Get 1.15.6 exercise solution

Exercise 1.16
Another pitfall, related to the execution of programs in multiprocessor systems, is
expecting improvement in performance by improving only the execution time of
part of the routines. The following table shows the execution time of five routines
of a program running on different numbers of processors.

1.16.1 [10] <1.8> Find the total execution time and by how much it is reduced if
the time of routines A, C, and E is improved by 15%.

Get 1.16.1 exercise solution

1.16.2 [10] <1.8> How much is the total time reduced if routine B is improved
by 10%?

Get 1.16.2 exercise solution

1.16.3 [10] <1.8> How much is the total time reduced if routine D is improved
by 10%?
Execution time in a multiprocessor system can be split into computing time for
the routines plus routing time spent sending data from one processor to another.
Consider the execution time and routing time given in the following table. In this
case, the routing time is an important component of the total time.

Get 1.16.3 exercise solution

1.16.4 [10] <1.8> For each doubling of the number of processors, determine the
ratio of new to old computing time and the ratio of new to old routing time.

Get 1.16.4 exercise solution

1.16.5 [5] <1.8> Using the geometric means of the ratios, extrapolate to find the
computing time and routing time in a 128-processor system.

Get 1.16.5 exercise solution

1.16.6 [10] <1.8> Find the computing time and routing time for a system with
one processor.

Get 1.16.6 exercise solution

Computer Organization and Design - Chapter 1 - Book solutions - 4th edition - Hennessy, Patterson