-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathhomework2.tex
executable file
·311 lines (249 loc) · 12.1 KB
/
homework2.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{geometry}
\usepackage{multirow}
\usepackage{sectsty}
\usepackage{lipsum}
\usepackage{multicol}
\usepackage{tabularx}
\usepackage{enumitem}
\renewcommand{\thesubsection}{\Alph{subsection}}
\geometry{
a4paper,
left=20mm,
top=20mm,
bottom=20mm,
right=20mm
}
\usepackage{listings}
\setlength{\columnsep}{0cm}
\title{Architecture Problem Set 2}
\author{Taylor King }
\date{February 2016}
\sectionfont{\Large}
\subsectionfont{\large}
\usepackage{changepage}
\begin{document}
\maketitle
\section{}
Figure 1.23 presents the power consumption of several computer system components. In this exercise, we will explore how the hard drive affects power consumption for the system.
\vspace{1cm}
\begin{minipage}{\textwidth}
\begin{tabular}{|l|l|l|l|}
\hline
\textbf{Component Type} & \textbf{Product} & \textbf{Performance} & \textbf{Power}\\
\hline
Processor & Sun Niagra 8-core & 1.2 GHz & 72-79 W peak \\
& Intel Pentium 4 & 2 GHz & 48.9-66 W \\
DRAM & Kingston X64C3AD2 1 GB & 184-Pin & 3.7 W \\
& Kingston D2N3 1 GB & 240 Pin & 2.3 W\\
Hard Drive & DiamondMax 16 & 5400 RPM & 7.0 W read/seek, 2.9 W idle \\
& DiamondMax 9 & 7200 RPM & 7.9 W read/seek, 4.0W Idle\\
\hline
\end{tabular}\bigskip
\par{\textit{Figure 1.23}}
\end{minipage}
\begin{adjustwidth}{2.5em}{0pt}
\subsection{}
Assuming the maximum load for each component, and a power supply efficiency of 80\%, what wattage must the server's power supply deliver to a system with an Intel Pentium 4 chip, 2 GB 240-pin Kingston DRAM, and one 7200 RPM Hard drive?
\vspace{5mm}
$Power_{CPU}=66\ W$
$Power_{Memory}=2\times2.3\ W$
$Power_{Hard Drive}=7.9\ W$
\vspace{3mm}
$Power_{Total}=Power_{CPU}\ +\ Power_{Memory}\ +\ Power_{Hard Drive}=78.5\ W$
\vspace{3mm}
$Power_{Required}=\frac{Power_{Total}}{Efficiency_{Power Supply}}=\frac{78.5\ W}{80\%}=\textbf{98.13\ W}$
\subsection{}
How much power will the 7200 rpm disk consume if it is idle roughly 60\% of the time?
\vspace{5mm}
$Power_{Total}\ =\ Power_{Idle}\times{}Time_{idle}\ +\ Power_{read/seek}\times{}Time_{read/seek}$
\vspace{3mm}
$Power_{Total}\ =\ 60\%\times{}4\ W\ +\ 40\%\times{}7.9\ W\ =\ \textbf{5.56\ W}$
\subsection{}
Given that the time to read data off of a 7200 rpm disk drive will be roughly 75\% of a 5400 rpm disk, at what idle time of the 7200 rpm disk will the power consumption be equal on average for the two disks?
\vspace{5mm}
$Power_{5400}=T_{idle}\times{}2.9+T_{read/seek}\times{}7.0$
\vspace{3mm}
$Power_{7200}=T_{idle}\times{}4.0+75\%\ T_{read/seek_{5400}}\times{}7.9$
\textbf{Power consumption will be equal for both of the drives when the drives are idle for roughly 33.9\% of their operational time}
\vspace{3mm}
\end{adjustwidth}
\pagebreak
\section{}
One challenge for architects architects is that the design created today will require several years of implementation, verification, and testing before appearing on the market. This means that the architect must project what the technology will be like several years in advance. Sometimes, this is difficult to do.
\begin{adjustwidth}{2.5em}{0pt}
\subsection{}
According to the trend in device scaling observed by Moore's law, the number of transistors on a chip in 2015 should be how many times the number in 2005?
\vspace{5mm}
\textbf{Moore's law predicts that the number of transistors on a chip will double every 2 years. The number of transistors on a chip in 2015 should be 32 times that of a chip in 2005.}
\subsection{}
The Increase in clock rates once mirrored this trend. Had clock rates continued to climb at the same rate as in the 1990's, approximately how fast would clock rates be in 2015?
\vspace{5mm}
\textbf{Moores law predicts that CPU clock rate will double every 2 years. In the early 1990's, clock rates did increase at this rate for some time. Predicted by Moore's law, given an average clock speed of 10 MHz in 1990, the average clock speed should be 57,000 MHz in 2015}
\subsection{}
At the current rate of increase what are clock rates now projected to be in 2015?
\vspace{5mm}
\textbf{The current rate of increase in clock speed is about 1\% per year. With this, a 2 GHz CPU in 2012 should be approximately 2.06 GHz in 2015}
\subsection{}
What has limited the range of growth of the clock rate, and what are architects doing with the extra transistor performance?
\vspace{5mm}
\textbf{Increasing the clock speed of a chip causes it to produce more heat. As modern chips continue to get smaller and smaller, removing this excess heat becomes impractical. In order to solve this problem, manufacturers use parallelism across multiple cores on a CPU.}
\subsection{}
The rate of growth for DRAM capacity has also slowed down. For 20 years, DRAM capacity improved by 60\% each year. That rate dropped to 40\% each year and now improvement is 25-40\% per year. if this trend continues, what will be the approximate rate of growth for DRAM capacity by 2020?
\vspace{5mm}
\textbf{20-40\%}
\pagebreak
\end{adjustwidth}
\section{}
\subsection{}
Create a table similar to the one shown in your book on page 43 (excluding the Opteron and Itanium data) and calculate the SPECRatio using each benchmark and the geometric mean of the SPECRatios.
\begin{multicols}{3}
\begin{tabular}{|l|l|l|}
\hline
\textbf{Benchmark} & \textbf{Processor X} & \textbf{Spec Ratio}\\
\hline
wupwise & 53.5 & 29.9\\
swim & 110.0 & 28.18\\
mgrid & 88.3 & 2.04\\
applu & 75.2 & 27.93\\
mesa & 80.4 & 17.41\\
galgel & 61.2 & 47.39\\
art & 60.4 & 43.05\\
equake & 50.7 & 25.64 \\
facerec & 71.2 & 26.69\\
ammp & 81.9 & 26.86 \\
lucas & 110.0 & 1.82 \\
fma3d & 119.0 & 17.65 \\
sixtrack & 110.0 & 10\\
apsi & 175.0 & 14.86\\
\hline
\multicolumn{2}{|l|}{\textbf{Geo Mean}} & 19.86 \\
\hline
\end{tabular}
\end{multicols}
\subsection{}
Comparing the geometic means of the SPECRatios, give an ordering of the processors (X, Opteron, Itanium) from best performance to worst performance.
\begin{enumerate}
\item Itanium - 27.12
\item Opteron - 20.86
\item Processor X - 19.87
\end{enumerate}
\pagebreak
\section{}
Your company owns an Pentium dual core processor and you have been tasked with optimizing your software for this processor. You will run two applications on this dual Pentium, but the resource requirements are not equal. The first application needs 75\% of the resources, and the other only 25\% of the resources.
\begin{adjustwidth}{2.5em}{0pt}
\vspace{1cm}
\subsection{}
Given that 60\% of the first application is parallelizable, how much speedup would you achieve with that application if run in isolation?
\vspace{5mm}
$T_{New}=1-.6\times{}T_{Execution}+\frac{.6\times{}T_{Execution}}{2}$
$T_{New}=.7\times{}T_{Execution}$
$Speedup=\textbf{1.43}$
\vspace{5mm}
\subsection{}
Given that 95\% of the second application is parallelizable, how much speedup would this application observe if run in isolation?
\vspace{5mm}
$T_{New}=1-.95\times{}T_{Execution}+\frac{.95\times{}T_{Execution}}{2}$
$T_{New}=.525\times{}T_{Execution}$
$Speedup=\textbf{1.90}$
\vspace{5mm}
\subsection{}
Given that 60\% of the first application is parallelizable, how much overall system speedup would you observe if you parallelized it, but not the second application?
\vspace{5mm}
$Speedup=\frac{1}{75\%A_{New}+25\%B_{Old}}$
\vspace{3mm}
$Speedup=\textbf{1.30}$
\vspace{5mm}
\subsection{}
How much overall system speedup would you achieve if you parallelized both applications?
\vspace{5mm}
$Speedup=\frac{1}{75\%A_{New}+25\%B_{New}}$
\vspace{3mm}
$Speedup=\textbf{1.52}$
\pagebreak
\section{}
Using the integer average column in figure A.27 on page A-41 in your H\&P text and the table below, calculate the effective (overall) CPI. The table below contains the average CPI for instructions. Assume that 60\% of the conditional branches are taken. All instructions not specifically noted in table below have a CPI of 1.0.
\vspace{5mm}
\begin{multicols}{2}
\begin{minipage}{\linewidth}
\begin{tabular}{|l|l|l|l|}
\hline
\textbf{Instruction} & \textbf{CPI} & \textbf{Usage} & \textbf{Average} \\
\hline
load & 1.4 & 26\% & .36 \\
store & 1.4 & 10\% & .14 \\
add & 1 & 19\% & .19 \\
sub & 1 & 3\% & .03\\
mul & 1 & 0\% & 0 \\
compare & 1 & 5\% & .05 \\
load imm & 1.4 & 2\% & .03\\
cond branch & 1.8 & 12\% & .22\\
cond move & 1.8 & 1\% & .02\\
jump & 1.2 & 1\% & .01\\
call & 1.2 & 1\% & .01\\
return & 1.2 & 1\% & .01\\
shift & 1 & 2\% & .02 \\
AND & 1 & 4\% & .04 \\
OR & 1 & 9\% & .09 \\
XOR & 1 & 3\% & .03 \\
\hline
\multicolumn{3}{|l|}{Sum:} & \textbf{1.25} \\
\hline
\end{tabular}\par
\bigskip\textit{Figure A.27}
\end{minipage}
\begin{tabular}{|l|l|}
\hline
\textbf{Instruction} & \textbf{Clock Cycles} \\
\hline
All ALU Instructions & 1.0 \\
Load-stores & 1.4 \\
Taken conditional branches & 2.0 \\
Not taken conditional branches & 1.5 \\
Jumps/Calls/Returns & 1.2 \\
\hline
\end{tabular}
\end{multicols}
\end{adjustwidth}
\section{}
When designing memory systems it becomes useful to know the frequency of memory reads versus writes and also accesses for instructions versus those for data. Using the integer average column in figure A.27 in your H\&P text, find:
\begin{enumerate}[label=\alph*]
\item the percentage of all memory accesses for data - \textbf{36\%}
\item the percentage of data accesses that are reads - \textbf{26\%}
\item the percentage of all memory accesses that are reads - \textbf{72\%}
\end{enumerate}
\pagebreak
\section{}
A certain benchmark contains 195,578 floating point operations. The benchmark was run on an embedded processor after compilation with optimization turned on. The embedded processor is based on a current RISC processor that includes floating point functional units, but the embedded processor does not include floating point for reasons of cost, power consumption, and lack of need for floating point by the target applications. The compiler allows floating point instructions to be calculated with the hardware units or using software routines, depending upon compiler flags. The benchmark took 1.08 seconds on the RISC processor and 13.6 seconds using software on its embedded version. Assume that the CPI using the RISC processor was measured to be 10, while the CPI of the embedded version of the processor was measured to be 6. The two machines have the same clock rate.
\vspace{5mm}
\vspace{5mm}
\begin{adjustwidth}{2.5em}{0pt}
\subsection{}
What is the total number of instructions executed for both runs? You'll need to use a variable to represent the value of the clock rate since this value is not stated. (Use a clock rate variable that represents cycles per time rather than time per cycle to avoid a fractional solution.)
\vspace{5mm}
$I_{Hardware}=\frac{Rate_{Clock}\times{1.08}}{10}$
$I_{Software}=\frac{Rate_{Clock}\times{13.6}}{6}$
\vspace{5mm}
\subsection{}
What is the MIPS rating for both runs? Which processor has the higher MIPS rating? Recall MIPS is equal to Millions Instructions Per Second. (This solution will also have a clock rate variable in it.)
\vspace{5mm}
Assuming $Rate_{Clock}$ is in $\frac{Cycles}{Second}$
\vspace{3mm}
$MIPS_{Hardware}=\frac{Rate_{Clock}}{10\times{1,000,000}}$
\vspace{3mm}
$MIPS_{Software}=\frac{Rate_{Clock}}{6\times{1,000,000}}$
\vspace{3mm}
\textbf{With these parameters, the simpler instructions in the software implementation gives a higher MIPS rating.}
\subsection{}
On average, how many integer instructions does it take to perform a floating-point operation in software? Hint: you'll use the instruction counts you determined to calculate the number of non-floating point calculations executed on the RISC processor. Next, you'll use this value to calculate the number of instructions executed on the embedded processor to simulate fp instructions.
\vspace{5mm}
Given that in the hardware implementation, a float operation requires one instruction
$I_{Non-float}=I_{Hardware}-195,578$
Therefore $Cycles_{Float_{Software}}=\frac{I_{Software}-I_{Non-float}}{I_{Hardware}-I_{Non-float}}$
\subsection{}
What do these results indicate about the use of MIPS to measure processor performance? Explain.
\vspace{5mm}
\textbf{In this case, MIPS is not a good measure of performance because the software implementation is able to perform more instructions, but more instructions are required to perform a floating point operation.}
\end{adjustwidth}
\end{document}