forked from Apress/data-parallel-CPP
-
Notifications
You must be signed in to change notification settings - Fork 0
/
book_errata.txt
232 lines (176 loc) · 11.3 KB
/
book_errata.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
The following are known errors in the book Data Parallel C++: Mastering DPC++ for
Programming of Heterogeneous Systems using C++ and SYCL by James Reinders, Ben Ashbaugh,
James Brodman, Michael Kinsner, John Pennycook, Xinmin Tian (Apress, 2020).
p.106 - Fig 4-11 - Comment in code:
// Return the offset of this item (if with_offset == true)
should read:
// Return the offset of this item (if WithOffset == true)
p.183 - Fig. 7-6 - In the first column, there should be no capitalization,
therefore "Write" should be "write"
and "read_Write" should be "read_write"
p.187 - Instead, we use the default access mode, which is read-write -> read_write
p.233 - Figure 9-14: defines the template parameter T outside of the code snippet without explanation, reader is left unaware what is T.
p.266 - Figure 11-3: mult_ptr -> multi_ptr (two occurrences)
p.284 - The line "if (dev.is_host()) score += 100;" will need
deleting when implementations drop support for a host_device (see
errata below regarding changes in the SYCL 2020 specification that we
did not anticipate in the book.) The accompanying text needs to
consider the implications of such a change: without it, it is possible
that this device selector will fail to find a device.
p.294 - Mistakenly have wrong definitions for preferred_work_group_size_multiple (we repeated the prior definiton of preferred_work_group_size). The correct definition for preferred_work_group_size_multiple is:
Work-group size should be a multiple of this value (preferred_work_group_size_multiple) for executing a kernel on a particular device for best performance. The value must not be greater than work_group_size.
p.319 - translational unit -> translation unit
p.330 - A scan is said to be exclusive... the range [0, i] -> range [0, i)
p.368 - Figure 15-13: the rightmost "Op" box should be moved to the right since it cannot overlap the bottommost "Long Operation" box
p.373 - it may preferable to save trasnfer costs -> it may be preferable...
p.375 - our somewhat-parallel matrix multiplication kernel in Figure 15-15 -> Figure 15-5
p.414 - Figure 16-17: get_globalid(0) -> get_global_id(0)
p.421 - Fig 17-1 - The text at the top should be “ISA-based” instead of “SA-based”
p.436 - Fig 17-10 - The text "Valdation" should be "Validation"
p.437 - last paragraph... "which" is incorrectly in monospace font
p.438 - Figure 17-11: one "include" filename is red, the other black
p.474 - std:complex -> std::complex
p.475 - Figure 18-2: description of "fmax" overran "fmin" (fmin should be "Return y if y < x, otherwise it returns x")
p.475 - Figure 18-2: Square root of x2 + y2 -> x^2 + y^2
p.475 Figure 18-2: "trunc" should be in bold
p.476 - Figure 18-3: "abs_diff" should be in bold
p.476 - Figure 18-3: "popcount" should be in bold
p.508 - Figure 19-9: scq_rel -> acq_rel (two occurrences)
p.508 - The memory model exposes different memory orders through six values -> five
(because C++'s "consume" is excluded from SYCL)
Some code changes have been needed to keep up with tool updates, and notably final SYCL 2020.
The code affected has been updated in this github. The affected code is as follows:
fig_14_8_one_reduction.cpp:
dropped 'using namespace sycl::ONEAPI'
this allows reduction to use the sycl::reduction,
added sycl::ONEAPI:: to plus
fig_14_11_user_defined_reduction.cpp:
dropped 'using namespace sycl::ONEAPI'
this allows reduction to use the sycl::reduction,
added sycl::ONEAPI:: to minimum.
fig_18_11_std_fill.cpp:
old naming dpstd:: is now oneapi::dpl::
fig_18_13_binary_search.cpp:
dpstd:: is now
oneapi::dpl::
dpstd::execution::default_policy is now
oneapi::dpl::execution::dpcpp_default
fig_18_15_pstl_usm.cpp:
old naming dpstd:: is now oneapi::dpl::
SYCL 2020 was finalized after the book went to press. The largest
change, which was not anticipated in the book, was the dropping of the
host device.
This is a good thing, but we explained and promoted the host device
which is not longer part of the specification. We advise against
usign a host device, because long term we expect it will disappear
from all SYCL compilers.
The SYCL 2020 specification, has an Appendix on "what changed," that
explains it as this:
A SYCL implementation is no longer required to provide a host
device. Instead, an implementation is only required to provide at
least one device. Implementations are still allowed to provide devices
that are implemented on the host, but it is no longer required. The
specification no longer defines any special semantics for a "host
device" and APIs specific to the host device have been removed. The
default constructors for the device and platform classes have been
changed to construct a copy of the default device and a copy of the
platform containing the default device. Previously, they returned a
copy of the host device and a copy of the platform containing the host
device. The default constructor for the event class has also been
changed to construct an event that comes from a default-constructed
queue. Previously, it constructed an event that used the host backend.
Here is a quick explanation of how to interpret the book in light of
this change:
Chapter 2: Method #2 discusses "running device code on the host, which
is often used for debugging." Broadly speaking, the usual answer is
to request a CPU device to get this behavior. We say broadly, because
what you really want is a device that supports the best debugging. In
all implementation that we are aware of, this will be a CPU. However,
SYCL does not guarantee that any device on a given system will support
advanced debugging. The environment where this will occur most
frequently is an embedded systems enviroment. When developing in such
an environment, it is not uncommon to have diminished debugging
options and often have a different system entirely for debugging.
Regardless, the best debugging environment is what we seek in Method
#2 in Chapter 2. Unless we know better, we would expect that would be
from a CPU device.
Chapter 2: "The host device is guaranteed to be always avaialble on
any system." This is no longer true - the some what "magical" host
device is not longer in the SYCL specification. Instead all devices
have properties that can be queried, and SYCL only guarantees that one
device must be offered on a system. We'll note, just to be annoying,
that if you don't install any drivers your program can still fail at
runtime because effectively no device will be present. This is
because the guarantee only holds for a properly setup system
supporting SYCL.
Chapter 12: mentioned host device, and for the same reasons explained
for Chapter 2, the general answer here is to request a CPU device with
the caveat that there may not be such a device. In Figures 12-2 and
12-3, the "host_selector()" would need to be changed to
"cpu_selector()" but that will lose the quality of always being
guaranteed to work.
Chapter 13: Mentioned host device as well, so the guarantee mentioned
on page 299 is no longer correct - as we have mentioned already in
this errata. The mention of "Method#2" from Chapter 2, that is foudn
on pages 305 and 306 (Chapter 13) needs to consider the errata for
Chapter 2 we have already covered.
Chapter 19, page 496, "host device" is now "CPU device."
Chapter 19, page 511, "(i.e., a host device...)" can just be ignored.
Chapter 19, page 513, "host device is required to support all memory
orderings." There is no device with has this requirement any more,
although generally CPU device will have this quality. Properties of a
device must be queried to determine what is actually supported.
An additional subtle change was spliting sycl::vec into sycl::vec and sycl::marray.
This is a very good thing, and makes vec better.
The SYCL 2020 specification, has an Appendix on "what changed," that
explains it as this:
A new math array type marray has been defined to begin disambiguation
of the multiple possible interpretations of how sycl::vec should be
interpreted and implemented.
In the book, we anticipated this coming since parts were in the
early drafts for SYCL 2020, and we tried to avoid covering it in
a way that would become invalid. We succeeded on that, but it
means that we really didn't dive into it as deeply as we could
have. Of course the book was probably long enough! In the book
we simply said: "the need for this section of the book (talking
about interpretations of vectors) highlights that there is
confusion on what a vector means, and that will be solved in sYCL
in the future. There is a hint of this in the SYCL 2020
provisional specification where a math array type (marray) has
been described, which is explicitly the first interpretation from
this section—a convenience type unrelated to vector hardware
instructions. We should expect another type to also eventually
appear to cover the second interpretation, likely aligned with
the C++ std::simd templates. With these two types being clearly
associated with specific interpretations of a vector data type,
our intent as programmers will be clear from the code that we
write. this will be less error prone and less confusing and may
even reduce the number of heated discussions between expert
developers when the question arises “What is a vector?”
So, we hope to write more about that in the future. Stay tuned, we'll
add a note here is we expand upon this in a future blog.
Code samples from Figures -
Error of omission: code samples for 18-10, 18-12, 18-14, and 19-17 are not in the repository.
This does not affect the accuracy of the book, but it means we failed to supply the code,
and we do not currently include it to test compilation.
Some general comments:
(1) use DPC++ (dpcpp) to compile SYCL code and for the final program linking
While any compiler can be used to compile host code (such as g++), dpcpp must be used for kernel
code and to invoke the final linkage. This is because only dpcpp knows that this is a SYCL program.
Using another compiler to do the final linkage will not result in a functional SYCL program.
There are probably several places in the book that we could have pointed this out - and we did not.
Mixing use of compilers with DPC++
Four simple rules exist for using multiple compilers with DPC++:
1. Host code can be compiled with any compiler.
2. main() needs to be compiled with DPC++ or a C++ compiler.
3. Any source files containing device code should be compiled with DPC++.
4. The linkage of the final program needs to be done with the DPC++ compiler.
Mixing the use of another SYCL compiler with DPC++, is not currently supported. As SYCL compilers evolve, this might change.
A simple example of mixing compilers could be:
dpcpp -c michigan.cpp # may have host and device code
g++ -c erie.cpp # host code only
ifx -c ontario.f90 # host code only
icx -c huron.cpp # host code only
dpcpp -c superior.cpp # may have host and device code
# final linkage must be using DPC++
dpcpp