-
Notifications
You must be signed in to change notification settings - Fork 0
/
local-search.xml
526 lines (251 loc) · 348 KB
/
local-search.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
<?xml version="1.0" encoding="utf-8"?>
<search>
<entry>
<title>如何在 golang 中上下对齐打印字符串</title>
<link href="/2023/02/02/%E5%A6%82%E4%BD%95%E5%9C%A8-golang-%E4%B8%AD%E4%B8%8A%E4%B8%8B%E5%AF%B9%E9%BD%90%E6%89%93%E5%8D%B0%E5%AD%97%E7%AC%A6%E4%B8%B2/"/>
<url>/2023/02/02/%E5%A6%82%E4%BD%95%E5%9C%A8-golang-%E4%B8%AD%E4%B8%8A%E4%B8%8B%E5%AF%B9%E9%BD%90%E6%89%93%E5%8D%B0%E5%AD%97%E7%AC%A6%E4%B8%B2/</url>
<content type="html"><![CDATA[<p>在一些场景中,我们需要像表格一样整齐地打印一些信息,比如一个人的姓名,家庭地址和联系方式,我希望打印的格式像下面这样:</p><figure class="highlight ada"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs ada">Name : <span class="hljs-type">Bob</span><br>Address : <span class="hljs-type">New</span> York Avenue<br>Phone : 12345674567<br></code></pre></td></tr></table></figure><p>这个问题听上去非常简单,简单到似乎不该成为一个问题,我们很容易地给出下面这段代码:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><code class="hljs go">information := <span class="hljs-keyword">map</span>[<span class="hljs-type">string</span>]<span class="hljs-type">string</span>{<br> <span class="hljs-string">"Name"</span>: <span class="hljs-string">"Bob"</span>,<br> <span class="hljs-string">"Address"</span>: <span class="hljs-string">"New York Avenue"</span>,<br> <span class="hljs-string">"Phone"</span>: <span class="hljs-string">"12345674567"</span>,<br>}<br><span class="hljs-keyword">for</span> k, v := <span class="hljs-keyword">range</span> information {<br> fmt.Printf(<span class="hljs-string">"%-10s: %s\n"</span>, k, v)<br>}<br></code></pre></td></tr></table></figure><p>效果如下:</p><figure class="highlight ada"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs ada">Name : <span class="hljs-type">Bob</span><br>Address : <span class="hljs-type">New</span> York Avenue<br>Phone : 12345674567<br></code></pre></td></tr></table></figure><p>好,下面我们希望打印的信息中可能参杂一些中文字符,使用相同的代码去打印:</p><figure class="highlight golang"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><code class="hljs golang">Name : Bob<br>Address : New York Avenue<br>Phone : <span class="hljs-number">12345674567</span><br>国籍 : 无国籍<br></code></pre></td></tr></table></figure><p>可以看到不这么对齐了,这实在让人有点困惑。</p><p>而在我们继续往下之前,对 golang 比较熟悉的人应该都知道 character 和 byte 的区别,golang 之父 Rob Pike 也在这篇文章<sup id="fnref:1" class="footnote-ref"><a href="#fn:1" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Strings, bytes, runes and characters in Go](https://go.dev/blog/strings)">[1]</span></a></sup>中比较详尽地解释了差别。</p><p>对于一个中文字符,使用<code>len()</code>实际上计算的是其 byte 长度</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><code class="hljs go">fmt.Printf(<span class="hljs-string">"the length of a is %d\n"</span>, <span class="hljs-built_in">len</span>(<span class="hljs-string">"a"</span>))<span class="hljs-comment">// the length of a is 1</span><br>fmt.Printf(<span class="hljs-string">"the length of 我 is %d\n"</span>, <span class="hljs-built_in">len</span>(<span class="hljs-string">"我"</span>))<span class="hljs-comment">// the length of 我 is 3</span><br></code></pre></td></tr></table></figure><p> 而如果想要统计字符即 character 的数量,golang 提供了<code>utf8.RuneCountInString</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs go">text := <span class="hljs-string">"你好,世界"</span><br>fmt.Printf(<span class="hljs-string">"the length of %s is %d\n"</span>, text, utf8.RuneCountInString(text))<br><span class="hljs-comment">// the length of 你好,世界 is 5</span><br></code></pre></td></tr></table></figure><p>有没有可能<code>fmt</code>在计算中文这样的字符的长度时有问题呢?我们可以看一下<code>fmt</code>的实现</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-comment">// padString appends s to f.buf, padded on left (!f.minus) or right (f.minus).</span><br><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-params">(f *fmt)</span></span> padString(s <span class="hljs-type">string</span>) {<br><span class="hljs-keyword">if</span> !f.widPresent || f.wid == <span class="hljs-number">0</span> {<br>f.buf.writeString(s)<br><span class="hljs-keyword">return</span><br>}<br>width := f.wid - utf8.RuneCountInString(s)<br><span class="hljs-keyword">if</span> !f.minus {<br><span class="hljs-comment">// left padding</span><br>f.writePadding(width)<br>f.buf.writeString(s)<br>} <span class="hljs-keyword">else</span> {<br><span class="hljs-comment">// right padding</span><br>f.buf.writeString(s)<br>f.writePadding(width)<br>}<br>}<br></code></pre></td></tr></table></figure><p><code>fmt</code>在统计长度时使用的同样是<code>utf8.RuneCountInString</code>,那么问题只可能出现在打印的效果上,在这里就需要区分两个概念,一个是<code>character length</code>,另一个则是<code>displayed width</code>,同样是一个长度的字符,中文的<code>我</code>和<code>i</code>占的宽度并不相同,详细地可以参考一下 UAX 的 report<sup id="fnref:2" class="footnote-ref"><a href="#fn:2" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Unicode® Standard Annex #11EAST ASIAN WIDTH](https://www.unicode.org/reports/tr11/tr11-40.html)">[2]</span></a></sup></p><p>而为了能够整齐地打印就需要有办法统计字符串的宽度,在 <a href="https://github.com/olekukonko/tablewriter">tablewriter</a> 和 <a href="https://github.com/jedib0t/go-pretty">go-pretty</a> 这些打印 table 的开源库中在计算 padding 时都用到了 <a href="https://github.com/mattn/go-runewidth">go-runewidth</a> 这个库,我们试着使用它改写一下我们的代码</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><code class="hljs go">information := <span class="hljs-keyword">map</span>[<span class="hljs-type">string</span>]<span class="hljs-type">string</span>{<br> <span class="hljs-string">"Name"</span>: <span class="hljs-string">"Bob"</span>,<br> <span class="hljs-string">"Address"</span>: <span class="hljs-string">"New York Avenue"</span>,<br> <span class="hljs-string">"Phone"</span>: <span class="hljs-string">"12345674567"</span>,<br> <span class="hljs-string">"国籍"</span>: <span class="hljs-string">"无国籍"</span>,<br>}<br><span class="hljs-keyword">for</span> k, v := <span class="hljs-keyword">range</span> information {<br> kWid := runewidth.StringWidth(k)<br> <span class="hljs-keyword">if</span> kWid <= <span class="hljs-number">10</span> {<br> k += strings.Repeat(<span class="hljs-string">" "</span>, <span class="hljs-number">10</span>-kWid)<br> }<br> fmt.Printf(<span class="hljs-string">"%s: %s\n"</span>, k, v)<br>}<br></code></pre></td></tr></table></figure><p>It works like a charm!</p><p>当然,也正如 stackoverflow 一篇回答<sup id="fnref:3" class="footnote-ref"><a href="#fn:3" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Get the width of Chinese strings correctly](https://stackoverflow.com/questions/69559133/get-the-width-of-chinese-strings-correctly)">[3]</span></a></sup>评论中提到的那样</p><blockquote><p>runes do not have a “pixel width”, the font does. Therefore the answer will depend on the tool/package you’re using to render the font. </p></blockquote><p>有的时候显示长度会取决于字体的设计,不过 <a href="https://github.com/mattn/go-runewidth">go-runewidth</a> 在大部分情况都可以工作得很好。</p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><section class="footnotes"><div class="footnote-list"><ol><li><span id="fn:1" class="footnote-text"><span><a href="https://go.dev/blog/strings">Strings, bytes, runes and characters in Go</a><a href="#fnref:1" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:2" class="footnote-text"><span><a href="https://www.unicode.org/reports/tr11/tr11-40.html">Unicode® Standard Annex #11EAST ASIAN WIDTH</a><a href="#fnref:2" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:3" class="footnote-text"><span><a href="https://stackoverflow.com/questions/69559133/get-the-width-of-chinese-strings-correctly">Get the width of Chinese strings correctly</a><a href="#fnref:3" rev="footnote" class="footnote-backref"> ↩</a></span></span></li></ol></div></section>]]></content>
<categories>
<category>golang</category>
</categories>
<tags>
<tag>golang</tag>
</tags>
</entry>
<entry>
<title>使用 Golang 实现 AES 加密算法</title>
<link href="/2022/12/26/%E4%BD%BF%E7%94%A8-Golang-%E5%AE%9E%E7%8E%B0-AES-%E5%8A%A0%E5%AF%86%E7%AE%97%E6%B3%95/"/>
<url>/2022/12/26/%E4%BD%BF%E7%94%A8-Golang-%E5%AE%9E%E7%8E%B0-AES-%E5%8A%A0%E5%AF%86%E7%AE%97%E6%B3%95/</url>
<content type="html"><![CDATA[<h2 id="写在之前"><a href="#写在之前" class="headerlink" title="写在之前"></a>写在之前</h2><h3 id="定位"><a href="#定位" class="headerlink" title="定位"></a>定位</h3><p>本文无意于介绍对称加密和非对称加密算法的区别,也无意于从底层来讲解 AES 算法的加密原理以及一步步实现,可能会有人注意到 Golang 实际上内置了 <code>cypto/aes</code> 这个包,但它的抽象程度还没有那么高(比如提供像<code>Encrypt(key, plainText)</code> 以及 <code>Decrypt(key, cipherText)</code>这样的接口),所以到我们真正使用间还隔着一道沟壑,所以本文就是在讲如何去”填埋”,如何快速地利用官方提供的包去实现自己业务上可用的 AES 算法。</p><h3 id="注意点"><a href="#注意点" class="headerlink" title="注意点"></a>注意点</h3><p>我个人并没有数字安全的从业背景,对密码学了解有但十分有限,无法保证给出的代码在生产上的绝对安全,但会尽我所能,利用可用的参考资料,让我的代码尽可能可信可靠,如果能在什么地方帮到你,这是我的荣幸。如果你有任何的意见,欢迎提出反馈,一起讨论。</p><h3 id="代码仓库参考"><a href="#代码仓库参考" class="headerlink" title="代码仓库参考"></a>代码仓库参考</h3><p>项目仓库地址:<a href="https://github.com/FLAGLORD/goaes">https://github.com/FLAGLORD/goaes</a></p><h2 id="实现"><a href="#实现" class="headerlink" title="实现"></a>实现</h2><h3 id="Padding"><a href="#Padding" class="headerlink" title="Padding"></a>Padding</h3><blockquote><p>AES is a <em>block cipher with a block length of 128 bits</em>.</p></blockquote><p><a href="https://zh.wikipedia.org/wiki/%E5%88%86%E7%BB%84%E5%AF%86%E7%A0%81"> <code>block cipher(分组加密)</code></a>需要将明文分成多个等长的模块(block),使用确定的算法和对称密钥对每组分别加密和解密。然而在绝大部份情况下,我们给出的明文长度并非是 <code>block-aligned</code>的,即无法被模块长度整除。所以在这种情况下,我们需要在加密前去使用 Padding 对明文进行补齐,并在加密传输并解密后去除 Padding。而在去除时如何区分 Padding 和实际传输的明文信息便体现着 Padding 算法的精妙所在。</p><p>我使用的是<a href="https://en.wikipedia.org/wiki/Padding_(cryptography)#PKCS#5_and_PKCS#7"><code>PKCS#7Padding</code></a>,它的思想很简单但也很巧妙:缺 n 个 byte,便填补 n 个值为 n 的 byte。而为了能够确认明文是否添加过 Padding,选择的做法是 <em><strong>always-padded</strong></em>,即便明文的长度恰好能够被模块长度整除,我们也会去添加一个虚块(dummy block),实现如下: </p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">PKCS7Pad</span><span class="hljs-params">(data []<span class="hljs-type">byte</span>, blockSize <span class="hljs-type">int</span>)</span></span> ([]<span class="hljs-type">byte</span>, <span class="hljs-type">error</span>) {<br><span class="hljs-keyword">if</span> blockSize < <span class="hljs-number">1</span> || blockSize >= <span class="hljs-number">256</span> {<br><span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, fmt.Errorf(<span class="hljs-string">"invalid block size: %d"</span>, blockSize)<br>}<br><br><span class="hljs-comment">// according to https://www.rfc-editor.org/rfc/rfc2315:</span><br><span class="hljs-comment">//</span><br><span class="hljs-comment">//2. Some content-encryption algorithms assume the</span><br><span class="hljs-comment">//input length is a multiple of k octets, where k > 1, and</span><br><span class="hljs-comment">//let the application define a method for handling inputs</span><br><span class="hljs-comment">//whose lengths are not a multiple of k octets. For such</span><br><span class="hljs-comment">//algorithms, the method shall be to pad the input at the</span><br><span class="hljs-comment">//trailing end with k - (l mod k) octets all having value k -</span><br><span class="hljs-comment">//(l mod k), where l is the length of the input. In other</span><br><span class="hljs-comment">//words, the input is padded at the trailing end with one of</span><br><span class="hljs-comment">//the following strings:</span><br><span class="hljs-comment">//</span><br><span class="hljs-comment">//01 -- if l mod k = k-1</span><br><span class="hljs-comment">//02 02 -- if l mod k = k-2</span><br><span class="hljs-comment">//.</span><br><span class="hljs-comment">//.</span><br><span class="hljs-comment">//.</span><br><span class="hljs-comment">//k k ... k k -- if l mod k = 0</span><br><span class="hljs-comment">//</span><br><span class="hljs-comment">//The padding can be removed unambiguously since all input is</span><br><span class="hljs-comment">//padded and no padding string is a suffix of another. This</span><br><span class="hljs-comment">//padding method is well-defined if and only if k < 256;</span><br><span class="hljs-comment">//methods for larger k are an open issue for further study.</span><br><span class="hljs-comment">//</span><br><br><span class="hljs-comment">// calculate the padding length, ranging from 1 to blockSize</span><br>paddingLen := blockSize - <span class="hljs-built_in">len</span>(data)%blockSize<br><br><span class="hljs-comment">// build the padding text</span><br>padding := bytes.Repeat([]<span class="hljs-type">byte</span>{<span class="hljs-type">byte</span>(paddingLen)}, paddingLen)<br><span class="hljs-keyword">return</span> <span class="hljs-built_in">append</span>(data, padding...), <span class="hljs-literal">nil</span><br>}<br></code></pre></td></tr></table></figure><p>相应的 Unpad 算法也比较清晰,我们只需要去读取最后一个 byte 代表的数字,并将相应长度的尾缀移除即可:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">PKCS7UnPad</span><span class="hljs-params">(data []<span class="hljs-type">byte</span>, blockSize <span class="hljs-type">int</span>)</span></span> ([]<span class="hljs-type">byte</span>, <span class="hljs-type">error</span>) {<br>length := <span class="hljs-built_in">len</span>(data)<br><span class="hljs-keyword">if</span> length == <span class="hljs-number">0</span> { <span class="hljs-comment">// empty</span><br><span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, errors.New(<span class="hljs-string">"unpad called on zero length byte array"</span>)<br>}<br><span class="hljs-keyword">if</span> length%blockSize != <span class="hljs-number">0</span> {<br><span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, errors.New(<span class="hljs-string">"data is not block-aligned"</span>)<br>}<br><br><span class="hljs-comment">// just the number that the last byte represents</span><br>paddingLen := <span class="hljs-type">int</span>(data[length<span class="hljs-number">-1</span>])<br>padding := bytes.Repeat([]<span class="hljs-type">byte</span>{<span class="hljs-type">byte</span>(paddingLen)}, paddingLen)<br><span class="hljs-keyword">if</span> paddingLen > blockSize || paddingLen == <span class="hljs-number">0</span> || !bytes.HasSuffix(data, padding) {<br><span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, errors.New(<span class="hljs-string">"invalid padding"</span>)<br>}<br><span class="hljs-keyword">return</span> data[:length-paddingLen], <span class="hljs-literal">nil</span><br>}<br></code></pre></td></tr></table></figure><h3 id="Encrypt"><a href="#Encrypt" class="headerlink" title="Encrypt"></a>Encrypt</h3><p>大致步骤为:</p><ol><li>使用密钥初始化 <code>cipher.Block</code></li><li>对明文做 Padding 处理</li><li>初始化 Initilizaiton Vector(IV)</li><li>使用 CBC mode 对明文进行加密</li><li>计算 HMAC</li><li>返回结果(由 IV + HMAC + Ciphertext 三部分组成)</li></ol><p>代码如下:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">Encrypt</span><span class="hljs-params">(key []<span class="hljs-type">byte</span>, plainText []<span class="hljs-type">byte</span>)</span></span> ([]<span class="hljs-type">byte</span>, <span class="hljs-type">error</span>) {<br> <span class="hljs-comment">// 1. 使用密钥初始化</span><br>block, err := aes.NewCipher(key)<br><span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {<br><span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err<br>}<br>blockSize := block.BlockSize()<br><br> <span class="hljs-comment">// 2. 对明文做 Padding 处理</span><br>plainText, err = PKCS7Pad(plainText, blockSize)<br><span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {<br><span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err<br>}<br><br><span class="hljs-comment">// The IV needs to be unique, but not secure. Therefore it's common to</span><br><span class="hljs-comment">// include it at the beginning of the ciphertext.</span><br>cipherText := <span class="hljs-built_in">make</span>([]<span class="hljs-type">byte</span>, blockSize+sha256.Size+<span class="hljs-built_in">len</span>(plainText))<br>iv := cipherText[:blockSize]<br>mac := cipherText[blockSize : blockSize+sha256.Size]<br>payload := cipherText[blockSize+sha256.Size:]<br> <span class="hljs-comment">// 3. 初始化 IV</span><br><span class="hljs-keyword">if</span> _, err = rand.Read(iv); err != <span class="hljs-literal">nil</span> {<br><span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err<br>}<br><br> <span class="hljs-comment">// 4. 加密明文</span><br>mode := cipher.NewCBCEncrypter(block, iv)<br>mode.CryptBlocks(payload, plainText)<br><br><span class="hljs-comment">// we use Encrypt-then-MAC</span><br><span class="hljs-comment">// https://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac</span><br><span class="hljs-comment">// 5. 计算 HMAC</span><br>hash := hmac.New(sha256.New, key)<br>hash.Write(payload)<br><span class="hljs-built_in">copy</span>(mac, hash.Sum(<span class="hljs-literal">nil</span>))<br><br><span class="hljs-keyword">return</span> cipherText, <span class="hljs-literal">nil</span><br>}<br></code></pre></td></tr></table></figure><h4 id="block-cipher-mode-of-operation"><a href="#block-cipher-mode-of-operation" class="headerlink" title="block cipher mode of operation"></a>block cipher mode of operation</h4><p><code>block cipher</code>其本身只能处理固定长度(size of block)的数据,而当我们的明文数据超过单个模块长度时,如何迭代地应用 <code>block cipher</code>加密的方法称之为<a href="https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation">操作模式(mode of operation)</a></p><p>用于做 mask patterns 的模式主要为以下五种:</p><ul><li>ECB</li><li>CBC</li><li>CFB</li><li>OFB</li><li>CTR</li></ul><p>前两种是需要 Padding 的,后三种由于基于 stream 并不需要去 Padding,这几种模式并不需要我们自己去实现, Golang 其实也为我们提供好了,我在实现 AES 算法时使用了 CBC,理论上你可以自己对它进行替换。</p><h4 id="Initialization-Vector-IV"><a href="#Initialization-Vector-IV" class="headerlink" title="Initialization Vector(IV)"></a>Initialization Vector(IV)</h4><p>绝大多数的 mode 都需要 <code>Initialization Vector(IV)</code>来引入随机性,保证即便是使用相同的明文以及密钥,加密后得到的密文仍然是不同的。IV 并不要求是 secure 的,即它可以暴露(<em>比如我把它放在了密文的头部,这样的话从固定位置取出并利用其去进行解密</em>),但是它要求其不被重复使用,所以我们可以使用随机数去填充它。</p><p>在填充随机数时我使用了<code>cypto/rand</code>提供的 <code>Read()</code>函数,它会去使用包内置的一个全局共享的随机数生成器实例,其保证是密码学安全的。</p><h4 id="HMAC"><a href="#HMAC" class="headerlink" title="HMAC"></a>HMAC</h4><p>HMAC 在 encrypt 的使用有以下三种方式:</p><ul><li>encrypt-then-mac,先加密,然后对密文计算 mac</li><li>mac-then-encrypt,先对明文计算 mac,然后将明文和 mac 一起加密</li><li>mac-and-encrypt,先对明文计算 mac,然后对明文进行加密,将 mac 添加到密文后</li></ul><p>这个争论很多,但多数研究者比较推荐的是 encrypt-then-mac,详细的可以去看代码注释中提到的<a href="https://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac">那篇回答</a>。</p><p><code>encrypt-then-mac</code>可以在解密时去验证 ciphertext 在传递过程中的 integrity,也可以防范 <a href="https://en.wikipedia.org/wiki/Padding_oracle_attack">Padding oracle attack</a></p><h3 id="Decrypt"><a href="#Decrypt" class="headerlink" title="Decrypt"></a>Decrypt</h3><p>步骤大致如下:</p><ol><li>使用密钥初始化 <code>cipher.Block</code></li><li>验证长度是否过短(理论上,由于 IV 和 HMAC 的存在,长度至少需要大于 16 + 32 = 48 个 byte)以及实际密文长度(即去除 IV 以及 HMAC 后剩余的部分)是否能够被 block size 整除</li><li>验证 HMAC</li><li>使用与加密对应的 CBC mode 解密</li><li>去除 Padding </li><li>返回结果</li></ol><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">Decrypt</span><span class="hljs-params">(key []<span class="hljs-type">byte</span>, cipherText []<span class="hljs-type">byte</span>)</span></span> ([]<span class="hljs-type">byte</span>, <span class="hljs-type">error</span>) {<br> <span class="hljs-comment">// 1. 使用密钥初始化</span><br>block, err := aes.NewCipher(key)<br><span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {<br><span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err<br>}<br>blockSize := block.BlockSize()<br><br> <span class="hljs-comment">// 2. 验证长度是否过短</span><br><span class="hljs-keyword">if</span> <span class="hljs-built_in">len</span>(cipherText) <= blockSize+sha256.Size {<br><span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, errors.New(<span class="hljs-string">"ciphertext too short"</span>)<br>}<br><br>iv := cipherText[:blockSize]<br>mac := cipherText[blockSize : blockSize+sha256.Size]<br>cipherText = cipherText[blockSize+sha256.Size:]<br><br><span class="hljs-comment">// 2. 验证实际密文长度是否合法</span><br><span class="hljs-keyword">if</span> <span class="hljs-built_in">len</span>(cipherText)%blockSize != <span class="hljs-number">0</span> {<br><span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, errors.New(<span class="hljs-string">"ciphertext is not block-aligned, maybe corrupted"</span>)<br>}<br><br>hash := hmac.New(sha256.New, key)<br>hash.Write(cipherText)<br> <span class="hljs-comment">// 3. 验证 HMAC</span><br><span class="hljs-keyword">if</span> !hmac.Equal(hash.Sum(<span class="hljs-literal">nil</span>), mac) {<br><span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, errors.New(<span class="hljs-string">"hmac failure, message corrupted"</span>)<br>}<br><br>plainText := <span class="hljs-built_in">make</span>([]<span class="hljs-type">byte</span>, <span class="hljs-built_in">len</span>(cipherText))<br>mode := cipher.NewCBCDecrypter(block, iv)<br> <span class="hljs-comment">// 4. 解密</span><br>mode.CryptBlocks(plainText, cipherText)<br><br> <span class="hljs-comment">// 5. 去除 Padding</span><br>plainText, err = PKCS7UnPad(plainText, blockSize)<br><span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {<br><span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>, err<br>}<br><span class="hljs-keyword">return</span> plainText, <span class="hljs-literal">nil</span><br>}<br></code></pre></td></tr></table></figure><h2 id="最后"><a href="#最后" class="headerlink" title="最后"></a>最后</h2><p>实现一个 AES 算法并没有想象中那么容易,因为有不少点需要想清楚想明白,之前我在 Google 上搜索资料和在 Github 查看一些参考实现时,发现需要代码片段或多或少都有些小问题,或是没有使用 HMAC,或是在使用 HMAC 时使用了 <code>mac-then-encrypt</code>或者 <code>mac-and-encrypt</code>的方式等等。</p><p>写下此文仅做记录。</p>]]></content>
<categories>
<category>golang</category>
</categories>
<tags>
<tag>golang</tag>
</tags>
</entry>
<entry>
<title>Goroutine 使用的一些经验谈</title>
<link href="/2022/11/28/Goroutine-%E4%BD%BF%E7%94%A8%E7%9A%84%E4%B8%80%E4%BA%9B%E7%BB%8F%E9%AA%8C%E8%B0%88/"/>
<url>/2022/11/28/Goroutine-%E4%BD%BF%E7%94%A8%E7%9A%84%E4%B8%80%E4%BA%9B%E7%BB%8F%E9%AA%8C%E8%B0%88/</url>
<content type="html"><![CDATA[<p>在工作中 Goroutine 使用得相当多,积累了不少经验,也逐渐学习了一些小 tricks,在此进行一些总结 。</p><h2 id="如何控制-Goroutine-的数量"><a href="#如何控制-Goroutine-的数量" class="headerlink" title="如何控制 Goroutine 的数量"></a>如何控制 Goroutine 的数量</h2><p>关于 Goroutine 的语法不做赘述。</p><p>首先,使用协程池还是直接创建新的 Goroutine 并限制数量(协程池会尽可能复用而不是创建)仁者见仁,智者见智,由于 Goroutine 比较轻量级,即便创建新的 Routine 资源消耗也不会很大,在并发数不高的情况下,可以不用过多在意,详情可以参考下这篇回答<sup id="fnref:1" class="footnote-ref"><a href="#fn:1" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Does a goroutine pool make sense like thread pools in other languages?](https://stackoverflow.com/questions/48659334/does-a-goroutine-pool-make-sense-like-thread-pools-in-other-languages)">[1]</span></a></sup></p><p>下面会介绍提到的两种限制数量的方法。</p><h3 id="开源协程池"><a href="#开源协程池" class="headerlink" title="开源协程池"></a>开源协程池</h3><p>主流的为以下两个:</p><ul><li><a href="https://github.com/Jeffail/tunny">tunny</a></li><li><a href="https://github.com/panjf2000/ants">ants</a></li></ul><p>以 ants 为例,ants 在 <a href="https://github.com/panjf2000/ants#-how-to-use">README</a> 中其实有比较详细的介绍:</p><h4 id="common-pool"><a href="#common-pool" class="headerlink" title="common pool"></a>common pool</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">demoFunc</span><span class="hljs-params">()</span></span> {<br>time.Sleep(<span class="hljs-number">10</span> * time.Millisecond)<br>fmt.Println(<span class="hljs-string">"Hello World!"</span>)<br>}<br><br><span class="hljs-keyword">defer</span> ants.Release()<br><br>runTimes := <span class="hljs-number">1000</span><br><br><span class="hljs-comment">// Use the common pool.</span><br><span class="hljs-keyword">var</span> wg sync.WaitGroup<br>syncCalculateSum := <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">()</span></span> {<br> demoFunc()<br> wg.Done() <span class="hljs-comment">// 进行并发控制</span><br>}<br><span class="hljs-keyword">for</span> i := <span class="hljs-number">0</span>; i < runTimes; i++ {<br> wg.Add(<span class="hljs-number">1</span>)<br> _ = ants.Submit(syncCalculateSum)<br>}<br>wg.Wait() <span class="hljs-comment">// 等待任务完成</span><br>fmt.Printf(<span class="hljs-string">"running goroutines: %d\n"</span>, ants.Running())<br>fmt.Printf(<span class="hljs-string">"finish all tasks.\n"</span>)<br></code></pre></td></tr></table></figure><p>正如其名,其很通用,因为 <code>Sumbit()</code>完全可以传递完全不同的函数作为参数。</p><p>有一些注意点:</p><ol><li><p>不能直接去 <code>Submit</code> 我们的 workerFunction (即 <code>demoFunc()</code>),正如使用 Goroutine 的一般经验,需要使用去<code>WaitGroup</code>去进行控制,示例使用了<code>syncCalculateSum</code>进行了一层包络,同时在调用前要<code>wg.Add(1)</code></p></li><li><p>尽管示例中的 <code>demoFunc()</code>没有参数,而且<code>Submit()</code>的<a href="https://github.com/panjf2000/ants/blob/master/pool.go#L164">函数签名</a>中可以看到其能接受的函数参数也没有任何参数值,但是我们可以使用 <code>closure</code>闭包机制去传递参数。</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">demoFunc</span><span class="hljs-params">(person <span class="hljs-type">string</span>)</span></span> {<br>fmt.Println(<span class="hljs-string">"Hello "</span> + person)<br>}<br><br><span class="hljs-keyword">defer</span> ants.Release()<br><br><span class="hljs-keyword">var</span> people = []<span class="hljs-type">string</span>{<br> <span class="hljs-string">"Alice"</span>,<br> <span class="hljs-string">"Bob"</span>,<br>}<br><br><span class="hljs-comment">// Use the common pool.</span><br><span class="hljs-keyword">var</span> wg sync.WaitGroup<br><br><span class="hljs-keyword">for</span> _, person := <span class="hljs-keyword">range</span> peple {<br> localPerson := person <span class="hljs-comment">// important!</span><br> f := <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">()</span></span>{<br> demoFunc(localPerson)<br> wg.Done()<br> }<br> <br> wg.Add(<span class="hljs-number">1</span>)<br> _ = ants.Submit(f)<br>}<br>wg.Wait()<br></code></pre></td></tr></table></figure></li></ol><h4 id="Pool-with-a-func"><a href="#Pool-with-a-func" class="headerlink" title="Pool with a func"></a>Pool with a func</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-keyword">var</span> sum <span class="hljs-type">int32</span><br><br><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">myFunc</span><span class="hljs-params">(i <span class="hljs-keyword">interface</span>{})</span></span> {<br>n := i.(<span class="hljs-type">int32</span>)<br>atomic.AddInt32(&sum, n)<br>fmt.Printf(<span class="hljs-string">"run with %d\n"</span>, n)<br>}<br><br><span class="hljs-keyword">defer</span> ants.Release()<br><br>runTimes := <span class="hljs-number">1000</span><br><br><span class="hljs-comment">// Use the pool with a function,</span><br><span class="hljs-comment">// set 10 to the capacity of goroutine pool and 1 second for expired duration.</span><br>p, _ := ants.NewPoolWithFunc(<span class="hljs-number">10</span>, <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">(i <span class="hljs-keyword">interface</span>{})</span></span> {<br> myFunc(i)<br> wg.Done()<br>})<br><span class="hljs-keyword">defer</span> p.Release()<br><span class="hljs-comment">// Submit tasks one by one.</span><br><span class="hljs-keyword">for</span> i := <span class="hljs-number">0</span>; i < runTimes; i++ {<br> wg.Add(<span class="hljs-number">1</span>)<br> _ = p.Invoke(<span class="hljs-type">int32</span>(i))<br>}<br>wg.Wait()<br>fmt.Printf(<span class="hljs-string">"running goroutines: %d\n"</span>, p.Running())<br>fmt.Printf(<span class="hljs-string">"finish all tasks, result is %d\n"</span>, sum)<br></code></pre></td></tr></table></figure><p>与 Common Pool 相比,其只执行特定的函数,但在使用思路上大体一致,同样需要使用<code>WaitGroup</code>进行控制。</p><p>示例中只传递了一个参数,同时<code>NewPoolWithFunc()</code>的<a href="https://github.com/panjf2000/ants/blob/master/pool_func.go#L127">函数签名</a>中也可以看到其接受的函数参数的参数值只有一个<code>interface{}</code>,也缺少一个类似于 <code>func NewPoolWithFunc(size int, pf func(...interface{}), options ...Option)</code> 的接口,为了能够传递多个参数可能就需要一些 trick.</p><p>一种方法就是将 workFunc 需要的所有参数包装成一个结构体,再拆解传递:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">workerFunc</span><span class="hljs-params">(paramA TypeA, paramB TypeB)</span></span>{<br> <span class="hljs-comment">// ...</span><br>}<br><br><span class="hljs-keyword">type</span> Params <span class="hljs-keyword">struct</span>{<br> A TypeA<br> B TypeB<br>}<br><br>p, _ := ants.NewPoolWithFunc(<span class="hljs-number">10</span>, <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">(i <span class="hljs-keyword">interface</span>{})</span></span>{<br> p, _ := i.(Params) <span class="hljs-comment">// 断言</span><br> workerFunc(p.A, p.B)<br> wg.Done()<br>})<br></code></pre></td></tr></table></figure><p>另一种方法则是上面提到的闭包机制。</p><h4 id="关于性能"><a href="#关于性能" class="headerlink" title="关于性能"></a>关于性能</h4><p>还是需要强调,在 worker 数量并不大的情况下,协程池能带来的性能提升是极其有限的,尽管可能会有比较明显的内存使用优势(因为复用机制)<sup id="fnref:2" class="footnote-ref"><a href="#fn:2" rel="footnote"><span class="hint--top hint--rounded" aria-label="[测试发现使用池计算时间并没有缩短,只有内存占用变小了。](https://github.com/panjf2000/ants/issues/1)">[2]</span></a></sup>,但协程池的使用毕竟增加了复杂度,这点需要进行权衡。</p><h3 id="直接创建新的-Routine"><a href="#直接创建新的-Routine" class="headerlink" title="直接创建新的 Routine"></a>直接创建新的 Routine</h3><p>除了使用协程池去对 Goroutine 进行复用,还可以直接创建新的 Goroutine ,需要的则是添加一些对数量进行限制的逻辑:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-keyword">var</span> wg sync.WaitGroup<br><br>workerLimiter := <span class="hljs-built_in">make</span>(<span class="hljs-keyword">chan</span> <span class="hljs-keyword">struct</span>{}, workerNum)<br><span class="hljs-keyword">for</span> i := <span class="hljs-number">0</span>; i < runTimes; i++ {<br> wg.Add(<span class="hljs-number">1</span>)<br> workLimiter <- <span class="hljs-keyword">struct</span>{}{}<br> <span class="hljs-keyword">go</span> <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">(data ParamStruct)</span></span>{<br> workerFunc(data)<br> <br> wg.Done()<br> <-workerLimiter<br> }(yourData)<br>}<br></code></pre></td></tr></table></figure><p>核心思想就是使用空结构体的带有缓冲的<code>channel</code>,利用缓冲区满时 Block 的性质可以很容易地控制并发的数量。</p><div class="note note-warning"> <p>这里很推荐将任务逻辑包装成一个<code>workerFunc</code>;尽管不那么做也可以,比如像下面这样:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-keyword">go</span> <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">(data ParamStruct)</span></span>{<br> <span class="hljs-comment">// your task logic </span><br> ...<br> <br> wg.Done()<br> <-workerLimiter<br> }(yourData)<br></code></pre></td></tr></table></figure><p>但是这里会有一些潜藏的风险,如果你的逻辑中有多个 return,最容易出现的错误就是<strong>只在末尾添加<code><-workerLimiter</code>和<code>wg.Done()</code>,</strong>一旦执行不到末尾而提前 return,就有可能会造成程序的阻塞:</p><ul><li>如果 worker 数量大于等于 task 数量,会在<code>wg.Wait()</code>时阻塞</li><li>如果 worker 数量小于 task 数量,则会在<code>workLimiter <- struct{}{}</code>处阻塞</li></ul><p>为了避免遗忘添加,最好的方法就是使用<code>defer</code>,不过由于<code>defer</code>只能和函数连用,所以将相关资源释放的逻辑整合到同一个函数中:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-keyword">defer</span> <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">()</span></span>{<br> wg.Done()<br> <-workerLimiter<br>}<br></code></pre></td></tr></table></figure> </div><h2 id="收集数据"><a href="#收集数据" class="headerlink" title="收集数据"></a>收集数据</h2><p>使用 Goroutine 往往是为了将比较多的数据进行分批处理从而加快执行速度,所以一般最后需要对 worker 处理得到的结果进行汇集。</p><p>如果结果可以存储在 map 确定的 key 对应的位置或者数组中某个确定的 index 对应的位置中,那么无需担心并发访问的问题。但如果只是将结果添加到数组末尾,且添加位置取决于完成的先后顺序,那么需要考虑 thread-safe 的问题。</p><h3 id="使用-sync-Mutex-来保护"><a href="#使用-sync-Mutex-来保护" class="headerlink" title="使用 sync.Mutex 来保护"></a>使用 sync.Mutex 来保护</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-keyword">var</span>{<br> finalRes[]ResStruct{}<br> wg sync.WaitGroup<br> musync.Mutex<br>}<br><span class="hljs-keyword">for</span> i := <span class="hljs-number">0</span>; i < runTimes; i++ {<br> wg.Add(<span class="hljs-number">1</span>)<br> <span class="hljs-keyword">go</span> <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">(data YourStruct)</span></span>{<br> <span class="hljs-keyword">defer</span> wg.Done()<br> <br> <span class="hljs-keyword">var</span> res ResStruct{}<br> <span class="hljs-comment">// your task logic</span><br> ...<br> <br> mu.Lock()<br> fianlRes = <span class="hljs-built_in">append</span>(finalRes, res)<br> mu.Unlock()<br> }(yourData)<br>}<br>wg.Wait()<br></code></pre></td></tr></table></figure><p>最直接的方法就是使用<code>sync.Mutex</code>来做同步。</p><h3 id="使用-channel-来收集数据"><a href="#使用-channel-来收集数据" class="headerlink" title="使用 channel 来收集数据"></a>使用 channel 来收集数据</h3><p><code>channel</code>本身是 thread-safe 的,使用<code>channel</code>去做同步也是官方推荐的一种方法:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-keyword">var</span>{<br> finalRes[]ResStruct{}<br> wg sync.WaitGroup<br>}<br>resChan := <span class="hljs-built_in">make</span>(<span class="hljs-keyword">chan</span> ResStruct{})<br><span class="hljs-keyword">for</span> i := <span class="hljs-number">0</span>; i < runTimes; i++ {<br> wg.Add(<span class="hljs-number">1</span>)<br> <span class="hljs-keyword">go</span> <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">(data YourStruct)</span></span>{<br> <span class="hljs-keyword">defer</span> wg.Done()<br> <br> <span class="hljs-keyword">var</span> res ResStruct{}<br> <span class="hljs-comment">// your task logic</span><br> <br> resChan <- res<br> }(yourData)<br>}<br><br><span class="hljs-comment">// important</span><br><span class="hljs-keyword">go</span> <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">()</span></span>{<br> wg.Wait()<br> <span class="hljs-built_in">close</span>(resChan)<br>}()<br><br><span class="hljs-comment">// Position 1</span><br><span class="hljs-keyword">for</span> res := <span class="hljs-keyword">range</span> resChan{<br> finalRes = <span class="hljs-built_in">append</span>(finalRes, res) <br>}<br><span class="hljs-comment">// Position 2</span><br></code></pre></td></tr></table></figure><p>在代码中,worker routine 中将任务结果发送到<code>channel</code> 中,在主 routine 中(区分于<code>go</code>关键词创建的 routine)使用<code>for</code>从<code>channel</code>中不断地收集数据;</p><div class="note note-info"> <p>为了能够让主 routine 及时退出循环,这里需要<em><strong>使用另一个单独的 routine</strong></em> 来负责在所有任务完成后关闭<code>channel</code>,这段逻辑如果写在主 routine 的任何地方都会导致阻塞<sup id="fnref:3" class="footnote-ref"><a href="#fn:3" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Why does the use of an unbuffered channel in the same goroutine result in a deadlock?](https://stackoverflow.com/questions/18660533/why-does-the-use-of-an-unbuffered-channel-in-the-same-goroutine-result-in-a-dead)">[3]</span></a></sup>:</p><ul><li>如果在 Position 1 处,由于 unbuffered channel 可以看作 always-full,在没有人从<code>resChan</code>处接收数据时,worker routine 作为 sender 会阻塞,这意味着<code>wg.Done()</code>永远无法被执行,则<code>wg.Wait()</code>永远无法退出,从而造成主 routine 的阻塞</li><li>如果在 Position 2 处,由于没有人去 close channel,主 routine 永远无法退出<code>for</code> 循环</li></ul> </div><p>由于这些要注意的点,这么写确实显得很麻烦,远不如前一种那么简单直观,在多数情况下我也推荐第一种,但是有一些很特别的情况,比如你既不想使用<code>goto</code>同时你又不希望执行到底部,想要提前退出并发送 res,在这种情况下,前一种方法需要写</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><code class="hljs go">mu.Lock()<br>fianlRes = <span class="hljs-built_in">append</span>(finalRes, res)<br>mu.Unlock()<br><span class="hljs-keyword">return</span><br></code></pre></td></tr></table></figure><p>后一种方法则只需要写</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><code class="hljs go">resChan <- res<br><span class="hljs-keyword">return</span><br></code></pre></td></tr></table></figure><p>尤其在这样的代码段可能会在你的 task 逻辑中大量反复出现时,后一种方法形势上要更整洁统一。</p><h2 id="如何在-Goroutine-中处理错误"><a href="#如何在-Goroutine-中处理错误" class="headerlink" title="如何在 Goroutine 中处理错误"></a>如何在 Goroutine 中处理错误</h2><p>在使用 Goroutine 的一些场景中,worker 可能会在执行过程中产生一些<code>error</code>,关于如何处理,<a href="https://github.com/golang/sync">Go Sync</a>提供了一个强大的“武器”——<a href="https://github.com/golang/sync/blob/master/errgroup/errgroup.go"><code>errgroup.Group</code></a></p><p>后续有空的话会进一步拓展这个话题,有一些代码片段可供参考<sup id="fnref:4" class="footnote-ref"><a href="#fn:4" rel="footnote"><span class="hint--top hint--rounded" aria-label="[errgroup_example_md5all_test.go](https://github.com/golang/sync/blob/master/errgroup/errgroup_example_md5all_test.go)">[4]</span></a></sup><sup id="fnref:5" class="footnote-ref"><a href="#fn:5" rel="footnote"><span class="hint--top hint--rounded" aria-label="[another errgroup example](https://gist.github.com/pteich/c0bb58b0b7c8af7cc6a689dd0d3d26ef)">[5]</span></a></sup>,这篇博文<sup id="fnref:6" class="footnote-ref"><a href="#fn:6" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Why you should be using errgroup.WithContext() in your Golang server handlers](https://www.fullstory.com/blog/why-errgroup-withcontext-in-golang-server-handlers/)">[6]</span></a></sup>也写得很好,深入浅出,可以好好品读一下。</p><p><code>errgroup.Group.Go()</code>无法接受带有参数的函数参数,可以像给出的参考<sup id="fnref:4" class="footnote-ref"><a href="#fn:4" rel="footnote"><span class="hint--top hint--rounded" aria-label="[errgroup_example_md5all_test.go](https://github.com/golang/sync/blob/master/errgroup/errgroup_example_md5all_test.go)">[4]</span></a></sup><sup id="fnref:5" class="footnote-ref"><a href="#fn:5" rel="footnote"><span class="hint--top hint--rounded" aria-label="[another errgroup example](https://gist.github.com/pteich/c0bb58b0b7c8af7cc6a689dd0d3d26ef)">[5]</span></a></sup><sup id="fnref:6" class="footnote-ref"><a href="#fn:6" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Why you should be using errgroup.WithContext() in your Golang server handlers](https://www.fullstory.com/blog/why-errgroup-withcontext-in-golang-server-handlers/)">[6]</span></a></sup>中那样使用<code>channel</code>传递数据,当然也可以使用之前介绍中提到的闭包机制。</p><h2 id="闭包"><a href="#闭包" class="headerlink" title="闭包"></a>闭包</h2><p>严格来说这不是一个跟 Goroutine 相关的话题,所以在这里不会做详细地介绍。</p><p>可能注意到在 <a href="#common-pool">common pool</a> 中有这么一个代码片段:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-keyword">for</span> _, person := <span class="hljs-keyword">range</span> peple {<br> <span class="hljs-comment">// is it redundant?</span><br> localPerson := person <span class="hljs-comment">// important!</span><br> f := <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">()</span></span>{<br> demoFunc(localPerson)<br> wg.Done()<br> }<br> <br> wg.Add(<span class="hljs-number">1</span>)<br> _ = ants.Submit(f)<br>}<br>wg.Wait()<br></code></pre></td></tr></table></figure><p><code>localPerson := person</code> 显得很多余,但其实不然,如果有过 Python 程序编写经验的人会很容易明白,<code>person</code>这个变量指向的其实是同一个地址,如果不做 copy,它会被所有的 Goroutine 共享<sup id="fnref:7" class="footnote-ref"><a href="#fn:7" rel="footnote"><span class="hint--top hint--rounded" aria-label="[go vet range variable captured by func literal when using go routine inside of for each loop](https://stackoverflow.com/questions/40326723/go-vet-range-variable-captured-by-func-literal-when-using-go-routine-inside-of-f)">[7]</span></a></sup>。</p><p>可以自己简单验证一下:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-keyword">var</span> res []*<span class="hljs-type">int</span><br><span class="hljs-keyword">for</span> i := <span class="hljs-number">1</span>; i <= <span class="hljs-number">3</span>; i++ {<br> res = <span class="hljs-built_in">append</span>(res, &i)<br>}<br><span class="hljs-keyword">for</span> _, p := <span class="hljs-keyword">range</span> res {<br> fmt.Println(*p)<br>}<br></code></pre></td></tr></table></figure><p>结果为</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs bash">4<br>4<br>4<br></code></pre></td></tr></table></figure><p>修改后:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-keyword">var</span> res []*<span class="hljs-type">int</span><br><span class="hljs-keyword">for</span> i := <span class="hljs-number">1</span>; i <= <span class="hljs-number">3</span>; i++ {<br>local := i<br>res = <span class="hljs-built_in">append</span>(res, &local)<br>}<br><span class="hljs-keyword">for</span> _, p := <span class="hljs-keyword">range</span> res {<br>fmt.Println(*p)<br>}<br></code></pre></td></tr></table></figure><p>结果为</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs bash">1<br>2<br>3<br></code></pre></td></tr></table></figure><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><section class="footnotes"><div class="footnote-list"><ol><li><span id="fn:1" class="footnote-text"><span><a href="https://stackoverflow.com/questions/48659334/does-a-goroutine-pool-make-sense-like-thread-pools-in-other-languages">Does a goroutine pool make sense like thread pools in other languages?</a><a href="#fnref:1" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:2" class="footnote-text"><span><a href="https://github.com/panjf2000/ants/issues/1">测试发现使用池计算时间并没有缩短,只有内存占用变小了。</a><a href="#fnref:2" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:3" class="footnote-text"><span><a href="https://stackoverflow.com/questions/18660533/why-does-the-use-of-an-unbuffered-channel-in-the-same-goroutine-result-in-a-dead">Why does the use of an unbuffered channel in the same goroutine result in a deadlock?</a><a href="#fnref:3" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:4" class="footnote-text"><span><a href="https://github.com/golang/sync/blob/master/errgroup/errgroup_example_md5all_test.go">errgroup_example_md5all_test.go</a><a href="#fnref:4" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:5" class="footnote-text"><span><a href="https://gist.github.com/pteich/c0bb58b0b7c8af7cc6a689dd0d3d26ef">another errgroup example</a><a href="#fnref:5" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:6" class="footnote-text"><span><a href="https://www.fullstory.com/blog/why-errgroup-withcontext-in-golang-server-handlers/">Why you should be using errgroup.WithContext() in your Golang server handlers</a><a href="#fnref:6" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:7" class="footnote-text"><span><a href="https://stackoverflow.com/questions/40326723/go-vet-range-variable-captured-by-func-literal-when-using-go-routine-inside-of-f">go vet range variable captured by func literal when using go routine inside of for each loop</a><a href="#fnref:7" rev="footnote" class="footnote-backref"> ↩</a></span></span></li></ol></div></section>]]></content>
<tags>
<tag>golang</tag>
<tag>goroutine</tag>
</tags>
</entry>
<entry>
<title>golang file permission mode</title>
<link href="/2021/10/28/golang-file-permission-mode/"/>
<url>/2021/10/28/golang-file-permission-mode/</url>
<content type="html"><![CDATA[<h2 id="导言"><a href="#导言" class="headerlink" title="导言"></a>导言</h2><p>在自己实现简易版容器时,出现了一些跟文件权限的相关问题,用到了跟<code>chmod</code>和<code>chown</code>相关的指令,在这里做一个简单的梳理</p><h2 id="permission-mode-in-chmod"><a href="#permission-mode-in-chmod" class="headerlink" title="permission mode in chmod"></a>permission mode in chmod</h2><h3 id="简要介绍"><a href="#简要介绍" class="headerlink" title="简要介绍"></a>简要介绍</h3><p>我的环境下是 WSL ubuntu 20.04,一般来说使用<code>touch</code>创建文件以及使用<code>mkdir</code>创建的文件夹的默认权限如下:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [1:06:23]</span><br>$ <span class="hljs-built_in">touch</span> a.txt<br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [1:06:29]</span><br>$ <span class="hljs-built_in">mkdir</span> b<br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [1:06:36]</span><br>$ <span class="hljs-built_in">ls</span> -al<br>total 12<br>drwxr-xr-x 3 zyc zyc 4096 Oct 28 01:06 .<br>drwxr-xr-x 20 zyc zyc 4096 Oct 28 01:06 ..<br>-rw-r--r-- 1 zyc zyc 0 Oct 28 01:06 a.txt<br>drwxr-xr-x 2 zyc zyc 4096 Oct 28 01:06 b<br></code></pre></td></tr></table></figure><p>文件夹与文件相比默认权限多了个 execute permission. StackExchange 有个问题专门讨论了这个问题<sup id="fnref:1" class="footnote-ref"><a href="#fn:1" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Execute vs Read bit. How do directory permissions in Linux work?](https://unix.stackexchange.com/questions/21251/execute-vs-read-bit-how-do-directory-permissions-in-linux-work)">[1]</span></a></sup> :</p><blockquote><ul><li>The <strong>read bit</strong> (<code>r</code>) allows the affected user to list the files within the directory</li><li>The <strong>write bit</strong> (<code>w</code>) allows the affected user to create, rename, or delete files within the directory, and modify the directory’s attributes</li><li>The <strong>execute bit</strong> (<code>x</code>) allows the affected user to enter the directory, and access files and directories inside</li><li>The <strong>sticky bit</strong> (<code>T</code>, or <code>t</code> if the execute bit is set for others) states that files and directories within that directory may only be deleted or renamed by their owner (or root)</li></ul></blockquote><p>关于 <strong>sticky bit</strong> 接下来会专门讲一下</p><p>使用<code>sudo chmod 644 DIRNAME</code>将文件夹的执行权限去除进行一些测试:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [14:29:38]</span><br>$ <span class="hljs-built_in">ls</span> <span class="hljs-built_in">test</span><br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [14:29:41]</span><br>$ <span class="hljs-built_in">cd</span> <span class="hljs-built_in">test</span><br><span class="hljs-built_in">cd</span>: permission denied: <span class="hljs-built_in">test</span><br></code></pre></td></tr></table></figure><p>由于设置了 <strong>read bit</strong> ,所以<code>ls</code>可以正常显示文件夹的内容,而由于没有设置<strong>execute bit</strong>,所以无法使用<code>cd</code>(failed to access the directory)</p><p>再做一些测试:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [14:40:28] C:1</span><br>$ sudo <span class="hljs-built_in">mkdir</span> <span class="hljs-built_in">test</span>/test_sub<br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [14:40:44]</span><br>$ sudo <span class="hljs-built_in">touch</span> <span class="hljs-built_in">test</span>/test_sub/a.txt<br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [14:41:42]</span><br>$ sudo <span class="hljs-built_in">chmod</span> 644 <span class="hljs-built_in">test</span><br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [14:44:41]</span><br>$ <span class="hljs-built_in">ls</span> <span class="hljs-built_in">test</span>/test_sub<br><span class="hljs-built_in">ls</span>: cannot access <span class="hljs-string">'test/test_sub'</span>: Permission denied<br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [14:44:46] C:2</span><br>$ tree <span class="hljs-built_in">test</span><br><span class="hljs-built_in">test</span><br><br>0 directories, 0 files<br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [14:45:10]</span><br>$ sudo <span class="hljs-built_in">chmod</span> 755 <span class="hljs-built_in">test</span><br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [14:45:17]</span><br>$ <span class="hljs-built_in">ls</span> <span class="hljs-built_in">test</span>/test_sub<br>a.txt<br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [14:45:26]</span><br>$ tree <span class="hljs-built_in">test</span><br><span class="hljs-built_in">test</span><br>└── test_sub<br> └── a.txt<br><br>1 directory, 1 file<br></code></pre></td></tr></table></figure><p>由于没有设置<code>execute bit</code>,所以<code>tree</code>和<code>ls</code>都没有返回预期结果(failed to access files and directories inside).</p><p>重新设置<code>execute bit</code>后,结果正常。</p><h3 id="7777-到底是什么?"><a href="#7777-到底是什么?" class="headerlink" title="7777 到底是什么?"></a>7777 到底是什么?</h3><p>接下来的讨论需要注意区分<code>digit</code>和<code>bit</code></p><p>stackExchange 有一个问题在讨论 7777 和 777 的差异<sup id="fnref:2" class="footnote-ref"><a href="#fn:2" rel="footnote"><span class="hint--top hint--rounded" aria-label="[whats the difference between chmod 777 and chmod 7777](https://superuser.com/questions/592309/whats-the-difference-between-chmod-777-and-chmod-7777)">[2]</span></a></sup>。</p><p>777 我们接触得比较多,为什么会有 7777 这样的 <strong>four digits</strong> 的情况?</p><p>其实在大部分情况下我们都使用不到 <strong>four digits</strong>,<strong>three digits</strong>会使用先导 0 填充,即 <code>777</code>视作 <code>0777</code>.</p><p><strong>four digits</strong>中的第一个 digit 的三个 bits 分别对应 <strong>setuid</strong>、<strong>setgid</strong> 以及 <strong>sticky</strong>,它们都属于 <strong>unix access right flag</strong></p><h4 id="sticky-bit"><a href="#sticky-bit" class="headerlink" title="sticky bit"></a>sticky bit</h4><p>根据 wiki<sup id="fnref:3" class="footnote-ref"><a href="#fn:3" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Sticky bit](https://en.wikipedia.org/wiki/Sticky_bit)">[3]</span></a></sup>,<strong>sticky bit</strong>有两种定义:</p><ul><li><strong>For files</strong>: 尤其是可执行文件,设置后,其 <strong>.text</strong> 会保留在内存中而不会被换出,这样当再次需要它时可以减少 swapping 。不过由于 swapping 优化,这个功能已经过时了</li><li><strong>For directories</strong>:设置后,对于在目录中的文件,只有文件的 owner, 目录的 owner 以及 root user 可以重命名或删除文件。wiki 中提到这常常设置在 <code>/tmp</code> 目录上,用于防止普通用户去移动删除其他用户的文件</li></ul><h4 id="setuid-amp-setgid"><a href="#setuid-amp-setgid" class="headerlink" title="setuid & setgid"></a>setuid & setgid</h4><p>直接参考wiki<sup id="fnref:4" class="footnote-ref"><a href="#fn:4" rel="footnote"><span class="hint--top hint--rounded" aria-label="[setuid](https://en.wikipedia.org/wiki/Setuid)">[4]</span></a></sup>:</p><blockquote><p>allow users to run an <a href="https://en.wikipedia.org/wiki/Executable">executable</a> with the <a href="https://en.wikipedia.org/wiki/File_system_permissions">file system permissions</a> of the executable’s owner or group respectively and to change behaviour in directories. </p></blockquote><p>使用 golang 写一个简单的例子</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-keyword">package</span> main<br><br><span class="hljs-keyword">import</span> (<br><span class="hljs-string">"fmt"</span><br><span class="hljs-string">"os"</span><br>)<br><br><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span> {<br><span class="hljs-keyword">if</span> err := os.Mkdir(<span class="hljs-string">"/home/testdir"</span>, <span class="hljs-number">0777</span>); err != <span class="hljs-literal">nil</span> {<br>fmt.Println(err)<br>}<br><br>}<br></code></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [15:13:51] </span><br>$ ./main<br><span class="hljs-built_in">mkdir</span> /home/testdir: permission denied<br></code></pre></td></tr></table></figure><p>可以看到由于权限问题返回了错误.</p><p>我们将文件的拥有者改为 <code>root</code>,并使用<code>chmod</code>设置 <strong>setuid</strong> 和 <strong>setgid</strong>,重新执行:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [15:21:23] </span><br>$ sudo <span class="hljs-built_in">chown</span> root:root main<br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [15:21:33] </span><br>$ ./main<br><span class="hljs-built_in">mkdir</span> /home/testdir: permission denied<br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [15:21:35] </span><br>$ sudo <span class="hljs-built_in">chmod</span> 6777 main <br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [15:21:43] </span><br>$ ./main<br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [15:21:45] </span><br>$ <span class="hljs-built_in">ls</span> -al<br>total 1748<br>drwxr-xr-x 2 zyc zyc 4096 Oct 28 15:21 .<br>drwxr-xr-x 20 zyc zyc 4096 Oct 28 15:21 ..<br>-rw-r--r-- 1 zyc zyc 21 Oct 28 01:20 go.mod<br>-rwsrwsrwx 1 root root 1772990 Oct 28 15:21 main<br>-rwxr--r-- 1 zyc zyc 135 Oct 28 01:24 main.goo<br></code></pre></td></tr></table></figure><p>尽管没有<code>sudo</code>,但是由于<code>./main</code>的 owner 是 root, 所以我们使用 root 的文件系统权限创建了文件夹.</p><div class="note note-info"> <p>由于这里使用了 root 的文件系统权限,所以创建出来的文件夹 owner 是 root</p> </div><p>更多的关于 <code>chmod</code> 的信息可以参考一下 manual <sup id="fnref:5" class="footnote-ref"><a href="#fn:5" rel="footnote"><span class="hint--top hint--rounded" aria-label="[chmod(1)—— Linux manual page](https://man7.org/linux/man-pages/man1/chmod.1.html)">[5]</span></a></sup></p><h2 id="0777-vs-777"><a href="#0777-vs-777" class="headerlink" title="0777 vs 777?"></a>0777 vs 777?</h2><p>前面提到对于<code>chmod</code>而言, 由于会使用先导 0 填充,所以<code>777</code>和<code>0777</code>没有任何区别。</p><p>但是对于 C 以及 Golang 等程序而言,<code>0777</code>是八进制,而<code>777</code>视作十进制<sup id="fnref:6" class="footnote-ref"><a href="#fn:6" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Is there any difference between mode value 0777 and 777](https://unix.stackexchange.com/questions/103413/is-there-any-difference-between-mode-value-0777-and-777)">[6]</span></a></sup></p><p>通过程序进行验证一下</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-keyword">package</span> main<br><br><span class="hljs-keyword">import</span> (<br><span class="hljs-string">"fmt"</span><br><span class="hljs-string">"os"</span><br>)<br><br><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span> {<br><span class="hljs-keyword">if</span> err := os.Mkdir(<span class="hljs-string">"/home/zyc/testPermission/testdir"</span>, <span class="hljs-number">0777</span>); err != <span class="hljs-literal">nil</span> {<br>fmt.Println(err)<br>}<br><br>}<br></code></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [17:15:13] </span><br>$ <span class="hljs-built_in">ls</span> ~/testPermission -al <br>total 1752<br>drwxr-xr-x 3 zyc zyc 4096 Oct 28 17:15 .<br>drwxr-xr-x 20 zyc zyc 4096 Oct 28 17:15 ..<br>-rw-r--r-- 1 zyc zyc 21 Oct 28 01:20 go.mod<br>-rwxr-xr-x 1 zyc zyc 1772990 Oct 28 17:15 main<br>-rwxr--r-- 1 zyc zyc 154 Oct 28 17:14 main.go<br>drwxr-xr-x 2 zyc zyc 4096 Oct 28 17:15 testdir<br></code></pre></td></tr></table></figure><p>现在改为</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs go">os.Mkdir(<span class="hljs-string">"/home/zyc/testPermission/testdir"</span>, <span class="hljs-number">777</span>)<br></code></pre></td></tr></table></figure><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/testPermission [17:17:00] </span><br>$ <span class="hljs-built_in">ls</span> ~/testPermission -al <br>total 1752<br>drwxr-xr-x 3 zyc zyc 4096 Oct 28 17:17 .<br>drwxr-xr-x 20 zyc zyc 4096 Oct 28 17:17 ..<br>-rw-r--r-- 1 zyc zyc 21 Oct 28 01:20 go.mod<br>-rwxr-xr-x 1 zyc zyc 1772990 Oct 28 17:15 main<br>-rwxr--r-- 1 zyc zyc 153 Oct 28 17:16 main.go<br>dr----x--x 2 zyc zyc 4096 Oct 28 17:17 testdir<br></code></pre></td></tr></table></figure><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><section class="footnotes"><div class="footnote-list"><ol><li><span id="fn:1" class="footnote-text"><span><a href="https://unix.stackexchange.com/questions/21251/execute-vs-read-bit-how-do-directory-permissions-in-linux-work">Execute vs Read bit. How do directory permissions in Linux work?</a><a href="#fnref:1" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:2" class="footnote-text"><span><a href="https://superuser.com/questions/592309/whats-the-difference-between-chmod-777-and-chmod-7777">whats the difference between chmod 777 and chmod 7777</a><a href="#fnref:2" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:3" class="footnote-text"><span><a href="https://en.wikipedia.org/wiki/Sticky_bit">Sticky bit</a><a href="#fnref:3" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:4" class="footnote-text"><span><a href="https://en.wikipedia.org/wiki/Setuid">setuid</a><a href="#fnref:4" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:5" class="footnote-text"><span><a href="https://man7.org/linux/man-pages/man1/chmod.1.html">chmod(1)—— Linux manual page</a><a href="#fnref:5" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:6" class="footnote-text"><span><a href="https://unix.stackexchange.com/questions/103413/is-there-any-difference-between-mode-value-0777-and-777">Is there any difference between mode value 0777 and 777</a><a href="#fnref:6" rev="footnote" class="footnote-backref"> ↩</a></span></span></li></ol></div></section>]]></content>
</entry>
<entry>
<title>MyDocker 实现踩坑指南</title>
<link href="/2021/10/17/MyDocker-%E5%AE%9E%E7%8E%B0%E8%B8%A9%E5%9D%91%E6%8C%87%E5%8D%97/"/>
<url>/2021/10/17/MyDocker-%E5%AE%9E%E7%8E%B0%E8%B8%A9%E5%9D%91%E6%8C%87%E5%8D%97/</url>
<content type="html"><![CDATA[<h2 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h2><p>《自己动手写 Docker》成书是在数年前,也一直没有更新,而且相关项目仓库的 Issues 板块也不是十分活跃,期间自己也遇到了一些问题,而通过搜寻阅读资料,对于部分知识的理解也有所加深,在这里做一些简单的整理。</p><p>环境如下:</p><blockquote><p>Golang: 1.17</p><p>Host: WSL Ubuntu 20.04</p></blockquote><h2 id="CLONE-NEWUSER"><a href="#CLONE-NEWUSER" class="headerlink" title="CLONE_NEWUSER"></a>CLONE_NEWUSER</h2><p>关于 namespace 的问题可以参考一下这一系列的文章<sup id="fnref:1" class="footnote-ref"><a href="#fn:1" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Namespace](https://medium.com/@teddyking/linux-namespaces-850489d3ccf)">[1]</span></a></sup>,一些问题在这里面有所提及。</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs go">cmd.SysProcAttr = &syscall.SysProcAttr{<br>Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWIPC | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS | syscall.CLONE_NEWNET,<br>}<br></code></pre></td></tr></table></figure><p>在这里的<code>Cloneflags</code>没有设置<code>CLONE_NEWUSER</code>,所以如果不以<code>root</code>权限运行程序,会报错</p><figure class="highlight sh"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs sh"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in /mnt/d/GoProject/src/MyDocker on git:master x [13:47:21] </span><br>$ ./main run /bin/zsh -it<br>fork/exec /proc/self/exe: operation not permitted<br></code></pre></td></tr></table></figure><p>使用<code>sudo</code></p><figure class="highlight sh"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><code class="hljs sh"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in /mnt/d/GoProject/src/MyDocker on git:master x [14:06:17] C:255</span><br>$ sudo ./main run /bin/zsh -it<br>/bin/zsh<br>DESKTOP-KK42M35<span class="hljs-comment"># whoami</span><br>root<br>DESKTOP-KK42M35<span class="hljs-comment"># id root</span><br>uid=0(root) gid=0(root) <span class="hljs-built_in">groups</span>=0(root)<br>DESKTOP-KK42M35<span class="hljs-comment"># </span><br></code></pre></td></tr></table></figure><p>我们在<code>Cloneflags</code>中加入<code>CLONE_NEWUSER</code>,现在我们可以使用 non-root 权限运行程序(注意用户名的变化):</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in /mnt/d/GoProject/src/MyDocker on git:master x [14:09:48] C:255</span><br>$ ./main run /bin/zsh -it<br>/bin/zsh<br><br><span class="hljs-comment"># nobody @ DESKTOP-KK42M35 in /mnt/d/GoProject/src/MyDocker on git:master x [14:09:52] </span><br>$ <span class="hljs-built_in">whoami</span><br>nobody<br><br><span class="hljs-comment"># nobody @ DESKTOP-KK42M35 in /mnt/d/GoProject/src/MyDocker on git:master x [14:10:05] </span><br>$ <span class="hljs-built_in">id</span> nobody<br>uid=65534(nobody) gid=65534(nogroup) <span class="hljs-built_in">groups</span>=65534(nogroup)<br></code></pre></td></tr></table></figure><p>但是与前面相比,我们在创建出的 namespace 的 shell 中失去了 root 权限,而关于 container 的初始化工作中需要去挂载<code>/proc</code>:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><code class="hljs go">defaultMountFlags := syscall.MS_NOEXEC | syscall.MS_NOSUID | syscall.MS_NODEV<br>syscall.Mount(<span class="hljs-string">"proc"</span>, <span class="hljs-string">"/proc"</span>, <span class="hljs-string">"proc"</span>, <span class="hljs-type">uintptr</span>(defaultMountFlags), <span class="hljs-string">""</span>)<br></code></pre></td></tr></table></figure><p>作为 non-root user,<code>mount</code>指定<code>-t</code>类型时会出现<code>operation not permitted</code>的错误,这个问题需要去进行修正.</p><p>我们需要用到<code>UidMappings</code>和<code>GidMappings</code>,具体的解释可以看一下这篇文章<sup id="fnref:2" class="footnote-ref"><a href="#fn:2" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Namespace in Go](https://medium.com/@teddyking/namespaces-in-go-user-a54ef9476f2a)">[2]</span></a></sup>,简单地说,它允许一个以<code>non-root-user</code>权限运行的进程 spawn 出一个以<code>root</code>权限运行的进程,不过在不同的 namespcae.</p><img src="/2021/10/17/MyDocker-%E5%AE%9E%E7%8E%B0%E8%B8%A9%E5%9D%91%E6%8C%87%E5%8D%97/1lY9jQy-ZHnKF1fMEe0W9qQ.jpeg" class="" alt="img"><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><code class="hljs go">cmd.SysProcAttr = &syscall.SysProcAttr{<br>Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWIPC | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS | syscall.CLONE_NEWUSER | syscall.CLONE_NEWNET,<br>UidMappings: []syscall.SysProcIDMap{<br>{<br>ContainerID: <span class="hljs-number">0</span>,<br>HostID: syscall.Getuid(),<br>Size: <span class="hljs-number">1</span>,<br>},<br>},<br>GidMappings: []syscall.SysProcIDMap{<br>{<br>ContainerID: <span class="hljs-number">0</span>,<br>HostID: syscall.Getgid(),<br>Size: <span class="hljs-number">1</span>,<br>},<br>},<br>}<br></code></pre></td></tr></table></figure><blockquote><p>在这里有个不重要的注意点:如果<code>HostID</code>并非是当前进程的 ID 或者 size > 1(必然包含非当前进程)的话,那么必须使用<code>sudo</code>才能运行这段代码。</p></blockquote><p>现在运行结果如下:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in /mnt/d/GoProject/src/MyDocker on git:master x [17:56:51] C:255</span><br>$ ./main run /bin/zsh -it<br>/bin/zsh<br><br><span class="hljs-comment"># root @ DESKTOP-KK42M35 in /mnt/d/GoProject/src/MyDocker on git:master x [17:56:54] </span><br>$ <span class="hljs-built_in">id</span> root <br>uid=0(root) gid=0(root) <span class="hljs-built_in">groups</span>=0(root)<br></code></pre></td></tr></table></figure><h2 id="关于-Storage-Driver"><a href="#关于-Storage-Driver" class="headerlink" title="关于 Storage Driver"></a>关于 Storage Driver</h2><p>在实现第四章相关代码时,使用<code>mount</code>挂载时会报错<code>unknown filesystem 'aufs'</code>.</p><p>使用<code>docker info</code>指令查看本机信息:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><code class="hljs bash">$ docker info<br>Client:<br> Debug Mode: <span class="hljs-literal">false</span><br> Plugins:<br> compose: Docker Compose (Docker Inc., 2.0.0-beta.4)<br> scan: Docker Scan (Docker Inc., v0.8.0)<br> app: Docker Application (Docker Inc., v0.8.0)<br> buildx: Build with BuildKit (Docker Inc., v0.4.2-tp-docker)<br><br>Server:<br> Containers: 2<br> Running: 1<br> Paused: 0<br> Stopped: 1<br> Images: 28<br> Server Version: 20.10.7<br> Storage Driver: overlay2<br> ...<br></code></pre></td></tr></table></figure><p>注意到<code>Storage Driver</code>字段值为<code>overlay2</code>, OverlayFS 与 AUFS 类似,同样是一种 Union FileSystem,但是速度更快,实现更简单。参考一下 docker 的<a href="https://docs.docker.com/storage/storagedriver/overlayfs-driver/">官方文档</a>以及这篇文章<sup id="fnref:3" class="footnote-ref"><a href="#fn:3" rel="footnote"><span class="hint--top hint--rounded" aria-label="[overlayfs](https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt)">[3]</span></a></sup></p><p>应该是内核版本比较高,所以使用的是后继者 OverlayFS 而非 AUFS.使用<code>cat /proc/filesystems</code>查看文件系统信息也可以可以只有 overlay.</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><code class="hljs bash">$ <span class="hljs-built_in">cat</span> /proc/filesystems<br>nodev sysfs<br>nodev rootfs<br>nodev tmpfs<br>nodev bdev<br>nodev proc<br>nodev cpuset<br>nodev cgroup<br>nodev cgroup2<br>nodev devtmpfs<br>nodev binfmt_misc<br>nodev debugfs<br>nodev tracefs<br>nodev sockfs<br>nodev dax<br>nodev bpf<br>nodev pipefs<br>nodev ramfs<br>nodev hugetlbfs<br>nodev rpc_pipefs<br>nodev devpts<br> ext3<br> ext2<br> ext4<br> squashfs<br> vfat<br> msdos<br> iso9660<br>nodev nfs<br>nodev nfs4<br>nodev nfsd<br>nodev cifs<br>nodev smb3<br>nodev autofs<br> fuseblk<br>nodev fuse<br>nodev fusectl<br>nodev overlay<br> xfs<br>nodev 9p<br>nodev ceph<br>nodev mqueue<br> btrfs<br></code></pre></td></tr></table></figure><h3 id="overlay-example"><a href="#overlay-example" class="headerlink" title="overlay example"></a>overlay example</h3><p>先准备好以下一些文件</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/aufs [23:39:49]</span><br>$ tree<br>.<br>├── container-layer<br>│ └── container-layer.txt<br>├── image-layer1<br>│ └── image-layer1.txt<br>├── image-layer2<br>│ └── image-layer2.txt<br>├── image-layer3<br>│ └── image-layer3.txt<br>├── mnt<br>└── work<br><br>6 directories, 4 files<br></code></pre></td></tr></table></figure><p><code>container-layer.txt</code>以及<code>image-layer_id.txt</code>文件的内容为<code>i'm ${filename}</code>,如</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/aufs [23:39:51]</span><br>$ <span class="hljs-built_in">cat</span> container-layer/container-layer.txt<br>i<span class="hljs-string">'m container-layer</span><br><span class="hljs-string"></span><br><span class="hljs-string"># zyc @ DESKTOP-KK42M35 in ~/aufs [23:41:35]</span><br><span class="hljs-string">$ cat image-layer1/image-layer1.txt</span><br><span class="hljs-string">i'</span>m image-layer1<br></code></pre></td></tr></table></figure><p>使用<code>mount</code>进行挂载:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/aufs [23:50:11]</span><br>$ sudo mount -t overlay overlay -o lowerdir=image-layer1:image-layer2:image-layer3,upperdir=container-layer,workdir=work mnt<br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/aufs [23:51:50]</span><br>$ tree mnt<br>mnt<br>├── container-layer.txt<br>├── image-layer1.txt<br>├── image-layer2.txt<br>└── image-layer3.txt<br><br>0 directories, 4 files<br></code></pre></td></tr></table></figure><h4 id="写入文件"><a href="#写入文件" class="headerlink" title="写入文件"></a>写入文件</h4><p>如果我们尝试向<code>image-layer1.txt</code>写入一些内容,变化不会反映在<code>image-layer1/image-layer1.txt</code>上,而是会复制一份新的文件到<code>container-layer</code>文件夹,并在这之上做修改:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/aufs [23:55:31] C:130</span><br>$ <span class="hljs-built_in">echo</span> <span class="hljs-string">"i'm image-layer1 test: hello"</span> >> mnt/image-layer1.txt//写入新的内容<br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/aufs [23:55:35]</span><br>$ <span class="hljs-built_in">cat</span> mnt/image-layer1.txt//查看mnt目录下文件的内容<br>i<span class="hljs-string">'m image-layer1</span><br><span class="hljs-string">i'</span>m image-layer1 <span class="hljs-built_in">test</span>: hello<br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/aufs [23:55:46]</span><br>$ <span class="hljs-built_in">cat</span> image-layer1/image-layer1.txt//原文件无任何变化<br>i<span class="hljs-string">'m image-layer1</span><br><span class="hljs-string"></span><br><span class="hljs-string"># zyc @ DESKTOP-KK42M35 in ~/aufs [23:55:54]</span><br><span class="hljs-string">$ tree//container-layer文件夹中多了新文件</span><br><span class="hljs-string">.</span><br><span class="hljs-string">├── container-layer</span><br><span class="hljs-string">│ ├── container-layer.txt</span><br><span class="hljs-string">│ └── image-layer1.txt</span><br><span class="hljs-string">├── image-layer1</span><br><span class="hljs-string">│ └── image-layer1.txt</span><br><span class="hljs-string">├── image-layer2</span><br><span class="hljs-string">│ └── image-layer2.txt</span><br><span class="hljs-string">├── image-layer3</span><br><span class="hljs-string">│ └── image-layer3.txt</span><br><span class="hljs-string">├── mnt</span><br><span class="hljs-string">│ ├── container-layer.txt</span><br><span class="hljs-string">│ ├── image-layer1.txt</span><br><span class="hljs-string">│ ├── image-layer2.txt</span><br><span class="hljs-string">│ └── image-layer3.txt</span><br><span class="hljs-string">└── work</span><br><span class="hljs-string"> └── work [error opening dir]</span><br><span class="hljs-string"></span><br><span class="hljs-string">7 directories, 9 files</span><br><span class="hljs-string"></span><br><span class="hljs-string"># zyc @ DESKTOP-KK42M35 in ~/aufs [23:56:08]</span><br><span class="hljs-string">$ cat container-layer/image-layer1.txt//内容为刚刚写入的内容</span><br><span class="hljs-string">i'</span>m image-layer1<br>i<span class="hljs-string">'m image-layer1 test: hello</span><br></code></pre></td></tr></table></figure><blockquote><p>同时注意到<code>tree</code>命令下,<code>work</code>文件夹报错<code>error opening dir</code>,这是因为 workdir 应该是 overlayfs 内部使用的文件夹,不应该被外界所读取</p></blockquote><h4 id="创建文件"><a href="#创建文件" class="headerlink" title="创建文件"></a>创建文件</h4><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/aufs [23:56:17]</span><br>$ <span class="hljs-built_in">echo</span> <span class="hljs-string">"new file"</span> >> mnt/new_file.txt<br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/aufs [0:01:22]</span><br>$ tree<br>.<br>├── container-layer<br>│ ├── container-layer.txt<br>│ ├── image-layer1.txt<br>│ └── new_file.txt<br>├── image-layer1<br>│ └── image-layer1.txt<br>├── image-layer2<br>│ └── image-layer2.txt<br>├── image-layer3<br>│ └── image-layer3.txt<br>├── mnt<br>│ ├── container-layer.txt<br>│ ├── image-layer1.txt<br>│ ├── image-layer2.txt<br>│ ├── image-layer3.txt<br>│ └── new_file.txt<br>└── work<br> └── work [error opening <span class="hljs-built_in">dir</span>]<br><br>7 directories, 11 files<br></code></pre></td></tr></table></figure><p>新创建的文件在<code>container-layer</code>文件夹下</p><h4 id="删除文件"><a href="#删除文件" class="headerlink" title="删除文件"></a>删除文件</h4><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/aufs [0:01:26]</span><br>$ <span class="hljs-built_in">rm</span> -rf mnt/image-layer1.txt<br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/aufs [0:02:50]</span><br>$ tree<br>.<br>├── container-layer<br>│ ├── container-layer.txt<br>│ ├── image-layer1.txt<br>│ └── new_file.txt<br>├── image-layer1<br>│ └── image-layer1.txt<br>├── image-layer2<br>│ └── image-layer2.txt<br>├── image-layer3<br>│ └── image-layer3.txt<br>├── mnt<br>│ ├── container-layer.txt<br>│ ├── image-layer2.txt<br>│ ├── image-layer3.txt<br>│ └── new_file.txt<br>└── work<br> └── work [error opening <span class="hljs-built_in">dir</span>]<br><br>7 directories, 10 files<br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/aufs [0:02:52]</span><br>$ <span class="hljs-built_in">ls</span> -al container-layer<br>total 16<br>drwxr-xr-x 2 zyc zyc 4096 Oct 27 00:02 .<br>drwxr-xr-x 8 zyc zyc 4096 Oct 26 23:39 ..<br>-rw-r--r-- 1 zyc zyc 20 Oct 20 00:53 container-layer.txt<br>c--------- 1 root root 0, 0 Oct 27 00:02 image-layer1.txt<br>-rw-r--r-- 1 zyc zyc 9 Oct 27 00:01 new_file.txt<br></code></pre></td></tr></table></figure><p><code>image-layer1.txt</code>变成了一个字符设备,相关解释在此篇文章<sup id="fnref:3" class="footnote-ref"><a href="#fn:3" rel="footnote"><span class="hint--top hint--rounded" aria-label="[overlayfs](https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt)">[3]</span></a></sup>中有所提及</p><div class="note note-info"> <p>whiteouts and opaque directories</p><p>In order to support rm and rmdir without changing the lower<br>filesystem, an overlay filesystem needs to record in the upper filesystem<br>that files have been removed. This is done using whiteouts and opaque<br>directories (non-directories are always opaque).</p><p>A whiteout is created as a character device with 0/0 device number.<br>When a whiteout is found in the upper level of a merged directory, any<br>matching name in the lower level is ignored, and the whiteout itself<br>is also hidden.</p><p>A directory is made opaque by setting the xattr “trusted.overlay.opaque”<br>to “y”. Where the upper filesystem contains an opaque directory, any<br>directory in the lower filesystem with the same name is ignored.</p> </div><h4 id="一些补充"><a href="#一些补充" class="headerlink" title="一些补充"></a>一些补充</h4><p><code>mount</code>挂载时,<code>upperdir</code>只能指定一个文件夹,否则会报错</p><figure class="highlight awk"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs awk">mount: <span class="hljs-regexp">/home/</span>zyc<span class="hljs-regexp">/aufs/m</span>nt: special device overlay does not exist.<br></code></pre></td></tr></table></figure><p>如果想挂载一个只读的文件系统,只需要指定<code>lowerdir</code></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><code class="hljs bash"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/aufs [0:36:24]</span><br>$ sudo mount -t overlay overlay -o lowerdir=image-layer1:image-layer2 mnt<br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/aufs [0:36:28]</span><br>$ tree<br>.<br>├── container-layer<br>│ ├── container-layer.txt<br>│ ├── image-layer1.txt<br>│ ├── image-layer2.txt<br>│ └── new_file.txt<br>├── image-layer1<br>│ └── image-layer1.txt<br>├── image-layer2<br>│ └── image-layer2.txt<br>├── image-layer3<br>│ └── image-layer3.txt<br>├── mnt<br>│ ├── image-layer1.txt<br>│ └── image-layer2.txt<br>└── work<br> └── work [error opening <span class="hljs-built_in">dir</span>]<br><br>7 directories, 9 files<br><br><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in ~/aufs [0:36:30]</span><br>$ <span class="hljs-built_in">echo</span> <span class="hljs-string">"new file"</span> >> mnt/image-layer1.txt<br>zsh: read-only file system: mnt/image-layer1.txt<br></code></pre></td></tr></table></figure><p>但是<code>lowdir</code>指定的文件夹数必须大于 1,否则会报错</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><code class="hljs bash">$ sudo mount -t overlay overlay -o lowerdir=image-layer1 mnt<br>mount: /home/zyc/aufs/mnt: wrong fs <span class="hljs-built_in">type</span>, bad option, bad superblock on overlay, missing codepage or helper program, or other error.<br></code></pre></td></tr></table></figure><p>因为如果只是挂载单个文件夹的话,其实没必要使用 overlay,使用 mount bind 即可<sup id="fnref:4" class="footnote-ref"><a href="#fn:4" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Read only bind-mount?](https://serverfault.com/questions/136515/read-only-bind-mount)">[4]</span></a></sup></p><p>另外书中使用<code>-v</code>做数据集映射时实现使用了 aufs mount,其实简单的 bind mount 就可以满足需要</p><h2 id="关于-Docker-i、-t、-d-参数"><a href="#关于-Docker-i、-t、-d-参数" class="headerlink" title="关于 Docker -i、-t、-d 参数"></a>关于 Docker -i、-t、-d 参数</h2><p>参考<a href="http://www.jerrymei.cn/docker-run-interactive-tty-detach/">如何正确使用 docker run -i、 -t、-d 参数</a></p><p>《自己动手写 Docker》中为了简化直接将 <code>-i</code>和<code>-t</code>合并成了<code>-it</code></p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><section class="footnotes"><div class="footnote-list"><ol><li><span id="fn:1" class="footnote-text"><span><a href="https://medium.com/@teddyking/linux-namespaces-850489d3ccf">Namespace</a><a href="#fnref:1" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:2" class="footnote-text"><span><a href="https://medium.com/@teddyking/namespaces-in-go-user-a54ef9476f2a">Namespace in Go</a><a href="#fnref:2" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:3" class="footnote-text"><span><a href="https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt">overlayfs</a><a href="#fnref:3" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:4" class="footnote-text"><span><a href="https://serverfault.com/questions/136515/read-only-bind-mount">Read only bind-mount?</a><a href="#fnref:4" rev="footnote" class="footnote-backref"> ↩</a></span></span></li></ol></div></section>]]></content>
<categories>
<category>golang</category>
</categories>
<tags>
<tag>golang</tag>
</tags>
</entry>
<entry>
<title>如何在Golang中实现Fork()</title>
<link href="/2021/10/16/%E5%A6%82%E4%BD%95%E5%9C%A8Golang%E4%B8%AD%E5%AE%9E%E7%8E%B0Fork/"/>
<url>/2021/10/16/%E5%A6%82%E4%BD%95%E5%9C%A8Golang%E4%B8%AD%E5%AE%9E%E7%8E%B0Fork/</url>
<content type="html"><![CDATA[<h2 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h2><p>最近在跟随《自己动手写 Docker》实现一个简易版本的 docker 时,注意到书中创建容器进程时采取了一种很 weird 的方法。<code>mydocker run Command</code>后使用<code>exec.Command</code>调用<code>/proc/self/exe</code>(即进程本身),但会修改参数,使得其相当于调用了<code>mydocker init Command</code>,然后再完成与容器相关的初始化工作。</p><p>书中使用了<a href="https://github.com/urfave/cli">cli</a>库来编写命令行相关的代码,因此会注册<code>initCommand</code>,但很可惜的是<code>initCommand</code>是一个内部调用,即它应该由创建容器进程的程序本身来调用,而非用户通过输入<code>init</code>来使用,而使用<code>mydocker help</code>却会对外暴露这个命令,这样不太好。</p><p>所以我比较奇怪为什么作者没有采用类似 C 中 fork 那样的方式,返回后在父子进程中执行不同的函数。</p><h2 id="原因"><a href="#原因" class="headerlink" title="原因"></a>原因</h2><p>在这篇博客<sup id="fnref:1" class="footnote-ref"><a href="#fn:1" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Golang中实现典型的fork调用](https://jiajunhuang.com/articles/2018_03_08-golang_fork.md.html)">[1]</span></a></sup>中可以找到一些解释,Golang 提倡使用协程 <code>goroutine</code>来进行并发编程,为我们屏蔽了线程和进程的概念。同时鉴于在多数<code>fork</code>+<code>exec</code>情景下,可以很好地使用 Golang 中的 <code>syscall.ForkExec</code>和<code>exec.Command</code>来进行代替<sup id="fnref:2" class="footnote-ref"><a href="#fn:2" rel="footnote"><span class="hint--top hint--rounded" aria-label="[How do I fork a go process?](https://stackoverflow.com/questions/28370646/how-do-i-fork-a-go-process)">[2]</span></a></sup></p><p>如果我们想要在 Golang 中使用类似 C 里面 <code>fork</code> 的行为实现拷贝当前进程,并在父子进程中执行不同的函数,这需要一些技巧</p><h2 id="解决"><a href="#解决" class="headerlink" title="解决"></a>解决</h2><p>参考一下 docker 的 <a href="https://github.com/moby/moby/tree/master/pkg/reexec">reexec</a>的实现,其实跟书中很类似,但优点胜在它并没有直接把内部调用的命令暴露给用户。</p><h3 id="Example"><a href="#Example" class="headerlink" title="Example"></a>Example</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-keyword">package</span> main<br><br><span class="hljs-keyword">import</span> (<br><span class="hljs-string">"MyDocker/reexec"</span><br><span class="hljs-string">"fmt"</span><br><span class="hljs-string">"log"</span><br><span class="hljs-string">"os"</span><br>)<br><br><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">init</span><span class="hljs-params">()</span></span> {<br>fmt.Printf(<span class="hljs-string">"os.Args = %+v\n"</span>, os.Args)<br>reexec.Register(<span class="hljs-string">"childProcess"</span>, childProcess)<br><span class="hljs-keyword">if</span> reexec.Init() {<br>os.Exit(<span class="hljs-number">0</span>)<br>}<br>}<br><br><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">childProcess</span><span class="hljs-params">()</span></span> {<br>fmt.Println(<span class="hljs-string">"ChildProcess"</span>)<br>}<br><br><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span> {<br>cmd := reexec.Command(<span class="hljs-string">"childProcess"</span>)<br>cmd.Stdin = os.Stdin<br>cmd.Stdout = os.Stdout<br>cmd.Stderr = os.Stderr<br><br><span class="hljs-keyword">if</span> err := cmd.Start(); err != <span class="hljs-literal">nil</span> {<br>log.Fatal(err)<br>}<br>fmt.Println(<span class="hljs-string">"ParentProcess"</span>)<br>cmd.Wait()<br>os.Exit(<span class="hljs-number">0</span>)<br>}<br></code></pre></td></tr></table></figure><p>结果如下</p><figure class="highlight sh"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><code class="hljs sh"><span class="hljs-comment"># zyc @ DESKTOP-KK42M35 in /mnt/d/GoProject/src/MyDocker on git:master x [17:29:55] </span><br>$ go run main.go<br>os.Args = [/tmp/go-build3043439324/b001/exe/main]<br>ParentProcess<br>os.Args = [childProcess]<br>ChildProcess<br></code></pre></td></tr></table></figure><h3 id="Explanation"><a href="#Explanation" class="headerlink" title="Explanation"></a>Explanation</h3><p>reexec 主要由<code>reexec.go</code>以及<code>command_${os}.go</code>两个文件组成 </p><p><code>reexec.go</code>文件内容如下</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-keyword">package</span> reexec <span class="hljs-comment">// import "github.com/docker/docker/pkg/reexec"</span><br><br><span class="hljs-keyword">import</span> (<br><span class="hljs-string">"fmt"</span><br><span class="hljs-string">"os"</span><br><span class="hljs-string">"os/exec"</span><br><span class="hljs-string">"path/filepath"</span><br>)<br><br><span class="hljs-keyword">var</span> registeredInitializers = <span class="hljs-built_in">make</span>(<span class="hljs-keyword">map</span>[<span class="hljs-type">string</span>]<span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">()</span></span>)<br><br><span class="hljs-comment">// Register adds an initialization func under the specified name</span><br><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">Register</span><span class="hljs-params">(name <span class="hljs-type">string</span>, initializer <span class="hljs-keyword">func</span>()</span></span>) {<br><span class="hljs-keyword">if</span> _, exists := registeredInitializers[name]; exists {<br><span class="hljs-built_in">panic</span>(fmt.Sprintf(<span class="hljs-string">"reexec func already registered under name %q"</span>, name))<br>}<br><br>registeredInitializers[name] = initializer<br>}<br><br><span class="hljs-comment">// Init is called as the first part of the exec process and returns true if an</span><br><span class="hljs-comment">// initialization function was called.</span><br><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">Init</span><span class="hljs-params">()</span></span> <span class="hljs-type">bool</span> {<br>initializer, exists := registeredInitializers[os.Args[<span class="hljs-number">0</span>]]<br><span class="hljs-keyword">if</span> exists {<br>initializer()<br><br><span class="hljs-keyword">return</span> <span class="hljs-literal">true</span><br>}<br><span class="hljs-keyword">return</span> <span class="hljs-literal">false</span><br>}<br></code></pre></td></tr></table></figure><p>调用者使用<code>Register</code>注册函数,将名字与其相关联。使用<code>Init</code>查询注册的函数,如果存在则调用它</p><p><code>command_linux.go</code>文件如下</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-keyword">package</span> reexec<br><br><span class="hljs-keyword">import</span> (<br><span class="hljs-string">"os/exec"</span><br>)<br><br><span class="hljs-comment">// Self returns the path to the current process's binary.</span><br><span class="hljs-comment">// Returns "/proc/self/exe".</span><br><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">Self</span><span class="hljs-params">()</span></span> <span class="hljs-type">string</span> {<br><span class="hljs-keyword">return</span> <span class="hljs-string">"/proc/self/exe"</span><br>}<br><br><span class="hljs-comment">// Command returns *exec.Cmd which has Path as current binary. Also it setting</span><br><span class="hljs-comment">// This will use the in-memory version (/proc/self/exe) of the current binary,</span><br><span class="hljs-comment">// it is thus safe to delete or replace the on-disk binary (os.Args[0]).</span><br><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">Command</span><span class="hljs-params">(args ...<span class="hljs-type">string</span>)</span></span> *exec.Cmd {<br><span class="hljs-keyword">return</span> &exec.Cmd{<br>Path: Self(),<br>Args: args,<br>}<br>}<br></code></pre></td></tr></table></figure><p><code>reexec.Command</code>通过封装逻辑返回一个<code>*exec.Cmd</code>对象,<code>Self()</code>返回的<code>/proc/self/exe</code>指向的是当前进程.</p><p>一般而言,<code>os.Args</code>的第一个参数是可执行文件的名称,如前面例子中的<code>/tmp/go-build3043439324/b001/exe/main</code>,但到了新创建的进程中,第一个参数变成了我们设置的<code>childProcess</code>,而<code>childProcess</code>是我们注册的函数,并非是一个真正的可执行文件。不过这种情况,一般是我们在命令行中启动程序。</p><p>在 Golang 中,<code>cmd.Start()</code>会将<code>cmd.Path</code>作为程序启动的路径,这也是为什么我们需要在<code>reexec.Command</code>函数中设置<code>Path</code>为<code>/proc/self/exe</code>.</p><p>在这里需要提到一点,如果我们的 <code>reexec.Command</code>实现如下:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">Command</span><span class="hljs-params">(args ...<span class="hljs-type">string</span>)</span></span> *exec.Cmd {<br><span class="hljs-keyword">return</span> exec.Command(Self(), args...)<br>}<br></code></pre></td></tr></table></figure><p>输出会是无尽的递归:</p><figure class="highlight sh"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><code class="hljs sh">os.Args = [/proc/self/exe childProcess]<br>ParentProcess<br>os.Args = [/proc/self/exe childProcess]<br>ParentProcess<br>os.Args = [/proc/self/exe childProcess]<br>ParentProcess<br>os.Args = [/proc/self/exe childProcess]<br>ParentProcess<br>os.Args = [/proc/self/exe childProcess]<br>ParentProcess<br>......<br></code></pre></td></tr></table></figure><p><code>exec.Command</code>同样返回的是一个<code>*exec.Command</code>对象,但是我们看一下它的实现</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><code class="hljs go"><span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">Command</span><span class="hljs-params">(name <span class="hljs-type">string</span>, arg ...<span class="hljs-type">string</span>)</span></span> *Cmd {<br>cmd := &Cmd{<br>Path: name,<br>Args: <span class="hljs-built_in">append</span>([]<span class="hljs-type">string</span>{name}, arg...),<br>}<br><span class="hljs-keyword">if</span> filepath.Base(name) == name {<br><span class="hljs-keyword">if</span> lp, err := LookPath(name); err != <span class="hljs-literal">nil</span> {<br>cmd.lookPathErr = err<br>} <span class="hljs-keyword">else</span> {<br>cmd.Path = lp<br>}<br>}<br><span class="hljs-keyword">return</span> cmd<br>}<br></code></pre></td></tr></table></figure><p>它将<code>Path</code>设置为<code>Self()</code>的同时,又将其作为第一个参数,这遵循了我们提到的在命令行中启动程序的惯例。</p><p>而<code>reexec.go > Init()</code>会根据第一个参数(即<code>/proc/self/exe</code>,而非我们希望的<code>childProcess</code>)去查找注册的函数,由于查找失败,所以会进入<code>main</code>函数继续执行 。</p><p>当然我们可以选择修改<code>Init()</code>的实现,让它根据第二个参数来进行查找,然而这就要求当前的父进程至少要有两个参数(其中一个为当前进程启动的程序名),这样并不太好…</p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><section class="footnotes"><div class="footnote-list"><ol><li><span id="fn:1" class="footnote-text"><span><a href="https://jiajunhuang.com/articles/2018_03_08-golang_fork.md.html">Golang中实现典型的fork调用</a><a href="#fnref:1" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:2" class="footnote-text"><span><a href="https://stackoverflow.com/questions/28370646/how-do-i-fork-a-go-process">How do I fork a go process?</a><a href="#fnref:2" rev="footnote" class="footnote-backref"> ↩</a></span></span></li></ol></div></section>]]></content>
<categories>
<category>golang</category>
</categories>
<tags>
<tag>golang</tag>
</tags>
</entry>
<entry>
<title>UniquePtr C++ implementation</title>
<link href="/2021/10/02/UniquePtr-C-implementation/"/>
<url>/2021/10/02/UniquePtr-C-implementation/</url>
<content type="html"><![CDATA[<p>在之前的文章中我实现了 <a href="https://flaglord.com/2021/09/29/Sharedptr-C-implementation/">SharedPtr</a>,而为了实现 UniquePtr 我阅读了一些文章<sup id="fnref:1" class="footnote-ref"><a href="#fn:1" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Smart-Pointer-Shared Pointer](https://lokiastari.com/blog/2015/01/15/c-plus-plus-by-example-smart-pointer-part-ii/)">[1]</span></a></sup>,意识到我之前的实现存在了大量的问题。正如文章作者所言,智能指针的实现并不适合作为学习材料,它看上去很简单,却存在着大量的陷阱。而 Boost 库的实现直到其成为 C++ 11 的标准,大概有九年的时间。</p><p>不过既然开了坑,姑且硬着头皮写下去吧,毕竟也算作学习的一个过程。</p><p>先给出代码:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-keyword">template</span><<span class="hljs-keyword">typename</span> T><br><span class="hljs-keyword">class</span> <span class="hljs-title class_">MyUniquePtr</span>{<br><span class="hljs-keyword">private</span>:<br> T *data;<br><span class="hljs-keyword">public</span>:<br> <span class="hljs-built_in">MyUniquePtr</span>():<span class="hljs-built_in">data</span>(<span class="hljs-literal">nullptr</span>){ <br> }<br> <span class="hljs-comment">// Explicit constructor</span><br> <span class="hljs-function"><span class="hljs-keyword">explicit</span> <span class="hljs-title">MyUniquePtr</span><span class="hljs-params">(T* data)</span> : data(data){</span>}<br><br> ~<span class="hljs-built_in">MyUniquePtr</span>(){<br> <span class="hljs-keyword">delete</span> data;<br> }<br><br> <span class="hljs-comment">// Constructor/Assignment that binds to nullptr</span><br> <span class="hljs-built_in">MyUniquePtr</span>(std::<span class="hljs-type">nullptr_t</span>) : <span class="hljs-built_in">data</span>(<span class="hljs-literal">nullptr</span>){<br> }<br><br> MyUniquePtr& <span class="hljs-keyword">operator</span>=(std::<span class="hljs-type">nullptr_t</span>){<br> <span class="hljs-built_in">reset</span>();<br> <span class="hljs-keyword">return</span> *<span class="hljs-keyword">this</span>;<br> }<br><br> <span class="hljs-comment">/** Move Semantics **/</span><br> <span class="hljs-built_in">MyUniquePtr</span>(MyUniquePtr&& moving) <span class="hljs-keyword">noexcept</span>{<br> moving.<span class="hljs-built_in">swap</span>(*<span class="hljs-keyword">this</span>);<br> }<br><br> MyUniquePtr& <span class="hljs-keyword">operator</span>=(MyUniquePtr&& moving) <span class="hljs-keyword">noexcept</span>{<br> moving.<span class="hljs-built_in">swap</span>(*<span class="hljs-keyword">this</span>);<br> <span class="hljs-keyword">return</span> *<span class="hljs-keyword">this</span>;<br> }<br><br> <span class="hljs-comment">// Remove compiler generated copy semantics</span><br> <span class="hljs-built_in">MyUniquePtr</span>(MyUniquePtr <span class="hljs-type">const</span>&) = <span class="hljs-keyword">delete</span>;<br> MyUniquePtr& <span class="hljs-keyword">operator</span>=(MyUniquePtr <span class="hljs-type">const</span>&) = <span class="hljs-keyword">delete</span>;<br><br> <span class="hljs-comment">// Const correct access owned object</span><br> T* <span class="hljs-keyword">operator</span>->() <span class="hljs-type">const</span>{<br> <span class="hljs-keyword">return</span> data;<br> }<br> T& <span class="hljs-keyword">operator</span>*() <span class="hljs-type">const</span>{<br> <span class="hljs-keyword">return</span> *data;<br> }<br><br> <span class="hljs-comment">// Access to smart pointer state</span><br> <span class="hljs-comment">// it can be used in conditional expression</span><br> <span class="hljs-function">T* <span class="hljs-title">get</span><span class="hljs-params">()</span> <span class="hljs-type">const</span></span>{<br> <span class="hljs-keyword">return</span> data;<br> }<br> <span class="hljs-function"><span class="hljs-keyword">explicit</span> <span class="hljs-keyword">operator</span> <span class="hljs-title">bool</span><span class="hljs-params">()</span> <span class="hljs-type">const</span></span>{<br> <span class="hljs-keyword">return</span> data;<br> }<br><br> <span class="hljs-comment">// modify object state</span><br> <span class="hljs-function">T* <span class="hljs-title">release</span><span class="hljs-params">()</span> <span class="hljs-keyword">noexcept</span></span>{<br> T* result = <span class="hljs-literal">nullptr</span>;<br> std::<span class="hljs-built_in">swap</span>(result, data);<br> <span class="hljs-keyword">return</span> result;<br> }<br><br> <span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">reset</span><span class="hljs-params">()</span></span>{<br> T *tmp = <span class="hljs-built_in">release</span>();<br> <span class="hljs-keyword">delete</span> tmp;<br> }<br><br> <span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">swap</span><span class="hljs-params">(MyUniquePtr& src)</span> <span class="hljs-keyword">noexcept</span></span><br><span class="hljs-function"> </span>{<br> std::<span class="hljs-built_in">swap</span>(data, src.data);<br> }<br><br>};<br></code></pre></td></tr></table></figure><p>诚实地讲,以上代码几乎是这篇答案<sup id="fnref:2" class="footnote-ref"><a href="#fn:2" rel="footnote"><span class="hint--top hint--rounded" aria-label="[My implementation for std::unique_ptr](https://codereview.stackexchange.com/questions/163854/my-implementation-for-stdunique-ptr)">[2]</span></a></sup>的 copy,不过我这边为了简单删去了 Constructor from derived type 的部分。接下来,我会对其中做一些注解。</p><h2 id="Rule-of-Three"><a href="#Rule-of-Three" class="headerlink" title="Rule of Three"></a>Rule of Three</h2><p>rule of three<sup id="fnref:3" class="footnote-ref"><a href="#fn:3" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Rule of Three](https://stackoverflow.com/questions/4172722/what-is-the-rule-of-three)">[3]</span></a></sup> </p><blockquote><p>If you need to explicitly declare either the destructor, copy constructor or copy assignment operator yourself, you probably need to explicitly declare all three of them.</p></blockquote><p>在多数情况下,编译器默认生成的拷贝构造函数和拷贝赋值函数来很好地满足我们的需要,但一旦类中涉及到指针对象,便会牵扯到 resources management. 关于这个问题我们提到的比较多的是深拷贝和浅拷贝,编译器的默认拷贝行为往往是 memberwise 的,这在跟指针有关的情景下会导致一些问题。</p><p>如果<code>MyUniquePtr</code>使用默认生成的拷贝构造函数,在下面情况中会产生 double delete,从而导致 Undefined Behavior,这样的 UB 一般来说会返回一个堆损坏的异常退出码(<code>(0xC0000374)</code>),但更糟糕的情况是程序运行不产生任何错误,直到另一次运行失败。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-function">MyUniquePtr<<span class="hljs-type">int</span>> <span class="hljs-title">u1</span><span class="hljs-params">(<span class="hljs-keyword">new</span> <span class="hljs-type">int</span>(<span class="hljs-number">5</span>))</span></span>;<br><span class="hljs-function">MyUniquePtr<<span class="hljs-type">int</span>> <span class="hljs-title">u2</span><span class="hljs-params">(u1)</span></span>;<br></code></pre></td></tr></table></figure><h3 id="UniquePtr-不应该有-copy-semantics"><a href="#UniquePtr-不应该有-copy-semantics" class="headerlink" title="UniquePtr 不应该有 copy semantics"></a>UniquePtr 不应该有 copy semantics</h3><p>跟 <code>shared_ptr</code>相比这是一件值得注意的事情,对于<code>MyUniquePtr</code>,我们希望它是 Noncopyable,但是我们可以对它使用 move semantics.所以在实现上我们对拷贝构造函数和拷贝赋值函数使用了<code>delete</code>关键字,这样可以防止它被调用,同时又可以屏蔽掉编译器默认生成的函数版本。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-comment">// Remove compiler generated copy semantics</span><br><span class="hljs-built_in">MyUniquePtr</span>(MyUniquePtr <span class="hljs-type">const</span>&) = <span class="hljs-keyword">delete</span>;<br>MyUniquePtr& <span class="hljs-keyword">operator</span>=(MyUniquePtr <span class="hljs-type">const</span>&) = <span class="hljs-keyword">delete</span>;<br></code></pre></td></tr></table></figure><h2 id="Why-need-explicit?"><a href="#Why-need-explicit?" class="headerlink" title="Why need explicit?"></a>Why need explicit?</h2><p>在上面的实现中,有两处使用到了 <code>explicit </code>关键字</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-function"><span class="hljs-keyword">explicit</span> <span class="hljs-title">MyUniquePtr</span><span class="hljs-params">(T* data)</span> : data(data){</span>}<br><br><span class="hljs-function"><span class="hljs-keyword">explicit</span> <span class="hljs-keyword">operator</span> <span class="hljs-title">bool</span><span class="hljs-params">()</span> <span class="hljs-type">const</span></span>{<br> <span class="hljs-keyword">return</span> data;<br>}<br></code></pre></td></tr></table></figure><h3 id="构造函数"><a href="#构造函数" class="headerlink" title="构造函数"></a>构造函数</h3><p>explicit 关键词能够很好地避免 implicit conversion.如果不使用 explicit ,对于下面的这段代码,编译器不会报告任何错误。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">takeOwner1</span><span class="hljs-params">(MyUniquePtr<<span class="hljs-type">int</span>> x)</span></span>{<br>}<br><span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">takeOwner2</span><span class="hljs-params">(MyUniquePtr<<span class="hljs-type">int</span>> <span class="hljs-type">const</span> &x)</span></span>{<br>}<br><span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">takeOwner3</span><span class="hljs-params">(MyUniquePtr<<span class="hljs-type">int</span>> &&x)</span></span>{<br>}<br><br><span class="hljs-function"><span class="hljs-type">int</span> <span class="hljs-title">main</span><span class="hljs-params">(<span class="hljs-type">void</span>)</span></span>{<br> <span class="hljs-type">int</span> *data3 = <span class="hljs-keyword">new</span> <span class="hljs-built_in">int</span>(<span class="hljs-number">7</span>);<br> <span class="hljs-type">int</span> *data2 = <span class="hljs-keyword">new</span> <span class="hljs-built_in">int</span>(<span class="hljs-number">6</span>);<br> <span class="hljs-type">int</span> *data1 = <span class="hljs-keyword">new</span> <span class="hljs-built_in">int</span>(<span class="hljs-number">5</span>);<br> std::cout << *data1<<std::endl;<br> std::cout << *data2<<std::endl;<br> std::cout << *data3<<std::endl;<br> <span class="hljs-built_in">takeOwner1</span>(data1);<br> <span class="hljs-built_in">takeOwner2</span>(data2);<br> <span class="hljs-built_in">takeOwner3</span>(data3);<br> std::cout << <span class="hljs-string">"------------------"</span> << std::endl;<br> std::cout << *data1<<std::endl;<br> std::cout << *data2<<std::endl;<br> std::cout << *data3<<std::endl;<br>}<br></code></pre></td></tr></table></figure><p>但是我们可以看一下输出的结果</p><figure class="highlight asciidoc"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><code class="hljs asciidoc">D:\Desktop\Study\course\cpp_wkspc\Leetcode\cmake-build-debug\Leetcode.exe<br>5<br>6<br><span class="hljs-section">7</span><br><span class="hljs-section">------------------</span><br>8202128<br>1597264<br>1597264<br></code></pre></td></tr></table></figure><p>可以明显地看到 data 指向的数据被损坏了,变成 invalid 了。原因在于有 implicit conversion, 会利用 data 作为参数调用构造函数创建一个 temporary object 供函数使用。而函数返回后,temporary object 的生命周期也就结束了,<code>MyUniquePtr</code>中的析构函数会被调用,从而直接 delete,在接下来的代码使用它时就会访问一块非法的区域。</p><h3 id="bool-重载"><a href="#bool-重载" class="headerlink" title="bool 重载"></a>bool 重载</h3><p>其实这里也是 implicit conversion 闹出的问题:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-function">MyUniquePtr <span class="hljs-title">s1</span><span class="hljs-params">(<span class="hljs-keyword">new</span> <span class="hljs-type">int</span>(<span class="hljs-number">1</span>))</span>, <span class="hljs-title">s2</span><span class="hljs-params">(<span class="hljs-keyword">new</span> <span class="hljs-type">int</span>(<span class="hljs-number">2</span>))</span></span>;<br><span class="hljs-keyword">if</span>(s1 == s2){<br> cout << <span class="hljs-string">"matched"</span> << endl;<br>}<br></code></pre></td></tr></table></figure><p>如果不使用<code>explicit</code>,上述的这段代码会输出 matched,这是因为编译器会把<code>MyUniquePtr</code>转为<code>bool</code>进行比较。</p><p>在这里真的有需要的话,应该自己去实现<code>operator ==</code>.</p><h2 id="nullptr"><a href="#nullptr" class="headerlink" title="nullptr"></a>nullptr</h2><p>在实现的代码中,你可能会对以<code>nullptr_t</code>为参数的构造函数和拷贝赋值函数有些好奇</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-built_in">MyUniquePtr</span>(std::<span class="hljs-type">nullptr_t</span>) : <span class="hljs-built_in">data</span>(<span class="hljs-literal">nullptr</span>){<br>}<br><br>MyUniquePtr& <span class="hljs-keyword">operator</span>=(std::<span class="hljs-type">nullptr_t</span>){<br> <span class="hljs-built_in">reset</span>();<br> <span class="hljs-keyword">return</span> *<span class="hljs-keyword">this</span>;<br>}<br></code></pre></td></tr></table></figure><p>前面提到了我们构造函数使用了<code>explicit</code>关键词来避免 implicit conversion,所以编译器不能自动把<code>nullptr</code>转换为智能指针,必须由开发者显示来完成<sup id="fnref:5" class="footnote-ref"><a href="#fn:5" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Smart-Pointer - Constructors](https://lokiastari.com/blog/2015/01/23/c-plus-plus-by-example-smart-pointer-part-iii/index.html)">[5]</span></a></sup></p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">workWithUP</span><span class="hljs-params">(MyUniquePtr<<span class="hljs-type">int</span>>&& up)</span></span>{<br> <span class="hljs-comment">/* STUFF */</span><br>}<br><span class="hljs-function"><span class="hljs-type">int</span> <span class="hljs-title">main</span><span class="hljs-params">(<span class="hljs-type">void</span>)</span></span>{<br> <span class="hljs-comment">// This fails to compile</span><br> <span class="hljs-built_in">workWithUP</span>(<span class="hljs-literal">nullptr</span>);<br> <br> <span class="hljs-comment">// Need to be explicit with smart pointers</span><br> <span class="hljs-built_in">workWithUP</span>(<span class="hljs-built_in">MyUniquePtr</span><<span class="hljs-type">int</span>>(<span class="hljs-literal">nullptr</span>));<br>}<br></code></pre></td></tr></table></figure><p>这看起来非常麻烦,所以我们可以加入一个以类型<code>std::nullptr_t</code>为参数的构造函数和赋值函数来简化这种情形。</p><h2 id="copy-and-swap-idiom"><a href="#copy-and-swap-idiom" class="headerlink" title="copy-and-swap idiom"></a>copy-and-swap idiom</h2><p>十分建议好好读一下这篇回答<sup id="fnref:6" class="footnote-ref"><a href="#fn:6" rel="footnote"><span class="hint--top hint--rounded" aria-label="[What is the copy-and-swap idiom?](https://stackoverflow.com/questions/3279543/what-is-the-copy-and-swap-idiom)">[6]</span></a></sup>,非常精彩。</p><p>正是由于 <em>copy-and-swap idiom</em>的应用,我们的 <em>move semantics</em> 实现得很精简,这边提一个要点。</p><ul><li><p>为什么不直接使用<code>std::swap</code>?</p><p><code>std::swap</code>的实现中会使用拷贝构造函数和拷贝赋值函数,然而我们的拷贝赋值函数需要依赖拷贝构造函数、析构函数以及<code>swap</code>来实现。人不能自己举起自己,所以在这里我们需要定义自己版本的<code>swap</code>函数</p></li></ul><h2 id="overloading-deference-operators"><a href="#overloading-deference-operators" class="headerlink" title="overloading deference operators"></a>overloading deference operators</h2><p>在这里提一下这两个重载返回类型的问题</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><code class="hljs c++">T* <span class="hljs-keyword">operator</span>->() <span class="hljs-type">const</span>{<br> <span class="hljs-keyword">return</span> data;<br>}<br>T& <span class="hljs-keyword">operator</span>*() <span class="hljs-type">const</span>{<br> <span class="hljs-keyword">return</span> *data;<br>}<br></code></pre></td></tr></table></figure><p>对于<code>*</code>返回引用是因为我们希望能够修改指针指向的数据,设想一下如果返回<code>T</code>,<code>*ptr = 1</code>将什么事都不做;</p><p>而对于<code>-></code>可以参考一下这篇回答<sup id="fnref:4" class="footnote-ref"><a href="#fn:4" rel="footnote"><span class="hint--top hint--rounded" aria-label="[C++ overloading dereference operators](https://stackoverflow.com/questions/21569483/c-overloading-dereference-operators)">[4]</span></a></sup>,里面解释比较清晰</p><blockquote><p><em>When overloading the structure dereference, the type should be <code>T*</code> because this operator is a special case and that is just how it works.</em></p></blockquote><h2 id="Summary"><a href="#Summary" class="headerlink" title="Summary"></a>Summary</h2><p>任何情况下都不建议去使用自己实现的智能指针。坦率地讲,实现涉及到 resources management 的类真的不是一件简单的事情。上面的实现已经有很多注意点了,但它仍不是一个完备的实现(比如提到的 derived type constructor<sup id="fnref:5" class="footnote-ref"><a href="#fn:5" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Smart-Pointer - Constructors](https://lokiastari.com/blog/2015/01/23/c-plus-plus-by-example-smart-pointer-part-iii/index.html)">[5]</span></a></sup>).</p><p>C++ 这门语言确实有点可怕 ……</p><h2 id="reference"><a href="#reference" class="headerlink" title="reference"></a>reference</h2><section class="footnotes"><div class="footnote-list"><ol><li><span id="fn:1" class="footnote-text"><span><a href="https://lokiastari.com/blog/2015/01/15/c-plus-plus-by-example-smart-pointer-part-ii/">Smart-Pointer-Shared Pointer</a><a href="#fnref:1" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:2" class="footnote-text"><span><a href="https://codereview.stackexchange.com/questions/163854/my-implementation-for-stdunique-ptr">My implementation for std::unique_ptr</a><a href="#fnref:2" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:3" class="footnote-text"><span><a href="https://stackoverflow.com/questions/4172722/what-is-the-rule-of-three">Rule of Three</a><a href="#fnref:3" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:4" class="footnote-text"><span><a href="https://stackoverflow.com/questions/21569483/c-overloading-dereference-operators">C++ overloading dereference operators</a><a href="#fnref:4" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:5" class="footnote-text"><span><a href="https://lokiastari.com/blog/2015/01/23/c-plus-plus-by-example-smart-pointer-part-iii/index.html">Smart-Pointer - Constructors</a><a href="#fnref:5" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:6" class="footnote-text"><span><a href="https://stackoverflow.com/questions/3279543/what-is-the-copy-and-swap-idiom">What is the copy-and-swap idiom?</a><a href="#fnref:6" rev="footnote" class="footnote-backref"> ↩</a></span></span></li></ol></div></section>]]></content>
<categories>
<category>C++</category>
</categories>
<tags>
<tag>C++</tag>
<tag>interview</tag>
</tags>
</entry>
<entry>
<title>RWLock C++ implementation</title>
<link href="/2021/09/30/RWLock-C-implementation/"/>
<url>/2021/09/30/RWLock-C-implementation/</url>
<content type="html"><![CDATA[<p>其实 boost 库已经有相关的实现,在实际工程中使用 boost 库的实现可能会使得我们程序更加健壮。</p><p>这边展示的是一个比较 naive 的实现,逻辑上也并不复杂。</p><p>成员变量<code>readerCount</code>记录正在临界区的读者数量,<code>mutexReader</code>以及<code>mutexWriter</code>实现相应的控制逻辑</p><p><code>WLock</code>和<code>WUnlock</code>比较简单,就是直接对<code>mutexWriter</code>进行加锁和放锁</p><p>而在<code>RLock</code>中,首先会去获取读锁<code>mutexReader</code>,这是因为需要保护对变量<code>readerCount</code>的访问,如果是第一个读者,还需要获取写锁,防止有写者进入临界区。</p><p><code>RUnlock</code>中,逻辑相似,最后一个离开临界区的读者需要负责把写锁释放掉。</p><p>从整体实现上看这是一个偏向读者的读写锁,因为它允许读者在有写者等待时进入临界区,这样的话如果有读者不断到来,可能会引起写者饥饿。</p><p>全部代码如下:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-keyword">using</span> <span class="hljs-keyword">namespace</span> std;<br><span class="hljs-keyword">class</span> <span class="hljs-title class_">RWLOCK</span>{<br><span class="hljs-keyword">private</span>:<br> <span class="hljs-type">size_t</span> readerCount;<br> mutex mutexReader, mutexWriter;<br><span class="hljs-keyword">public</span>:<br> <span class="hljs-built_in">RWLOCK</span>():<span class="hljs-built_in">readerCount</span>(<span class="hljs-number">0</span>){<br><br> }<br><br> <span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">RLock</span><span class="hljs-params">()</span></span>{<br> <span class="hljs-function">unique_lock<mutex> <span class="hljs-title">ul</span><span class="hljs-params">(mutexReader)</span></span>;<br> readerCount++;<br> <span class="hljs-keyword">if</span>(readerCount == <span class="hljs-number">1</span>){<br> mutexWriter.<span class="hljs-built_in">lock</span>();<br> }<br> }<br><br> <span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">RUnlock</span><span class="hljs-params">()</span></span>{<br> <span class="hljs-function">unique_lock<mutex> <span class="hljs-title">ul</span><span class="hljs-params">(mutexReader)</span></span>;<br> readerCount--;<br> <span class="hljs-keyword">if</span>(readerCount == <span class="hljs-number">0</span>){<br> mutexWriter.<span class="hljs-built_in">unlock</span>();<br> }<br> }<br><br> <span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">WLock</span><span class="hljs-params">()</span></span>{<br> mutexWriter.<span class="hljs-built_in">lock</span>();<br> }<br><br> <span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">WUnlock</span><span class="hljs-params">()</span></span>{<br> mutexWriter.<span class="hljs-built_in">unlock</span>();<br> }<br>};<br></code></pre></td></tr></table></figure>]]></content>
<categories>
<category>C++</category>
</categories>
<tags>
<tag>C++</tag>
<tag>interview</tag>
</tags>
</entry>
<entry>
<title>Vector C++ implementation</title>
<link href="/2021/09/29/Vector-C-implementation/"/>
<url>/2021/09/29/Vector-C-implementation/</url>
<content type="html"><![CDATA[<p>成员变量<code>cap_</code>记录了容量大小,而<code>size_</code>则记录了实际存储的元素的数量,<code>iniVal</code>作为<code>const</code>值表示<code>MyVector</code>对象创建时为其预分配的容量大小,<code>vector</code>则是指针,其指向的应该是对象数组的起始地址</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-keyword">private</span>:<br> T *vector = <span class="hljs-literal">nullptr</span>;<br> <span class="hljs-type">size_t</span> cap_;<br> <span class="hljs-type">size_t</span> size_;<br> <span class="hljs-type">const</span> <span class="hljs-type">int</span> iniVal = <span class="hljs-number">20</span>;<br></code></pre></td></tr></table></figure><p>首先看一下构造函数,其中会使用<code>malloc</code>分配指定大小的内存区域,并把起始地址赋给<code>vector</code></p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-built_in">MyVector</span>(){<br> cap_ = iniVal;<br> size_ = <span class="hljs-number">0</span>;<br> vector = (T*) <span class="hljs-built_in">malloc</span>(<span class="hljs-built_in">sizeof</span> (T) * cap_);<br>}<br></code></pre></td></tr></table></figure><p>实现的重点在于<code>push_back</code>函数:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">push_back</span><span class="hljs-params">(<span class="hljs-type">const</span> T &data)</span></span>{<br> <span class="hljs-keyword">if</span>(size_ < cap_){<br> *(vector + size_) = data;<br> size_++;<br> }<span class="hljs-keyword">else</span>{<br> vector = (T*)<span class="hljs-built_in">realloc</span>(vector, <span class="hljs-built_in">sizeof</span>(T) * cap_ * <span class="hljs-number">2</span>);<br> cap_ *= <span class="hljs-number">2</span>;<br> <span class="hljs-keyword">if</span>(vector){<br> *(vector + size_) = data;<br> size_ ++;<br> }<br> }<br>}<br></code></pre></td></tr></table></figure><p>如果容量足够(<code>size_ < cap_</code>),直接赋值即可;如果容量不够,我们将容量翻倍,并调用<code>realloc</code><sup id="fnref:1" class="footnote-ref"><a href="#fn:1" rel="footnote"><span class="hint--top hint--rounded" aria-label="[cpp vector implementation]( https://www.delftstack.com/howto/cpp/cpp-vector-implementation/)">[1]</span></a></sup>重新分配内存,<code>realloc</code>对于扩大内存操作会将传入的指针指向的内存中的数据复制到新地址,并释放原指针指向的内存空间。</p><p>在这里需要注意的是<code>realloc</code>是可能会分配失败的 ,它会返回<code>NULL</code>,所以在赋值前会加入相应的检查判断,然而在这里我欠缺了分配失败后的处理逻辑。</p><p>而<code>pop_back</code>则很简单,只需要将<code>size_</code>递减即可,不需要对那部分数据有任何操作,因为不管那个位置上的值是否有效,由<code>cap_</code>指定大小的一块内存区域已经分配给了<code>MyVector</code>使用,如果有后续的元素加入,它会直接进行覆盖。</p><p>不过这样实现显然会有一些缺陷,因为它并非那么“动态”:一旦<code>cap_</code>扩大后,却没有任何缩减<code>cap_</code>的逻辑,实际上可能会造成内存使用上的浪费,这是需要改进的一个点。</p><p>全部代码如下:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-keyword">using</span> <span class="hljs-keyword">namespace</span> std;<br><span class="hljs-keyword">template</span><<span class="hljs-keyword">typename</span> T><br><span class="hljs-keyword">class</span> <span class="hljs-title class_">MyVector</span>{<br><span class="hljs-keyword">private</span>:<br> T *vector = <span class="hljs-literal">nullptr</span>;<br> <span class="hljs-type">size_t</span> cap_;<br> <span class="hljs-type">size_t</span> size_;<br> <span class="hljs-type">const</span> <span class="hljs-type">int</span> iniVal = <span class="hljs-number">20</span>;<br><span class="hljs-keyword">public</span>:<br> <span class="hljs-built_in">MyVector</span>(){<br> cap_ = iniVal;<br> size_ = <span class="hljs-number">0</span>;<br> vector = (T*) <span class="hljs-built_in">malloc</span>(<span class="hljs-built_in">sizeof</span> (T) * cap_);<br> }<br> ~<span class="hljs-built_in">MyVector</span>(){<br> <span class="hljs-built_in">free</span>(vector);<br> }<br> T& <span class="hljs-keyword">operator</span>[](<span class="hljs-type">size_t</span> pos){<br> <span class="hljs-keyword">return</span> *(<span class="hljs-keyword">this</span>->vector + pos);<br> }<br> <span class="hljs-function"><span class="hljs-type">size_t</span> <span class="hljs-title">size</span><span class="hljs-params">()</span></span>{<br> <span class="hljs-keyword">return</span> size_;<br> }<br> <span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">push_back</span><span class="hljs-params">(<span class="hljs-type">const</span> T &data)</span></span>{<br> <span class="hljs-keyword">if</span>(size_ < cap_){<br> *(vector + size_) = data;<br> size_++;<br> }<span class="hljs-keyword">else</span>{<br> vector = (T*)<span class="hljs-built_in">realloc</span>(vector, <span class="hljs-built_in">sizeof</span>(T) * cap_ * <span class="hljs-number">2</span>);<br> cap_ *= <span class="hljs-number">2</span>;<br> <span class="hljs-keyword">if</span>(vector){<br> *(vector + size_) = data;<br> size_ ++;<br> }<br> }<br> }<br> <span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">pop_back</span><span class="hljs-params">()</span></span>{<br> size_--;<br> }<br>};<br></code></pre></td></tr></table></figure><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><section class="footnotes"><div class="footnote-list"><ol><li><span id="fn:1" class="footnote-text"><span><a href="https://www.delftstack.com/howto/cpp/cpp-vector-implementation/">cpp vector implementation</a><a href="#fnref:1" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:2" class="footnote-text"><span><a href="http://c.biancheng.net/cpp/html/2859.html">realloc</a><a href="#fnref:2" rev="footnote" class="footnote-backref"> ↩</a></span></span></li></ol></div></section>]]></content>
<categories>
<category>C++</category>
</categories>
<tags>
<tag>C++</tag>
<tag>interview</tag>
</tags>
</entry>
<entry>
<title>Sharedptr C++ implementation</title>
<link href="/2021/09/29/Sharedptr-C-implementation/"/>
<url>/2021/09/29/Sharedptr-C-implementation/</url>
<content type="html"><![CDATA[<h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p>回望最初版本的<code>shared_ptr</code>,其实存在着很多问题,比如没有考虑 <em>Constructor Failure</em>以及未使用 <em>copy-and-swap idiom</em>来解决 <em>code duplicate</em> 等.</p><p>在这里我打算保留最初的版本,并添加说明对此进行更正</p><h2 id="Old-Version"><a href="#Old-Version" class="headerlink" title="Old Version"></a>Old Version</h2><p>成员变量<code>ptr</code>用于保存共享的指针,而 <code>refCount</code>则是计数器,在这里注意 <code>refCount</code>是指针,这样的话可以做到多个共享指针共享同一份计数值(这一点非常重要)</p><p>关于拷贝构造函数,注意在自增<code>refCount</code>前,需要检验被拷贝对象是否为空:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-comment">// copy constructor</span><br><span class="hljs-built_in">my_shared_ptr</span>(<span class="hljs-type">const</span> my_shared_ptr & obj){<br> <span class="hljs-keyword">this</span>->ptr = obj.ptr;<br> <span class="hljs-keyword">this</span>->refCount = obj.refCount; <span class="hljs-comment">// share refCount;</span><br> <span class="hljs-keyword">if</span>(obj.ptr != <span class="hljs-literal">nullptr</span>){<br> (*<span class="hljs-keyword">this</span>->refCount) ++;<br> }<br>}<br></code></pre></td></tr></table></figure><p>而对于拷贝赋值函数,首先要注意返回值是引用(拷贝赋值函数的惯例),其次相比于拷贝构造函数,调用了<code>__cleanup__</code>来清理原先共享指针维护的指针对象资源:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-comment">// copy assigmemt</span><br>my_shared_ptr& <span class="hljs-keyword">operator</span>=(<span class="hljs-type">const</span> my_shared_ptr &obj){<br> <span class="hljs-comment">// clean up existing data</span><br> __cleanup__();<br> <span class="hljs-comment">// assigning new obj's data to this obj</span><br> <span class="hljs-keyword">this</span>->ptr = obj.ptr;<br> <span class="hljs-keyword">this</span>->refCount = obj.refCount;<br> <span class="hljs-keyword">if</span>(obj.ptr != <span class="hljs-literal">nullptr</span>){<br> (*<span class="hljs-keyword">this</span>->refCount) ++;<br> }<br>}<br></code></pre></td></tr></table></figure><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-type">void</span> __cleanup__(){<br> (*refCount)--;<br> <span class="hljs-keyword">if</span>(*refCount == <span class="hljs-number">0</span>){<br> <span class="hljs-comment">// important</span><br> <span class="hljs-keyword">if</span>(ptr != <span class="hljs-literal">nullptr</span>){<br> <span class="hljs-keyword">delete</span> ptr;<br> }<br> <span class="hljs-keyword">delete</span> refCount;<br> }<br>}<br></code></pre></td></tr></table></figure><p><code>__cleanup__</code>中会将计数减一,一旦计数值变为0,便会使用<code>delete</code>来回收资源,不过同样要注意的是,在<code>delete ptr</code>前会去检验<code>ptr</code>是否为空。如果不这样做,在以下场景中就会去<code>delete</code>一个空指针,这是一件很危险的事情</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs c++">my_shared_ptr<SomeClass> msp;<br><span class="hljs-function">my_shared_ptr<SomeClass> <span class="hljs-title">another_msp</span><span class="hljs-params">(<span class="hljs-keyword">new</span> SomeClass)</span></span><br><span class="hljs-function">msp </span>= another_msp<br></code></pre></td></tr></table></figure><p>下面再使得<code>my_shared_ptr</code>支持<code>move</code>语义:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-comment">/*** Move Semantics ***/</span><br><span class="hljs-comment">// move constructor</span><br><span class="hljs-built_in">my_shared_ptr</span>(my_shared_ptr && dyingObj){<br> <span class="hljs-keyword">this</span>->ptr = dyingObj.ptr;<br> <span class="hljs-keyword">this</span>->refCount = dyingObj.refCount;<br> <span class="hljs-comment">// clean up dyingObj to avoid the moved value being destructed</span><br> dyingObj.ptr = dyingObj.refCount = <span class="hljs-literal">nullptr</span>;<br>}<br><br><span class="hljs-comment">// move assignment</span><br>my_shared_ptr& <span class="hljs-keyword">operator</span>=(my_shared_ptr &&dyingObj){<br> <span class="hljs-comment">// clean up existing data</span><br> __cleanup__();<br> <span class="hljs-comment">// assign new data</span><br> <span class="hljs-keyword">this</span>->ptr = dyingObj.ptr;<br> <span class="hljs-keyword">this</span>->refCount = dyingObj.refCount;<br> <span class="hljs-comment">// clean up</span><br> dyingObj.ptr = dyingObj.refCount = <span class="hljs-literal">nullptr</span>;<br>}<br></code></pre></td></tr></table></figure><p>同时我们还需要实现一些重载函数,使得共享指针使用起来跟其维护的指针对象没有什么差别。</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><code class="hljs c++"> <span class="hljs-comment">// overload -> and *</span><br>T* <span class="hljs-keyword">operator</span>->() <span class="hljs-type">const</span>{<br> <span class="hljs-keyword">return</span> <span class="hljs-keyword">this</span>->ptr;<br>}<br><br>T& <span class="hljs-keyword">operator</span>*() <span class="hljs-type">const</span>{<br> <span class="hljs-keyword">return</span> <span class="hljs-keyword">this</span>->ptr;<br>}<br><br></code></pre></td></tr></table></figure><p>全部代码如下:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-keyword">using</span> <span class="hljs-keyword">namespace</span> std;<br><span class="hljs-keyword">typedef</span> <span class="hljs-type">unsigned</span> <span class="hljs-type">int</span> uint;<br><span class="hljs-keyword">template</span><<span class="hljs-keyword">class</span> <span class="hljs-title class_">T</span>><br><span class="hljs-keyword">class</span> <span class="hljs-title class_">my_shared_ptr</span>{<br><span class="hljs-keyword">private</span>:<br> T *ptr = <span class="hljs-literal">nullptr</span>;<br> <span class="hljs-comment">// refCount should be a pointer</span><br> uint *refCount = <span class="hljs-literal">nullptr</span>;<br> <span class="hljs-type">void</span> __cleanup__(){<br> (*refCount)--;<br> <span class="hljs-keyword">if</span>(*refCount == <span class="hljs-number">0</span>){<br> <span class="hljs-comment">// important</span><br> <span class="hljs-keyword">if</span>(ptr != <span class="hljs-literal">nullptr</span>){<br> <span class="hljs-keyword">delete</span> ptr;<br> }<br> <span class="hljs-keyword">delete</span> refCount;<br> }<br> }<br><span class="hljs-keyword">public</span>:<br> <span class="hljs-comment">// default constructor</span><br> <span class="hljs-built_in">my_shared_ptr</span>():<span class="hljs-built_in">ptr</span>(<span class="hljs-literal">nullptr</span>), <span class="hljs-built_in">refCount</span>(<span class="hljs-keyword">new</span> <span class="hljs-built_in">uint</span>(<span class="hljs-number">0</span>)){<br><br> }<br><br> <span class="hljs-comment">// constructor</span><br> <span class="hljs-built_in">my_shared_ptr</span>(T* ptr):<span class="hljs-built_in">ptr</span>(ptr), <span class="hljs-built_in">refCount</span>(<span class="hljs-keyword">new</span> <span class="hljs-built_in">uint</span>(<span class="hljs-number">1</span>)){<br><br> }<br><br> <span class="hljs-comment">/*** Copy Semantics ***/</span><br> <span class="hljs-comment">// copy constructor</span><br> <span class="hljs-built_in">my_shared_ptr</span>(<span class="hljs-type">const</span> my_shared_ptr & obj){<br> <span class="hljs-keyword">this</span>->ptr = obj.ptr;<br> <span class="hljs-keyword">this</span>->refCount = obj.refCount; <span class="hljs-comment">// share refCount;</span><br> <span class="hljs-keyword">if</span>(obj.ptr != <span class="hljs-literal">nullptr</span>){<br> (*<span class="hljs-keyword">this</span>->refCount) ++;<br> }<br> }<br><br> <span class="hljs-comment">// copy assignment</span><br> <span class="hljs-comment">// return value type should be reference to make exp a + b = c invalid</span><br> my_shared_ptr& <span class="hljs-keyword">operator</span>=(<span class="hljs-type">const</span> my_shared_ptr &obj){<br> <span class="hljs-comment">// clean up existing data</span><br> __cleanup__();<br> <span class="hljs-comment">// assigning new obj's data to this obj</span><br> <span class="hljs-keyword">this</span>->ptr = obj.ptr;<br> <span class="hljs-keyword">this</span>->refCount = obj.refCount;<br> <span class="hljs-keyword">if</span>(obj.ptr != <span class="hljs-literal">nullptr</span>){<br> (*<span class="hljs-keyword">this</span>->refCount) ++;<br> }<br> }<br><br> <span class="hljs-comment">/*** Move Semantics ***/</span><br> <span class="hljs-comment">// move constructor</span><br> <span class="hljs-built_in">my_shared_ptr</span>(my_shared_ptr && dyingObj){<br> <span class="hljs-keyword">this</span>->ptr = dyingObj.ptr;<br> <span class="hljs-keyword">this</span>->refCount = dyingObj.refCount;<br> <span class="hljs-comment">// clean up dyingObj to avoid the moved value being destructed</span><br> dyingObj.ptr = dyingObj.refCount = <span class="hljs-literal">nullptr</span>;<br> }<br><br> <span class="hljs-comment">// move assignment</span><br> my_shared_ptr& <span class="hljs-keyword">operator</span>=(my_shared_ptr &&dyingObj){<br> <span class="hljs-comment">// clean up existing data</span><br> __cleanup__();<br> <span class="hljs-comment">// assign new data</span><br> <span class="hljs-keyword">this</span>->ptr = dyingObj.ptr;<br> <span class="hljs-keyword">this</span>->refCount = dyingObj.refCount;<br> <span class="hljs-comment">// clean up</span><br> dyingObj.ptr = dyingObj.refCount = <span class="hljs-literal">nullptr</span>;<br> }<br><br> <span class="hljs-comment">// overload -> and *</span><br> T* <span class="hljs-keyword">operator</span>->() <span class="hljs-type">const</span>{<br> <span class="hljs-keyword">return</span> <span class="hljs-keyword">this</span>->ptr;<br> }<br><br> T& <span class="hljs-keyword">operator</span>*() <span class="hljs-type">const</span>{<br> <span class="hljs-keyword">return</span> <span class="hljs-keyword">this</span>->ptr;<br> }<br><br> <span class="hljs-function">uint <span class="hljs-title">get_count</span><span class="hljs-params">()</span> <span class="hljs-type">const</span></span>{<br> <span class="hljs-keyword">return</span> *refCount;<br> }<br> ~<span class="hljs-built_in">my_shared_ptr</span>(){<br> __cleanup__();<br> }<br>};<br></code></pre></td></tr></table></figure><h2 id="New-Version"><a href="#New-Version" class="headerlink" title="New Version"></a>New Version</h2><p>关于下面提到的一些概念可以去参考我的有关 UniquePtr 的博客文章,实现上可以参考一下<sup id="fnref:2" class="footnote-ref"><a href="#fn:2" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Smart-Pointer - Shared Pointer](https://lokiastari.com/blog/2015/01/15/c-plus-plus-by-example-smart-pointer-part-ii/index.html)">[2]</span></a></sup></p><h3 id="cleanup"><a href="#cleanup" class="headerlink" title="cleanup"></a>cleanup</h3><p>首先是关于<code>__cleanup__</code>,其中提到<code>delete</code>之前会去检查<code>ptr</code>是否为空,其实这边是多余的,因为<code>delete nullptr</code>并不是一个危险的行为</p><h3 id="Constructor-Failure"><a href="#Constructor-Failure" class="headerlink" title="Constructor Failure"></a>Constructor Failure</h3><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-comment">// default constructor</span><br><span class="hljs-built_in">my_shared_ptr</span>():<span class="hljs-built_in">ptr</span>(<span class="hljs-literal">nullptr</span>), <span class="hljs-built_in">refCount</span>(<span class="hljs-keyword">new</span> <span class="hljs-built_in">uint</span>(<span class="hljs-number">0</span>)){<br><br>}<br><br><span class="hljs-comment">// constructor</span><br><span class="hljs-built_in">my_shared_ptr</span>(T* ptr):<span class="hljs-built_in">ptr</span>(ptr), <span class="hljs-built_in">refCount</span>(<span class="hljs-keyword">new</span> <span class="hljs-built_in">uint</span>(<span class="hljs-number">1</span>)){<br><br>}<br></code></pre></td></tr></table></figure><p>在构造函数中使用了<code>new</code>来分配内存,而<code>new</code>如果分配内存失败会产生<code>std::bad_alloc</code>异常,如果程序在构造函数外抛出异常,那么析构函数不会被调用,那么就有可能会造成内存泄露。我们需要更正这个问题:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-function"><span class="hljs-keyword">explicit</span> <span class="hljs-title">my_shared_ptr</span><span class="hljs-params">(T* ptr)</span>: ptr(ptr), refCount(new (std::nothrow) int(<span class="hljs-number">1</span>)){</span><br> <span class="hljs-comment">// check if the pointer allocated</span><br> <span class="hljs-keyword">if</span>(refCount == <span class="hljs-literal">nullptr</span>){<br> <span class="hljs-comment">// If we failed then delete the pointer</span><br> <span class="hljs-comment">// and manually throw the exception</span><br> <span class="hljs-keyword">delete</span> data;<br> <span class="hljs-keyword">throw</span> std::<span class="hljs-built_in">bad_alloc</span>();<br> }<br>}<br></code></pre></td></tr></table></figure><p>在这面使用了<code>new</code>的 nothrow 版本,内存分配失败后其并不会抛出<code>std::bad_alloc</code>异常,而是返回<code>nullptr</code>,在构造函数中我们需要检验到这种情况,进行资源的回收,同时抛出异常(在 constructor 抛出异常要区别于在 constructor 外抛出异常)</p><p>同时使用<code>explicit</code>避免隐式转换,而由于<code>explicit</code>的添加,在这里为了处理<code>nullptr</code>并简化其使用,可以参考之前实现的<code>MyUniquePtr</code>,加入以<code>nullptr_t</code>类型为参数的构造函数和拷贝赋值函数。</p><p>而有了对<code>nullptr</code>的专门处理,可以删去一些计数增加前检查指针是否为空的逻辑。</p><h3 id="copy-and-swap-idiom"><a href="#copy-and-swap-idiom" class="headerlink" title="copy-and-swap idiom"></a>copy-and-swap idiom</h3><p><em>copy-and-swap idiom</em>可以在做到在减少重复代码的同时,又做到 <em>strong exception guarantee</em>.</p><p>首先需要实现自己的<code>swap</code>函数</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">swap</span><span class="hljs-params">(my_shared_ptr& other)</span> <span class="hljs-keyword">noexcept</span></span>{<br> std::<span class="hljs-built_in">swap</span>(ptr, other.ptr);<br> std::<span class="hljs-built_in">swap</span>(count, other.count);<br>}<br></code></pre></td></tr></table></figure><p>接着将<em>copy semantics</em>和 <em>move semantics</em> 使用<code>swap</code>进行改写</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-comment">/*** Copy Semantics ***/</span><br><span class="hljs-comment">// copy constructor</span><br><span class="hljs-built_in">my_shared_ptr</span>(<span class="hljs-type">const</span> my_shared_ptr & obj):<span class="hljs-built_in">ptr</span>(obj.ptr), <span class="hljs-built_in">refCount</span>(obj.refCount){<br> (*<span class="hljs-keyword">this</span>->refCount) ++;<br>}<br><span class="hljs-comment">// copy assignment</span><br>my_shared_ptr& <span class="hljs-keyword">operator</span>=(my_shared_ptr obj){<br>obj.<span class="hljs-built_in">swap</span>(*<span class="hljs-keyword">this</span>);<br> <span class="hljs-keyword">return</span> *<span class="hljs-keyword">this</span>;<br>}<br><br>my_shared_ptr& <span class="hljs-keyword">operator</span>=(T* newData){<br> myshared_ptr <span class="hljs-built_in">tmp</span>(newData);<br> tmp.<span class="hljs-built_in">swap</span>(*<span class="hljs-keyword">this</span>);<br> <span class="hljs-keyword">return</span> *<span class="hljs-keyword">this</span>;<br>}<br><br><br><span class="hljs-comment">/*** Move Semantics ***/</span><br><span class="hljs-comment">// move constructor</span><br><span class="hljs-built_in">my_shared_ptr</span>(myshared_ptr && dyingObj){<br> dyingObj.<span class="hljs-built_in">swap</span>(*)<br>}<br><br><span class="hljs-comment">// move assignment</span><br>my_shared_ptr& <span class="hljs-keyword">operator</span>=(my_shared_ptr &&dyingObj){<br>dyingObj.<span class="hljs-built_in">swap</span>(*<span class="hljs-keyword">this</span>);<br> <span class="hljs-keyword">return</span> *<span class="hljs-keyword">this</span>;<br>}<br></code></pre></td></tr></table></figure><p>这边一个值得注意的细节是 <em>copy semantics</em>的拷贝赋值函数的参数是 <em>pass by value</em>,如要非要使用引用可以参考下面的代码</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><code class="hljs c++">my_shared_ptr& <span class="hljs-keyword">operator</span>=(my_shared_ptr &obj){<br> my_shared_ptr <span class="hljs-built_in">tmp</span>(obj); <span class="hljs-comment">// use copy constructor to build a temporary object</span><br>tmp.<span class="hljs-built_in">swap</span>(*<span class="hljs-keyword">this</span>);<br> <span class="hljs-keyword">return</span> *<span class="hljs-keyword">this</span>;<br>}<br></code></pre></td></tr></table></figure><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><section class="footnotes"><div class="footnote-list"><ol><li><span id="fn:1" class="footnote-text"><span><a href="https://medium.com/analytics-vidhya/c-shared-ptr-and-how-to-write-your-own-d0d385c118ad">write your own shared_ptr</a><a href="#fnref:1" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:2" class="footnote-text"><span><a href="https://lokiastari.com/blog/2015/01/15/c-plus-plus-by-example-smart-pointer-part-ii/index.html">Smart-Pointer - Shared Pointer</a><a href="#fnref:2" rev="footnote" class="footnote-backref"> ↩</a></span></span></li></ol></div></section>]]></content>
<categories>
<category>C++</category>
</categories>
<tags>
<tag>C++</tag>
<tag>interview</tag>
</tags>
</entry>
<entry>
<title>BoundedBuffer C++ implementation</title>
<link href="/2021/09/29/BoundedBuffer-C-implementation/"/>
<url>/2021/09/29/BoundedBuffer-C-implementation/</url>
<content type="html"><![CDATA[<p>本文讲述了 <code>BoundedBuffer</code> 的 C++ 实现,而 <code>BoundedBuffer</code> 常常用于生产者和消费者模式中</p><p>在 boost 库的 <code>circular_buffer</code>文档中讲述了如何使用 <code>circular_buffer</code>来构建<code>BoundedBuffer</code><sup id="fnref:1" class="footnote-ref"><a href="#fn:1" rel="footnote"><span class="hint--top hint--rounded" aria-label="[boost circular_buffer](https://www.boost.org/doc/libs/1_37_0/libs/circular_buffer/doc/circular_buffer.html)">[1]</span></a></sup>,我基于 boost 库的例子做了个改写,代码如下</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-meta">#<span class="hljs-keyword">include</span> <span class="hljs-string"><mutex></span></span><br><span class="hljs-meta">#<span class="hljs-keyword">include</span> <span class="hljs-string"><thread></span></span><br><span class="hljs-meta">#<span class="hljs-keyword">include</span> <span class="hljs-string"><iostream></span></span><br><span class="hljs-meta">#<span class="hljs-keyword">include</span> <span class="hljs-string"><vector></span></span><br><span class="hljs-meta">#<span class="hljs-keyword">include</span> <span class="hljs-string"><condition_variable></span></span><br><span class="hljs-keyword">using</span> <span class="hljs-keyword">namespace</span> std;<br><br><span class="hljs-keyword">class</span> <span class="hljs-title class_">BoundedBuffer</span>{<br><span class="hljs-keyword">private</span>:<br> <span class="hljs-type">size_t</span> begin_;<br> <span class="hljs-type">size_t</span> end_;<br> <span class="hljs-type">size_t</span> buffered_;<br> vector<<span class="hljs-type">int</span>> circular_buffer_;<br> condition_variable not_full_cv_;<br> condition_variable not_empty_cv_;<br> mutex mutex_;<br><span class="hljs-keyword">public</span>:<br> <span class="hljs-built_in">BoundedBuffer</span>(<span class="hljs-type">size_t</span> size): <span class="hljs-built_in">begin_</span>(<span class="hljs-number">0</span>), <span class="hljs-built_in">end_</span>(<span class="hljs-number">0</span>), <span class="hljs-built_in">buffered_</span>(<span class="hljs-number">0</span>), <span class="hljs-built_in">circular_buffer_</span>(size){<br> circular_buffer_.<span class="hljs-built_in">reserve</span>(size);<br> }<br> <span class="hljs-function"><span class="hljs-type">int</span> <span class="hljs-title">produceData</span><span class="hljs-params">()</span></span>{<br> <span class="hljs-type">int</span> randomNumber = <span class="hljs-built_in">rand</span>() % <span class="hljs-number">10000</span>;<br> cout << <span class="hljs-string">"produce data : "</span> << randomNumber << endl;<br> <span class="hljs-keyword">return</span> randomNumber;<br> }<br><br> <span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">Produce</span><span class="hljs-params">()</span></span>{<br> <span class="hljs-keyword">while</span>(<span class="hljs-literal">true</span>) {<br> <span class="hljs-function">unique_lock<std::mutex> <span class="hljs-title">ul</span><span class="hljs-params">(mutex_)</span></span>;<br> not_full_cv_.<span class="hljs-built_in">wait</span>(ul, [=] { <span class="hljs-keyword">return</span> buffered_ < circular_buffer_.<span class="hljs-built_in">size</span>(); });<br> circular_buffer_[end_] = <span class="hljs-built_in">produceData</span>();<br> end_ = (end_ + <span class="hljs-number">1</span>) % circular_buffer_.<span class="hljs-built_in">size</span>();<br> ++buffered_;<br> ul.<span class="hljs-built_in">unlock</span>();<br> not_empty_cv_.<span class="hljs-built_in">notify_one</span>();<br> }<br> }<br><br> <span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">Consume</span><span class="hljs-params">()</span></span>{<br> <span class="hljs-keyword">while</span> (<span class="hljs-literal">true</span>) {<br> <span class="hljs-function">unique_lock<mutex> <span class="hljs-title">ul</span><span class="hljs-params">(mutex_)</span></span>;<br> not_empty_cv_.<span class="hljs-built_in">wait</span>(ul, [&]() { <span class="hljs-keyword">return</span> buffered_ > <span class="hljs-number">0</span>; });<br> <span class="hljs-type">int</span> n = circular_buffer_[begin_];<br> cout << <span class="hljs-string">"consume data : "</span> << n << endl;<br> begin_ = (begin_ + <span class="hljs-number">1</span>) % circular_buffer_.<span class="hljs-built_in">size</span>();<br> --buffered_;<br> ul.<span class="hljs-built_in">unlock</span>();<br> not_full_cv_.<span class="hljs-built_in">notify_one</span>();<br> }<br> }<br>};<br><br><span class="hljs-function">BoundedBuffer <span class="hljs-title">boundedBuffer</span><span class="hljs-params">(<span class="hljs-number">4</span>)</span></span>;<br><br><span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">ConsumerThread</span><span class="hljs-params">()</span></span>{<br> boundedBuffer.<span class="hljs-built_in">Consume</span>();<br>}<br><br><span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">ProducerThread</span><span class="hljs-params">()</span></span>{<br> boundedBuffer.<span class="hljs-built_in">Produce</span>();<br>}<br><br><span class="hljs-function"><span class="hljs-type">int</span> <span class="hljs-title">main</span><span class="hljs-params">(<span class="hljs-type">void</span>)</span></span>{<br> <span class="hljs-function">thread <span class="hljs-title">t2</span><span class="hljs-params">(ProducerThread)</span></span>;<br> <span class="hljs-function">thread <span class="hljs-title">t1</span><span class="hljs-params">(ConsumerThread)</span></span>;<br> t1.<span class="hljs-built_in">join</span>();<br> t2.<span class="hljs-built_in">join</span>();<br> <span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;<br>}<br><span class="hljs-comment">// another version of ConsumerThread and ProducerThread</span><br><span class="hljs-comment">// void ProducerThread(BoundedBuffer bb){</span><br><span class="hljs-comment">// bb.Produce();</span><br><span class="hljs-comment">// }</span><br><span class="hljs-comment">// void ConsumerThread(BoundedBuffer bb){</span><br><span class="hljs-comment">// bb.Consume();</span><br><span class="hljs-comment">// }</span><br><span class="hljs-comment">// int main(void){</span><br><span class="hljs-comment">// BoundedBuffer boundBuffer(4);</span><br><span class="hljs-comment">// thread t1(ProducerThread, ref(boundBuffer)), t2(ConsumerThread, ref(boundBuffer));</span><br><span class="hljs-comment">// t1.join;</span><br><span class="hljs-comment">// t2.join;</span><br><span class="hljs-comment">// return 0;</span><br><span class="hljs-comment">// }</span><br></code></pre></td></tr></table></figure><p>在实现中我定义了两个<code>condition_variable</code>的变量:<code>not_full_cv_</code>以及<code>not_empty_cv_</code></p><p>其实如果简单点使用一个<code>condtition_variable</code>也是可以的</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">Produce</span><span class="hljs-params">()</span></span>{<br> <span class="hljs-keyword">while</span>(<span class="hljs-literal">true</span>) {<br> <span class="hljs-function">unique_lock<std::mutex> <span class="hljs-title">ul</span><span class="hljs-params">(mutex_)</span></span>;<br> cv.<span class="hljs-built_in">wait</span>(ul, [=] { <span class="hljs-keyword">return</span> buffered_ < circular_buffer_.<span class="hljs-built_in">size</span>(); });<br> circular_buffer_[end_] = <span class="hljs-built_in">produceData</span>();<br> end_ = (end_ + <span class="hljs-number">1</span>) % circular_buffer_.<span class="hljs-built_in">size</span>();<br> ++buffered_;<br> ul.<span class="hljs-built_in">unlock</span>();<br> cv.<span class="hljs-built_in">notify_one</span>();<br> }<br>}<br><br><span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">Consume</span><span class="hljs-params">()</span></span>{<br> <span class="hljs-keyword">while</span> (<span class="hljs-literal">true</span>) {<br> <span class="hljs-function">unique_lock<mutex> <span class="hljs-title">ul</span><span class="hljs-params">(mutex_)</span></span>;<br> cv.<span class="hljs-built_in">wait</span>(ul, [&]() { <span class="hljs-keyword">return</span> buffered_ > <span class="hljs-number">0</span>; });<br> <span class="hljs-type">int</span> n = circular_buffer_[begin_];<br> cout << <span class="hljs-string">"consume data : "</span> << n << endl;<br> begin_ = (begin_ + <span class="hljs-number">1</span>) % circular_buffer_.<span class="hljs-built_in">size</span>();<br> --buffered_;<br> ul.<span class="hljs-built_in">unlock</span>();<br> cv.<span class="hljs-built_in">notify_one</span>();<br> }<br>}<br></code></pre></td></tr></table></figure><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><section class="footnotes"><div class="footnote-list"><ol><li><span id="fn:1" class="footnote-text"><span><a href="https://www.boost.org/doc/libs/1_37_0/libs/circular_buffer/doc/circular_buffer.html">boost circular_buffer</a><a href="#fnref:1" rev="footnote" class="footnote-backref"> ↩</a></span></span></li></ol></div></section>]]></content>
<categories>
<category>C++</category>
</categories>
<tags>
<tag>C++</tag>
<tag>interview</tag>
</tags>
</entry>
<entry>
<title>为什么不应该在构造和析构函数中调用虚函数</title>
<link href="/2021/09/14/%E4%B8%BA%E4%BB%80%E4%B9%88%E4%B8%8D%E5%BA%94%E8%AF%A5%E5%9C%A8%E6%9E%84%E9%80%A0%E5%92%8C%E6%9E%90%E6%9E%84%E5%87%BD%E6%95%B0%E4%B8%AD%E8%B0%83%E7%94%A8%E8%99%9A%E5%87%BD%E6%95%B0/"/>
<url>/2021/09/14/%E4%B8%BA%E4%BB%80%E4%B9%88%E4%B8%8D%E5%BA%94%E8%AF%A5%E5%9C%A8%E6%9E%84%E9%80%A0%E5%92%8C%E6%9E%90%E6%9E%84%E5%87%BD%E6%95%B0%E4%B8%AD%E8%B0%83%E7%94%A8%E8%99%9A%E5%87%BD%E6%95%B0/</url>
<content type="html"><![CDATA[<p>其实这来自于 《Effective C++》 Rule 09:</p><div class="note note-info"> <p>Never call virtual functions during construction or destruction</p> </div><p>这一点主要是因为构造函数和析构函数比较特殊</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-meta">#<span class="hljs-keyword">include</span> <span class="hljs-string"><bits/stdc++.h></span></span><br><span class="hljs-keyword">using</span> <span class="hljs-keyword">namespace</span> std;<br><br><span class="hljs-keyword">class</span> <span class="hljs-title class_">Base</span>{<br><span class="hljs-keyword">public</span>:<br> <span class="hljs-built_in">Base</span>(){<br> cout << <span class="hljs-string">"Call Base::constructor"</span> << endl;<br> }<br><br> <span class="hljs-keyword">virtual</span> ~<span class="hljs-built_in">Base</span>(){<br> cout << <span class="hljs-string">"Call Base::destructor"</span> << endl;<br> }<br>};<br><br><span class="hljs-keyword">class</span> <span class="hljs-title class_">Derived</span> : <span class="hljs-keyword">public</span> Base{<br><span class="hljs-keyword">public</span>:<br> <span class="hljs-built_in">Derived</span>(){<br> cout << <span class="hljs-string">"Call Derived::constructor"</span> << endl;<br> }<br> <span class="hljs-keyword">virtual</span> ~<span class="hljs-built_in">Derived</span>(){<br> cout << <span class="hljs-string">"Call Derived::destructor"</span> << endl;<br> }<br>};<br><br><span class="hljs-function"><span class="hljs-type">int</span> <span class="hljs-title">main</span><span class="hljs-params">(<span class="hljs-type">void</span>)</span></span>{<br> Base* d = <span class="hljs-keyword">new</span> Derived;<br> <span class="hljs-keyword">delete</span> d;<br>}<br></code></pre></td></tr></table></figure><p>结果如下:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><code class="hljs t">D:\Desktop\Study\course\cpp_wkspc\Leetcode\cmake-build-debug\Leetcode.exe<br>Call Base::constructor<br>Call Derived::constructor<br>Call Derived::destructor<br>Call Base::destructor<br><br>Process finished with exit code 0<br></code></pre></td></tr></table></figure><p>子类<code>Derived</code>对象构造时:</p><ul><li>先调用基类<code>Base</code>构造函数</li><li>再调用<code>Derived</code>构造函数</li></ul><p>而<code>Derived</code>对象析构时:</p><ul><li>先调用子类<code>Derived</code>析构函数</li><li>再调用基类<code>Base</code>析构函数</li></ul><p>所以如果在构造函数中调用虚函数:</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-keyword">class</span> <span class="hljs-title class_">Base</span>{<br><span class="hljs-keyword">public</span>:<br> <span class="hljs-built_in">Base</span>(){<br> <span class="hljs-built_in">TestConstruct</span>();<br> cout << <span class="hljs-string">"Call Base::constructor"</span> << endl;<br> }<br> <span class="hljs-function"><span class="hljs-keyword">virtual</span> <span class="hljs-type">void</span> <span class="hljs-title">TestConstruct</span><span class="hljs-params">()</span></span>{<br> cout << <span class="hljs-string">"Call Base::TestConstruct"</span> << endl;<br> }<br> <span class="hljs-function"><span class="hljs-keyword">virtual</span> <span class="hljs-type">void</span> <span class="hljs-title">TestDestruct</span><span class="hljs-params">()</span></span>{<br> cout << <span class="hljs-string">"Call Base::TestDestruct"</span> << endl;<br> };<br> <span class="hljs-keyword">virtual</span> ~<span class="hljs-built_in">Base</span>(){<br> <span class="hljs-built_in">TestDestruct</span>();<br> cout << <span class="hljs-string">"Call Base::destructor"</span> << endl;<br> }<br>};<br><br><span class="hljs-keyword">class</span> <span class="hljs-title class_">Derived</span> : <span class="hljs-keyword">public</span> Base{<br><span class="hljs-keyword">public</span>:<br> <span class="hljs-built_in">Derived</span>(){<br> <span class="hljs-built_in">TestConstruct</span>();<br> cout << <span class="hljs-string">"Call Derived::constructor"</span> << endl;<br> }<br> <span class="hljs-function"><span class="hljs-keyword">virtual</span> <span class="hljs-type">void</span> <span class="hljs-title">TestConstruct</span><span class="hljs-params">()</span></span>{<br> cout << <span class="hljs-string">"Call Derived::TestConstruct"</span> << endl;<br> }<br> <span class="hljs-function"><span class="hljs-keyword">virtual</span> <span class="hljs-type">void</span> <span class="hljs-title">TestDestruct</span><span class="hljs-params">()</span></span>{<br> cout << <span class="hljs-string">"Call Derived::TestDestruct"</span> << endl;<br> };<br> <span class="hljs-keyword">virtual</span> ~<span class="hljs-built_in">Derived</span>(){<br> <span class="hljs-built_in">TestDestruct</span>();<br> cout << <span class="hljs-string">"Call Derived::destructor"</span> << endl;<br> }<br>};<br><br><span class="hljs-function"><span class="hljs-type">int</span> <span class="hljs-title">main</span><span class="hljs-params">(<span class="hljs-type">void</span>)</span></span>{<br> Base* d1 = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Base</span>();<br> <span class="hljs-keyword">delete</span> d1;<br> cout << <span class="hljs-string">"-----------------------------------"</span><<endl;<br> Base* d2 = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Derived</span>();<br> <span class="hljs-keyword">delete</span> d2;<br>}<br></code></pre></td></tr></table></figure><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><code class="hljs t">D:\Desktop\Study\course\cpp_wkspc\Leetcode\cmake-build-debug\Leetcode.exe<br>Call Base::TestConstruct<br>Call Base::constructor<br>Call Base::TestDestruct<br>Call Base::destructor<br>-----------------------------------<br>Call Base::TestConstruct<br>Call Base::constructor<br>Call Derived::TestConstruct<br>Call Derived::constructor<br>Call Derived::TestDestruct<br>Call Derived::destructor<br>Call Base::TestDestruct<br>Call Base::destructor<br><br>Process finished with exit code 0<br></code></pre></td></tr></table></figure><p>所以如果我们想要在构造函数和析构函数中为子类和父类调用不同版本的虚函数,可能会失望:因为子类实际上会一共执行两个版本的虚函数</p><p>这样的行为背后的逻辑是<code>Derived</code>对象的<code>Base class</code> 构造期间,对象的类型是<code>Base</code>而不是<code>Derived</code>,所以 virtual function 会被编译器解析至<code>Base</code>.这是非常合理的,因为<code>Derived</code>中的虚函数绝大多数都会使用到属于<code>Derived</code>部分的成员变量,而那些成员变量在此刻尚未初始化,使用未初始化的成员对象存在风险。</p><p>而对于析构函数,进入<code>Base</code>析构函数后,对象的类型也被视作<code>Base</code>而非<code>Derived</code>.由于<code>Derived</code>的析构函数会先于<code>Base</code>析构函数执行,当进入<code>Base</code>析构函数时,<code>Derived</code>部分的成员变量应当已经被析构而呈现未定义值,所以也不应该使用它们。</p>]]></content>
<tags>
<tag>C++</tag>
<tag>Effective C++</tag>
<tag>构造函数</tag>
<tag>析构函数</tag>
<tag>虚函数</tag>
</tags>
</entry>
<entry>
<title>Is KMP always faster?</title>
<link href="/2021/09/07/Is-KMP-always-faster/"/>
<url>/2021/09/07/Is-KMP-always-faster/</url>
<content type="html"><![CDATA[<p>本文比对了 KMP 以及 C++ <code>string</code> 的<code>find</code>方法在检查 a 是否为 b 子串这一问题上的效率。</p><p>尽管字符串长度已经达到了 10<sup>4</sup> 这一量级,但是在 test case 数量为 10<sup>5</sup> 时,KMP 表现仍没有<code>find</code>好.</p><p>首先,毫无疑问,从时间复杂度上分析,KMP 确实胜于 <code>find</code>的暴力匹配,但是 KMP 会在 build next table 上花费比较多的时间,在字符串长度仍不够大时,并无法体现它的优势。</p><p>同时在我的测试情景中,next 表 Build 完毕后只会在一个测试用例中使用,无疑是浪费了。</p><p>而<code>find</code>方法只所以实现上采用暴力匹配,也是因为其使用于更 common 的场景,而 KMP 也需要分配额外的内存来进行 next 表的构建</p><figure class="highlight c++"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br></pre></td><td class="code"><pre><code class="hljs c++"><span class="hljs-meta">#<span class="hljs-keyword">include</span> <span class="hljs-string"><iostream></span></span><br><span class="hljs-meta">#<span class="hljs-keyword">include</span> <span class="hljs-string"><string></span></span><br><span class="hljs-meta">#<span class="hljs-keyword">include</span> <span class="hljs-string"><regex></span></span><br><span class="hljs-meta">#<span class="hljs-keyword">include</span> <span class="hljs-string"><ctime></span></span><br><span class="hljs-meta">#<span class="hljs-keyword">include</span> <span class="hljs-string"><unistd.h></span></span><br><span class="hljs-meta">#<span class="hljs-keyword">include</span> <span class="hljs-string"><chrono></span></span><br><span class="hljs-meta">#<span class="hljs-keyword">define</span> CASESNUM 100000</span><br><span class="hljs-keyword">using</span> <span class="hljs-keyword">namespace</span> std;<br><span class="hljs-function"><span class="hljs-type">bool</span> <span class="hljs-title">checkSubstring</span><span class="hljs-params">(string a, string b)</span></span>;<br>vector<pair<string, string>> <span class="hljs-built_in">generateCases</span>(<span class="hljs-type">int</span> len);<br><br><span class="hljs-keyword">class</span> <span class="hljs-title class_">Solution</span>{<br><span class="hljs-keyword">public</span>:<br> vector<<span class="hljs-type">int</span>> next;<br> string p;<br> <span class="hljs-type">int</span> m;<br> <span class="hljs-built_in">Solution</span>(string pattern){<br> p = pattern;<br> m = p.<span class="hljs-built_in">size</span>();<br> p.<span class="hljs-built_in">insert</span>(p.<span class="hljs-built_in">begin</span>(),<span class="hljs-string">' '</span>);<br> next.<span class="hljs-built_in">reserve</span>(m + <span class="hljs-number">1</span>);<br> <span class="hljs-comment">//预处理next数组</span><br> <span class="hljs-keyword">for</span>(<span class="hljs-type">int</span> i = <span class="hljs-number">2</span>, j = <span class="hljs-number">0</span>; i <= m; i++){<br> <span class="hljs-keyword">while</span>(j <span class="hljs-keyword">and</span> p[i] != p[j + <span class="hljs-number">1</span>]) j = next[j];<br> <span class="hljs-keyword">if</span>(p[i] == p[j + <span class="hljs-number">1</span>]) j++;<br> next[i] = j;<br> }<br> }<br><br> <span class="hljs-function"><span class="hljs-type">bool</span> <span class="hljs-title">kmpCheckSubstring</span><span class="hljs-params">(string s)</span></span>{<br> <span class="hljs-type">int</span> n = s.<span class="hljs-built_in">size</span>();<br> <span class="hljs-keyword">if</span>(m == <span class="hljs-number">0</span>){<br> <span class="hljs-keyword">return</span> <span class="hljs-literal">true</span>;<br> }<br> <span class="hljs-comment">//设置哨兵</span><br> s.<span class="hljs-built_in">insert</span>(s.<span class="hljs-built_in">begin</span>(),<span class="hljs-string">' '</span>);<br> <span class="hljs-comment">//匹配过程</span><br> <span class="hljs-keyword">for</span>(<span class="hljs-type">int</span> i = <span class="hljs-number">1</span>, j = <span class="hljs-number">0</span>; i <= n; i++){<br> <span class="hljs-keyword">while</span>(j <span class="hljs-keyword">and</span> s[i] != p[j + <span class="hljs-number">1</span>]) j = next[j];<br> <span class="hljs-keyword">if</span>(s[i] == p[j + <span class="hljs-number">1</span>]) j++;<br> <span class="hljs-keyword">if</span>(j == m) <span class="hljs-keyword">return</span> <span class="hljs-literal">true</span>;<br> }<br> <span class="hljs-keyword">return</span> <span class="hljs-literal">false</span>;<br> }<br>};<br><br><span class="hljs-keyword">template</span> <<span class="hljs-keyword">class</span> <span class="hljs-title class_">TimeT</span> = std::chrono::milliseconds,<br> <span class="hljs-keyword">class</span> ClockT = std::chrono::steady_clock><br> <span class="hljs-keyword">class</span> Timer{<br> <span class="hljs-keyword">using</span> <span class="hljs-type">timep_t</span> = <span class="hljs-keyword">typename</span> ClockT::time_point;<br> <span class="hljs-type">timep_t</span> _start = ClockT::<span class="hljs-built_in">now</span>(), _end = {};<br><br> <span class="hljs-keyword">public</span>:<br> <span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">tick</span><span class="hljs-params">()</span> </span>{<br> _end = <span class="hljs-type">timep_t</span>{};<br> _start = ClockT::<span class="hljs-built_in">now</span>();<br> }<br><br> <span class="hljs-function"><span class="hljs-type">void</span> <span class="hljs-title">tock</span><span class="hljs-params">()</span> </span>{ _end = ClockT::<span class="hljs-built_in">now</span>(); }<br><br> <span class="hljs-keyword">template</span> <<span class="hljs-keyword">class</span> <span class="hljs-title class_">TT</span> = TimeT><br> TT <span class="hljs-built_in">duration</span>() <span class="hljs-type">const</span> {<br> <span class="hljs-keyword">return</span> std::chrono::<span class="hljs-built_in">duration_cast</span><TT>(_end - _start);<br> }<br> };<br><span class="hljs-function"><span class="hljs-type">int</span> <span class="hljs-title">main</span><span class="hljs-params">(<span class="hljs-type">void</span>)</span></span>{<br> string a, b;<br> <span class="hljs-built_in">srand</span>( (<span class="hljs-type">unsigned</span>) <span class="hljs-built_in">time</span>(<span class="hljs-literal">NULL</span>) * <span class="hljs-built_in">getpid</span>());<br> Timer clock1, clock2;<br><br> vector<pair<string, string>> cases = <span class="hljs-built_in">generateCases</span>(CASESNUM);<br> clock1.<span class="hljs-built_in">tick</span>();<br> <span class="hljs-keyword">for</span>(<span class="hljs-type">int</span> i = <span class="hljs-number">0</span>; i < CASESNUM; i++){<br> <span class="hljs-built_in">checkSubstring</span>(cases[i].first, cases[i].second);<br> }<br> clock1.<span class="hljs-built_in">tock</span>();<br> cout << <span class="hljs-string">"Run time = "</span> << clock1.<span class="hljs-built_in">duration</span>().<span class="hljs-built_in">count</span>() << <span class="hljs-string">" ms\n"</span>;<br> clock2.<span class="hljs-built_in">tick</span>();<br> <span class="hljs-function">Solution <span class="hljs-title">solution</span><span class="hljs-params">(cases[<span class="hljs-number">0</span>].second)</span></span>;<br> <span class="hljs-keyword">for</span>(<span class="hljs-type">int</span> i = <span class="hljs-number">0</span>; i < CASESNUM; i++){<br> solution.<span class="hljs-built_in">kmpCheckSubstring</span>(cases[i].first);<br> }<br> clock2.<span class="hljs-built_in">tock</span>();<br> cout << <span class="hljs-string">"Run time = "</span> << clock2.<span class="hljs-built_in">duration</span>().<span class="hljs-built_in">count</span>() << <span class="hljs-string">" ms\n"</span>;<br>}<br><span class="hljs-function">std::string <span class="hljs-title">gen_random</span><span class="hljs-params">(<span class="hljs-type">const</span> <span class="hljs-type">int</span> len)</span> </span>{<br> std::string tmp_s;<br> <span class="hljs-type">static</span> <span class="hljs-type">const</span> <span class="hljs-type">char</span> alphanum[] =<br> <span class="hljs-string">"0123456789"</span><br> <span class="hljs-string">"ABCDEFGHIJKLMNOPQRSTUVWXYZ"</span><br> <span class="hljs-string">"abcdefghijklmnopqrstuvwxyz"</span>;<br> tmp_s.<span class="hljs-built_in">reserve</span>(len);<br> <span class="hljs-keyword">for</span> (<span class="hljs-type">int</span> i = <span class="hljs-number">0</span>; i < len; ++i)<br> tmp_s += alphanum[<span class="hljs-built_in">rand</span>() % (<span class="hljs-built_in">sizeof</span>(alphanum) - <span class="hljs-number">1</span>)];<br> <span class="hljs-keyword">return</span> tmp_s;<br><br>}<br><br><br>vector<pair<string, string>> <span class="hljs-built_in">generateCases</span>(<span class="hljs-type">int</span> len){<br> vector<pair<string, string>> cases;<br> cases.<span class="hljs-built_in">reserve</span>(len);<br> <span class="hljs-type">int</span> substringLen = <span class="hljs-built_in">rand</span>() % <span class="hljs-number">20000</span> + <span class="hljs-number">10000</span>;<br> string substring = <span class="hljs-built_in">gen_random</span>(substringLen);<br> <span class="hljs-keyword">while</span>(len--){<br> <span class="hljs-type">int</span> choose = <span class="hljs-built_in">rand</span>() % <span class="hljs-number">2</span>;<br> <span class="hljs-keyword">if</span>(choose){<br> <span class="hljs-type">int</span> prefixLen = <span class="hljs-built_in">rand</span>() % <span class="hljs-number">10000</span> + <span class="hljs-number">80000</span>;<br> <span class="hljs-type">int</span> suffixLen = <span class="hljs-built_in">rand</span>() % <span class="hljs-number">10000</span> + <span class="hljs-number">80000</span>;<br> string prefixString = <span class="hljs-built_in">gen_random</span>(prefixLen),<br> suffixString = <span class="hljs-built_in">gen_random</span>(suffixLen);<br> cases.<span class="hljs-built_in">emplace_back</span>(prefixString + substring + suffixString, substring);<br> }<span class="hljs-keyword">else</span>{<br> <span class="hljs-type">int</span> anotherStringLen = <span class="hljs-built_in">rand</span>() % <span class="hljs-number">20000</span> + <span class="hljs-number">10000</span>;<br> cases.<span class="hljs-built_in">emplace_back</span>(substring, <span class="hljs-built_in">gen_random</span>(anotherStringLen));<br> }<br> }<br> <span class="hljs-keyword">return</span> cases;<br>}<br><br><span class="hljs-comment">// check if a is substring of b;</span><br><span class="hljs-function"><span class="hljs-type">bool</span> <span class="hljs-title">checkSubstring</span><span class="hljs-params">(string b, string a)</span></span>{<br> <span class="hljs-comment">// regex subregex(a);</span><br> <span class="hljs-comment">// return regex_search(b, subregex);</span><br> <span class="hljs-keyword">return</span> b.<span class="hljs-built_in">find</span>(a) != b.npos;<br>}<br></code></pre></td></tr></table></figure>]]></content>
<tags>
<tag>C++</tag>
<tag>KMP</tag>
<tag>algorithm</tag>
</tags>
</entry>
<entry>
<title>How https work?</title>
<link href="/2021/09/02/How-https-work/"/>
<url>/2021/09/02/How-https-work/</url>
<content type="html"><![CDATA[<p>讲 https 的文章其实已经挺多了,在这里不赘述了,可以参考 Cloudflare 官网上的这篇文章<sup id="fnref:1" class="footnote-ref"><a href="#fn:1" rel="footnote"><span class="hint--top hint--rounded" aria-label="[What happens in a YLS handshake?](https://www.cloudflare.com/zh-cn/learning/ssl/what-happens-in-a-tls-handshake/)">[1]</span></a></sup>以及 <strong>The First Few Milliseconds of an HTTPS Connection</strong><sup id="fnref:2" class="footnote-ref"><a href="#fn:2" rel="footnote"><span class="hint--top hint--rounded" aria-label="[The First Few Milliseconds of an HTTPS Connection](http://www.moserware.com/2009/06/first-few-milliseconds-of-https.html)">[2]</span></a></sup></p><p>在这里主要是讲讲在 TLS 中为什么需要 Client Random 以及 Server Random. </p><h2 id="导言"><a href="#导言" class="headerlink" title="导言"></a>导言</h2><p>首先先从 https 如何防止 MITM(Man-In-The-Middle Attack, 即中间人攻击)说起。</p><p>一旦 Client 和 Server 完成 TLS 握手,接下来的会话都会通过商议得到的 master_secret 进行对称加密, 只要保证 master_secret 的安全性,那么通信过程的机密性就能得到保障。</p><p>对称加密的问题在于 <a href="https://en.wikipedia.org/wiki/Key_exchange">Key exchange</a>,如果 master_secret 被窃取,那么通信报文可以随意被 attcker 加解密,所以这也是为什么 https 会在 TLS 握手阶段用到非对称加密。</p><p>在 TLS 握手阶段,client 会使用 server 的 public key 来加密 pre_master_secret (client 选取的随机字符串),MITM Attacker 由于没有 server 的 private key,所以无法获取 pre_master_master. 而由于 master_secret 由 pre_master_secret、 Client Random 以及 Server Random 构成,attacker 也无法获取到最终加密通信使用的 master secret.</p><figure class="highlight arcade"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs arcade">master_secret = PRF(pre_master_secret, <br> <span class="hljs-string">"master secret"</span>, <br> ClientHello.<span class="hljs-built_in">random</span> + ServerHello.<span class="hljs-built_in">random</span>)<br></code></pre></td></tr></table></figure><p><em>PRF : Pesudo Random Function</em></p><h3 id="Does-DNS-posisoning-compromise-TLS"><a href="#Does-DNS-posisoning-compromise-TLS" class="headerlink" title="Does DNS posisoning compromise TLS?"></a>Does DNS posisoning compromise TLS?</h3><p>这是我原先的一个困惑:DNS posioning 会不会让 client 误以为自己在跟真实的 server 通信从而损害 TLS 的安全性?<sup id="fnref:3" class="footnote-ref"><a href="#fn:3" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Can a HTTPS connection be compromised because of a rogue DNS server](https://security.stackexchange.com/questions/3857/can-a-https-connection-be-compromised-because-of-a-rogue-dns-server)">[3]</span></a></sup></p><p>其实这本质上是跟 TLS 如何防止 MITM 是一样的,在这里 certificate 发挥着很关键的作用。</p><p>如果 DNS 被劫持,client 的请求将重定向到 fake IP,那么:</p><ul><li>Attacker 不做任何伪装,如果我们在使用浏览器浏览网站内容,很明显能够发现与 original host 的不同</li><li>如果 attacker 复刻原网址的内容,由于使用 https, client 会要求 server 提供 certficate, 考虑两种情况 <ul><li>attacker 将 original host 的 certificate 提供给 client, 由于 attacker 不知道 private key,attacker 将无法完成跟 client 的 TLS handshake </li><li>attacker 可能会考虑去伪造 certificate, 比如 Self-Signed Certificate(自签名证书),我们使用 OpenSSL 命令就可以签发,但是由于不被浏览器信任的 CA 签名认证(所谓签名认证,即用 CA 的 private key 对证书打包的信息进行加密),所以浏览器会发出告警,提示为不安全 (虽然根据我的日常使用经验来说,很多人会点继续信任…)</li></ul></li></ul><p>证书体系严重依赖 CA,所以如果 client 信任了一个不值得信任的 CA,比如 attacker 偷取了 CA 的密钥或者 CA 本身就是恶意的,理论上它可以随意签发证书来进行欺骗;或者 Attacker 攻击 client,在 client 浏览器信任的根证书中注入了 fake CA,那么 Attacker 就可以针对被攻击的 client 随意产生证书<sup id="fnref:4" class="footnote-ref"><a href="#fn:4" rel="footnote"><span class="hint--top hint--rounded" aria-label="[SSL and man-in-the-middle misunderstanding](https://stackoverflow.com/questions/14907581/ssl-and-man-in-the-middle-misunderstanding)">[4]</span></a></sup></p><h2 id="没有被加密传输的-Random-到底扮演着什么样的角色?"><a href="#没有被加密传输的-Random-到底扮演着什么样的角色?" class="headerlink" title="没有被加密传输的 Random 到底扮演着什么样的角色?"></a>没有被加密传输的 Random 到底扮演着什么样的角色?</h2><p>前面提到 master_secret 其实是由 pre_master_secret、Client Random 以及 Server Random,而在 client Random 以及 server Random 传输时没有被加密保护,它们都是可以被窃听的,master_secret 的可靠性主要依赖于被 public key 加密传输的 pre_master_secret,既然如此,传输 Client Random 以及 Server Random 是否是多余的操作?<sup id="fnref:5" class="footnote-ref"><a href="#fn:5" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Why does the SSL/TLS handshake have a client and server random?](https://security.stackexchange.com/questions/89383/why-does-the-ssl-tls-handshake-have-a-client-and-server-random)">[5]</span></a></sup></p><h3 id="为什么使用-Server-Random"><a href="#为什么使用-Server-Random" class="headerlink" title="为什么使用 Server Random"></a>为什么使用 Server Random</h3><p>如果没有 Server Random,意味着 key generation 完全依赖 client generated values(pre_master_secret 是由 client 产生决定的), 会使得 client 容易遭受 <a href="https://en.wikipedia.org/wiki/Replay_attack">replay attack</a>.</p><p>Attacker 完全不需要知道 client 加密发送的 pre_master_secret 具体是什么:它可以向 server 先发送相同的 Client Random,然后把加密后的 pre_master_secret 原封不动地发给 server,那么 attacker 与 server 的通信将使用跟 client 相同的 master_secret. 在这里 attacker 并不知道 master_secret 具体是什么(因为 pre_master_secret 并没有真正被 attacker 破解获取),这也意味着 attacker 不能使用 master_secret 伪造新的报文,但是它可以把 client 的 encrypted traffic 原封不动地发送给 server ,给你带来一些意向不到的麻烦(比如你发现自己莫名其妙地买了原先数量十倍的商品)<sup id="fnref:6" class="footnote-ref"><a href="#fn:6" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Why using the premaster secret directly would be vulnerable to replay attack?](https://security.stackexchange.com/questions/218491/why-using-the-premaster-secret-directly-would-be-vulnerable-to-replay-attack)">[6]</span></a></sup></p><p>如果加入 Server Random,每段连接会有不同的 Server Random,最后会有完全不同的 master_secret,attack 就不能使用 client 的 encrypted traffic 干坏事了。</p><div class="note note-info"> <p>这边的 replay attack 容易产生一个新的疑问:Server Random 不同是因为 attacker 跟 server 创建了新的连接,如果 attacker 在受害者 handshake 后使用它的连接来重复发送 encrypted packets 该怎么办?在这种情况下,似乎即便有 master_secret 也完全无法防范这种情况…</p><p>SE 上这个问题回答的评论区也有同样的疑问<sup id="fnref:6" class="footnote-ref"><a href="#fn:6" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Why using the premaster secret directly would be vulnerable to replay attack?](https://security.stackexchange.com/questions/218491/why-using-the-premaster-secret-directly-would-be-vulnerable-to-replay-attack)">[6]</span></a></sup>: <strong>Why would the MITM need a new connection to replay the attack?</strong></p><blockquote><p>That could almost be a new question in itself, but the short answer is that the MAC for each record includes a sequence number, so re-sending the first record after the 1000th record would cause the MAC verification to fail. This also prevents an attacker from arbitrarily reordering, duplicating, or dropping records without causing the connection to fail. </p></blockquote> </div><p><em>还有一点需要注意,TLS 本身并不会阻止 client 来 replay a request. Server 应该尝试从 application level 上去解决这个问题(比如给你的购买操作编号或者其他一些能做到幂等性的保障措施). 这些独立于 TLS 本身,但是却也能在上面提到的场景中阻止 attacker 干坏事<sup id="fnref:8" class="footnote-ref"><a href="#fn:8" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Are SSL encrypted requests vulnerable to Replay Attacks?](https://security.stackexchange.com/questions/20105/are-ssl-encrypted-requests-vulnerable-to-replay-attacks)">[8]</span></a></sup></em></p><h3 id="为什么使用-Client-Random"><a href="#为什么使用-Client-Random" class="headerlink" title="为什么使用 Client Random"></a>为什么使用 Client Random</h3><p>根据在 SE 上的一篇回答的评论区,不使用 Client Random 不会直接对 TLS 造成威胁<sup id="fnref:7" class="footnote-ref"><a href="#fn:7" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Why does the SSL/TLS handshake have a client random?](https://security.stackexchange.com/questions/157684/why-does-the-ssl-tls-handshake-have-a-client-random)">[7]</span></a></sup>:pre_master_secret 来自于 client, 而 Server Random 来自于 server, 对于会话的双方来说 master_secret 仍然是随机的。</p><p>而对于高票回答的<em>feed obsolete information</em>,如何 replay 我不是太能理解,正如前面所说,如果 attacker 不能复用 handshake 后的连接,在新的连接中,attacker 是完全无法预测到 client 会发送什么样的 pre_master_secret</p><h2 id="补充"><a href="#补充" class="headerlink" title="补充"></a>补充</h2><p>前面提到的主要是针对 RSA,DHE 有些不太一样,两者之间的区别可以参考一下:</p><ul><li><a href="https://www.cloudflare.com/zh-cn/learning/ssl/what-happens-in-a-tls-handshake/">What is keyless SSL?</a></li><li><a href="https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange#Description">Diffie–Hellman key exchange</a></li><li><a href="https://security.stackexchange.com/questions/45963/diffie-hellman-key-exchange-in-plain-english">Diffie-Hellman Key Exchange algorithm in plain English</a></li></ul><p>Client Random 和 Server Random 其实是 g (public prime base) 以及 p (public prime modules)</p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><section class="footnotes"><div class="footnote-list"><ol><li><span id="fn:1" class="footnote-text"><span><a href="https://www.cloudflare.com/zh-cn/learning/ssl/what-happens-in-a-tls-handshake/">What happens in a YLS handshake?</a><a href="#fnref:1" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:2" class="footnote-text"><span><a href="http://www.moserware.com/2009/06/first-few-milliseconds-of-https.html">The First Few Milliseconds of an HTTPS Connection</a><a href="#fnref:2" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:3" class="footnote-text"><span><a href="https://security.stackexchange.com/questions/3857/can-a-https-connection-be-compromised-because-of-a-rogue-dns-server">Can a HTTPS connection be compromised because of a rogue DNS server</a><a href="#fnref:3" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:4" class="footnote-text"><span><a href="https://stackoverflow.com/questions/14907581/ssl-and-man-in-the-middle-misunderstanding">SSL and man-in-the-middle misunderstanding</a><a href="#fnref:4" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:5" class="footnote-text"><span><a href="https://security.stackexchange.com/questions/89383/why-does-the-ssl-tls-handshake-have-a-client-and-server-random">Why does the SSL/TLS handshake have a client and server random?</a><a href="#fnref:5" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:6" class="footnote-text"><span><a href="https://security.stackexchange.com/questions/218491/why-using-the-premaster-secret-directly-would-be-vulnerable-to-replay-attack">Why using the premaster secret directly would be vulnerable to replay attack?</a><a href="#fnref:6" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:7" class="footnote-text"><span><a href="https://security.stackexchange.com/questions/157684/why-does-the-ssl-tls-handshake-have-a-client-random">Why does the SSL/TLS handshake have a client random?</a><a href="#fnref:7" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:8" class="footnote-text"><span><a href="https://security.stackexchange.com/questions/20105/are-ssl-encrypted-requests-vulnerable-to-replay-attacks">Are SSL encrypted requests vulnerable to Replay Attacks?</a><a href="#fnref:8" rev="footnote" class="footnote-backref"> ↩</a></span></span></li></ol></div></section>]]></content>
<tags>
<tag>https</tag>
</tags>
</entry>
<entry>
<title>Consistency and consensus</title>
<link href="/2021/07/31/Consistency-and-consensus/"/>
<url>/2021/07/31/Consistency-and-consensus/</url>
<content type="html"><![CDATA[<h2 id="Introduction"><a href="#Introduction" class="headerlink" title="Introduction"></a>Introduction</h2><p>在分布式系统中我们提到的 consistency 不同于 ACID 的 consistency (或者说是 correctness), 它描述了在 replicated database 中我们在某一时刻在不同的节点看到的数据状态应该是怎么样的。</p><p>在最为理想的情况下,如果系统提供的是强一致性的保证,针对某一个数据项的更新操作成功后,那么所有的用户都可以立刻读取到更新后的值。但是许多分布式数据库系统为了性能考虑在默认情况下提供的都是 *eventual consistency(最终一致性)*,这表明数据库的不一致状态会<strong>暂时</strong>地持续一段时间,到最后会 <em>converge</em>,变为一致。</p><p>但是 DDIA 中提到这其实是个很弱的保证,因为它没有给出 <em><strong>when the convergence will happen</strong></em> ,所以说如果你写了一个值,立马去读取既有可能得到的是旧值,也有可能得到的是新值,从不一致到一致的时间差往往取决于 network delay,但如果 network delay unbounded,这其实是一件很麻烦的事情。</p><h2 id="Linearizability"><a href="#Linearizability" class="headerlink" title="Linearizability"></a>Linearizability</h2><p> eventually consistent database 中,最大的困惑之处在于在某一时刻你去问不同的副本同一个数据项的值,你可能会得到两个不同的回答(主要原因可能是 replication lag,在更新的日志信息异步复制给其他副本时,一部分已经更新成功了,另一部分还没有收到更新信息). 所以在这里引出了一个更强一点的一致性模型:<em>Linearizability (也称为 atomic consistency | strong consistency | immediate consistency | external consistency)</em> </p><img src="/2021/07/31/Consistency-and-consensus/fig9-1.png" class="" title="This system is not linearizable,causing football fans to be confused"><p><em>Linearizability</em> 的基本思路是给使用者提供了一种 <em>a single copy of data</em> 的错觉,在上图提到的例子中,Alice 读取到了新的值,那么 Bod 去做读取,他期望得到的结果是至少跟 Alice 一样新的,而在这里返回旧值的行为其实违背了 Linearizability.</p><p>通俗地来解释 Linearizability ,它提供了两个性质:</p><ul><li><p>一旦一个写请求完成,后续的读请求应该都返回那个写请求的值</p></li><li><p>一旦某个读请求返回了写入的新值,后续的读请求也应该返回写入的新值。</p><div class="note note-secondary"> <p>有了第一个性质,第二个性质像是废话。其实不是这样的的,因为理想的 a single copy of data 中, <em>write request should be instantaneous</em>,但是对整个系统而言, write request 从发出请求到收到回应的窗口期中,read request 是有可能和 write request 重叠的,我们不希望 read request 由于路由到不同的副本节点出现 <em>flip back and forth between the new and old value</em> 的情况,所以我们需要新的限制来更好地描述 “a single copy of data” 的行为</p> </div><img src="/2021/07/31/Consistency-and-consensus/fig9-3.png" class="" title="After any one read has returned the new value, all following reads (on the same or other clients) must also return the new value."></li></ul><p> Herlihy 和 Wing 在它们的<a href="http://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf">论文</a>中更准确地描述了 Linearizability 的正式定义</p><h3 id="Linearizability-versus-Serializability"><a href="#Linearizability-versus-Serializability" class="headerlink" title="Linearizability versus Serializability"></a>Linearizability versus Serializability</h3><p>这一点可以参考一下 Peter Bailis 的这篇文章<sup id="fnref:1" class="footnote-ref"><a href="#fn:1" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Linearizability versus Serializability](http://www.bailis.org/blog/linearizability-versus-serializability/)">[1]</span></a></sup></p><div class="note note-info"> <p><em>Linearizability is a guarantee about <strong>single operations on single objects</strong>.</em> It provides a real-time (i.e., wall-clock) guarantee on the behavior of a set of single operations (often reads and writes) on a single object (e.g., distributed register or data item).</p> </div><p>Linearizability 强调的是一种 <em><strong>recency guarantee</strong></em></p><div class="note note-info"> <p><em>Serializability is a guarantee about transactions, or <strong>groups of one or more operations over one or more objects</strong>.</em> It guarantees that the execution of a set of transactions (usually containing read and write operations) over multiple items is equivalent to <em>some</em> serial execution (total ordering) of the transactions.</p> </div><p>Serializability 强调的是 <em>avoid race conditions</em>, 更多地属于 <em><strong>concurrency guarantee</strong></em></p><p>Serializability 不像 Linearizability 那样强调特定的顺序,它只需数据库执行完一系列事务的状态等价于以任意一个顺序 serial execution 后的状态。</p><p>而 <em>strict serializability</em>像是两者的结合体,它强调某个顺序,举例说明:某个时刻我开启了事务T1,过了一会儿我开启了事务T2,<em>strict serializability</em> 会把 T2 放置在 T1 后执行,而对于 <em>serializability</em> 来说,T1在T2后执行也是合法的。</p><p>从另一个角度上来看,linearizability 是 strict serializability 的特例,把目标从事务(可能涉及到多个对象)限定为单个对象,把行为从多个操作限定为单次操作。</p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><section class="footnotes"><div class="footnote-list"><ol><li><span id="fn:1" class="footnote-text"><span><a href="http://www.bailis.org/blog/linearizability-versus-serializability/">Linearizability versus Serializability</a><a href="#fnref:1" rev="footnote" class="footnote-backref"> ↩</a></span></span></li></ol></div></section>]]></content>
<tags>
<tag>Distributes System</tag>
<tag>consistency</tag>
<tag>ACID</tag>
</tags>
</entry>
<entry>
<title>DDIA-transaction学习</title>
<link href="/2021/07/25/transaction/"/>
<url>/2021/07/25/transaction/</url>
<content type="html"><![CDATA[<h2 id="ACID"><a href="#ACID" class="headerlink" title="ACID"></a>ACID</h2><ul><li><p>Atomicity: All-Or-Nothing. Transaction 涉及到的多个操作要么一同完成,要么什么都没有做,不能处于 half-finished 的状态。因为各种各样的 fault (比如crash、network connection interruption 以及 disk fail 等问题)可能会导致事务中的操作并没有全部完成,atomicity 为数据库的状态提供了一种确定性的保证。</p></li><li><p>Isolation:涉及到 Concurrency,也是探讨的重点。</p></li><li><p>Consistency: DDIA 中提到 Consistency 其实应该和数据库没有什么关系, <span class="label label-danger">it’s a property of the application</span>.应用应该依赖数据库的 Atomicity 和 Isolation 去达成 Consistency,但是数据库无法直接提供这样的一种保障。</p><div class="note note-info"> <p><em>invariants</em> 是一个经常提到的概念,比如在一个只涉及到转账的财务系统中,所有账户上的金额加起来的总和应该是一个不变的定值。数据库只是存储数据,所以它不能防止应用自己往里面写入错误的数据从而破坏 Consistency</p> </div></li><li><p>Durability: 指事务提交后,无论发生了什么 fault,其状态应该持久化地保存下来。在单机系统上常常指数据被持久化到磁盘上,或者说在 WAL 上记录成功。而在分布式系统中由于涉及到副本同步的问题,所以定义起来复杂一点。</p></li></ul><h2 id="Isolation-level"><a href="#Isolation-level" class="headerlink" title="Isolation level"></a>Isolation level</h2><p>这其实在面试中也会经常遇到,DDIA 书中要讲得细致不少</p><h3 id="Read-Uncommitted"><a href="#Read-Uncommitted" class="headerlink" title="Read Uncommitted"></a>Read Uncommitted</h3><p>这一隔离级别下会遇到脏读问题。</p><div class="note note-info"> <p><em>Dirty reads: One Client reads another client’s write before they have committed</em></p> </div><p>脏读可能带来的影响:</p><ul><li>一个事务可能会看到另一个事务的部分更新,这有可能使得当前事务做出错误的决策</li><li>因为一个事务并不一定会成功,脏读使得一个事务可能看到稍后会回滚的数据</li></ul><p>很多数据库的文档都没有特别提到脏写这个问题,这边略微提一下</p><img src="/2021/07/25/transaction/fig7-5.png" class="" title="脏写可能会导致不同事务的冲突写入混淆在一起"><div class="note note-info"> <p><em>Dirty writes: One client overwrites data that another client has written, but not yet committed</em> </p> </div><p>脏写的问题可以用行锁解决(<em>Row-level Lock</em>)</p><h3 id="Read-Committed"><a href="#Read-Committed" class="headerlink" title="Read Committed"></a>Read Committed</h3><p>解决脏读可以使用和解决脏写类似的方式:加锁,但是这样可能会使得许多 read-only transactions 不得不等待一个 long-running write transaction 执行完毕,这严重影响了只读事务的响应时间。</p><p>所以实际工程中,一般采取以下的方法:对于每一个写的对象,记录下旧的提交值以及当前持有写锁的事务设置的新值,事务尚未提交时,其他事务获得的是旧的提交值,一旦事务提交成功,那么立马切换到新值进行读取。</p><p>不过 Read Committed 仍然存在不可重复读的问题。</p><div class="note note-info"> <p><em>Read skew(nonrepeatable reads): A client sees different parts of the database at different points in time.</em></p> </div><img src="/2021/07/25/transaction/fig7-6.png" class="" title="Alice观察到了处于不一致状态的数据库"><h3 id="Repeatable-Read"><a href="#Repeatable-Read" class="headerlink" title="Repeatable Read"></a>Repeatable Read</h3><p>解决不可重复读常用的方法是 <em>Snapshot Isolation</em>. </p><div class="note note-secondary"> <p><em>A key principle of Snapshot Isolation:</em></p><p><em><strong>Readers never block writers,and writers never block readers.</strong></em></p><p>读不需要获取任何锁</p> </div><p><em><strong>MVCC(Multi-version concurrency control):</strong></em> 数据库会保存对象的不同提交版本,在一个事务的执行过程中,它看到的是冻结在某个时间点的数据库的快照。</p><div class="note note-secondary"> <p>实际上 MVCC 也可以用来实现 Read Committed 的隔离级别。典型的方法是<em>read committed uses a separate snapshot for each query, while snapshot isolation uses the same snapshot for an entire transaction.</em></p> </div><h4 id="更多问题……"><a href="#更多问题……" class="headerlink" title="更多问题……"></a>更多问题……</h4><p>Read Committed 以及 Repeatable Read 中,探讨更多的是 Concurrency Read.而并行的两个写事务中经常会遇到 <em>lost updates</em> 这个问题</p><div class="note note-info"> <p><em>Lost updates: Two clients concurrently perform a read-modify-write cycle. One overwrites the other’s write without incorporating its changes, so data is lost.</em></p> </div><img src="/2021/07/25/transaction/fig7-1.png" class="" title="竞争状态下递增计时器"><p>问题的核心在于 <em>a read-modify-write cycle</em>: 读取当前值,依据当前值计算新值,再将更新后的值写回。MVCC 对这种情况无能为力。</p><h5 id="解决方式"><a href="#解决方式" class="headerlink" title="解决方式"></a>解决方式</h5><ul><li><p><strong>Atomic write operations</strong>: 许多数据库提供了原子更新操作,所以就不再需要通过事务实现 read-modify-write cycle. 实现原子更新的一种方式是使用互斥锁(比较普遍),另一种方式是强制要求所有的原子操作在一个单线程上执行。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs mysql">UPDATE counters SET value = value + 1 WHERE key = 'foo'<br></code></pre></td></tr></table></figure></li><li><p><strong>Explicit locking</strong>: 读取时声明读取的结果将用于更新,显式地加锁,从而强制使一个事务对一个对象的读操作延迟到另一个事务的 read-modify-write cycle 结束以后。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><code class="hljs mysql">SELECT * FROM child WHERE id > 100 FOR UPDATE;<br># FOR UPDATE指示数据库应该对查询结果涉及的所有行加锁<br># mysql中一般select读取的是快照值,而FOR UPDATE会使得select读取当前值<br></code></pre></td></tr></table></figure></li></ul><p>在一些不提供事务支持的数据库中,它们提供了类似 compare-and-set 这样的操作。但是要注意的是,如果数据库允许其从一个旧的快照中读值的话,lost updates 仍会发生.</p><h4 id="还不够……"><a href="#还不够……" class="headerlink" title="还不够……"></a>还不够……</h4><p>在 lost updates 中两个写事务更新的是同一个对象,而在 Write skew 中两个事务可能会各自更新不同的对象。</p><p>我们可以认为 <em>Write skew is a generalization of the lost update problems</em>: 如果两个事务读相同的对象,然后更新一些对象。在特例中,不同的事务更新的是同一个对象,我们可能会遇到 dirty write 或者 lost updates(取决于时间)</p><div class="note note-info"> <p><em>Write skew: A transaction reads something,make a decision based on the value it saw, and writes the decision to the database. However, by the time the write is made, the premise of the decision is no longer true.</em></p> </div><p>一些例子:</p><ul><li>会议室预订:检查某一个时间段是否有人预约,没有则插入一条预约记录。如果有两人同时在预约,可能会出现问题</li><li>Unique name: 如果名字要求唯一,那么新用户注册时需要查询名字是否已经被注册,如果没有则插入。如果两人同时注册相同的名字,可能会出现问题</li></ul><p><em>Phantom causes write skew!</em></p><p><code>SELECT FOR UPDATE</code>并不能完全避免 Phantom 的问题<sup id="fnref:1" class="footnote-ref"><a href="#fn:1" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Phantom Rows](https://dev.mysql.com/doc/refman/8.0/en/innodb-next-key-locking.html)">[1]</span></a></sup></p><div class="note note-info"> <p><em>Phantom reads: A transaction reads objects that match some search condition. Another Client makes a write that affects the results of that search.</em></p> </div><p>stackoverflow<sup id="fnref:2" class="footnote-ref"><a href="#fn:2" rel="footnote"><span class="hint--top hint--rounded" aria-label="[difference between Non-Repeatable Read and Phantom Read](https://stackoverflow.com/questions/11043712/what-is-the-difference-between-non-repeatable-read-and-phantom-read)">[2]</span></a></sup>上有一篇问答区分了 Non-Repeatable Read 以及 Phantom Read 的差异:</p><blockquote><p>A non-repeatable read occurs, when during the course of a transaction, a row is retrieved twice and the values within the row differ between reads.</p></blockquote><blockquote><p>A phantom read occurs when, in the course of a transaction, two identical queries are executed, and the collection of rows returned by the second query is different from the first.</p></blockquote><p><em>while the <strong>Non-Repeatable Read</strong> applies to a single row, the <strong>Phantom Read</strong> is about a range of records which satisfy a given query filtering criteria.</em></p><h5 id="解决方式-1"><a href="#解决方式-1" class="headerlink" title="解决方式"></a>解决方式</h5><h6 id="Materializing-conflicts"><a href="#Materializing-conflicts" class="headerlink" title="Materializing conflicts"></a>Materializing conflicts</h6><p>我觉得这个方式还挺有意思的,因为<code>SELECT FOR UPDATE</code>不能给不存在的 row 加锁,所以它就类似于创建了一个 lock table.以会议室预订为例,我们可以提前创建出会议室与时间段的所有组合构成的 row(时间粒度可以选择接下来的六个月)。如果要预订某个时间段的会议室,我们就可以使用<code>SELECT FOR UPDATE</code>对这样实际存在的 row 加锁。</p><p>DDIA 中将 Materializing conflicts 解释为 <em>it takes a phantom and turns it into a lock conflict on <strong>a concrete set of rows that exist in the database</strong></em></p><p>不过这样的方法并不优雅,不仅 materializing conflicts 本身具体如何操作很困难,而且把并发控制机制泄露给应用数据模型这件事就很 weird. 所以不到万不得已并不应该考虑它</p><h6 id="2PL"><a href="#2PL" class="headerlink" title="2PL"></a>2PL</h6><p>S/X Lock 实现,不做赘述</p><p>性能很糟糕,MySQL 的 InnoDB 在实现 serializable isolation level 时使用的就是 2PL</p><h6 id="Predicate-locks"><a href="#Predicate-locks" class="headerlink" title="Predicate locks"></a>Predicate locks</h6><p>前面提到问题的关键是 <em>can’t lock that doesn’t exist yet</em>,所以这里转换思路 <em><strong>lock predicates instead of locking records</strong></em><sup id="fnref:3" class="footnote-ref"><a href="#fn:3" rel="footnote"><span class="hint--top hint--rounded" aria-label="[lock-pdf](http://www.scs.stanford.edu/nyu/02fa/notes/l8.pdf)">[3]</span></a></sup></p><img src="/2021/07/25/transaction/image-20210726125222000.png" class=""><div class="note note-secondary"> <p>Pdf<sup id="fnref:3" class="footnote-ref"><a href="#fn:3" rel="footnote"><span class="hint--top hint--rounded" aria-label="[lock-pdf](http://www.scs.stanford.edu/nyu/02fa/notes/l8.pdf)">[3]</span></a></sup>给出的 solution 2: Precision Lock 其实就是 DDIA 中提到的 Predicate Lock,只不过加入了 S/X 的区分,实现起来仍然代价高昂</p> </div><h6 id="Index-range-locks"><a href="#Index-range-locks" class="headerlink" title="Index-range locks"></a>Index-range locks</h6><p><em>Index-range locking</em> 或者称为 <em>next-key locking</em>,是 <em>predicate lock</em> 的一种近似模拟。相比于 <em>predicate lock</em>,其粒度粗了一点点,比如说预订正午12点到下午一点的 NO.123 会议室,我们可以锁住 No.123 会议室的全时间段或者正午12点到下午一点的所有会议室房间,这些都包含了被 predicate 匹配的对象,所以是安全的。</p><p>这里可以参考一下 MySQL 中的 <em>Next-Key Locks</em><sup id="fnref:4" class="footnote-ref"><a href="#fn:4" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Next-Key Locks](https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html)">[4]</span></a></sup></p><div class="note note-secondary"> <p>InnoDB 经常提到的三种锁:</p><ul><li>Record Lock: 单个行记录上的锁。</li><li>Gap Lock: 间隙锁,锁定一个范围,但不包括记录本身。</li><li>Next-Key Lock: Record Lock + Gap Lock,锁定一个范围且包括记录本身 <em>(To be precise, it’s a combination of a record lock on the index record and a gap lock on the gap <strong>before</strong> the index)</em></li></ul> </div><p>举一些例子<sup id="fnref:5" class="footnote-ref"><a href="#fn:5" rel="footnote"><span class="hint--top hint--rounded" aria-label="[Rules of MySQL Gap-lock/Next-key Locks](https://stackoverflow.com/questions/52399319/rules-of-mysql-gap-lock-next-key-locks)">[5]</span></a></sup>,假设表拥有<code>id</code>值为[5, 10, 11, 13, 20]的行,以下语句</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs sql"><span class="hljs-keyword">SELECT</span> <span class="hljs-operator">*</span> <span class="hljs-keyword">FROM</span> child <span class="hljs-keyword">WHERE</span> id <span class="hljs-operator">=</span> <span class="hljs-number">13</span> <span class="hljs-keyword">FOR</span> <span class="hljs-keyword">UPDATE</span>;<br></code></pre></td></tr></table></figure><p>search 使事务会持有 (11, 13] 的 Next-Key lock 以及 gap lock (13, 20)</p><p>而以下语句</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs sql"><span class="hljs-keyword">SELECT</span> <span class="hljs-operator">*</span> <span class="hljs-keyword">FROM</span> child <span class="hljs-keyword">WHERE</span> id <span class="hljs-operator">></span> <span class="hljs-number">15</span> <span class="hljs-keyword">FOR</span> <span class="hljs-keyword">UPDATE</span>;<br></code></pre></td></tr></table></figure><p>scan 会使得事务持有 gap lock(20, +∞) 以及 Next-Key lock (13, 20];在引入 Next-Key lock 前,上面提到<code>SELECT FOR UPDATE</code>不能完全解决幻读问题在于事务只会持有 gap lock(20, +∞) ,它不能阻止另一个事务插入 <code>id</code> 值为 16, 17, 18, 19 的行。</p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><section class="footnotes"><div class="footnote-list"><ol><li><span id="fn:1" class="footnote-text"><span><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-next-key-locking.html">Phantom Rows</a><a href="#fnref:1" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:2" class="footnote-text"><span><a href="https://stackoverflow.com/questions/11043712/what-is-the-difference-between-non-repeatable-read-and-phantom-read">difference between Non-Repeatable Read and Phantom Read</a><a href="#fnref:2" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:3" class="footnote-text"><span><a href="http://www.scs.stanford.edu/nyu/02fa/notes/l8.pdf">lock-pdf</a><a href="#fnref:3" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:4" class="footnote-text"><span><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html">Next-Key Locks</a><a href="#fnref:4" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:5" class="footnote-text"><span><a href="https://stackoverflow.com/questions/52399319/rules-of-mysql-gap-lock-next-key-locks">Rules of MySQL Gap-lock/Next-key Locks</a><a href="#fnref:5" rev="footnote" class="footnote-backref"> ↩</a></span></span></li></ol></div></section>]]></content>
<tags>
<tag>Distributed System</tag>
<tag>transaction</tag>
</tags>
</entry>
<entry>
<title>2PC and 3PC</title>
<link href="/2021/07/18/2PC-and-3PC/"/>
<url>/2021/07/18/2PC-and-3PC/</url>
<content type="html"><![CDATA[<p>本文主要记述在学习 2PC 和 3PC 时的一些思考</p><h2 id="2PC"><a href="#2PC" class="headerlink" title="2PC"></a>2PC</h2><p>关于 2PC 网上有许多文章介绍,在这里不做赘述。下面提几个关键点以及其和 3PC 的联系</p><p>对于分布式事务而言,其也要保证 Atomicity,即 All-Or-Nothing,其要存在一个 Commit Point。在这之前事务视作失败(Nothing),在这之后事务视作成功(All)。2PC 中,Coordinator 在 Vote 阶段收到所有 Participants 的 agree 后,在向 Participants 发送 commit message 之前,会先在 Log 上写下相关记录。在磁盘上 log 的记录成功视作 Commit Point,在此之后即便 Coordinator crash,其可以通过 Log 得知当前状态,继续完成剩余的阶段(Commit).反之,即便得到所有 Participants 的 agree,Coordinator 在 Log 记录成功前 crash,重启后会选择 abort.</p><p>2PC 属于 Blocking protocol,在下图 DDIA 提到的例子中:如果 Coordinator 崩溃,Participants 将无法继续推进,只能等待 Coordinator recovery,原因在于 Database1 收到 commit 后提交了事务,而 Database2 由于无法收到 Coordinator 的信息,它并不清楚事务整体上是 Abort 还是 Commit,无论采取什么样的行为,都会有风险和其他副本不再保持一致。当然理论上,Participants 在无法收到 Coordinator 信息后可以彼此之间互相交流从而推进事务:</p><ol><li>If no participants had received a commit message, 没有提交发生,可以选择 abort 整个事务</li><li>If any participant had received a commit message,已经有提交发生,其余的 Participants 可以跟进提交事务</li></ol><p>不过以上这些并不包括在 2PC 的内容之内</p><img src="/2021/07/18/2PC-and-3PC/fig9-10.png" class="" title="img"><h3 id="更棘手的情况"><a href="#更棘手的情况" class="headerlink" title="更棘手的情况"></a>更棘手的情况</h3><p>以上提到的解决方案适用于只有 Coordinator 失效的情况 ,如果 Coordinator 和某一个 Participant 一起 failed,而且恰好失效的 Participant 是被 Coordinator 通知的第一个对象并且已经进行了提交。即便剩余存活的 Participant 可以交换信息,由于它们无法得知失效的 Participant 的状态,因此选择 commit 或者 abort 都存在风险。即使有新的 coordinator ,而一般 Log 中不会记录 commit 信息目前已经发送给了哪些 Participant(每次发送前都要记录对性能是一种拖累),在得到所有 Participant 回应之前,它也不得不一直 block</p><img src="/2021/07/18/2PC-and-3PC/2pc-crash.svg" class="" title="img"><h2 id="3PC"><a href="#3PC" class="headerlink" title="3PC"></a>3PC</h2><p>3PC 是为了解决 2PC 问题设计出来的 Nonblocking protocol,其介绍可以参考 wiki<sup id="fnref:3" class="footnote-ref"><a href="#fn:3" rel="footnote"><span class="hint--top hint--rounded" aria-label="[3PC-wiki](https://en.wikipedia.org/wiki/Three-phase_commit_protocol#:~:text=From%20Wikipedia%2C%20the%20free%20encyclopedia%20In%20computer%20networking,failure-resilient%20refinement%20of%20the%20two-phase%20commit%20protocol%20%282PC%29.)">[3]</span></a></sup>(个人觉得讲得比网络上的一些文章明白清晰)</p><p>3PC 可以视作将 2PC 的 commit phase 拆成了 preCommit 以及 doCommit 两个 phase,doCommit 实际上做了跟 2PC 第二个阶段几乎一样的事情,而 preCommit 相当于提供了一段缓冲,让 participant 有机会知道 vote 的结果。</p><p>如果 Coordinator 在发送 preCommit messages 之前 crash,participants 可以一致认为事务 aborted.而 Coordinator 只会在得到所有 Participants 对 preCommit messages 的 Ack 后才会发送 doCommit message.</p><p>在网上阅读一些文章时,有的人评论中提到如果 3PC 遇到了 2PC 在上文遭遇的情况该怎么办?这里体现的正是 3PC 的精妙所在:2PC 中大家并不知道失效的 participant 状况,它既有可能在 vote 阶段选择了 abort,也有可能在 commit 阶段已经率先提交了事务;而 3PC中, Coordinator 失效后,新的节点可以接管事务,并且询问剩余的 participant 的状态:</p><ol><li><p>如果所有的 participants 都已经收到了 preCommit messages,那么新的 Coordinator 可以认为事务能够提交</p><div class="note note-info"> <p>大家收到 preCommit messages 的前提是所有人在 vote 阶段都 agree,所以如果剩余存活的 participants 都收到了 preCommit, 无论失效的 participant 是否收到 preCommit,coordinator 选择提交是没有风险(否则失效节点也不会在 vote 阶段选择 agree)</p> </div></li><li><p>如果有任意一个 participant 没有收到 preCommit messages,那么 Coordinator 可以选择 abort transaction 或者重新启动整个事务提交的流程</p><div class="note note-info"> <p>在这里可以放心大胆地 abort 的原因正是只有所有的 participants 收到 preCommit后才会进入提交阶段,换言之,只要有任意一个未收到,那么不会有任意一个副本已经提交了从而造成与其他 abort 事务的副本不一致的状态</p> </div></li></ol><p><em>不过 3PC 并没有解决所有的问题,如果出现了 network partition,one partition 中所有的 participants 收到了 preCommit messages, the other partition 中的 participant 则没有,那么两边可能会分别 abort 或 commit 事务,从而当网络恢复时会出现副本状态不一致的情况。此外由于 3PC 多了一个阶段,需要交换的信息也变多了,对于对 latency 要求比较严格的应用来说可能不是那么适合。不过至少 3PC 不会因为 single node failures 而 block,所以如果对高可用性有一定要求可以考虑 3PC</em></p><h2 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h2><section class="footnotes"><div class="footnote-list"><ol><li><span id="fn:1" class="footnote-text"><span><a href="https://www.the-paper-trail.org/post/2008-11-27-consensus-protocols-two-phase-commit/">2PC</a><a href="#fnref:1" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:2" class="footnote-text"><span><a href="https://www.the-paper-trail.org/post/2008-11-29-consensus-protocols-three-phase-commit/">3PC</a><a href="#fnref:2" rev="footnote" class="footnote-backref"> ↩</a></span></span></li><li><span id="fn:3" class="footnote-text"><span><a href="https://en.wikipedia.org/wiki/Three-phase_commit_protocol#:~:text=From%20Wikipedia%2C%20the%20free%20encyclopedia%20In%20computer%20networking,failure-resilient%20refinement%20of%20the%20two-phase%20commit%20protocol%20%282PC%29.">3PC-wiki</a><a href="#fnref:3" rev="footnote" class="footnote-backref"> ↩</a></span></span></li></ol></div></section>]]></content>
<tags>
<tag>Distributed System</tag>
<tag>2PC</tag>
<tag>3PC</tag>
</tags>
</entry>
<entry>
<title>关于git rebase</title>
<link href="/2021/07/09/%E5%85%B3%E4%BA%8Egit-rebase/"/>
<url>/2021/07/09/%E5%85%B3%E4%BA%8Egit-rebase/</url>
<content type="html"><![CDATA[<p>昨天师兄让我基于他的例子用技术栈重写监控脚本时,需要从 master 获取最新的提交。之前在学校也有过一些使用 git 进行团队协作的经历,所以出于习惯使用了<code>(MyBranch)$ git merge master</code> 。使用 git log 检查提交记录时发现 log 非常混乱,师兄提醒我说从 master 获取时最好使用<code>git rebase</code>,因为这个大库有很多人在开发,master 分支异常活跃,与学校小组作业 3-4人的开发完全不同。在这里系统地学习一些 <code>git base</code>以及 <code>git merge</code>在使用上的区别</p><p>merge 和 commit 都是将一个分支的变化合并进另一个分支,但是它们工作方式上有一些区别,以下是示意图</p><img src="/2021/07/09/%E5%85%B3%E4%BA%8Egit-rebase/02-Merging-main-into-the-feature-branh.svg" class="" title="Merging master into the feature branch"><img src="/2021/07/09/%E5%85%B3%E4%BA%8Egit-rebase/03-Rebasing-the-feature-branch-into-main.svg" class="" title="Rebasing the feature branch onto master"><p>merge 会保留一个有用的,语义化的</p><p>rebase 会移除整个 feature 分支,并在 main 分支的基础上重写原先 feature 分支的提交记录,这样 project history 会更加清晰、简练并且是 linear 的。</p><h2 id="应用场景"><a href="#应用场景" class="headerlink" title="应用场景"></a>应用场景</h2><h3 id="合并提交记录"><a href="#合并提交记录" class="headerlink" title="合并提交记录"></a>合并提交记录</h3><p>加入<code>-i</code>选项执行<code>git rebase -i main</code>,会出现vi编辑器并显示即将被移除重新叠加的提交</p><figure class="highlight tex"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><code class="hljs tex">pick b3676f70e feat: message1<br>pick 10ed50e1f fix: message2<br>pick a99d36b1d fix: message3<br><br><span class="hljs-params">#</span> Rebase 26c3db1..26c3db1 onto 26c3db1 (3 commands)<br><span class="hljs-params">#</span><br><span class="hljs-params">#</span> Commands:<br><span class="hljs-params">#</span> p, pick <commit> = use commit<br><span class="hljs-params">#</span> r, reword <commit> = use commit, but edit the commit message<br><span class="hljs-params">#</span> e, edit <commit> = use commit, but stop for amending<br><span class="hljs-params">#</span> s, squash <commit> = use commit, but meld into previous commit<br><span class="hljs-params">#</span> f, fixup <commit> = like "squash", but discard this commit's log message<br><span class="hljs-params">#</span> x, exec <command> = run command (the rest of the line) using shell<br><span class="hljs-params">#</span> b, break = stop here (continue rebase later with 'git rebase --continue')<br><span class="hljs-params">#</span> d, drop <commit> = remove commit<br><span class="hljs-params">#</span> l, label <label> = label current HEAD with a name<br><span class="hljs-params">#</span> t, reset <label> = reset HEAD to a label<br><span class="hljs-params">#</span> m, merge [-C <commit> | -c <commit>] <label> [<span class="hljs-params">#</span> <oneline>]<br><span class="hljs-params">#</span> . create a merge commit using the original merge commit's<br><span class="hljs-params">#</span> . message (or the oneline, if no original merge commit was<br><span class="hljs-params">#</span> . specified). Use -c <commit> to reword the commit message.<br><span class="hljs-params">#</span><br><span class="hljs-params">#</span> These lines can be re-ordered; they are executed from top to bottom.<br><span class="hljs-params">#</span><br><span class="hljs-params">#</span> If you remove a line here THAT COMMIT WILL BE LOST.<br><span class="hljs-params">#</span><br><span class="hljs-params">#</span> However, if you remove everything, the rebase will be aborted.<br><span class="hljs-params">#</span><br><br></code></pre></td></tr></table></figure><p>可以使用其中提示到的一些指令来变更提交说明或者压缩提交记录,这样会使得你分支的 history 更为清晰</p><p>如果保存时遇到这个错误:</p><figure class="highlight subunit"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs subunit"><span class="hljs-keyword">error: </span>cannot 'squash' without a previous commit<br></code></pre></td></tr></table></figure><div class="note note-danger"> <p><strong>WARNING:</strong> 注意不要合并先前已经提交的内容!</p> </div><p>如果异常退出 vi 编辑窗口</p><p>使用<code>git rebase --edit-todo</code>继续编辑,修改后使用<code>git rebase --continue</code>保存</p><h3 id="Workflow"><a href="#Workflow" class="headerlink" title="Workflow"></a>Workflow</h3><p>在工作中我们经常会需要开发新 feature,我们一般不会直接在 master 做更新,一般会创建一个专有分支来进行开发,如图所示</p><img src="/2021/07/09/%E5%85%B3%E4%BA%8Egit-rebase/06-Developing-a-feature-in-a-dedicated-branch.svg" class="" title="Developing a feature in a dedicated branch"><p>当开发新 feature 周期比较长时,main 分支时可能有大量的更新,我们可以定时进行 rebase,确保我们的 feature 时有意义的(可以及时发现可能存在的冲突),我们有两个选择:</p><ol><li>基于 feature 的父分支(比如 main),这种情况在前面讲 merge 和 rebase 区别时已经提到了</li><li>基于 feature 分支的之前的提交版本,执行<code>git rebase -i HEAD~3</code>可以用来合并最近的 3 次提交记录(base 正是 <code>HEAD~3</code>),这种情况其实主要是为了合并提交日志,它不会囊括上游的变化</li></ol><img src="/2021/07/09/%E5%85%B3%E4%BA%8Egit-rebase/07-Rebasing-into-Head-3.svg" class="" title="Rebasing onto Head~3"><h2 id="一些注意点"><a href="#一些注意点" class="headerlink" title="一些注意点"></a>一些注意点</h2><h3 id="Note1"><a href="#Note1" class="headerlink" title="Note1"></a><em>Note1</em></h3><p><strong>Never use rebase on</strong> <em><strong>public</strong></em> <strong>branch!</strong></p><p>如果 rebase main onto feature, git 会人会你的 main 分支已经与其他人的 main 分支偏离了</p><img src="/2021/07/09/%E5%85%B3%E4%BA%8Egit-rebase/05-Rebasing-the-main-branch.svg" class="" title="Rebasing the master branch"><p>同步两个 main 分支的部分就是 merge,但是这会带来一次额外的 merge commit以及两组包含着相同变更的 commits</p><h3 id="Note2"><a href="#Note2" class="headerlink" title="Note2"></a><em>Note2</em></h3><p>当你发起 pull request 请求后最好不要使用<code>git rebase</code>,因为其他的开发者会看到你的提交,这是它就是<em>public</em> branch.</p><h3 id="Note3"><a href="#Note3" class="headerlink" title="Note3"></a><em>Note3</em></h3><p>main 上合并 feature 时通常使用 merge 而不是 rebase,不过如果 feature 分支合并前事先执行过 rebase再合并会得到一个很好的 linear history</p><h3 id="Note4"><a href="#Note4" class="headerlink" title="Note4"></a><em>Note4</em></h3><p>rebase 时可能会出现 conflict,需要我们去 resolve,解决后使用<code>git add</code>去更新,但是无需执行<code>git commit</code>,只需要执行<code>git rebase --continue</code></p><p>在任何时候可以使用<code>git rebase --abort</code>来终止rebase,这样分支会回到 rebase 开始前的状态</p><h3 id="Note5"><a href="#Note5" class="headerlink" title="Note5"></a><em>Note5</em></h3><p><code>git rebase</code>是危险操作,因为它会改变历史,使用它应该谨慎</p><p>只要分支上需要 rebase 的所有 commits 历史还没有被 push 过,就可以安全使用 </p><p>今天我 rebase 的部分commits 已经被我提交了,所以与remote repository产生了一些冲突,但是如果能够确认这是私有分支,可以使用 <code>git push -f</code>进行覆盖</p><h2 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h2><blockquote><p><a href="https://www.atlassian.com/git/tutorials/merging-vs-rebasing">https://www.atlassian.com/git/tutorials/merging-vs-rebasing</a></p><p><a href="http://jartto.wang/2018/12/11/git-rebase/">http://jartto.wang/2018/12/11/git-rebase/</a></p></blockquote>]]></content>
<tags>
<tag>git</tag>
</tags>
</entry>
<entry>
<title>LVS负载均衡</title>
<link href="/2021/07/08/LVS%E8%B4%9F%E8%BD%BD%E5%9D%87%E8%A1%A1/"/>
<url>/2021/07/08/LVS%E8%B4%9F%E8%BD%BD%E5%9D%87%E8%A1%A1/</url>
<content type="html"><![CDATA[<h2 id="负载均衡方案"><a href="#负载均衡方案" class="headerlink" title="负载均衡方案"></a>负载均衡方案</h2><p>服务器集群对外提供服务时,我们希望请求尽可能平均地分散到各台机器上,有一些负载均衡的解决方案</p><h3 id="DNS"><a href="#DNS" class="headerlink" title="DNS"></a>DNS</h3><p>基于 DNS 的负载均衡实现比较简单,成本低:当 DNS 请求到达 DNS Server 解析域名时,Server 可以根据一些调度策略(按地域、按运营商等等)回复 Client 集群中任意一台服务器的 IP 地址,客户端接下来的请求在 TTL 所指定的时间内将一直发送到此服务器进行处理。</p><p>但是基于 DNS 的均衡负载流量调度以主机 IP 为单位而非连接,其粒度过大,不均衡(因为用户的访问模式可能存在差异);而且 DNS Server 从客户端的 IP 地址中其实能获取的信息并不会太多(地域、运营商),这使得其负载均衡的策略极为有限;同时由于客户端在 TTL 时间内都会使用该解析记录,TTL 的值设置上:过小会导致 DNS 流量很高,过大会严重影响负载均衡的效果,同时如果节点发生故障,即使 DNS Server 的维护人员可以通过监测获知并迅速剔除故障节点,但在 TTL 还没过期时,原先分配到故障节点的客户端仍然会继续向其请求服务,会给用户带来很糟糕的体验。</p><h3 id="硬件"><a href="#硬件" class="headerlink" title="硬件"></a>硬件</h3><p>由专门的硬件设备(dispatcher)来实现,dispatcher 对用户来说是透明的,集群只需要对外提供一个虚拟 IP,在集群内部可以以内网 IP 进行通信转发请求,其功能以及性能强大,但是价格昂贵、可扩展性差以及调试维护麻烦</p><h3 id="软件"><a href="#软件" class="headerlink" title="软件"></a>软件</h3><p>例如 Nginx 以及 LVS,简单灵活且便宜,而且可以根据业务特点比较方便地进行扩展以及定制功能</p><h2 id="LVS的组成"><a href="#LVS的组成" class="headerlink" title="LVS的组成"></a>LVS的组成</h2><h3 id="相关术语"><a href="#相关术语" class="headerlink" title="相关术语"></a>相关术语</h3><ul><li>DS(Director Server),前端的负载均衡节点服务器,其接受所有传入的客户端请求</li><li>RS(Real Server),真实服务器是构成 LVS 集群的节点,提供服务</li><li>VIP(Virtual IP),为客户端提供服务的 IP 地址</li><li>RIP(Real IP),真实服务器的 IP 地址</li><li>CIP(Client IP),客户端的 IP 地址</li><li>DIP(Director IP),负载均衡器与后端真实服务器通信的 IP 地址</li></ul><h3 id="组成部分"><a href="#组成部分" class="headerlink" title="组成部分"></a>组成部分</h3><ul><li>IPVS(IP Virtual Server):基于内核态 netfilter 实现,工作在内核态</li><li>IPVSADM(IP Virtual Server Administrator): LVS 用户态的配套管理工具,基于 netlink 或 raw socket 的方式与内核 LVS 通信</li></ul><p>类比:如果 IPVS 为 netfilter,那么 IPVSADM 为 iptables(<em>注:iptables 正是通过 netlink 与 netfilter通信</em>)</p><h3 id="Netfilter-和-IPTABLES"><a href="#Netfilter-和-IPTABLES" class="headerlink" title="Netfilter 和 IPTABLES"></a>Netfilter 和 IPTABLES</h3><p>iptables 在 Docker 以及 K8s 中应用甚广。我的文章<a href="https://flaglord.com/2021/05/26/Kubernetes%E7%BD%91%E7%BB%9C%E5%AD%A6%E4%B9%A0%E6%95%B4%E7%90%86/">Kubernetes网络学习整理</a>中可以看到 iptables 频繁出现。</p><p>底层的 netfilter 其本质就是在整个网络流程的若干位置放置一些钩子,并在每个钩子上挂载一些处理函数进行处理</p><img src="/2021/07/08/LVS%E8%B4%9F%E8%BD%BD%E5%9D%87%E8%A1%A1/netfilter.png" class="" title="netfilter.png"><p>IP 层的 5 个钩子点的位置为:</p><ul><li>PREROUTING</li><li>POSTROUTING</li><li>FORWARD</li><li>INPUT</li><li>OUTPUT</li></ul><p>在 PREROUTING 处可以进行 DNAT,在 POSTROUTING 处可以进行 SNAT,而在 FORWARD 则可以设置一些过滤函数</p><h3 id="LVS工作原理"><a href="#LVS工作原理" class="headerlink" title="LVS工作原理"></a>LVS工作原理</h3><img src="/2021/07/08/LVS%E8%B4%9F%E8%BD%BD%E5%9D%87%E8%A1%A1/lvm.svg" class="" title="img"><p>LVS 工作在 INPUT 链上,流程大致如下:</p><ol><li>在 PREROUTING 查找路由,确认 VIP 是本机地址则进入 INPUT 链</li><li>如果 VIP 和 端口 确实为 IPVS 服务,INPUT 上挂载的 ipvs_hook函数 会被调用,强制修改数据包的相关数据,送向 OUTPUT 链</li></ol><h3 id="工作方式"><a href="#工作方式" class="headerlink" title="工作方式"></a>工作方式</h3><p>有三种 IP 负载均衡技术:NAT 、IPTunneling 以及 DR</p><h3 id="NAT"><a href="#NAT" class="headerlink" title="NAT"></a>NAT</h3><img src="/2021/07/08/LVS%E8%B4%9F%E8%BD%BD%E5%9D%87%E8%A1%A1/lvs_nat.svg" class="" title="img"><p>客户端会使用 DNS 解析的 VIP 进行访问,到达 load balancer后,它首先会检查包的 destination address 以及 port number.如果其与 LVS service 相匹配,会根据调度算法从集群中选出一个 RS ,同时包的 destination address 和 port number 会使用 RS 的数据进行重写。包发送向选择的 RS 后,reply 会因为相应的路由规则送给 load balancer(网关),在里面 source IP 会被重写为 VIP,保持了整个过程对 Client 的透明</p><p><strong>Advantages</strong>:</p><ul><li>可以运行在任何支持 TCP/IP Protocol 的操作系统上(包括 Windows),且支持端口映射</li><li>只需要 一个 IP 地址供 load balancer 使用,load balancer 和 Cluster 通信可以使用内网 IP</li></ul><p><strong>Disadvantages:</strong></p><ul><li>可扩展性比较差,Load balancer 很容易成为 bottleneck,因为 request 以及 reply 都需要由其来进行重写</li></ul><p>LVS-NAT 复用了 Linux Kernel 中有关 netfilter 以及 iptables 中的相当多的代码,不过它将原本 O(n) 的链式判断变为了 Map O(1) 级别,加快了转发效率。</p><p>除了之后提到的两种工作模式,使用 DNS hybrid 可以解决瓶颈问题,可以有许多 load balancers 共同工作,只需要 DNS Server 解析时使用 Round-Robin 即可 </p><h4 id="IP-Tunneling"><a href="#IP-Tunneling" class="headerlink" title="IP Tunneling"></a>IP Tunneling</h4><img src="/2021/07/08/LVS%E8%B4%9F%E8%BD%BD%E5%9D%87%E8%A1%A1/lvs_iptunneling.svg" class="" title="img"><p>简单地说就是将原来的 packet 作为 payload 再封装在新的报文中,RS 上的 tun 设备会拆封,然后将 VIP 识别为本机的地址,将 reply 不经 load balance 送回客户端</p><p><strong>Advantages:</strong></p><ul><li>性能表现好</li><li>可以跨机房</li></ul><p><strong>Disadvantages:</strong></p><ul><li>需要 Server 支持 “IP Tunneling”(IP Encapsulation) protocol</li><li>在国内 VIP 与后端服务器可能存在跨运营商的情况,有可能被运营商的策略认为是IP伪造请求而被拦截</li></ul><p><em>Note: 在VS/TUN 以及后面提到的 VS/DR clusters 中,VIP 被 load balancer 以及 RSs 同时共享,在一些情况下,load balancer 和一些 RSs 在同一个 网络中,如果 RS 也回应 arp 报文请求,会存在 race condition.报文将一会儿被送给 load balancer,一会送给某一 RS,另一会儿被送给另一 RS,LVS cluster 将无法正常工作。所以我们需要保证 RS (与 load balancer 同网络中的)不回复关于 VIP 的 arp 请求,但允许其处理 destination IP 为 VIP 的包</em></p><h4 id="DR"><a href="#DR" class="headerlink" title="DR"></a>DR</h4><p>工作流程图与 IP Tunneling 相仿,但是在实现上不同的是报文并未被再次封装,而是由 load balancer 直接路由给 RS(将报文目的 MAC 地址改为选中的 RS 的 MAC 地址)</p><p><strong>Advantages:</strong></p><ul><li>性能好,DR > IP Tunneling > NAT</li></ul><p><strong>Disadvantages:</strong></p><ul><li>由于数据转发是通过修改 MAC 地址实现,所以限制了 load balancer 与 RS 必须处于同一交换机环境中,不利于与异地容灾</li></ul><div class="note note-info"> <p><b>Note:</b>前面提到 IP Tunneling 支持跨机房,其应该主要用于异地容灾上,因为为了用户体验考虑,时延应该应可能小,所以最好避免跨机房访问</p> </div><h2 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h2><blockquote><p><a href="http://www.linuxvirtualserver.org/why.html">Why virtual server?</a></p><p><a href="http://www.linuxvirtualserver.org/how.html">How virtual server works?</a></p></blockquote>]]></content>
<tags>
<tag>负载均衡</tag>
</tags>
</entry>
<entry>
<title>Windows上使用docker-compose搭建本地伪zookeeper集群</title>
<link href="/2021/07/04/docker-compose%E6%90%AD%E5%BB%BA%E6%9C%AC%E5%9C%B0%E4%BC%AAzookeeper%E9%9B%86%E7%BE%A4/"/>
<url>/2021/07/04/docker-compose%E6%90%AD%E5%BB%BA%E6%9C%AC%E5%9C%B0%E4%BC%AAzookeeper%E9%9B%86%E7%BE%A4/</url>
<content type="html"><![CDATA[<p>系统环境为 Windows</p><h2 id="Prerequisites"><a href="#Prerequisites" class="headerlink" title="Prerequisites"></a>Prerequisites</h2><h3 id="开启-Hyper-V"><a href="#开启-Hyper-V" class="headerlink" title="开启 Hyper-V"></a>开启 Hyper-V</h3><p>确保你开启了 Hyper-V,可以使用管理员身份运行 PowerShell 或者 Terminal,输入命令<code> bcdedit /set hypervisorlaunchtype auto</code>并进行重启,<em>注意虚拟机软件 VMWARE 以及 VirtualBox 一些版本与 Hyper-v 存在冲突,如果想要使用它们,需要关闭 Hyper-V,使用命令<code>bcdedit /set hypervisorlaunchtype off</code></em></p><h3 id="搜索-zookeeper-镜像"><a href="#搜索-zookeeper-镜像" class="headerlink" title="搜索 zookeeper 镜像"></a>搜索 zookeeper 镜像</h3><p>打开 PowerShell 输入<code>docker search zookeeper</code>,如果提示 docker 命令无法识别,请下载 docker 并将其加入电脑的环境变量 Path 中,路径名一般为<code>#安装路径\Docker\Docker\resources\bin</code>.</p><p>如果出现以下场景</p><img src="/2021/07/04/docker-compose%E6%90%AD%E5%BB%BA%E6%9C%AC%E5%9C%B0%E4%BC%AAzookeeper%E9%9B%86%E7%BE%A4/image-20210704121548110.png" class="" title="image-20210704121548110"><p>以管理员身份运行 cmd,在<code>#安装路径\Docker\Docker</code>路径下运行<code>DockerCli.exe -SwitchDaemon</code>,再次尝试使用<code>docker search</code>,如果仍然不行,请重新安装 Docker</p><p>搜索后会出现</p><img src="/2021/07/04/docker-compose%E6%90%AD%E5%BB%BA%E6%9C%AC%E5%9C%B0%E4%BC%AAzookeeper%E9%9B%86%E7%BE%A4/image-20210704122624059.png" class="" title="image-20210704122624059"><p>使用 apache 的官方镜像即可,使用<code>docker pull zookeeper</code>进行下载</p><h2 id="docker-compose-启动集群"><a href="#docker-compose-启动集群" class="headerlink" title="docker-compose 启动集群"></a>docker-compose 启动集群</h2><p><code>docker-compose.yml</code>文件内容如下</p><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br></pre></td><td class="code"><pre><code class="hljs yaml"><span class="hljs-comment"># 给zk集群配置一个网络,网络名为zk-net</span><br><span class="hljs-attr">networks:</span><br> <span class="hljs-attr">zk-net:</span><br> <span class="hljs-attr">name:</span> <span class="hljs-string">zk-net</span><br> <br><span class="hljs-comment"># 配置zk集群的</span><br><span class="hljs-comment"># container services下的每一个子配置都对应一个zk节点的docker container</span><br><span class="hljs-attr">services:</span><br> <span class="hljs-attr">zk1:</span><br> <span class="hljs-comment"># docker container所使用的docker image</span><br> <span class="hljs-attr">image:</span> <span class="hljs-string">zookeeper</span><br> <span class="hljs-attr">hostname:</span> <span class="hljs-string">zk1</span><br> <span class="hljs-attr">container_name:</span> <span class="hljs-string">zk1</span><br> <span class="hljs-comment"># 配置docker container和宿主机的端口映射</span><br> <span class="hljs-attr">ports:</span><br> <span class="hljs-bullet">-</span> <span class="hljs-number">2181</span><span class="hljs-string">:2181</span><br> <span class="hljs-bullet">-</span> <span class="hljs-number">8081</span><span class="hljs-string">:8080</span><br> <span class="hljs-comment"># 配置docker container的环境变量</span><br> <span class="hljs-attr">environment:</span><br> <span class="hljs-comment"># 当前zk实例的id</span><br> <span class="hljs-attr">ZOO_MY_ID:</span> <span class="hljs-number">1</span><br> <span class="hljs-comment"># 整个zk集群的机器、端口列表</span><br> <span class="hljs-attr">ZOO_SERVERS:</span> <span class="hljs-string">server.1=0.0.0.0:2888:3888;2181</span> <span class="hljs-string">server.2=zk2:2888:3888;2181</span> <span class="hljs-string">server.3=zk3:2888:3888;2181</span><br> <span class="hljs-comment"># 将docker container上的路径挂载到宿主机上 实现宿主机和docker container的数据共享</span><br> <span class="hljs-attr">volumes:</span><br> <span class="hljs-bullet">-</span> <span class="hljs-string">./zk1/data:/data</span><br> <span class="hljs-bullet">-</span> <span class="hljs-string">./zk1/datalog:/datalog</span><br> <span class="hljs-comment"># 当前docker container加入名为zk-net的隔离网络</span><br> <span class="hljs-attr">networks:</span><br> <span class="hljs-bullet">-</span> <span class="hljs-string">zk-net</span><br><br> <span class="hljs-attr">zk2:</span><br> <span class="hljs-attr">image:</span> <span class="hljs-string">zookeeper</span><br> <span class="hljs-attr">hostname:</span> <span class="hljs-string">zk2</span><br> <span class="hljs-attr">container_name:</span> <span class="hljs-string">zk2</span><br> <span class="hljs-attr">ports:</span><br> <span class="hljs-bullet">-</span> <span class="hljs-number">2182</span><span class="hljs-string">:2181</span><br> <span class="hljs-bullet">-</span> <span class="hljs-number">8082</span><span class="hljs-string">:8080</span><br> <span class="hljs-attr">environment:</span><br> <span class="hljs-attr">ZOO_MY_ID:</span> <span class="hljs-number">2</span><br> <span class="hljs-attr">ZOO_SERVERS:</span> <span class="hljs-string">server.1=zk1:2888:3888;2181</span> <span class="hljs-string">server.2=0.0.0.0:2888:3888;2181</span> <span class="hljs-string">server.3=zk3:2888:3888;2181</span><br> <span class="hljs-attr">volumes:</span><br> <span class="hljs-bullet">-</span> <span class="hljs-string">./zk2/data:/data</span><br> <span class="hljs-bullet">-</span> <span class="hljs-string">./zk2/datalog:/datalog</span><br> <span class="hljs-attr">networks:</span><br> <span class="hljs-bullet">-</span> <span class="hljs-string">zk-net</span><br><br> <span class="hljs-attr">zk3:</span><br> <span class="hljs-attr">image:</span> <span class="hljs-string">zookeeper</span><br> <span class="hljs-attr">hostname:</span> <span class="hljs-string">zk3</span><br> <span class="hljs-attr">container_name:</span> <span class="hljs-string">zk3</span><br> <span class="hljs-attr">ports:</span><br> <span class="hljs-bullet">-</span> <span class="hljs-number">2183</span><span class="hljs-string">:2181</span><br> <span class="hljs-bullet">-</span> <span class="hljs-number">8083</span><span class="hljs-string">:8080</span><br> <span class="hljs-attr">environment:</span><br> <span class="hljs-attr">ZOO_MY_ID:</span> <span class="hljs-number">3</span><br> <span class="hljs-attr">ZOO_SERVERS:</span> <span class="hljs-string">server.1=zk1:2888:3888;2181</span> <span class="hljs-string">server.2=zk2:2888:3888;2181</span> <span class="hljs-string">server.3=0.0.0.0:2888:3888;2181</span><br> <span class="hljs-attr">volumes:</span><br> <span class="hljs-bullet">-</span> <span class="hljs-string">./zk3/data:/data</span><br> <span class="hljs-bullet">-</span> <span class="hljs-string">./zk3/datalog:/datalog</span><br> <span class="hljs-attr">networks:</span><br> <span class="hljs-bullet">-</span> <span class="hljs-string">zk-net</span><br></code></pre></td></tr></table></figure><p><em>注意:如果你的 docker-compose 版本小于 1.27.0,请在文件开头指定 version</em></p><p>这边除了<code>docker-compose.yml</code>,我还创建了zk1、zk2、zk3 三个文件夹供容器挂载使用</p><img src="/2021/07/04/docker-compose%E6%90%AD%E5%BB%BA%E6%9C%AC%E5%9C%B0%E4%BC%AAzookeeper%E9%9B%86%E7%BE%A4/image-20210704123354987.png" class="" title="image-20210704123354987"><p>使用<code>docker-compose up -d</code>启动集群</p><img src="/2021/07/04/docker-compose%E6%90%AD%E5%BB%BA%E6%9C%AC%E5%9C%B0%E4%BC%AAzookeeper%E9%9B%86%E7%BE%A4/image-20210704123633354.png" class="" title="image-20210704123633354"><p>使用<code>docker-compose ps</code>进行 查看</p><img src="/2021/07/04/docker-compose%E6%90%AD%E5%BB%BA%E6%9C%AC%E5%9C%B0%E4%BC%AAzookeeper%E9%9B%86%E7%BE%A4/image-20210704123724798.png" class="" title="image-20210704123724798"><p>使用<code>docker-compose stop</code> 停止</p><img src="/2021/07/04/docker-compose%E6%90%AD%E5%BB%BA%E6%9C%AC%E5%9C%B0%E4%BC%AAzookeeper%E9%9B%86%E7%BE%A4/image-20210704123859881.png" class="" title="image-20210704123859881"><p>使用<code>docker exec -it zk1 /bin/bash</code>进入 zk1 容器内部(进入其他容器同理)</p><p><code>cd bin</code>-><code>ls</code>进行查看</p><img src="/2021/07/04/docker-compose%E6%90%AD%E5%BB%BA%E6%9C%AC%E5%9C%B0%E4%BC%AAzookeeper%E9%9B%86%E7%BE%A4/image-20210704124023023.png" class="" title="image-20210704124023023"><p>使用<code>./zkCli.sh</code>进行客户端连接,你可以使用输入一些 zookeeper 命令进行验证</p><h2 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h2><blockquote><p><a href="https://blog.csdn.net/weixin_40943540/article/details/103027246">Hyper-V 与虚拟机冲突解决</a></p><p><a href="https://stackoverflow.com/questions/67788960/error-during-connect-this-error-may-indicate-that-the-docker-daemon-is-not-runn?r=SearchResults">docker daemon is not running 解决</a></p><p><a href="https://zhuanlan.zhihu.com/p/72467871">docker 搭建 zookeeper 集群</a></p></blockquote>]]></content>
<tags>
<tag>zookeeper</tag>
</tags>
</entry>
<entry>
<title>Goland连接VMWARE</title>
<link href="/2021/06/26/Goland%E8%BF%9E%E6%8E%A5VMWARE/"/>
<url>/2021/06/26/Goland%E8%BF%9E%E6%8E%A5VMWARE/</url>
<content type="html"><![CDATA[<p>在这里,我是在宿主机上使用 Goland 来连接 VMWARE 本地虚拟机进行开发</p><h2 id="VMWARE-相关设置"><a href="#VMWARE-相关设置" class="headerlink" title="VMWARE 相关设置"></a>VMWARE 相关设置</h2><p>首先使用<code>ifconfig</code>查看虚拟机 ip 地址</p><img src="/2021/06/26/Goland%E8%BF%9E%E6%8E%A5VMWARE/ifconfig.png" class=""><p>点击虚拟网络编辑器</p><img src="/2021/06/26/Goland%E8%BF%9E%E6%8E%A5VMWARE/vm-1.png" class="" alt="image-20210626012433145"><p>点击更改设置获取管理权权限,再点击 NAT 设置,<em>注意这里设置的是 VMnet8 (NAT模式)</em></p><img src="/2021/06/26/Goland%E8%BF%9E%E6%8E%A5VMWARE/vm-2.png" class="" alt="image-20210626012518570"><img src="/2021/06/26/Goland%E8%BF%9E%E6%8E%A5VMWARE/config.png" class=""><p>在端口转发中添加规则,主机端口为22,默认映射端口为22,虚拟机 IP 地址填写刚刚使用 ifconfig 得到的虚拟地址</p><p>在 terminal 中使用<code>sudo apt-get install openssh-client openssh-server openssh-sftp-server</code>下载跟 ssh 相关的模块,再输入<code>sudo service ssh restart</code>开启 ssh 服务</p><h2 id="Goland-相关设置"><a href="#Goland-相关设置" class="headerlink" title="Goland 相关设置"></a>Goland 相关设置</h2><p>点击 Tools->Deployment->Configuration</p><img src="/2021/06/26/Goland%E8%BF%9E%E6%8E%A5VMWARE/Goland-config-1.png" class="" alt="image-20210626013346971"><p>选择添加 SFTP </p><img src="/2021/06/26/Goland%E8%BF%9E%E6%8E%A5VMWARE/Goland-config-2.png" class="" alt="image-20210626013432305"><p>host 默认使用 localhost</p><p>Username 为虚拟机用户名,密码为相应用户密码,可以点击 Test Connection 进行检查</p><img src="/2021/06/26/Goland%E8%BF%9E%E6%8E%A5VMWARE/Goland-config-3.png" class="" alt="image-20210626013605160"><p>在 Mappings 设置与虚拟机项目对应的本地项目文件夹</p><img src="/2021/06/26/Goland%E8%BF%9E%E6%8E%A5VMWARE/goland-config-mapping.png" class=""><p>点击 Tools->Deployment->Browse Remote Host 可以查看虚拟机文件夹,选择对应的项目文件夹右键选择 Download from here即可</p>]]></content>
<tags>
<tag>工具配置</tag>
</tags>
</entry>
<entry>
<title>Kubernetes网络学习整理</title>
<link href="/2021/05/26/Kubernetes%E7%BD%91%E7%BB%9C%E5%AD%A6%E4%B9%A0%E6%95%B4%E7%90%86/"/>
<url>/2021/05/26/Kubernetes%E7%BD%91%E7%BB%9C%E5%AD%A6%E4%B9%A0%E6%95%B4%E7%90%86/</url>
<content type="html"><![CDATA[<h1 id="Docker-网络模型"><a href="#Docker-网络模型" class="headerlink" title="Docker 网络模型"></a>Docker 网络模型</h1><p>四大模式:</p><ul><li><strong>Host</strong>: 容器不会虚拟出自己的网卡,配置自己的IP,而是使用宿主机的 IP 和端口。容器与宿主机共享同一 Network Namespace,<em>优点是网络性能好,但是缺点是网络隔离性差,容器网络栈如果崩溃会影响到宿主机,同时会受到宿主机端口使用数量以及占用状况的限制(端口冲突问题)</em></li><li><strong>Container</strong>: 指定新创建的容器加入已经存在的某一个容器的 Network Namespace</li><li><strong>None</strong>: 容器有独立的 Network Namespace, 但并没有对其进行过任何网络设置</li><li><strong>bridge</strong>: 默认工作模式</li></ul><h2 id="bridge-详解"><a href="#bridge-详解" class="headerlink" title="bridge 详解"></a><span id="jump">bridge 详解</span></h2><p>核心词:<strong>Veth Pair</strong>,<strong>bridge</strong>,<strong>172.xx.xx.xx</strong></p><p>宿主机会创建名为 docker0 的网桥,容器通过 Veth Pair连接到网桥上。网桥的工作方式与交换机类似,这样宿主机上的容器就可以通过网桥连接在一个二层网络中。</p><div class="note note-info"> <p>根据官方的<a href="https://docs.docker.com/desktop/networking/#use-cases-and-workarounds">文档</a>,mac 用户在宿主机上应该找不到 <code>docker0</code>这个网桥</p> </div><p>Docker 会从 RFC1918 定义的私有 IP 网段中选择一个网段来供 docker0 以及容器使用。Docker 一般会使用 172.17.0.0/16 这个网段,并将 172.16.0.1/16 分配给 docker0 网桥(当然这个网段可以在 Docker Daemon启动时通过<code>--bip=CDIR</code>自行配置)。</p><p>由于容器 Network Namespace 与宿主机隔离,所以容器是看不到 docker0 这个设备的。为了与同宿主机的其他容器通信,docker 会创建一对 veth pair,它组成一个数据通道,一段放在新创建的容器中,命名为 eth0,另一端在宿主机中,名字的形式一般为 vethxxx,并将该设备加入到 docker0 网桥中,docker 会为 eth0 从前文提到的172.17.0.0/16 选取一个未被占用的 ip 进行设置 ,同时容器的默认网关会设置成 docker0 的 ip 地址(即 172.17.0.1),即访问非本机容器网段会经过 docker0 网关进行转发,而同主机(同网段)之间通过广播通信(route 中可以看到一条 Gateway 0.0.0.0 的记录,表示其不需要路由)。</p><p>因为 docker0 以及容器的 IP是私网 IP,在外部网络上不能使用,所以想要和外部世界通信需要用到<strong>NAT</strong>(Network Address Translation).容器想要访问外部世界,需要采用 SNAT 来借用宿主机的 IP 去访问,而容器如果对外界提供服务,则采用 DNAT ,使用宿主机的端口通过 iptable 或者别的某些机制,将流导入到容器上。在这里,可以认为 linux 主机发挥了交换机的功能。</p><h2 id="Docker-网络的优劣分析"><a href="#Docker-网络的优劣分析" class="headerlink" title="Docker 网络的优劣分析"></a>Docker 网络的优劣分析</h2><p>Docker 的网络模型比较简单,即内部的网桥 + 内部的保留 IP,从而做到容器网络和外部世界的解耦。然而这样做,外部网络难以区分哪些是容器的网络与流量,哪些是宿主机的网络与容量,如果要做一个高可用,172.16.1.1 和 176.16.1.2 是拥有同样功能的容器,我们需要将两者绑成一个 Group 对外提供服务,而从外部来看两者没有什么相同之处,因为它们会借用宿主机的 IP 和端口。</p><p><strong>原生</strong> Docker 网络模型是单主机模式,默认配置下,不同宿主机上的容器无法通过 IP 互相访问,而大规模容器部署势必涉及不同主机的网络通信。Docker 一方面将 SocketPlane 整合至其集群管理项目 Swarm 中,另一方面将网络管理从 Docker Daemon 中独立出来形成 Libnetwork 并提供多种网络驱动以及允许第三方网络管理工具以插件形式来替代内置的网络功能(接口是 CNM),以两者作为跨主机通信的解决方案</p><h1 id="Kubernetes-网络模型"><a href="#Kubernetes-网络模型" class="headerlink" title="Kubernetes 网络模型"></a>Kubernetes 网络模型</h1><p><strong>perPodperIp</strong>:即每一个 Pod 都有独立的 IP,Pod 内所有容器共享同一 Network Namespace.</p><p>相比于 docker, 在 kubernetes 中 ,容器可以直接通信:① Pod内直接通过 localhost ② Pod 与 Pod 间容器可以通过 IP。这样不仅避免了 NAT 带来的性能损耗,还可以追溯源地址,降低了网络排错的难度。</p><p>K8s 对如何实现这样一个网络模型并未做限制,所以各自方案也比较多。</p><h2 id="容器跨主机网络"><a href="#容器跨主机网络" class="headerlink" title="容器跨主机网络"></a>容器跨主机网络</h2><p>可以从 Flannel 项目来理解跨主网络的主流实现方法,其支持三种实现:</p><ol><li><strong>VXLAN</strong></li><li><strong>UDP</strong></li><li><strong>host-gw</strong></li></ol><p>flannel 基本模型是集群使用一个网段,每个 node 从网段上划分一个子网,而在主机上为容器创建网络时,再从子网上划分一个 IP 给容器。这个模型跟 k8s 的 perPodperIp 模型契合得非常好</p><p>docker 上各个节点的容器 IP 地址是所属节点自动分配的,从全局上来看就像是不同小区的门牌号,在更大的范围上来观察就可能是重复的(每个主机上都有 172.16.0.2/16)。flannel 在 k8s 中使用 etcd 存储网段和节点的映射关系,然后再在各个节点上进行配置,确保节点只从分配到的网段中给容器分配 IP 地址。</p><p>仅仅地址不重复,网络仍无法联通。因为通常虚拟网络的 IP 和 MAC 地址在物理网络上是不认识的(why?),所以即使发送到网络中,也无法进行路由。所有 flannel 早期的实现方式是 overlay,即隧道网络,下面提到的 UDP 和 VXLAN 都属于 overlay,而 host-gateway则是路由,它是第二种解决容器网络地址路由的方法.</p><h3 id="UDP"><a href="#UDP" class="headerlink" title="UDP"></a>UDP</h3><p>关键设备:<strong>TUN(tunnel)设备</strong>:它可以在操作系统内核以及用户应用程序之间传递网络包</p><p>Node 1的 Container-1发向 Node 2的 Container-2 的网络包进入网桥并出现在宿主机上后,Flannel 已经在宿主机上创建了一系列路由规则,网络包会依据规则进入 flannel0 设备(tun设备),它会将 网络包发往用户态的 flanneld 进程,flanneld 可以根据目的 IP 地址匹配到对应的子网(做一下 mask 就行了),在 etcd 中可以找到子网对应的宿主机 IP 地址,将原 IP 包(为什么是IP包:因为tun 是在网络层工作的设备)封装成 UDP 包发向目标宿主机,这个 UDP 的源地址便是宿主机 Node 1的地址。</p><p>每个宿主机的 flanneld 都会监听 8285 端口,因此只要 udp 包的目的端口是8285,Node 2 的 flanned 便会收到包并解析出封装在其中的原 IP 包,并发送给 Node 2的 flannel0 设备,此时内核会处理这个 IP 包,依据路由规则转发给网桥,而网桥会扮演二层交换机的角色,将数据包发送给正确的端口,通过 veth pair 最终送达目标容器。</p><p>UDP 性能比较糟糕,因为有三次用户态与内核态的数据复制,上下文切换的开销和多次数据复制令它性能饱受诟病</p><h3 id="VXLAN"><a href="#VXLAN" class="headerlink" title="VXLAN"></a>VXLAN</h3><p>VXLAN 在内核实现解封装功能,从而相比于 UDP 极大地改善了性能</p><p>VXLAN 在宿主机上设置的特殊设备为 <strong>VTEP(VXLAN tunnel end point,虚拟隧道端点)</strong>,它解封转的对象为二层数据帧(Ethernet frame)</p><p>假设 Container-1(IP 地址 10.1.15.2)要访问 Container-2(IP 地址为10.1.16.3).Container-1 发出的包出现在网桥,会被路由到本机的 flannel.1设备(VTEP)进行处理,这里是隧道的入口。</p><p>当 Node2启动后并加入网络中,各个节点包括 Node 1上的 flannel 进程会添加一条路由规则,凡是发往 Node 2网段的 IP 包,都需要经由 flannel.1 设备发出,并且最后发向的网关地址正是 node-2 上 flannel.1 设备的 IP 地址。(这个地址是否是 flannel.1 的 IP 地址存疑,我看到一些书的 example 中这个 IP 地址很特殊,正好是 10.1.16.0/24,正好是分配到的子网号+主机号置为0的结果,但是查询下面的ARP记录获得的 mac 地址的确是 node 2 上 flannel.1 的 mac 地址 )</p><p>对于隧道入口的包,要想发送往另一端需要加上目的地的 mac 地址,封装成二层数据帧进行发送。现在路由记录已经告知了 node 2 VTEP 设备的 IP 地址,需要用到 ARP 表根据三层 IP 地址查询对应的 IP 地址。而这里用到的 ARP 记录,同样是 node 2 加入时由 flanneld 进程自动添加到节点 node 1 上的。(这里并没有依赖 L3 MISS 事件以及 ARP 学习)。</p><p>VTEP 的 MAC 地址对于宿主机网络来说没有什么实际意义,所以目前封装的数据帧仍无法在宿主机二层网络里传播。接下来需要将它当做内部数据帧进一步封装成宿主机网络中的普通数据帧,通过宿主机的 eth0 网卡进行传输。内核会在内部数据帧前加上特殊的 VXLAN 头,VXLAN 头有一个 VNI 标志,它是 VTEP 设备识别某个数据帧是否应该由自己处理的标识,而在 flannel 中,这个默认值是1,这也是宿主机 VTEP 设备叫做 flannel.1 的原因。加上特殊的头后,内核将其装进 UDP 包进行发送。</p><p>通过 UDP 包进行发送需要知道目标宿主机的 IP 地址。在这种情况下, flannel.1设备扮演“网桥”的角色,其转发的依据来自于 <strong>FDB(forwarding database)</strong>,<em>这里是使用 node 2 的 VTEP 设备的 mac 地址去查询的</em>.FDB 信息同样是由 flanneld 进程负责维护。有了 ip 地址,再将 nod 2 的 mac 地址填进去便封装完毕(这个 mac 地址不需要 flanneld 来维护,可以通过 ARP 学习获得)</p><p>发送后的包来到 Node 2的 eth0 网卡,内核网络栈会发现数据帧有 VXLAN Header 且 VNI = 1,所以 Linux 会对它进行拆包 ,获取内部数据帧,并依据 VNI交给 node 2 上的 flannel.1设备。flannel.1 设备会进一步拆包,取出“原始 IP 包”,并依据前面 UDP 中提到的流程进行处理,最终送达。</p><h3 id="host-gw"><a href="#host-gw" class="headerlink" title="host-gw"></a>host-gw</h3><p>上面提到 Host-gw 是一种路由方案,其工作原理就是将下一跳设置为所要访问的 POD 所在宿主机的 IP 地址(这个 IP 地址不是 flannel 分配的,而是宿主机的 Public IP),即目的主机会充当这条容器访问路径的网关</p><p>举例说明,假设 Node-1(IP : 10.168.0.2/24)上的 container-1(IP : 10.244.0.2)要访问 Node-2 (IP : 10.168.0.3/24)上的 container-2(IP : 10.244.1.3),flannel 的 host-gw 模式会在宿主机创建一条路由规则:目的地址属于 10.244.1.0/24 网段的 IP 包经过本机 eth0 设备发出,且下一跳(Gateway)为 10.168.0.3, 即Node-2的 IP 地址。有了下一跳地址,当 IP 包在链路层封装时会使用 Node-2 的 MAC 地址,这样数据包就能成功地从 Node-1 送往 Node-2</p><p>而Node-2 内核栈从二层数据帧中获取 IP 包后,会注意到 IP 目的地址为 10.244.1.3,而Node-2将会有以下一条路由规则:目的地址属于 10.244.1.0/24 网段的 IP 包会交给 cni0 网桥进行处理,从而进入 container-2.</p><p>在这里 ,flanneld 做的事情是 WATCH etcd 中主机和子网映射信息的变化来及时更新路由表。</p><p>这种模式免除了额外封包和解包带来的性能损耗,性能损失在 10% 左右,而其他基于 VXLAN 机制的网络方案大概在 20%-30%</p><p>host-gw 对底层网络有一定要求,即集群宿主机之间是二层连通的,所以如果宿主机分布在不同的子网,它们 IP 上是可通的(Node1 所在的 VLAN 的 router 可以连接 Node2所在 VLAN 的 router),但由于二层不通,即使拥有 Node-2 的 mac 地址,而在 Node-1所在的 VLAN 里是找不到它的。</p><p>一个直观的想法可能是干脆在 Node1 上添加到 Node2 网段(10.244.1.0/24,注意这不是个 Public IP)经由 router-1的路由规则,router-1中添加到 Node2 网段经由 router-2 的路由规则,以此类推来添加下一跳路由,最终转发到 Node2中。虽然通过 BGP 是有可能达成的,但是在 k8s 广泛使用的公有云场景中却不行 ,因为我们可以设置宿主机的路由表,但是公有云宿主机之间的网关不允许用户随意干预设置。</p><h3 id="补充:-Calico"><a href="#补充:-Calico" class="headerlink" title="补充: Calico"></a>补充: Calico</h3><p>Calico 像 Flannel host-gw 一样是一个三层网络方案,实际上,其实现几乎是跟 Flannel host-gw 完全一样。不过 Flannel 使用 etcd 和 宿主机上的 flanneld 进程维护路由信息,而 Calico 则是使用 BGP 协议。</p><p>需要注意的是 Calico 中没有使用网桥,因此宿主机上需要添加以下一条路由规则:</p><figure class="highlight armasm"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs armasm"><span class="hljs-symbol">CONTAINER</span>-<span class="hljs-built_in">IP</span> dev calixxxx scope link<br></code></pre></td></tr></table></figure><p>意思是发往宿主机上某容器的 IP 包应该进入 calixxxx 设备,而 calixxxx 正是 veth pair 在 host 的一端。</p><p>而当容器包想要发送时会走默认路由</p><figure class="highlight nginx"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><code class="hljs nginx"><span class="hljs-attribute">default</span> via <span class="hljs-number">169.254.1.1</span> dev eth0<br></code></pre></td></tr></table></figure><p>veth pair 一端的数据确实会从另一端出来,然而回忆一下,在<a href="#jump">详解 bridge</a>中容器默认网关的 IP 地址是网桥的 IP 地址,这个 169.254.1.1 又是谁的呢?</p><p>实际上 Calico 并没有真正把 169.254.1.1 这个 IP 分配给谁,这里使用到了<code>proxy_arp</code>功能,开启后 host 会响应所有的 ARP 请求,即使这个 IP 地址并非属于自己。如此,容器和主机网络才算打通。</p><p>Calico 还提供了 IPIP 工作模式,其是为了解决在 host-gw 末尾提到的那种情况,多了解封包的步骤,其性能与 VXLAN 大致相仿。</p><h3 id="小结"><a href="#小结" class="headerlink" title="小结"></a>小结</h3><p>从我的角度来看,UDP 和 VXLAN 方案通过封装将包伪装成宿主机之间的普通 UDP 包,又提供了某种机制来识别其实际为容器通信包的身份,如 UDP 中目的宿主机 flanneld 会监听 8285 端口来保证它会获取到这个容器通信包,然后交给 TUN 设备进行处理;而在 VXLAN 中则是使用内核提供的 VXLAN 机制 ,通过 VNI 交给 flannel 的 VTEP 设备来处理。</p><p>host-gw 作为路由方案其思路更为直接:容器 IP 不能在其他地方识别路由,那我直接把包交给能识别它的、容器所在的宿主机,以它来作为网关就行了。</p>]]></content>
<tags>
<tag>Kubernetes</tag>
</tags>
</entry>
<entry>
<title>小话分布式系统——从 3W 出发(一)</title>
<link href="/2021/04/14/%E5%88%86%E5%B8%83%E5%BC%8F%E7%B3%BB%E7%BB%9F%E6%A6%82%E8%A7%88/"/>
<url>/2021/04/14/%E5%88%86%E5%B8%83%E5%BC%8F%E7%B3%BB%E7%BB%9F%E6%A6%82%E8%A7%88/</url>
<content type="html"><![CDATA[<h1 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h1><p>分布式系统概念繁多,如果对其没有体系化的认知,学到的知识很容易成为孤岛,一头埋进实现的琐碎细节里,缺乏整体的把握,这样的学习过程会非常痛苦。</p><p>所以本文是我对自己所学的一个整理,从 3W 原则出发来看分布式系统。</p><h1 id="WHY——为什么需要分布式系统?"><a href="#WHY——为什么需要分布式系统?" class="headerlink" title="WHY——为什么需要分布式系统?"></a>WHY——为什么需要分布式系统?</h1><p>回答 what 之前,我觉得很有必要探讨下 why :分布式系统是在什么背景下提出来的以及它所解决的问题。</p><p><strong>计算机有两个重要的概念:数据和计算,数据是计算的对象,而计算是数据得到利用创造价值的过程。</strong></p><ul><li>计算上的需求:在单台服务器上,运算量太大以及太慢,我们可以选择多线程编程,利用多核和并行尽可能榨取 CPU 的带宽和性能。但还是太慢了,如果选择 SCALE UP (纵向拓展),就是提升单个节点的能力 ,使用更快和更多的 CPU,然而成本增长的速度往往快于线性增长,同时由于散热以及访存的原因往往容易存在瓶颈而不足以处理相应的载荷</li><li>存储上的需求:数据量的飞速增长,对于存储提出了越来越高的要求。如果数据规模远远超过 SCALE UP 的上限,我们只能选择通过 SCALE OUT (横向拓展),以增加节点的方式来分担存储的压力。同时在多个节点上各自处理相应数据的计算任务,利用 data locality 也可以更好地节省带宽</li><li>容错性:单台服务器出现故障就无法对外提供服务,使用多台机器提供冗余可以提升服务的质量</li><li>延迟:如果大洋东岸的用户想要获取大洋西岸的服务器上的数据,网络包不得不穿越半个世界,如果在世界各地部署多个服务器,可以提供延迟更低、用户体验更好的服务</li><li>拓展性:如果只使用单台服务器,当现有服务器无法应对日益增长的用户规模和业务需求时,更换服务器重新部署不仅麻烦,而且还未必能很好地解决问题,使用分布式系统拓展性更好</li></ul><h1 id="WHAT——什么是分布式系统?"><a href="#WHAT——什么是分布式系统?" class="headerlink" title="WHAT——什么是分布式系统?"></a>WHAT——什么是分布式系统?</h1><blockquote><p>A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal.</p><p>• There are several autonomous computational entities, each of which has its own local memory</p><p>• The entities communicate with each other by information passing</p></blockquote><p><strong>简而言之,分布式系统就是由多台计算机通过一个计算机网络交流并协作,来完成一个共同目标或解决一个共同问题</strong></p><h1 id="HOW——如何去设计分布式系统以及设计面对的挑战?"><a href="#HOW——如何去设计分布式系统以及设计面对的挑战?" class="headerlink" title="HOW——如何去设计分布式系统以及设计面对的挑战?"></a>HOW——如何去设计分布式系统以及设计面对的挑战?</h1><h2 id="数据划分(分区)"><a href="#数据划分(分区)" class="headerlink" title="数据划分(分区)"></a>数据划分(分区)</h2><p>Partition 或者也可以称作 shard 问题</p><p>既然谈到协作解决一个共同问题,那么分治的一个核心就是如何去分解问题,那么我们不得不思考以下一些问题</p><ol><li>是否能保证数据的分布均匀?</li><li>新增或者删除节点时,数据如何在各节点间迁移?迁移的数据规模有多大?</li><li>元数据如何管理?(元数据:当数据分布在各个节点时,我们需要记录各自数据所在的节点位置以及状态等信息,它是 <strong>data about data</strong> 即关于数据的数据 )</li><li>如何抽取数据的特征?</li></ol><h3 id="哈希方式"><a href="#哈希方式" class="headerlink" title="哈希方式"></a>哈希方式</h3><p>最常见的数据分布方式 ,其按照数据的某一特征值计算哈希值,并建立与机器的映射关系。数据特征值可以是key-value 系统中的key,也可以是与业务逻辑相关的值(比如数据属于的用户ID).考虑一个最简单的散列函数 mod N. N 可以是服务器的数量,<strong>但在工程上需要考虑服务器的副本冗余,将数台服务器编程一组,N为组的总数量</strong></p><p>理论上,只要哈希函数的散列特性较好,它可以较为均匀地将数据分布到集群中。其记录的元信息也比较简单:hash函数以及取模的 N.</p><p>然而其缺点很明显:加入或者删除节点时,会有大量的数据需要移动,有时这种大规模的数据迁移会影响到系统对外的正常服务。为了减少迁移的数据量,节点的数目往往成倍增长,这样概率上讲至多有50%的数据需要迁移。</p><p>还有一种改善拓展性的思路是将对应关系的元数据由专门的服务器进行管理,此时取模的 Partitions 数量往往大于机器个数(组数).每一台(组)机器负责多个取模的余数。在扩容时,将部分余数迁移到新机器上。不过在访问数据时,需要查询元数据服务器找到负责该数据的机器,这对元数据服务器提出了比较高的要求。</p><p>它的另一个缺点是很容易因为选择的特征值而出现<strong>偏斜(data skew)</strong>问题。如当以用户ID来哈希划分时,某个用户的数据量特别庞大,导致大量的负载压在一个分区上,使其成为<strong>热点(hot spot)</strong>.更糟糕的是,在这种情况下,通过扩展集群规模,并不能将热点的负载分散到其他节点。</p><p>当然,我们可以选择新的数据特征值来重新计算哈希,比如像数据库的联合主键那样,将ID与另一个维度的数据属性组合甚至全部数据计算哈希,但如果这样做,虽然数据会完全打散在集群中,但是数据之间的关联性会消失,当试图读取一个特定的值(比如与某个用户有关的所有数据)时,你不得不并行地查询所有的节点。</p><div class="note note-info"> <p>hash 对范围查询并不友好,尤其是范围涉及到的键恰好是用于计算 hash 的主键。不过在范围查询不涉及主键的情况下会有一些特殊的做法,比如查询一个用户在一段时间内的所有记录,可以使用 <em>compound primary key</em>,它是由几列的数值拼接而成的。前面的部分(如user_id)可以决定所在的 partition,而关于这个用户的数据在这个分区像 SST 那样有序排列(字符串排序中,在前半部分相同的情况下,由后半部分的值呈现出一种有序性),这样比单纯地遍历分区中的所有数据并筛选要高效得多。</p> </div><div class="note note-info"> <p>不同于单纯的 key-value,在很多关系型数据库中还有 secondary index,有时一些请求会直接针对 secondary index 做请求,比如 <em>Table Car(ID, color, make)<em>,ID是分区的主键,去查询颜色为银色的所有车,如果没有一些针对的处理措施,就必须去访问所有分区的数据。一般有两种方法,</em>document-based partitioning</em>以及<em>term-based partitioning</em> </p> </div><h3 id="根据数据范围"><a href="#根据数据范围" class="headerlink" title="根据数据范围"></a>根据数据范围</h3><p>为每个分区(每组机器)指定连续的键范围。</p><img src="/2021/04/14/%E5%88%86%E5%B8%83%E5%BC%8F%E7%B3%BB%E7%BB%9F%E6%A6%82%E8%A7%88/fig6-2.png" class=""><p>键的范围不一定均匀分布,以 DDIA 书中的图来举例,第一卷包含以A和B开头的单词,而第十二卷则包含了从T-Z的单词。这是因为数据量在键范围上的分布并不均匀,某个区间的数据可能要比其它要多得多。</p><p>边界可以手动选择,也可以根据数据量来自动选择,进行动态调整。比如当某个区间的数据量达到阈值时,就自动分裂成两个区间,当有新的节点加入时,就可以分配给它们达到均衡的目的</p><p>但是这种划分方式要求系统维护的元数据(数据分布信息)规模比较庞大,且随着集群规模的增长,元数据服务很容易成为瓶颈。</p><p>在DDIA中也提到,Key Range同样可能会因为特定的访问模式和键的选取而导致热点。在键是时间戳的情况下,分区对应时间范围,在一些业务中,一天访问的数据很可能大部分都属于今天(比如新闻网站),那么今天对应的分区就会成为热点。在这种情况下,可以考虑重新选取适当的key进行区域划分。</p><h3 id="一致性哈希"><a href="#一致性哈希" class="headerlink" title="一致性哈希"></a>一致性哈希</h3><p>一致性哈希可谓大名鼎鼎,很多面试相关的书都会提到它。其将数据按照特征值映射到一个首尾相接的 hash 环上,同时也将节点映射到这个环上。数据从环上的位置 开始顺时针遇到的第一个节点就是其负责节点。一致性哈希需要将节点在环上的位置作为元数据进行管理,但总体来说这个数据规模要比 Key Range 要小得多</p><p>一致性哈希与哈希方式相比,其拓展性要好不少,每次添加或删除节点仅影响其相邻节点 ,<strong>然而仅影响相邻节点也成为它的缺陷</strong>,当节点异常从环上删除时,其负责的数据全部由下一个节点承担,导致其压力突增。</p><p>因此,一般会引入<strong>虚拟节点 (virtual node)</strong>来改善这个问题。在初始创建比较多的虚拟节点分布在环上,并由一个物理节点映射多个虚拟节点。在访问数据时,通过哈希值先找到虚拟节点,然后访问元数据服务器根据映射关系找到实际负责的物理节点。当一个物理节点失效时,环上的多个虚拟节点同时失效,其负载会均摊给多个节点;增加节点时同理。</p><p>但是,<strong>There is no free lunch</strong>.引入虚拟节点后其需要维护的元数据也会增加,一部分来自于虚拟节点在环上的位置,一部分则是映射关系。</p><h4 id="一些补充"><a href="#一些补充" class="headerlink" title="一些补充"></a>一些补充</h4><p>使用 hash 相比于Range Based 能更好地解决热点问题,因为可以把某种访问数据上的局部性(频繁访问某个范围上的数据)进行均摊。但是它并不能完全避免它,前面也提到了很可能有大量的访问针对同一个键。</p><p>hash 的优点也使得它失去了高效执行范围查询的能力,相邻的键分散在所有分区中,它们的顺序丢失了。此时查询不得不从多组机器上读取数据并进行合并。再一次,<strong>There is no free lunch</strong>.</p><p>所以设计时可以根据业务模型灵活选择,或者采取 hybrid 的策略,只对少量火爆的主键进行 hash 处理</p>]]></content>
<tags>
<tag>Distributed System</tag>
</tags>
</entry>
</search>