-
Notifications
You must be signed in to change notification settings - Fork 36
/
README.txt
executable file
·189 lines (138 loc) · 7.09 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
NOTE: below is the original unmodified 2002 README, it does not fully
reflect the current status... ;-) After reading this file, also see
the COMPILING.txt and USAGE.txt files for newer instructions.
========================================================================
NOTE:
This file is *not* intended to be comprehensive documentation for the
Tsunami protocol or the programs in the Tsunami suite. We are working
on formal documentation, but it is not yet ready for public release.
Please bear with us -- Tsunami is a young and rapidly evolving
protocol, and we're documenting a moving target.
Tsunami is built using the standard GNU autoconf/automake system. To
install, use the standard './configure', 'make', 'make install'
sequence. (Thanks are due to Jeff Squyres <jsquyres@osl.iu.edu>
of Indiana University's Open Systems Lab for bringing us into the
modern age of automated building and configuration.)
Building Tsunami will create the Tsunami client (tsunami), the Tsunami
server (tsunamid), and two utilities for benchmarking disk subsystem
performance (readtest and writetest).
Later in this file, you'll find details on how Tsunami currently
performs authentication.
Please share with us any Tsunami performance data you can offer!
Ideally, we'd like to have hardware profiles of the client and server
systems (CPU, disk controller, memory size, kernel version, bdflush
settings, and so forth), the output of tsunami and tsunamid during
file transmission, the output of vmstat on both the client and server,
and the protocol parameters used. This data will help us to tune
the protocol and make the next release more robust.
And finally, please read the license agreement found in LICENSE.TXT.
If you have any technical questions about the Tsunami protocol, please
subscribe to the Tsunami LISTSERV. Instructions can be found on
the mailing list home page at:
http://listserv.indiana.edu/archives/tsunami-l.html
========================================================================
The Tsunami protocol
--------------------
A basic Tsunami conversation works like this:
(1) The client connects to the Tsunamid TCP port (46224 by default).
The server forks off a child process to deal with the connection.
(2) The client and server exchange protocol revision numbers to make
sure that they're talking the same language. (The revision number
is defined in "tsunami.h".)
(3) The client authenticates to the server. This process is described
later in this file.
(4) The server is now waiting for the name of a file to transfer to
the client.
(5) Once the file name is received, the server makes sure that it
can open and read the file. If it can, a positive result byte
is sent to the client. If it can't, the server reports failure.
(6) The client and server exchange protocol parameter information.
(7) The client sends the server the number of the UDP port on which
the client will listen for the file data.
(8) The server and client both enter their file transmission loops.
========================================================================
The server file transmission loop
---------------------------------
while the whole file hasn't been sent yet:
see if the client has sent a request over the TCP pipe (*)
if it has:
service that request
otherwise:
send the next block in the file
delay for the next packet
(*) There are three kinds of request:
(1) error rate notification
(2) retransfer block [nn]
(3) restart transfer at block [nn]
========================================================================
The client file transmission loop
---------------------------------
while the whole file hasn't been received yet:
try to receive another block
if it's the last block:
break out of the loop and notify the server
otherwise:
on every 50th iteration, see if it's been [update_period] since
our last statistics update
if it has:
display updated statistics
notify the server of our current error rate
transmit our queue of retransmission requests
save the block
if the block is later than the one we were expecting:
put intervening blocks in the retransmission queue
if the block is earlier than the one we were expecting:
remove the block from the retransmission queue
========================================================================
The retransmission queue
------------------------
This is a (potentially) sparse array of block numbers that we may need
to have retransmitted. Each entry is either 0 or a block number. The
size of the array is doubled if it runs out of space. We keep track
of the lowest index used and the highest index used and rehome the
data to the base of the array occasionally.
If the queue is extremely large (over [threshold] entries), instead of
asking for each entry in the queue, we ask to restart the transfer at
the first block in the queue.
========================================================================
How Tsunami does authentication
-------------------------------
The Tsunami server and Tsunami client both know a shared secret.
(Right now it's coded into the Tsunami server as "kitten", but this
can be overridden with the '--secret' option.) The client learns the
shared secret by giving the user a 'password' prompt and reading it in
with echo turned off.
The following sequence allows the client to prove its knowledge of the
shared secret to the server:
(1) The server reads 512 bits of random data from /dev/random and
sends this data to the client.
(2) The client XORs copies of the shared secret over the random data.
(3) The client sends an MD5 hash of the resulting buffer back to the
server.
(4) The server performs the same XOR/MD5 operation on the random data
and checks to make sure that they match. If they do, a positive
result byte is sent to the client. If they don't, the connection
is closed.
========================================================================
Other notes
-----------
(1) Everything is endian-independent except for the MD5 code.
(2) Everything does work okay with 64-bit file sizes, using the
fopen64() / fseeko64() API.
(3) Porting from Linux shouldn't be hard. The OS-dependent bits are
the use of /dev/random and the fixed-size data types defined in
<sys/types.h>. Linux uses "u_int32_t", Solaris uses "uint32_t".
That sort of thing. Solaris also lacks getopt_long() found in
glibc.
(4) This probably does require gcc to build. I use the GNU "long long"
datatype quite a bit for manipulating 64-bit values.
(5) The tuning in response to the current error rate is still under
active research and development. Future releases may change this
code significantly.
(6) Disk-to-disk on the same box is a bad test platform. The
scheduling daemon and the behavior of the loopback device make
everything go to hell.
(7) The client has a limited amount of online help. Use 'help' to
see it.
(8) The server has a limited amount of usage information. Run it
with the '--help' option to see it.