Protocol

Disaster Area Network

A disaster area network (DAN) is a hetreogenous wireless mesh network that utilizes the LoRa modulation scheme for point-to-point or point-to-multipoint connections between nodes. The following document describes the Layer 2 protocol developed for use on DANs.

Outline:

Existing Work

A DAN is heavily reliant on existing LoRa technology and development.

Physical properties

LoRa refers to a proprietary modulation scheme developed by Semtech. For more info, the physical modulation was reverse engineered by Matt Knight, https://www.link-labs.com/blog/what-is-lora. The modulation scheme should not be confused with LoRaWAN, the protocol stack for the Things Network, https://www.link-labs.com/blog/what-is-lorawan.

Mesh routing protocols

There are a number of mesh routing protocols and alogrithms that have been implemented for TCP/IP networks. The work done on the DAN routing protocol is particularly inspired by jech's work on the Babel routing protocol.

Layer 2 protocol

There are a few network stacks already being devleoped for LoRa-based networks that worth looking into,

Things Network - star topology, mostly for sensor networks
Reticulum - pre-alpha, may be capabale of ad-hoc style meshing, physical layer agnostic
6LoWPAN - IPv6 network stack for low power wireless personal area networks, "industry" standard (i.e. little publicly-available documentation), related to Thread
IEEE 802.15.4e - MAC amendement to IEEE standard for wireless personal area networks (WPANs)

Goals, challenges, assumptions

Below are some of the initial goals, known limitations, and baseline assumptions for building the DAN stack.

Desired capabilities

Implement a network stack for DANs that is capable of,

Communication between any two nodes on a DAN via LoRa, without the need for a server or gateway.
Distance-vector, loop-avoiding, routing of packets on a DAN with limited knowledge of network topology
Minimal design that could be easily reimplemented on many devices (e.g. Arduino microcontrollers, OpenWrt routers, Linux computers)
Connectionless protocol that does not require acknowledgements, so as not to over use air space

Challenges and limitations

LoRa packets can only be 256 bytes (due to FIFO buffer size, see official documentation https://www.semtech.com/products/wireless-rf/lora-transceivers/SX1276).
LoRa packet transmit time ranges from 200ms to 1s depending on packet size.
LoRa devices operate at 915 MHz, in North America. This is in the 33cm ISM band (902-928MHz), it does not require license to use and allows retransmission of packets and encryption of messages, https://www.itu.int/net/ITU-R/terrestrial/faq/index.html#g013.

Network structure

The structure of a DAN is assumed to be x number of devices (or nodes) with a single radio transceiver interface that transmits messages omnidirectionally such that, under ideal conditions, all nodes should receive every message transmitted by a neighboring node. LoRa packets take a relatively long time (200ms - 1s) to be transmited, therefore the duty-cycle of devices should be proportional to the number of devices on the network (or number of neighbors?) to guarentee a certain RF air quality.

LoRa airtime

The airtime of a LoRa packet means the time it takes for a LoRa transceiver to transmit a single packet. Due to LoRa's modulation scheme it takes a fairly long time to transmit a packet. This transmit time (or packet airtime) is a factor of size of the packet being sent and a few transceiver settings that are set in advance and for our purpose represent constants. LoRa airtime for packet of size N can be calculated as follows,

airtime(N) 8 + (max(ceil((8 * N - 4 * SF + 28 + 16 - 20 * (1 - HD)) / (4 * (SF - 2 * DR))) * CR, 0));

where, N is the number of bytes in the packet's payload
SF is the spreading factor (between 6-12) for our simulations this is currently set to 9
HD is a header bit, 0 for implicit header, 1 for explicit header
DR is a bit for low data optimization, 1 when enabled, 0 when disabled
CR is the coding rate (1 corresponding to 4/5 and 4 to 4/8)

adapted from page 31 of the Semtech's SX1276 documentation

Addressing

There are two possible ways of looking at addressing on a DAN.

single node addressing
multicast addressing

Single node addressing implies that messages are sent with a single destination address. Every disaster radio node has a unique 6 byte address. Intitially, we are using the MAC address of the nodes WiFi interface as its unique DAN address. This will almost certainly unique across a DAN. We plan to implement encryption eventually, so nodes (or users) may be identified by a public key.

Multicast addressing implies that messages are sent with multiple destinations, these destinations have a shared 6 byte multicast address, that look like a MAC address. This would enable nodes to subscribe to a channel by accepting messages for a specific multicast address. This may assume prior knowledge of the multicast address, though we may choose to implement some form of discovery eventually. Initially, the only multicast address is 0xFFFFFFFFFFFF, i.e. limited broadcast to all neighboring nodes.

Packet Structure

Semetech's specifications for LoRa tranceivers include a header (set explicit or implicit). When set in explicit mode, a header will be sent that contains the payload's length, coding rate, and the presence of a CRC, (link to RFM doc) https://revspace.nl/DecodingLora. In implicit mode, only a preamble is sent to intitiate communication.

Currently, we are using an explicit header, though we should consider switching to implicit,

Byte 0	Byte 1	Byte 2 - 7	Byte 8 - 13	Byte 14	Byte 15	Byte 16 - 256
ttl	totalLength	source	destination	sequence	type	data

where,
ttl is the "time to live", i.e. the number of hops allowed before message is discarded by a node
totalLength is the entire length of the paket (header + data)
source is the 6 byte address of node that originated the message
destination is the 6 byte address of intended destination node
sequence is the global message counter that determines packet loss rate
type is a single ASCII character that describes content of payload (e.g. r = route, c = chat, m = maps, etc)
data is the content of message to be interpreted by application layer

Addiontal header space may be allotted for client IDs tied to websocket clients that have logged into the node,
source client ID - a unique identifier of web socket client or the system (default last 4 bytes of MAC?) destination client ID - a unique identifier of intended recipient (0xFF for any recipient).

An optional CRC may be included at the end of the payload (it's unclear how or when this is checked).

Routing

Devices on a DAN are assumed to only have a single radio transceiver (though we plan on adding support for two). This means that every routing decision comes down to a single question, Should the node retransmit a message or not? This decision is based on a few of conditions,

Am I the intended nextHop address?
Is the destination address in the node's routing table?
Has the packet exceed it's time to live?

DAN routing is managed by two tables that answer two questions,

the neighbor table - who's around me?
the routing table - who's around who's around me...?

Neighbor Table

A neighbor table is generated by both regular messages (i.e. chat, map, etc) and hello messages. When a regular message has not been sent after a specified timeout period, a hello packet will be sent to ensure a node is not dropped from a neighboring node's routing table. A neighbor table entry consists of the following,

struct NeighborTableEntry{
    uint8_t address[ADDR_LENGTH];
    uint8_t lastReceived;
    uint8_t packet_success;
    uint8_t metric;
};

address is 6 byte address of neighbor node
lastReceived is the sequence number of last received message from this address used to recalculate packet loss
packet_success is the instanteanous success rate of packets from this address
metric is a 0 to 255 representation of quality of link to the neighbor address

Hello packets

In order to avoid flooding the network, hello packets have a TTL of 1, meaning they are only transmitted a single hop to immeadiate neighbors. This also implements a simple multicast address, such that the address 0xFFFFFFFFFFFF refers to all immeadiate neighbors. A hello packet can seen as any packet addressed to all neighbors using the neighbor multicast address. However, since every neighbor should, ideally, receive every packet transmitted by a neighboring node and every packet contains the global sequence number for the neighboring node, any packet can be considered a hello packet and be used to update the neighbor table.

Routing Table

A routing table entry on a disaster radio node consists of the following,

struct RoutingTableEntry{
    uint8_t destination[6];
    uint8_t nextHop[6];
    uint8_t distance;
    uint8_t lastReceived;
    uint8_t metric;
};

where,
destination is the 6 byte address of end point of route
nextHop is the 6 byte address of neighbor through which end point maybe reached
distance is the number of hops from end point
lastReceived is the sequence number of last received message from destination used to recalculate packet loss
metric is a 0 to 255 representation of quality of link between the destination address and its final hop (should be average metric of entire route?)

Neighbor table entries are routing entries where the destination and next hop are the same and the number of hops is known to be one (zero hops being the localhost). When a neighbor is found and added to the neighbor table, it is also added to the routing table as a route via itself and a single hop.

The routing table is generated by nodes advertising their routes to their neighbors. The advertising of routes requires larger packets to be transmitted and will therefore be sent less often than hello packets.

Routing table packets

Every so often (maybe after a change in the network, or on set intervals) a node transmits a routing table packet to advertise it's routes to it's neighbors. Much like hello packets, routing tables packets have a ttl of 1 and are addressed to 0xFFFFFFFFFF, that is all immediate neighbors. The data of this packet containing n routes is structured like so,

Byte 16 to 20	Byte 21	Byte 22	...	Byte n * 8 to (n * 8)+5	Byte (n * 8)+6	Byte (n * 8)+7
routeTable[0].destination	routeTable[0].distance	routeTable[0]. metric		routeTable[n].destination	routeTable[n].distance	routeTable[n].metric

On reception by a neighboring node, this data is parsed, the distance values are incremented to reflect the hop by which the routes were received, and the routes contained are compared against all exisiting routes in the node's routing table. After compairsion, the node either, adds the route to its table, updates an existing route with a new metric or better nextHop, or drops the route and does nothing because it already has a better route or the route refers to the nodes local address.

Dues to limited packet size (240 bytes of data), a routing table packet is limited to 30 routes (8 bytes/route). After discovering more than 30 routes, a node needs to send multiple packets to share its entire routing table. Rather than assuming that neighboring nodes can receive sequential packet and append them to one another, we assume the worst-case scenario of a lossy network and instead send a random assortment of 30 routes at a time. This way, it is less likely neighboring node will completely miss the existence of a route because of a poor quality link to the transmitting node (e.g. what if the neighboring node is missing every other third packet? then it might miss the entire last third of the transmitting node's routing table).

Metrics

In order to avoid over using air space (exceeding duty cycle), the routing protocol is built into normal packet structure. This way every packet can be used to update the routing table and route metrics. If a node reaches a timeout period, based on acceptable duty cycle, without transmitting it should transmit a "hello, I'm alive" packet, so other nodes do not drop it from their routing table.

For route selection, a disaster.radio node uses two metrics,

hop-count
packet success

A node will almost always prefer the shortest number of hops, unless the packet success rate drops below a certain threshold. This threshold can be tuned on a per node basis.

Packet success is calculated from the sequence number in the header of packets and is represented by a 0 - 255 value. If a node misses a packet from neighbor, the packet success value is decreased by 16, meaning that after 16 packets are missed packet success will be reduced to 0.

It is possible to add additional metrics as weighted averages of packet success. For development purposes, we are implementing a random weighted metric that represents potential packet loss on the network.

Multi-hop metrics are an average of the metric of each hop. They are calculated as follows

hopRatio = 1/route.distance;
metric = neighborTable[entry].metric * (hopRatio) + route.metric * (1-hopRatio)

where, route.distance is the number of hops between node X and route destination, node Z neighborTable[entry] is the neighbor table entry corresponding to node Y which sent routing packet neighborTable[entry].metric is the metric for the link between node X and node Y route.metric is the "rest of route" metric sent in routing packet representing the average metric between node Y and node Z, calculated as shown above

This way the routing metrics are recursviely calculated as a average without node X knowing the metric of every link between Y and Z.

Node states

When a disaster radio node first joins a DAN it must go through a number of steps before it can begin routing.

Discovery

During the discovery phase, a node transmits hello packets to introduce itself to its neighbors and listens for other packets being sent in its range so it can discover its neighbors and add them to its neighbor and routing tables. After a set timeout or some minimum number of neighbors is discovered a node can enter the learning phase.

Learning

During the learning phase a node shares its routing table with its neighbors and listens for routing packets being sent from neighbors. It then parses the routing packets as described above and updates its routing table accordingly. This way it learns about the neighbors of its neighbors, i.e. nodes that are multiple hops away. The learning phase is arguably never complete, as new nodes could appear at any moment, however, after a certain point, a node should have enough routes to begin routing packets and therefore enter the forwarding state

After learning phase has reached an adequate point, a routing table may look like so,

Routing table example:

1 hops from 06cf26a8f7dd via 06cf26a8f7dd metric 195 
1 hops from bd0777339dd7 via bd0777339dd7 metric 215 
1 hops from 4b7536110216 via 4b7536110216 metric 214 
2 hops from 659ae5093e8c via bd0777339dd7 metric 205 
2 hops from b3559bd3a4aa via bd0777339dd7 metric 187 
2 hops from 03a06767e872 via 06cf26a8f7dd metric 228 
2 hops from 8c30ecca8bd9 via 4b7536110216 metric 244 
3 hops from ed3365a48843 via 4b7536110216 metric 244 
3 hops from 072e89d39582 via 4b7536110216 metric 213 
5 hops from 9d3c0bfeb0f4 via 4b7536110216 metric 246 
6 hops from 0888590f4402 via 4b7536110216 metric 210

Forwarding

In the forwarding state, a node has discovered enough neighbors and learned enough about the network to be considered a good route by some number of (at least one) neighboring node(s). This should be consider the normal operating state of a disaster radio node, as it can now be used to send addressed message and retransmit messages addressed to nodes for which it has routes.

Convergence

In a DAN, convergence implies that every node has knowledge of a route to every other node in the network.

Similar to the Babel, a TCP/IP mesh networking protocol, a node in a DAN can be said to know the address of every node in the network and the nextHop to reach that address, but it does not know the entire topology of the network. This reduces the memory required to maintain a routing table and the time needed for convergence.

Given the physical constraints of LoRa modulation, a DAN converges relatively slowly (approx. on order of 1 minute for every 6 hops and perhaps longer under non-ideal conditions). Attempting to increase the convergence time can result in high packet loss or, on a physical network, RF interference from neighboring nodes. For convergence to occur, a route must be relayed from one end of the network to the other. At a ~10s transmission intervals, routes may take more than a minute to traverse a 6 hop wide network.
The ideal convergence time for a DAN can be calculated as follows,

convergence(n) = (airtime + delay) * (width) * 2

where, airtime is the time is takes for node to transmit a single routing table packet delay is the time between a node transmiting routing table packets width is the maximum hops possible on the network, depend on network structure, but often one half the total nodes

For example, in a 15 node network, assuming a network width of 7 hops, a message delay of 10s, and an approximate transmit time of 200ms,

converegence(15) = (0.2 + 10) * (7) * 2 = 142.8s

Routed packets

After entering a forwarding state (ideally once the network has converged), routed packets can be formed by adding a nextHop address after the standard header. A routed message packet is formed like so,

Byte 16 to 21	Byte 22 - 256
routeTable[entry].nextHop	message

The originating node checks their routing table and finds the entry which matches the packet's destination and then appends the corresponding nextHop to the header, followed by the content of the message.

When this packet is received by neighboring nodes, they check if the nextHop specified matches their local address. If it does not match, the packet is dropped. If the nextHop does match their local address, they decrement the ttl in the standard header and then check their routing table to find the nextHop for the packet and replace their local address in the original packet with the nextHop from their routing table. The message is then pushed on to buffer and queued to be transmitted.

Route Selection

Route selection is done on node-by-node basis, there is no intention to implement source-specific routing. Each node maintains only possible routes with lowest distance to a destination in it routing table. When a node receives a routed packet that it is intended to retransmit, it will refer to it's routing table find the best nextHop. By default, a node will select the nextHop that gives the lowest distance route. In the event of a tie, i.e. two routes with the same number of hops, the retransmitting node will break the tie by selecting the route with the better metric.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly