initial draft of the audio login specification

Signed-off-by: alberto tirla <albertotirla@gmail.com>
matrix-org · Nov 15, 2024 · e926b9b · e926b9b
1 parent f6d853c
commit e926b9b
Showing 1 changed file with 68 additions and 83 deletions.
diff --git a/proposals/4227-audio-based-quick-login.md b/proposals/4227-audio-based-quick-login.md
@@ -1,117 +1,102 @@
-# MSC0000: Template for new MSCs
+# MSC4227: Audio based quick login
 
-*Note: Text written in italics represents notes about the section or proposal process. This document
-serves as an example of what a proposal could look like (in this case, a proposal to have a template)
-and should be used where possible.*
+MSC4108 allows the matrix ecosystem to offer users a quicker way to sign in trusted devices, using a QR code based workflow. This is familiar to a lot of users, as they were exposed to this workflow on other networks.
 
-*In this first section, be sure to cover your problem and a broad overview of the solution. Covering
-related details, such as the expected impact, can also be a good idea. The example in this document
-says that we're missing a template and that things are confusing and goes on to say the solution is
-a template. There's no major expected impact in this proposal, so it doesn't list one. If your proposal
-was more invasive (such as proposing a change to how servers discover each other) then that would be
-a good thing to list here.*
+However, similar are the pains for visually impaired users regarding this workflow, because it is nearly impossible, with current assistive technologies, to reliably scan a QR code. Because of that, any workflow involving those is unreasonably difficult, to the point that there are lots of workarounds developed for these situations, some working only some of the time, while others not at all.
 
-*If you're having troubles coming up with a description, a good question to ask is "how
-does this proposal improve Matrix?" - the answer could reveal a small impact, and that is okay.*
+Although normal login works for visually impaired people as well as it does for anyone, we have a quicker way to login now, and a person should not be barred from using innovations in the matrix world because of a disability. To that effect, this MSC aims to facilitate a distribution method for the binary string that qr code decodes to, which is similar in functionality and purpose to the qr code itself, but in an accessible way, so that everyone, disabled or not, can use it without hindrance, provided they have at least one recording capable device.
 
-There can never be enough templates in the world, and MSCs shouldn't be any different. The level
-of detail expected of proposals can be unclear - this is what this example proposal (which doubles
-as a template itself) aims to resolve.
+## Proposal
 
+The only thing this MSC changes from the protocol outlined in the dependent MSC is the transport of the initial secret, the binary string represented through a qr code. The cryptographic primitives, the insecure session channel, the way in which  it is turned into a secure one, the strings exchanged between the two clients remain unchanged, which means that a lot of the mechanisms are in place already.
 
-## Proposal
+Like in the dependent MSC, any device can generate the audio signal, same for recieving it. This means that the login is more likely to succeed, all one needs is a single device which has a microphone, the other having speakers functional is implied because this is providing accessibility to visually impaired people, who definitely have speakers working otherwise they wouldn't have speech output.
+
+The mechanism used for transmitting the audio signal across devices is morse code, as standardised and described by International Telecommunication Union [at this address](https://www.itu.int/rec/R-REC-M.1677-1-200910-I/). Here are afew reason for which this way of communication was chosen, as well as what criteria other protocols would have to satisfy in order to be considered:
+
+* intelligible and decodable, even across moderate zones of interference. Morse stands the test of time in this category even today, because it was constructed in such a way to be minimalist, yet decodable even if recieved through a very noisy radio uplink. This means that even if the microphones and speakers of both devices are badly made, the signal should still be intelligible enough to be decodable
+* well understood by a lot of people in the telecommunications industry, which means that there should be encoder and decoder implementations for it floating around in pretty much every important programming language, and if not, there are lots of docs on how to encode and parse morse
+* well-formed, similar to the QR code. This means that even if there are errors in transmission, it should be possible for the initial meaning to be recovered, because glitches and errors during recording or playback still can't make a beep where there was none, or make a short beep appear long, the most it can do is create a pause between beeps, or distort a beep somewhat, but it should in most cases still be recognisable as a beep
+
+In the following section, the workflow of sending and recieving that binary code will be explained, on both the sender device, refered to as device A from here onwards, as well as the receiver device, refered to as device B from here onwards
+
+### initial code preparation
+
+the binary code, the exact same one which would have been used for creating a QR code from, undergoes the following transformations:
+
+* it gets compressed using the LZMA2 compression algorythm, so that the length of the transmission over morse is as short as possible. The following additional parameters are being used for LZMA:
+  * level, 8
+  * LC, 4
+  * FB, 256
+  * MC, 10000
+  * writeEndMark, 1
+* base64 encoding is being aplied, in order to not make the morse code encoder error out when parsing binary characters resulting from the compression
+* the morse code signal is being generated and stored in memory, in case login didn't succeed the first time. Note however, such a signal should be deleted from memory imediately after its corresponding session timed out, like its QR equivalent. The following options should be used to generate the signal:
+  * the beep being used must be a sign wave, sampled at a value between 50000 and 80000 HZ
+  * the volume of the produced sign wave should not be clamped in normal conditions, but if it has to be due to potential clipping, the result should not be less than 50% of the current device's volume
+  * the pause between beeps should be no less than 150 milliseconds, but not greater than 1 second, in order to make room for detecting and mitigating interference
+
+### device A, code transmission procedure
 
-*Here is where you'll reinforce your position from the introduction in more detail, as well as cover
-the technical points of your proposal. Including rationale for your proposed solution and detailing
-why parts are important helps reviewers understand the problem at hand. Not including enough detail
-can result in people guessing, leading to confusing arguments in the comments section. The example
-here covers why templates are important again, giving a stronger argument as to why we should have
-a template. Afterwards, it goes on to cover the specifics of what the template could look like.*
+First, the client should warn the user that they should have headphones disconnected during audio login, and if the platform on which the client is running allows for it, audio login should not be started until anything that identifies as headphones, multimedia devices, or accepting audio from the device except for speakers if such can be determined, is disconnected.
 
-Having a default template that everyone can use is important. Without a template, proposals would be
-all over the place and the minimum amount of detail may be left out. Introducing a template to the
-proposal process helps ensure that some amount of consistency is present across multiple proposals,
-even if each author decides to abandon the template.
+Then, once the user initiates audio login, the client should wait for at least 3 seconds, displaying or verbalising an accessible countdown, depending on platform. This is specified in order to allow the user to silence the screenreader and anything else which might be interfering with the recording on device B, as well as allowing enough time for picking up the second device and pressing record.
 
-The default template should be a markdown document because the MSC process requires authors to write
-a proposal in markdown. Using other formats wouldn't make much sense because that would prevent authors
-from copy/pasting the template.
+Finally, the device plays the recording. If at any time during the authentication flow inside the insecure channel, key mismatches are detected, this device must offer the user to restart the audio login process, be that with the same code or another, in case this one expired.
 
-The template should have the following sections:
+### device B, receiving the recording
 
-* **Introduction** - This should cover the primary problem and broad description of the solution.
-* **Proposal** - The gory details of the proposal.
-* **Potential issues** - This is where problems with the proposal would be listed, such as changes
-  that are not backwards compatible.
-* **Alternatives** - This section lists alternative solutions to the same
-  problem which have been considered and dismsissed.
-* **Security considerations** - Discussion of what steps were taken to avoid security issues in the
-  future and any potential risks in the proposal.
+Audio login must have been initiated by the first device to continue. For platforms which require microphone permissions, these should have been requested before this point, where the user is expected to initiate the recording process. For mobile devices, this most likely has to be set in static permissions, since microphone access is required so early in the flow
 
-Furthermore, the template should not be required to be followed. However it is strongly recommended to
-maintain some sense of consistency between proposals.
+After the record action has been initiated, the device should play a short beep indicating it's recording, about 128 MS long. The user is advised to time this short beep after the last number in the countdown was spoken or shown, but before the morse code starts playing.
 
+The device decodes the live recording stream from morse, then base64, untill the LZMA end marker is found. At that point, recording can be stopped imediately. In order to help with making the recording clearer and eliminating some ambient noise, the client can optionally aply a lowpass filter to remove everything below 50000 HZ, and a highpass one to remove anything above 80000 HZ
+
+Finally, decompression, with the same compression parameters as before, is being aplied, in order to get the initial code. The rest of the login flow is being followed precisely as written in the dependent MSC, so it will not be repeated here
+
+### note regarding accessibility and user convenience
+
+Because the situation where one client only supports qr code, while the other only supports morse is not desirable and should be avoided, client implementors should do the following:
+
+* if a client implements QR code login, it is strongly recommended that it also implements audio login, for accessibility reasons, because even if the other client supports audio login, the VI person still can't do anything to get the information from that qr code
+* if a client implements audio login, it is not required to also implement QR loggin, because audio login is accessible to everyone
 
 ## Potential issues
 
-*Not all proposals are perfect. Sometimes there's a known disadvantage to implementing the proposal,
-and they should be documented here. There should be some explanation for why the disadvantage is
-acceptable, however - just like in this example.*
+This proposal may not work well, or at all, while in very noisy environments. However, since the user is about to use audio login, it should be apparent that audio login requires the audio to actually be audible, similar to trying to scan a QR code in bright sunlight. So, in most circumstances, this is a nonissue, at least for the moment. If any noise whatsoever is imediately disturbing the recording and transcribing the code wrongly, then this should be revisited, because it's a worse problem than initially anticipated
 
-Someone is going to have to spend the time to figure out what the template should actually have in it.
-It could be a document with just a few headers or a supplementary document to the process explanation,
-however more detail should be included. A template that actually proposes something should be considered
-because it not only gives an opportunity to show what a basic proposal looks like, it also means that
-explanations for each section can be described. Spending the time to work out the content of the template
-is beneficial and not considered a significant problem because it will lead to a document that everyone
-can follow.
+This proposal does not work at all if none of those devices have a functional microphone. There are very few devices on which one would typically use matrix where this is the case nowadays, and while this is a problem, it's one this MSC cannot solve, the only thing that can still be said about this issue is that a login that works for most visually impaired people is better than a login which works for no visually impaired people.
 
+This proposal doesn't work if the sending device has no speakers. This is highly unlikely, considering that the overwhelming majority of visually impaired users have their devices configured with the capability of using TTS, even if that is not the primary way for them of consuming information, so speakers are most likely working
 
 ## Alternatives
 
-*This is where alternative solutions could be listed. There's almost always another way to do things
-and this section gives you the opportunity to highlight why those ways are not as desirable. The
-argument made in this example is that all of the text provided by the template could be integrated
-into the proposals introduction, although with some risk of losing clarity.*
+Before settling on morse code, other methods were thought of, each being ultimately rejected for relatively simple reasons.
 
-Instead of adding a template to the repository, the assistance it provides could be integrated into
-the proposal process itself. There is an argument to be had that the proposal process should be as
-descriptive as possible, although having even more detail in the proposals introduction could lead to
-some confusion or lack of understanding. Not to mention if the document is too large then potential
-authors could be scared off as the process suddenly looks a lot more complicated than it is. For those
-reasons, this proposal does not consider integrating the template in the proposals introduction a good
-idea.
+The first alternative was bluetooth based login, followed by typing a short code in one of the devices, similar to how smartphones are being connected to smart watches. This does not work because a lot of target devices for matrix users still don't have bluetooth, for example desktops.
 
+Another idea was NFC based sending of the code, where the two devices contact each other on a specific surface, where the NFC chips are, for the sender to send the binary string along. Similar to alternative 1, the problem is availability, as this is pretty much only available in mobile devices, and perhaps tablets
 
-## Security considerations
+Another interesting method is file sharing, where the code would be put in a file, which the user would have to transfer it to the other device. There are multiple issues with this one, only considering that the two devices are a phone and a computer, otherwise it's completely infezable:
+
+* it takes time: plugging in a usb cable, finding it among hundreds of other files, if you know where the app even put it in the first place, all that takes a lot of time if you're reading line by line with a screenreader
+* on some devices, that might not even work: if we consider the combination between the iphone and a non-apple device, if nothing major regarding this changed from ios 10, the computer is still heavily restricted in what it could access, so that file may not even be accessible outside the phone whatsoever
+* not everyone has a USB cable on them all the time: yes, this is the biggest issue by far here, not everyone walks with one of those in their pockets, so if one has to quickly login to a device while on the go or something, they definitely won't be able to
 
-**All proposals must now have this section, even if it is to say there are no security issues.**
+A last method would be using a security key, but that wouldn't work broadly because not a lot of people have those. Furthermore, passkeys, security key authentication, etc, those should be handled by open ID connect, not quick login
+
+## Security considerations
 
-*Think about how to attack your proposal, using lists from sources like
-[OWASP Top Ten](https://owasp.org/www-project-top-ten/) for inspiration.*
+A serious issue that could potentially compromise the account of the user who tryes to login in this way is if someone is next to them somewhere and manages to record the morse code exchange between devices. It is true that a QR code is 2d, so the attacker would literally have to be next to the person, while audio travels in all directions so even someone over at the next table can hear and record it clearly in ideal circumstances, however this inherits all of the security protections of its dependent MSC, which means that sholder surfing, or in this case, recording the morse code by an unauthorised device, is thought of in there, and all the mitigations in there aply here as well.
 
-*Some proposals may have some security aspect to them that was addressed in the proposed solution. This
-section is a great place to outline some of the security-sensitive components of your proposal, such as
-why a particular approach was (or wasn't) taken. The example here is a bit of a stretch and unlikely to
-actually be worthwhile of including in a proposal, but it is generally a good idea to list these kinds
-of concerns where possible.*
+Furthermore, a client could send invalid code, or send valid morse code which lasts for a very long time, trying to trigger a buffer overflow or inject bad input. Any client is recommended to stop at the first bad morse received by their decoder, and stop recording after half a minute has elapsed, if the end of stream mark hasn't been encountered yet.
 
-MSCs can drastically affect the protocol. The authors of MSCs may not have a security background. If they
-do not consider vulnerabilities with their design, we rely on reviewers to consider vulnerabilities. This
-is easy to forget, so having a mandatory 'Security Considerations' section serves to nudge reviewers
-into thinking like an attacker.
+If a client waits too long before sending the next batch of morse encoded samples over, and does it repeatedly, similar to a slowloris attack ment to overwhelm the listening device, then the user should stop whenever the detected silence lasts longer than the value described above in this document
 
 ## Unstable prefix
 
-*If a proposal is implemented before it is included in the spec, then implementers must ensure that the
-implementation is compatible with the final version that lands in the spec. This generally means that
-experimental implementations should use `/unstable` endpoints, and use vendor prefixes where necessary.
-For more information, see [MSC2324](https://github.com/matrix-org/matrix-doc/pull/2324). This section
-should be used to document things such as what endpoints and names are being used while the feature is
-in development, the name of the unstable feature flag to use to detect support for the feature, or what
-migration steps are needed to switch to newer versions of the proposal.*
+not applicable here
 
 ## Dependencies
 
-This MSC builds on MSCxxxx, MSCyyyy and MSCzzzz (which at the time of writing have not yet been accepted
-into the spec).
+This MSC builds on MSC4108 (which at the time of writing has not yet been accepted into the spec).