Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement factory reset #71

Closed
wants to merge 2 commits into from

Conversation

robinkrahl
Copy link
Collaborator

No description provided.

After performing the factory reset, we also build the AES key so that
the device is fully usable.  Due to timing issue, we have to add a delay
between the factory reset and building the AES key.
@d-e-s-o d-e-s-o self-assigned this Jan 19, 2019
@d-e-s-o
Copy link
Owner

d-e-s-o commented Jan 19, 2019

Thanks for implementing this feature, Robin. I seem to have some problems with this change. I always get asked to repeat my admin password. What seems to be happening is that build_aes_key indicates that the password was wrong after the factory reset, not sure why. If I comment that out everything is fine. I tried increasing the sleep to 10s without avail.

@d-e-s-o
Copy link
Owner

d-e-s-o commented Jan 19, 2019

Interestingly enough that seems to be a problem that I only hit on the Pro stick. And even the test seems to catch that.

@d-e-s-o
Copy link
Owner

d-e-s-o commented Jan 19, 2019

Two last (?) comments: I can't really reproduce the issue when removing the sleep and the build_aes_key and then running a C++ program using libnitrokey doing essentially the exact same call right after the reset. That could be purely coincidental, though. I also tried reconnecting the device right after the factory_reset, but that did not work.

Also, what I originally had envisioned was for us to check whether the AES keys need to be built before every command that requires them, and doing so if they have not been built yet. That may "solve" the issue. It would make things a bit more complex in that we now would need to maintain a list of those commands and add logic to check whether the keys have been built, but it would have the advantage that it would work if a user resets the device using gpg.

@d-e-s-o
Copy link
Owner

d-e-s-o commented Jan 19, 2019

Two last (?) comments

Fuck it :D

I also noticed that this problem seems to exist in the nitrokey crate. With only a Pro stick connected I see:

running 31 tests
test build_aes_key_pro ... ok
test build_aes_key_storage ... ok
test change_admin_pin_pro ... ok
test change_update_pin ... ok
test change_admin_pin_storage ... ok
test change_user_pin_pro ... ok
test change_user_pin_storage ... ok
test clear_new_sd_card_warning ... ok
test config_pro ... ok
test config_storage ... ok
test connect_no_device ... ok
test connect_pro ... ok
test connect_storage ... ok
test disconnect_pro ... ok
test disconnect_storage ... ok
test encrypted_volume ... ok
test export_firmware ... ok
test factory_reset_pro ... FAILED
test factory_reset_storage ... ok
test get_firmware_version ... ok
test get_production_info ... ok
test get_retry_count_pro ... ok
test get_retry_count_storage ... ok
test get_serial_number_pro ... ok
test get_serial_number_storage ... ok
test get_storage_status ... ok
test hidden_volume ... ok
test lock ... ok
test set_unencrypted_volume_mode ... ok
test unlock_user_pin_pro ... ok
test unlock_user_pin_storage ... ok

failures:

---- factory_reset_pro stdout ----
thread 'factory_reset_pro' panicked at 'assertion failed: `(left == right)`
  left: `Ok(())`,
 right: `Err(WrongPassword)`', tests/device.rs:305:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.


failures:
    factory_reset_pro

test result: FAILED. 30 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

Line 305 maps to:

    assert_eq!(Ok(()), device.build_aes_key(ADMIN_PASSWORD));

@robinkrahl
Copy link
Collaborator Author

Also, what I originally had envisioned was for us to check whether the AES keys need to be built before every command that requires them, and doing so if they have not been built yet.

I was thinking about that too, but: The AES key status is currently only reported in the Storage status. So for the Pro, there is no way to find out whether the AES key exists (AFAIK).

Interestingly enough that seems to be a problem that I only hit on the Pro stick.

I only tested this on the Storage as I didn’t want to reset my production stick. Just to have a minimal example reproducing the problem, could you please try to run this code with, let’s say, 2, 3 and 10 seconds?

@d-e-s-o
Copy link
Owner

d-e-s-o commented Jan 20, 2019

Also, what I originally had envisioned was for us to check whether the AES keys need to be built before every command that requires them, and doing so if they have not been built yet.

I was thinking about that too, but: The AES key status is currently only reported in the Storage status. So for the Pro, there is no way to find out whether the AES key exists (AFAIK).

Oh. That's bad then :-|

Interestingly enough that seems to be a problem that I only hit on the Pro stick.

I only tested this on the Storage as I didn’t want to reset my production stick. Just to have a minimal example reproducing the problem, could you please try to run this code with, let’s say, 2, 3 and 10 seconds?

I figured. The example fails with 2, 3, and 10s.

$ # With sleep(10)
$ ./test
> [Sun Jan 20 06:19:07 2019][DEBUG_L1]    Connection success: 1 ()
> [Sun Jan 20 06:19:07 2019][DEBUG_L1]    Connection success: 0 ()
> [Sun Jan 20 06:19:07 2019][DEBUG_L1]    Disconnection: handle already freed: 1 ()
> [Sun Jan 20 06:19:07 2019][DEBUG]       -------------------
> [Sun Jan 20 06:19:07 2019][DEBUG]       Outgoing HID packet:
> [Sun Jan 20 06:19:07 2019][DEBUG]       Contents:
> Command ID:     FACTORY_RESET
> CRC:    ccd3b413
> Payload:
>  admin_password:        ***********
> 
> [Sun Jan 20 06:19:07 2019][DEBUG_L1]    => FACTORY_RESET
> ..........
> [Sun Jan 20 06:19:08 2019][DEBUG]       Status busy, decreasing receiving_retry_counter counter: 4, current delay:200
> [Sun Jan 20 06:19:08 2019][DEBUG_L1]    Busy retry: status 0, 200ms, counter 4, progress: 0
> ...........
> [Sun Jan 20 06:19:11 2019][DEBUG]       Status busy, decreasing receiving_retry_counter counter: 3, current delay:300
> [Sun Jan 20 06:19:11 2019][DEBUG_L1]    Busy retry: status 0, 300ms, counter 3, progress: 0
> .
> [Sun Jan 20 06:19:11 2019][DEBUG_L1]    <= FACTORY_RESET 0 0
> [Sun Jan 20 06:19:11 2019][DEBUG]       Incoming HID packet:
> [Sun Jan 20 06:19:11 2019][DEBUG]       Device status:  0 OK
> Command ID:     FACTORY_RESET hex: 13
> Last command CRC:       ccd3b413
> Last command status:    0 STICK10::COMMAND_STATUS::OK
> CRC:    32405835
> Payload:
> Empty Payload.
> [Sun Jan 20 06:19:11 2019][DEBUG_L1]    Packet received with receiving_retry_counter count: 2
> [Sun Jan 20 06:19:21 2019][DEBUG]       -------------------
> [Sun Jan 20 06:19:21 2019][DEBUG]       Outgoing HID packet:
> [Sun Jan 20 06:19:21 2019][DEBUG]       Contents:
> Command ID:     NEW_AES_KEY
> CRC:    52a99af0
> Payload:
>  admin_password:        ***********
> 
> [Sun Jan 20 06:19:21 2019][DEBUG_L1]    => NEW_AES_KEY
> ..........
> [Sun Jan 20 06:19:22 2019][DEBUG]       Status busy, decreasing receiving_retry_counter counter: 4, current delay:200
> [Sun Jan 20 06:19:22 2019][DEBUG_L1]    Busy retry: status 0, 200ms, counter 4, progress: 0
> .....
> [Sun Jan 20 06:19:23 2019][DEBUG_L1]    <= NEW_AES_KEY 0 0
> [Sun Jan 20 06:19:23 2019][DEBUG]       Incoming HID packet:
> [Sun Jan 20 06:19:23 2019][DEBUG]       Device status:  0 OK
> Command ID:     NEW_AES_KEY hex: 6b
> Last command CRC:       52a99af0
> Last command status:    4 STICK10::COMMAND_STATUS::WRONG_PASSWORD
> CRC:    2b5d073e
> Payload:
> Empty Payload.
> [Sun Jan 20 06:19:23 2019][DEBUG_L1]    Throw: CommandFailedException 4
> [Sun Jan 20 06:19:23 2019][DEBUG]       CommandFailedException, status: 4
> test: test.cpp:35: int main(): Assertion `NK_build_aes_key("12345678") == 0' failed.
> Aborted

@robinkrahl
Copy link
Collaborator Author

Thanks for checking that. Should we remove the AES stuff before this is fixed upstream, or should we postpone the reset command?

@d-e-s-o
Copy link
Owner

d-e-s-o commented Jan 20, 2019

Thanks for checking that. Should we remove the AES stuff before this is fixed upstream, or should we postpone the reset command?

Well, I could not let my hands off of this problem. I got things working with:

--- nitrocli/src/commands.rs
+++ nitrocli/src/commands.rs
@@ -308,6 +308,7 @@ pub fn reset(ctx: &mut args::ExecCtx<'_>) -> Result<()> {
     // Workaround for a timing issue between factory_reset and build_aes_key, see
     // https://github.com/Nitrokey/nitrokey-storage-firmware/issues/80
     thread::sleep(time::Duration::from_secs(3));
+    let _ = device.build_aes_key(NITROKEY_DEFAULT_ADMIN_PIN);
     device.build_aes_key(NITROKEY_DEFAULT_ADMIN_PIN)
   })
 }

(I had the test run >10 iterations without failure)

Not sure what is happening. It's more than a stale command exit code being stashed somewhere: I tried just inquiring the serial number and that did not resolve the problem.

@d-e-s-o
Copy link
Owner

d-e-s-o commented Jan 20, 2019

It's more than a stale command exit code being stashed somewhere: I tried just inquiring the serial number and that did not resolve the problem.

Actually, it may be exactly that. Inquiring the serial number is probably just no "real" command (I traced it to some hidapi call but no further). Here is what works as well:

--- nitrocli/src/commands.rs
+++ nitrocli/src/commands.rs
@@ -308,6 +308,7 @@ pub fn reset(ctx: &mut args::ExecCtx<'_>) -> Result<()> {
     // Workaround for a timing issue between factory_reset and build_aes_key, see
     // https://github.com/Nitrokey/nitrokey-storage-firmware/issues/80
     thread::sleep(time::Duration::from_secs(3));
+    let _ = device.get_user_retry_count();
     device.build_aes_key(NITROKEY_DEFAULT_ADMIN_PIN)
   })
 }

Ugh. One hack follows the next.

@robinkrahl
Copy link
Collaborator Author

robinkrahl commented Jan 20, 2019 via email

@d-e-s-o
Copy link
Owner

d-e-s-o commented Jan 20, 2019

Ah, that would make sense too.

What do we do here? Go with such a hack? I probably should report that in the issue you filed. Do you have a better suggestion than get_user_retry_count?

@robinkrahl
Copy link
Collaborator Author

I’d rather not ship a workaround until we know what actually causes the problem and whether this fix works on all devices. – I think get_user_retry_count is a good example as it does not require a PIN.

@d-e-s-o
Copy link
Owner

d-e-s-o commented Jan 20, 2019

I've created Nitrokey/nitrokey-storage-firmware#84.

@d-e-s-o
Copy link
Owner

d-e-s-o commented Jan 26, 2019

My vote is for getting this change in, in part because you mentioned that it is blocking the addition of tests for #77.

<rant>
I tried understanding the problem by reading the Pro firmware code but it's the second most horrible piece of code I've encountered (on par with the storage firmware). In the paths that I looked at alone I found

All that without proper comments other than those explaining the obvious. I am obviously not familiar with the code base so all this could be a problem of my understanding, but none of that builds any confidence in a context where code should be to the point and hopefully reviewed/audited.
</rant>

I did not want to look at that because I knew what would await me, but I wanted to see if the problem is something "obvious".

Since the Nitrokey team hasn't commented on the issue it does not seem to be treated with priority.
As for the change at hand: I had the test run dozens of times and they all succeeded, no spurious failures, so while the root cause is not understood, there are way more unknowns than this, except they are not exposed in our code. So I am willing to take this risk in order to move forward. I may opt for not releasing this code, though, but I believe it should be checked in for now.

Let me know if you object or think I am overreacting :D or I shall merge that by tomorrow (after rebasing and everything).

@robinkrahl
Copy link
Collaborator Author

Well, now you know the reason why I started looking into embedded Rust. ;) It’s also the reason why I wanted to know the error cause – it sounds logical that a smart card command is the solution, but honestly, it could be everything.

But I don’t have a testing Pro at hand, so if you are confident that the workaround is indeed working, I’m fine with merging it into devel.

@d-e-s-o
Copy link
Owner

d-e-s-o commented Jan 26, 2019

Thanks for your quick response, Robin. I've merged the request with minor adjustments.

@d-e-s-o d-e-s-o closed this Jan 26, 2019
@robinkrahl robinkrahl deleted the feature/factory-reset branch January 26, 2019 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants