Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rare "unable to mount data partition" on boot #281

Open
adeebshihadeh opened this issue Aug 2, 2024 · 8 comments
Open

Rare "unable to mount data partition" on boot #281

adeebshihadeh opened this issue Aug 2, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@adeebshihadeh
Copy link
Contributor

adeebshihadeh commented Aug 2, 2024

Not super easy to see what's going in this state since SSH isn't enabled. Fairly easy to repro in the new testing closet though. Touch also doesn't work in this state.

These would all help debug this: #14, #158, #156

IMG_3927

@adeebshihadeh adeebshihadeh added the bug Something isn't working label Aug 2, 2024
@andiradulescu
Copy link
Collaborator

In this state I can login via serial.

Serial works, right, via jungle v2? It might be difficult to find which of the 6 devices is the right one, but I guess it can be taken one by one.

Some simple commands to check what happened:
dmesg | grep "mount"
journalctl | grep "mount"

My first assumption (naive one) would be that the filesystem is corrupt and fsck is disabled (second 0) in fstab:

/dev/disk/by-partlabel/userdata /data auto discard,noatime,nodiratime,nosuid,nodev,nofail 0 0

Very interesting though why touch doesn't work.

@adeebshihadeh
Copy link
Contributor Author

Got one connected over serial. No console and it spams this:

(openpilot) macbookair:tests adeebshihadeh$ ./som_debug.sh 
Failed to locate modem.mdt(rc:-11)
[FAILED] Failed to start Remote Storage Service.
[ 1633.090419] pil-q6v5-mss 4080000.qcom,mss: modem: Failed to locate modem.mdt(rc:-11)
[FAILED] Failed to start Remote Storage Service.
[ 1723.584538] pil-q6v5-ms080000.qcom,mss: modem: Failed to locate modem.mdt(rc:-11)
[FAILED] Failed to start Remote Storage Service.
[ 1814.082480] pil-q6v5-mss 4080000.qcom,mss: modem: Failed to locate modem.mdt(rc:-11)
[FAILED] Failed to start Remote Storage Service.
[ 1904.583378] pil-q6v5-mss 4080000.qcom,mss: modem: Failed to locate modem.mdt(rc:-11)
[FAILED] Failed to start Remote Storage Service.
[ 1995.081874] pil-q6v5-mss 4080000.qcom,mss: modem: Failed to locate modem.mdt(rc:-11)
[FAILED] Failed to start Remote Storage Service.
[ 2085.585541] pil-q6v5-mss 4080000.qcom,mss: modem: Failed to locate modem.mdt(rc:-11)

@andiradulescu
Copy link
Collaborator

What happened after you restarted it?

Can I reproduce this somehow? When does it usually happen?

@andiradulescu
Copy link
Collaborator

andiradulescu commented Sep 15, 2024

I tested activating fsck in fstab for userdata, as explained here.

After “resetting” userdata with writing COMMA_RESET, fsck fixed userdata succesfully.

If you agree on this change (activating fsck on boot for userdata), I can:

  • switch comma/flash to erase userdata by flashing a very small valid ext4 partition with just "__system_reset__" on it (RESET_TRIGGER)
  • undo this (again) 052d991

Second thing about this issue, is that, maybe, for some reason, there was a race condition and the partition got mounted after the mountpoint check. I can invalidate this supposition if the device rebooted fine.

@adeebshihadeh
Copy link
Contributor Author

No, we don't want to erase and format without user action. That's an extremely risky bug to be open to.

This is super rare, so I'm planning on fixing this myself. It's hard to repro without our rack of 50+ devices.

@andiradulescu
Copy link
Collaborator

No, we don't want to erase and format without user action. That's an extremely risky bug to be open to.

This is only related to how comma flash does userdata erase. Maybe I’m missing something.

This is super rare, so I'm planning on fixing this myself. It's hard to repro without our rack of 50+ devices.

Sure, I totally understand. I’ll try at least getting #14 done.

Got one connected over serial. No console and it spams this:

“Failed to locate modem.mdt(rc:-11)” seems like /firmware didn’t get mounted, since modem.mdt is in /firmware/image/modem.mdt. So what I’m saying above might be completely unrelated to the real issue.

@ccdunder
Copy link

I just encountered this within the first hour of using a new Comma 3X w/ stock open pilot.

Steps to repro:

  • First boot.
  • Install openpilot stock.
  • Take 2 drives.
  • Software -> Uninstall

Now stuck at this screen and touch doesn't work. How should I proceed?

@ccdunder
Copy link

Simply powering off and rebooting made touch work again. Then I pressed Confirm and everything worked as expected. But it doesn't inspire confidence...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants