Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A really bizarre bug in emacs. #423

Closed
Mihara opened this issue Aug 31, 2016 · 13 comments
Closed

A really bizarre bug in emacs. #423

Mihara opened this issue Aug 31, 2016 · 13 comments
Labels
bug report Something is not working properly

Comments

@Mihara
Copy link
Contributor

Mihara commented Aug 31, 2016

Frankly, I'm not sure whom exactly is to blame for this one, but since it only manifests on android, and apparently, not just under Termux, I think I'll start here.

The symptoms are like this: You have your init.el setup, which works fine on Linux or Windows or Mac or wherever, and attempt to use it under Android. Everything seems to be working, until you start using a mode that does something clever with files. In my case, it was tramp-mode in some cases, and magit in others. You get a bizarre error message, saying that it choked somewhere deep inside the standard library, and the offending function is always load-history-filename-element with the error (wrong-type-argument stringp (...)) -- the argument is usually different, in my case it was always (require . bytecomp) If you delete all your compiled elc files and start again, a different package starts choking on it. Loading with emacs -q presents no issue, however, though probably it's because the packages that tripped over it are not being loaded anymore.

I really am not sure who's at fault, considering that on desktop Linux the problem never appears. I almost gave up, but then I stumbled onto this StackExchange discussion: https://emacs.stackexchange.com/questions/5552/emacs-on-android-org-mode-error-wrong-type-argument-stringp-require-t

The logic is sound. Applying the patch effectively shuts the problem down! And yet, on desktop Linux, the load-history list looks almost the same and contains the same types of non-string values, but the crash never occurs. Even though the code of the function is identical.

I have a suspicion that since the problem is Android specific, the real cause is somewhere in the C code, and on desktop Linux the problem never triggers because the ordering of the elements in load-history is somehow different, so that the regexp always finds a matching file before it stumbles into any of the non-string elements of the list. Unfortunately I don't have the expertise to go diving into C code.

For the record here's my patch:

;; patch-load-history.el -- Patch load-history-filename-element

;;; Commentary:

;; On Termux and apparently, certain other Android builds of Emacs,
;; something weird happens to load-history.

;; Which causes crashes all over, centering around this function.
;; This is a bandaid that prevents those crashes without addressing
;; the root cause of the problem.

;;; Code:

(defun l-h-f-e-patch (orig-fun file-regexp)
  "Get the first elt of `load-history' whose car matches FILE-REGEXP.
        Return nil if there isn't one."
  (let* ((loads load-history)
         (load-elt (and loads (car loads))))
    (save-match-data
      (while (and loads
                  (or (null (car load-elt))
                      (not (and (stringp (car load-elt)) ; new condition
                                (string-match file-regexp (car load-elt))))))
        (setq loads (cdr loads)
              load-elt (and loads (car loads)))))
    load-elt))

(advice-add 'load-history-filename-element :around #'l-h-f-e-patch)

(provide 'patch-load-history)

;;; patch-load-history.el ends here
@Neo-Oli Neo-Oli added the bug report Something is not working properly label Aug 31, 2016
@Mihara
Copy link
Contributor Author

Mihara commented Sep 3, 2016

Further discoveries:

This is by far not the only place where the load-history variable is being used. In particular, I have encountered it triggering a similar error when installing packages, against which the above patch obviously didn't help.

@Mihara
Copy link
Contributor Author

Mihara commented Sep 3, 2016

...and some more research: I have been able to isolate the offending load-history entry:

(let* ((loads load-history)
       (load-elt (and loads (car loads))))
  (while (and loads (car load-elt))
    (unless (stringp (car load-elt))
      (print load-elt))
    (setq loads (cdr loads)
          load-elt (and loads (car loads))))
  nil)

The structure of the load-history alist is something like (("absolute filename" symbols ... symbols)) and one of the entries is malformed:

((require . bytecomp) (require . session) (require . company-capf))

It turns up like this every time, and it's obviously missing the filename as the first element. I know this is one of the files in company-mode, but I haven't figured out which one yet, or what exactly causes it to get saved to load-history like that, but the problem is definitely android-specific.

Some followup: After much googling, I stumbled on another user struggling with the same problem:
https://groups.google.com/forum/#!topic/gnu.emacs.help/s9k8wJC3DAA

The presented workaround about (package-initialize) being present in init.el rather than any other file actually has no effect at all in my case, though.

Some more followup: Starting with emacs -q -l ~/.emacs.d/init.el passes the above test: there is no borked load-history entry when you load like that, but that's probably because much less packages end up being loaded in this case.

@Mihara
Copy link
Contributor Author

Mihara commented Sep 3, 2016

And here's one more reference:

syl20bnr/spacemacs#6262

@Mihara
Copy link
Contributor Author

Mihara commented Sep 4, 2016

Curioser and curioser.

While investigating just what file exactly could possibly cause the broken entry in load-history and whether that hypothetical file is in any way special, I stumbled on the idea that it might be safe to just edit it out of load-history myself. After all, if it's broken, the information has already been lost. So I tried this:

(defun mihara/fix-broken-history ()
  "Attempt to remove a broken load-history element."
  (let* ((loads load-history)
         (edited-history ())
         (load-elt (and loads (car loads))))
    (while (and loads (car load-elt))
      (when (stringp (car load-elt))
        (push load-elt edited-history))
      (setq loads (cdr loads)
            load-elt (and loads (car loads))))
    (setq load-history edited-history)))

(add-hook 'after-init-hook #'mihara/fix-broken-history t)

I was quite surprised to find that the entry was still there, even though this bit of code should be the very last thing in the initialization to execute. Running it after initialization removes the offending entry, but it does seem to me that whatever actually produces this entry is not actually a package.

EDIT: window-setup-hook executes last, actually. But when using that one, the results are identical anyway.

Now I just need to figure out how to trigger code to run from init.el after initialization is complete, and I at least will have a more generally applicable workaround...

@Mihara
Copy link
Contributor Author

Mihara commented Sep 4, 2016

There we go, this workaround should be much more reliable than the one given at the top of the page -- at least, it no longer trips paradox:

https://gist.github.com/Mihara/021329452c8aaddca657cee46a84f3e3

Now I just hope someone who can properly debug it will eventually turn up.

@Mihara
Copy link
Contributor Author

Mihara commented Sep 5, 2016

And, one last followup: I found the relevant bug in emacs bugtracker, this is an upstream bug.

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=14120

They say it's 'minor'. They also say it's not actually a problem in C code, but now I have no clue how to fix it properly. :) I imagine there's a reason emacs is built undumped?

If there is (I think there's probably no other way with cross-compiling...) the most sensible way that I can think of is to simply patch loadup.el to ensure no broken load-history entry exists after (eval top-level) and leave the upstream be.

Another solution that comes to mind is to dump emacs as part of post-installation deb scripts.

@fornwall
Copy link
Member

fornwall commented Sep 5, 2016

@Mihara Really interesting and informative investigation!

The reason that emacs is built undumped is that Android (as of version 5.0) only supports running position-independent executables (PIE) - and that is only possible undumped. See for e.g. http://www.openwall.com/lists/oss-security/2015/03/13/13 for more information.

I'm not an emacs user myself, so I'm unsure of the exact patch to be done for loadup.el - could you create one?

@Mihara
Copy link
Contributor Author

Mihara commented Sep 6, 2016

@fornwall There is a possibility I can get away with commenting just one line in loadup.el (once I understand better what exactly is going on -- the upstream report refers to setting current-load-list to nil as the root cause) and even if that doesn't work, just sticking the body of my patch function after (eval top-level) will definitely do it, because that's when it ends up being executed anyway. But to get a patch right I would need to be able to build the package myself.

For some reason this hasn't been working -- build-package.sh complains there are missing libraries if I try to just build emacs by itself, (I'm using the docker builder) and building the entire set in order stumbles on a package before it gets to emacs. (I forget which one, it's been a few days since I tried this) Is there a shorter sequence of package building I could use to get it to build just emacs itself?

EDIT: Never mind, building ncurses was the missing piece. I'll be back with a pull request soonish...

@fornwall
Copy link
Member

fornwall commented Sep 6, 2016

@Mihara The following command should build emacs

for p in libandroid-support ca-certificates libgmp libidn libnettle libgnutls li
blzma libxml2 ncurses emacs; do ./build-package.sh $p; done

If the build fails with the above command, could you paste the error?

This will build for aarch64, do a export TERMUX_ARCH=arm or (i686/x86_64) before if you want to build for a different architecture.

If you have a patch to try out I could also build the package and put a link to it for testing!

@Mihara
Copy link
Contributor Author

Mihara commented Sep 6, 2016

@fornwall It worked with the above command, and inspecting the deb shows that my patch is in.

Now let me doublecheck everything. :)

Mihara added a commit to Mihara/termux-packages that referenced this issue Sep 6, 2016
fornwall pushed a commit that referenced this issue Sep 6, 2016
@fornwall
Copy link
Member

fornwall commented Sep 6, 2016

@Mihara Thanks a lot! The patch has now been released in version 25.0.95-3 of the emacs package, available with apt update && apt upgrade.

@andyleejordan
Copy link

Wow, years later, and folks are seeing this crop up on MacOS...I can't figure what the heck is causing it either, but the same workaround of removing the line (setq current-load-list nil) in loadup.el works...on Emacs 28. What is going on?!

@ryankask
Copy link

I'm also seeing this problem on macOS after upgrading to emacs 27. Commenting out that line does fix the problem.

@ghost ghost locked and limited conversation to collaborators Oct 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug report Something is not working properly
Projects
None yet
Development

No branches or pull requests

5 participants