-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use libxml and dom libraries? #95
Comments
Hi @alphapapa! Thanks for filing this ticket!
I didn't know about that, thanks!
org-cliplink calls out to an external process only when Line 428 in 6c134fd
So, here are my questions:
Unfortunately, I'm not actively developing org-cliplink anymore because I don't have resources for that, but I'm open for any discussions and pull requests. :) |
My main concern is the use of temp files, gzipping and gunzipping, that I see in the minibuffer every time I use org-cliplink. This other code doesn't have to do any of that. I haven't done benchmarks, but it seems faster. I don't know about the broken HTML issue, but I'm guessing that the simplicity of the HTML->HEAD->TITLE structure will work pretty well. I seem to recall looking at the Emacs source code for that function, and I remember seeing something about handling malformed documents, but don't quote me on that. ;) By the way, I looked at that link you gave, and I noticed something: will it work with capitalized HTML tags? e.g. if it's |
Now THAT concern should've been in the title of ticket! Please feel free to speak up any concerns you have. I completely agree with you. I was even trying to get rid of all of that once #85. As far as I know modern version of But I slightly remember that cURL was needed for #55, because So #89 should resolve your concerns. I'm not sure if I can find any time to work on it. But I'm open to any contributions. :)
No. org-cliplink currently doesn't parse such titles. I thought that was exactly what I said in my previous message. :)
Please correct me if my wording was wrong. |
Well, I was just wondering if some of the code in this project could essentially be replaced with code that's now in Emacs itself.
I guess that's a cool feature, but I'm not sure it's necessary. It doesn't usually take but a moment, and I'm not going to be doing anything else with Emacs while waiting for it to complete. :)
Is it common for Emacs to be built without zlib support? I'm guessing that the vast majority of users have it built-in.
Guess I failed to parse your message. ;) Anyway, that could be fixed very easily by setting (defun org-cliplink-extract-title-from-html (html)
(let ((case-fold-search t))
(when (string-match (rx "<title>" (minimal-match (0+ anything)) "</title>") html)
(match-string 1)))) |
Yeah, I guess I wanted this feature for myself. I have a couple of pretty intense use cases. :D
You know what. I think I lied to you, sorry. I just double checked and org-cliplink actually parses capitalized title tags. I guess I had that problem before but forgot that I fixed it. Sorry again. :) Ok, I think you motivated me to work on all of that. I propose the following:
What risks I see at the moment: I haven't run CI for a long time and it may be broken at the moment. It may take a considerable amount of effort to recover it. Probably will be merging PRs bypassing the CI step for awhile. |
No matter really, everything still works fine. No need to fix what ain't broken, just thought I'd mention it. :) I mean, the regex match works fine (though you might consider using the function in my last comment, which does it in a simpler way), so there's no need to parse the whole DOM anyway. While it may be faster overall (maybe) to avoid writing temp files, just getting the title is surely faster with the regexp. |
Hey, I know this issue is old, but using |
@d12frosted The attribute issue is a known one #72. For upper case titles I created a separete one #101. Both of the issues can be solved without libxml. Have you encountered anything else? |
Yeah, both of them can be fixed without P. S. thanks for lightning fast response! |
@d12frosted Not to poach, but you might find this package useful too: https://github.com/alphapapa/org-web-tools |
@d12frosted here is why I like the current approach:
What do you think? |
|
Hi there,
I've been using org-cliplink for years, and it's great, been very useful.
With Emacs 25 and
eww-readable
, I discovered that libxml and thedom
library make it fairly easy to get the HTML title of a page. Here's some example code.Given this, I wonder if org-cliplink should make use of this code instead of calling out to external processes. Now, I wouldn't be surprised if there are some edge cases that I haven't found that org-cliplink already handles. :) But if so, I would like to find them and fix them, because I'm using this code in some other things, and it seems to work well.
Thanks!
The text was updated successfully, but these errors were encountered: