Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some bugs I ran into after importing 10,000's of bookmarks from Delicious #172

Closed
nick-s-b opened this issue Jul 17, 2017 · 9 comments
Closed

Comments

@nick-s-b
Copy link

nick-s-b commented Jul 17, 2017

Now that Delicious is officially dead and you can actually export bookmarks (they held it up for ransom for months by disabling export!), I decided to look back into alternatives. I decided to immediately reject Pinb---d because I refuse to pay to blackmailers! They bought Delicious and immediately killed it. Screw those guys!

Since I had Buku installed for a year and used it occasionally, I decided to finally import and combine bookmarks from few machines into a single database. I started off with a fresh bookmarks.db and just did the buku -i delicious.html.

I ran into some issues. First crash was due to Delicious.com formatting of some bookmarks. Apparently, their (new? old?) exporter occasionally inserts <a> </a> after the first line of comment.

<DT><A HREF="https://github.com/j" ADD_DATE="1360951967" PRIVATE="1" TAGS="tag1,tag2">GitHub</A>
<DD>comment for the bookmark here 
<a> </a>

and sometimes I'd see this:

<DT><A HREF="https://github.com/j" ADD_DATE="1360951967" PRIVATE="1" TAGS="tag1,tag2">GitHub</A>
<DD>comment for the bookmark here 
<a>second line of the comment here</a>

boku crashed on that <a></a> pair of tags. If anyone comes across this while importing, just remove the <a></a> from the comment.

In fact, this is the biggest issue with importer... buku imports only the first line of comments and ignores the rest. For example, I have a lot of bookmarks of the type:

<DT><A HREF="https://github.com/j" ADD_DATE="1360951967" PRIVATE="1" TAGS="tag1,tag2">GitHub</A>
<DD>comment for the bookmark here 
second line of the comment here
third line of the comment here
<DT><A HREF="https://news.com/" ADD_DATE="1360951967" PRIVATE="1" TAGS="tag1,tag2,tag3">News</A>
...

This second line of the comment here and third line of the comment here will not be imported by buku. This final version of Delicious exporter works this way and if you have multiple lines, they're just listed below the <DD> line and it considers everything after it, until the next <DT> line, as a comment.

My second request is the time metadata. Would it be too hard to import a timestamp from Delicious so I know when I added the bookmark? Also, adding the new link from buku directly would benefit from this since you could then sort/search by dates etc.

Anyway, buku is definitely the best option for people who don't want to get locked into another online service that will just betray you sooner or later! Thanks goes to @jarun and other contributors... you have my immense gratitude. I have no idea what I'd do had I not found buku.

@jarun
Copy link
Owner

jarun commented Jul 18, 2017

I'll check this.

@nick-s-b
Copy link
Author

@jarun thank you so much!

@jarun
Copy link
Owner

jarun commented Jul 18, 2017

@Mohammadkhalifa the single line comment import is happening due to the following lines in importdb():

1673                 if comment_tag:                                                  
1674                     desc = comment_tag.text[0:comment_tag.text.find('\n')]

I could reproduce it by adding multiline description to one of my Firefox bookmarks and using the exported html. I have uploaded it here. Look for SearchPreview in the text.

However, I can't find an immediate way around it because if I remove the .text[0:comment_tag.text.find('\n')] part the text (Language Tools) from the next tag <DT> gets appended. Can you please take a look?

@nick-s-b regarding the timestamp, it's intentional and mentioned in the readme:

Buku is too busy to track you - no history, obsolete records, usage analytics or homing.

jarun added a commit that referenced this issue Jul 18, 2017
Please refer to #172. The spurious '<a></a>' tag leads to a crash; with or
without any text within.
@jarun jarun closed this as completed in 730b80f Jul 18, 2017
@jarun
Copy link
Owner

jarun commented Jul 18, 2017

@nick-s-b can you please test and confirm if this issue is fixed on master?

@rachmadaniHaryono rachmadaniHaryono mentioned this issue Jul 18, 2017
37 tasks
@nick-s-b
Copy link
Author

@jarun Just installed new version! Everything works great. Importer imported comments perfectly. I've also noticed that import was about 100x faster (not even kidding... it finished so fast, I thought it was an error! Amazing job! Thanks!

@jarun
Copy link
Owner

jarun commented Jul 19, 2017

Perfect! Thanks for confirming!

@jarun
Copy link
Owner

jarun commented Jul 19, 2017

@nick-s-b sorry for bothering you again. I made some further changes for optimization. Can you please test the latest master?

@nick-s-b
Copy link
Author

@jarun no bother at all! Just installed the latest version (noticed the new option for unique tags.. that should be useful) and I imported all the bookmarks. It took few seconds and it was done. Comments are imported correctly! Let me know if you want me to test anything else.

@jarun
Copy link
Owner

jarun commented Jul 19, 2017

Thanks a lot! I think we are good with import. 👍

@github-actions github-actions bot locked and limited conversation to collaborators Jun 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants