-
Notifications
You must be signed in to change notification settings - Fork 501
Paperless-ng is here. Thoughts on merging into master. #711
Comments
Hey ! I am personally very excited by paperless-ng. I was wondering several weeks ago if I would migrate from paperless to papermerge (https://github.com/ciur/papermerge), but your project makes seems to be a good competitor (and will avoid me to write a papermerge/paperless mapping) ! Thanks for your amazing work ! |
Up to you. This fork will see some active development in the foreseeable future and I'm pushing for a first stable release. The last thing I want to get into there before that is the ability to add selectable text to scanned documents, both for new documents as well as documents that are already in the system. |
Wow, it's so pretty! This is some really nice work Jonas. I don't know what your preference is here, whether you'd like paperless-ng to supplant this project (take over the name, merge into this repo, etc) or if you're just promoting the project as a literal next-generation, but I just wanted to congratulate you on a nice job. I haven't had time for a technical assessment (assuming you wanted one?) as I've got my hands full with presentations, another side project, and a 2 year old, but as far as I'm concerned this is a community project now. If there's strong support for full adoption of paperless-ng over the current core for v3.0, I'm cool with it. The one thing I'd mention though is that one of the strengths of the current system is that it runs well on low-powered (read Raspberry Pi) systems. If -ng requires more than that and can't be stripped down for such cases, that'd be a good argument for keeping yours as a separate fork. |
My opinion is just as an end user and not a dev (Edit: am now contributing, still feel strongly should some day become the next version of paperless) on this project but I have to say Jonas’ work and enthusiasm suggest to me paperless-ng should be merged into the core. There’s a lot of work on that fork under the hood that I think is important to the longevity of the project too. Very valid concern regarding low powered devices but just my +1 for adopting paperless-ng for v3.0. Bravo Jonas 👏🏼 |
Thank you :) The entire process of making this pretty has been incredibly fun. Also learned a couple things. I've never done any kind of UX work or front end design, I just took a couple libraries, mixed them together and tried to make it work. This bootstrap css framework has some pretty nifty stuff. Oh, I certainly did not expect a technical assessment, that would be quite a task. I should have made that clear. I'd rather want to get a feel for what the community feels is best for the future of the project and respect that. I'm fine either way! Edit for the statement above: This is especially true since the new project does a couple things quite differently and I've chopped off a few things, such as encryption. Regarding low-powered devices. I've got some good and some not-so-good news. The good news is that the new front end runs entirely in the browser and just uses the API to fetch data. Therefore, the server has to do much less work when serving the pages. The not-so-good news is that one of the new features does occasionally require a little bit more computing power, but that could be scheduled to run during the night. I've made this with the RPi in mind, but haven't extensively tested it on that platform. Someone got it running on an RPi 4, but I haven't heard anything about performance yet. |
Hi @jonaswinkler |
Thanks. I really need some more feedback on what's workable and what need improvement. We're currently working on making the central filtering tools nice, the present implementation is rather bulky. |
This is a little bit cheese, isn't it? @danielquinn , you set the rule that two (2) people have to approve a pull-request. How many people in your 'community' project have the permission to approve? You included three (3) but two of you never approve. Calling it community, doesn't make it so. I think this is unfair towards people who spent time writing PRs. |
In total 8 people can approve, as I see it. But I've got the same feeling. I'd like to write a PR at times, but since I feel like we can't make it over the limit of 2 people if one of 2 (sometimes) active reviewers writes the PR, I refrain from doing so. So yeah, it's not so much fun, if you can't fix anything yourself and are limited to only looking at other people's code all the time. |
So all the things said here make me support the idea of paperless-ng replacing this project, which of course would mean to make Jonas owner. All NG is missing is the userbase of paperless. I kind of feel sorry for all the users who find paperless today and start with it, not knowing there's NG. Or would there any reason for anybody to prefer paperless over paperless-ng? |
Maybe only the better (?) support of low-powered devices and the use of encryption via GPG? |
Yes, we should figure out if it's really better in every meaning:
I hope, Daniel, you don't get me wrong when I say that NG might be better in every meaning! I absolutely adore what you have created, but I am super happy that Jonas continued your work instead of starting from scratch like many others. I am sure that is why this is the best solution from my point of view. |
If you don't use "Auto" matching, the logic in question won't be invoked at all. I don't run this on a Pi, so I have no idea about performance. My gut feeling is that the web UI should be much more responsive.
Apart from that, the database stores unencrypted content for searching, even if encryption was enabled. That contains all your personal information from your documents, credit card numbers, addresses, maybe even passwords if sent via postal mail, all the things you purchased, your bank account history, etc. The way you'd implement security in a system like this would be as follows
A system like that has to be designed with this concept in mind from the very beginning. It's very unlikely I'll add something like that to paperless. For example, we can't just encrypt all the database fields as well, since
There's lots of things involved in doing this properly. |
I have only started reading up on paerless and intend to start using it, but I'd like to comment on the encryption topic. There are multiple attack vectors; here are four from the top of my head:
To protect against 1), you could use an encrypted filesystem so that someone stealing your computer could not mount it to read the contents. This can be done by everyone already without needing any change in paperless. For 2) however, an encrypted filesystem does not help, because when the filesystem is mounted, the contents is nicely decrypted. To protect against this, you would need to encrypt the files themselves separately (also the database storage). You would need to decrypt them in-memory only and you would need to make sure that the encryption key is not available to the attacker, e.g. by keeping the key only in memory (if at all). It might still be possible to read the key from memory, but that's a different topic. You would need to ask for the key on every start of paperless, of course. To protect against 3), you could encrypt the database, so that the contents are unreadable without access to the key. This also covers the database part of 2). See e.g. https://stackoverflow.com/a/5877130 for sqlite encryption. Protection against 4) on encryption-level is hard. You would need to use separate keys per user, essentially making it impossible for paperless itself to access the data (as you mentioned yourself). IMHO, an encrypted filesystem (e.g. https://en.wikipedia.org/wiki/EncFS) for the documents and an encrypted database would be sensible options with a "master key" to be provided on startup. If you don't want to protect against 2), you could even store the password for the database encryption inside the encrypted filesystem. That way the user would not need to provide the password for starting paperless (only when mounting the encrypted filesystem). encFS also encrypts filenames, btw. Good encryption also comes with the price of making sure to never lose the master key, of course. |
Lets not derail the conversation too much. The discussion of "proper" encryption is a big (separate) one but I think anyone who looks at this closely would agree the encryption as it stands in paperless is in fact a false sense of security, which is why @jonaswinkler chose to remove it (a decision I agree with). The point is IMHO that -ng having removed encryption should not be a barrier to using -ng as the continuation of the project, its not a feature removal if the feature wasnt truly implemented in the first place. As for the other apparent issue, does someone who uses a RPi as their primary host want to try it out?? Seems like we're so worried about low-resourced systems but most of the people commenting here aren't actually using one 😄. If its a major part of the user base then we should be able to find some folks and find out?! |
Well, I am doing that right now... :-) I am fully aware of the existence of NG, however.
Just after learning about paperless, I found this issue and decided to try out NG directly on my RPi4. I didn't manage to set it up, however. Tried it with and without a virtual environment. Version 0.9.11 would not work at all, some python dependency hell, apparently. The dependency problems disappeared in versions 0.9.12 and later, but it throws a missing module in PIL when importing documents. After battling with it for some time, I gave up and installed paperless instead. I was able to get it up and running perfectly within 10 minutes. Now I should say that this RPi4 is running Debian sid and python 3.9. So this might be the source of the problems with NG. Paperless works perfectly, however. So, I am sticking with it for the time being. I am not well versed in python programming, but It seems strict PR review does have its benefits after all. |
Just wanted to add I love this :) Big thanks for sharing @jonaswinkler 👍 I've been using Paperless OG for quite a while but have just switched. Running on a RPi4 via the latest multi-arch image through K8S and working perfectly. |
Hello fellow paperless users, avid paperless user and dev here.
I'm running a fairly big paperless instance with about 2500 documents over here and so far, paperless has been a life saver in many situations. I've recently had to search for and submit various documents for the past 10 years, and finding them was a breeze. So first of all, thanks for the great project.
I'm running a personal fork of paperless over here, which has seen some improvements over the years. For instance, I'm doing machine learning based assignment of selected tags and correspondents, and it works great for me. I've got multiple bank accounts and all my bank statements are in paperless. I've got tags for all of the accounts and paperless assigns them with very high reliability. No need to manually enter matching patterns. I made no attempts to merge this because it was quite experimental and hacky and didn't work alongside the conventional matching algorithms up until now.
I've had some free time on my hands lately and modified paperless quite a bit. Most of the code has been changed, improved, made more stable and more flexible. Both because I wanted to get into this open source thing and, well, I'm using paperless and want it to operate properly. The gist of the changes is as follows:
If you're interested, head over to https://github.com/jonaswinkler/paperless-ng. The documentation at https://paperless-ng.readthedocs.io/en/latest/ is also updated and contains some screenshots, a complete changelog and how to use it with your existing setup. Its easy to setup with docker, but the docs also contain information about what you need to take care of if you're running it without docker. No step by step guides though, since I cannot possibly cover every scenario. Migration from paperless to paperless-ng and backwards is tested.
Anyway, here's why I am creating a ticket over here. I wanted to somehow share my work with other people, but I feel the changes are way too big to be just merged into the main repository. I've also realized that merging individual parts is not possible. For example, the new email consumer depends on mime type checking and on the task processing queue, which itself depends on the reworked consumer code. The front end also depends on the changes to the API, so running that on top of the old back end is a big no-no. That's why I published this under a new name, for now. Gives me more freedom with changes and all that.
Maybe we can have this running as an experimental branch of paperless and get it into the main repository as version 3.0 or something at some point. What are your thoughts?
The text was updated successfully, but these errors were encountered: