-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exclude URLs from a specific time range #212
Comments
To me it would make more sense for it to block snapshots for a range given in the exclusion file rather than show the snapshots in that range. |
Thanks a lot, Lauren for your review. This is what I mean. I modified it. Another thing, what if range is not continuous? For instance, exclude from 2001 to 2002 and from 2004 to 2006. Should we write 2 lines in exclusion file with different start and end date for the same URL? |
I like one range per line, so in your example, yes, two lines in the file for the different start and end dates for the same URL. |
Is there way to make it work as inclusion list also? Ie. inclusion file will contain list of URL which are accessible, everything else from index is blocked. It is easier for us to create list of allowed sites then blocked ones. Our use case for public access would be to allow(include) range of sites we have contract with and exclude few specific from range, regarding copyright violation etc. For onsite access from library we would like to allow everything and exclude URL for copyright violation etc. Regarding optional date for inclusion file, it would be useful to set more then one date range. Ie. allow site 2002-2008 and 2010-2015, other dates excluded. Maybe SURTs would be useful too. I tried to use oracle access, but end up with error log i was unable to deal with. And our colleagues wrote custom class for inclusion behavior years ago for Wayback 1.14/15. I am trying to build this old class to work with Openwayback right now, but i am not sure if i will succeed. Anyway it does not deal with dates or SURTs. But simple URL exclusion/inclusion class with at least one date range would be great for us. |
As stated in wayback.xml, we can use the following configuration to block URLs from the ResourceIndex by creating a plain text file "e.g. /tmp/exclude.txt" which contains URL prefixes:
Can we change exclusion file format by including start and end date next to every URL if needed? OpenWayback (ResourceIndex) will check if there is a start and end date, then it will block snapshots which are in that range, else (no start and end date) it will behave normally by blocking it. A 3-column exclusion file format is as follows:
1st column is URL prefix which should be blocked. (required)
2nd and 3rd column are start and end date, respectively. (optional)
The text was updated successfully, but these errors were encountered: