Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for issue #116 #215

Closed
wants to merge 13 commits into from
Closed

Fix for issue #116 #215

wants to merge 13 commits into from

Conversation

MohammedElsayyed
Copy link
Member

No description provided.

Add "org.apache.httpcomponents.httpclient" dependency.
Add a bugfix at the end of "Bug Fixes" list of "OpenWayback 2.1.0 Release" section.
Add a RemoteLiveWebCache2-related comment.
Include myself in developers list.
@kris-sigur
Copy link
Member

Having a classname like RemoteLiveWebCache2 is unacceptable.

If RemoteLiveWebCache2 is fully functionally equivalent, but more robust than RemoteLiveWebCache, it should simply replace the old class. If this is the case, I'd like input from IA as they are the other big user of this facility.

Otherwise, if RemoteLiveWebCache2 is in some way different from the old class, making it sensible to retain both, then two things need to happen.

  1. RemoteLiveWebCache2 needs to be renamed to reflect this difference. It may be acceptable to also rename the old class if that helps to clarify their difference.
  2. The JavaDoc for both classes must be updated to include a description of their differences so that users can choose the appropriate one. This can be added to the class level JavaDoc comment (currently empty on both except for an author tag).

In this scenario (where both classes are retained) I think we should retain the old one as the default option in LiveWeb.xml unless there is a strong argument for changing it.

@MohammedElsayyed
Copy link
Member Author

Rethinking about it, it is perhaps better to describe this class as an alternate implementation.

The new class is a modified version of the old that works with a standard proxy server (e.g., Squid), not with the ArcRecordingProxy. We went that route initially because our experience using the ArcRecordingProxy was not a good one, and we kept on bumping into connection issues, making the Wayback unusable.

I could suggest renaming the old and new classes to the following, respectively:

ArcRemoteLiveWebCache
StdRemoteLiveWebCache

I realize the new class needs a good deal of cleaning up, but it works so far, and it is what we have in production at the BA right now.

Thank you for any feedback.

@kris-sigur
Copy link
Member

That's starting to make sense to me. Perhaps start with adding the JavaDoc descriptions on the classes functionality?

@MohammedElsayyed
Copy link
Member Author

OK, I will work on it next days. Should I push more commits on the same branch or create a new branch with modified contents?

@kris-sigur
Copy link
Member

You can push the commits to the same branch. They will then become part of this pull request.

@MohammedElsayyed
Copy link
Member Author

I finished adding JavaDoc descriptions on both classes (StdRemoteLiveWebCache.java and ArcRemoteLiveWebCache.java) and updating LiveWeb.xml to fit with the changes.

As shown above, Travis CI failed to build my first commits. This is because LiveRobotsNoCache.java is extending ArcRemoteLiveWebCache.java and I should've committed it first. Anyway, do you think you could rebuild 'Update ArcRemoteLiveWebCache.java' (e980049) and 'Update LiveWeb.xml' (4815ab1) instead of committing them again? Thank you.

@MohammedElsayyed
Copy link
Member Author

Please don't merge stdRemoteLiveWebCache because after doing some tests, I noticed it is not working properly. I have added 2 features to the current version of stdRemoteLiveWebCache.

Lines 122-126 set connection time out to 10 seconds which increases openwayback responsiveness. This part is working perfectly normal.

As you know that Openwayback gets robots.txt from the live web. If any domain is down (or is not responding) because its http status code is 404, then Openwayback will fail to get robots.txt and will throw a connection time out exception (Openwayback unable to get the robots.txt document to display this page). Lines 133-142 should fix this issue, but it is not working as expected.

Openwayback is stable somehow if we excluded lines 133-142 from stdRemoteLiveWebCache, but it will not display contents of any down domain (http status is 404), which breaks the idea of web archiving. I will work on it and commit it again if it worked gracefully.

@kris-sigur kris-sigur mentioned this pull request May 8, 2015
@kris-sigur
Copy link
Member

Closing in favor of PR #251

@kris-sigur kris-sigur closed this May 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants