Skip to content
This repository has been archived by the owner on Sep 10, 2024. It is now read-only.

Many bug fixes and misc. improvements #35

Open
wants to merge 40 commits into
base: master
Choose a base branch
from

Conversation

TylerRick
Copy link

  • Using Thread.new instead of Process::detach fork for render_pdf because Process fork was causing
    database connections to disconnect when the forked process ended
  • Fixing file corrupted when rendering_time is small
  • Make it possible to add a header/footer with page numbers
  • Add some new tests, including some tests that check PDF strings
  • Allow content disposition (inline or attachment) to be specified with a
    X-Pdf-Disposition header.
  • Allow rendering to an output format other than '.pdf', like '.png' images (PhantomJS isn't just
    for PDFs!)
  • Make the Middleware thread-safe
  • Add a simpler middleware, SynchronousMiddleware, that runs the phantomjs
    command synchronously and doesn't use any of the fancy polling that
    Shrimp::Middleware uses. It also avoids showing the user an ugly
    "Preparing pdf..." page, which doesn't even go away after the PDF file has
    been downloaded.
  • Merge the options that were passed in to the middleware with the options
    configured in Shrimp.config, so that you can override an option from
    Shrimp.config by passing in a different value to the middleware.
  • Add config.debug option which outputs additional debugging information like the the phantomjs
    command being run in case you want to manually run it
  • Change the default value for the tmpdir option to save to a shrimp* subdir under /tmp instead of
    just saving everything directly under /tmp.
  • Remove some duplication and clean up some code
  • Improvements to the Readme file

This caused problems if, for example, the URL contained a '&' character, since that has special
meaning to the shell.

Changed it to pass an array to IO.popen instead of using `#{cmd}`.
…utput the command exactly as it

will be run for easier command-line debugging.
…ee details such as the

phantomjs command line that it's about to execute.
…red?, and up_to_date?

Add a test for the Content-Type header.

Change rendering_in_progress? to return false when it would have returned nil.
…o that they can be reused from

various test files.

Enclose the definition of mock_app within a shared_context block so that we can have different
mock_app methods for different middleware test files.
…Dir.tmpdir (which is a fixed

path like '/tmp').

This prevents intermittent test failures like this one that are caused by the file already existing
because it was leftover from the previous test run:

  1) Shrimp::Middleware matching pdf should return 503 the first time
     Failure/Error: last_response.status.should eq 503

       expected: 503
            got: 200
…eady has a writer method.

Rename the misleading variable name @default_options to just @options, because the writer methods
(such as margin=) actually *modify* this hash, so they are not just the default values but rather
the current values.

Replace attr_accessor :default_options with a read-only to_h method for when you need to convert the
Configuration option to a Hash.

Add a Shrimp.config alias for Shrimp.configuration.
…directory within /tmp like

/tmp/shrimp20131120-28880-1bll5ur) instead of just saving everything directly under /tmp
(Dir.tmpdir).

This way you can easily list or delete all shrimp-generated temp files with something like:

    rm -r /tmp/shrimp*
…thods like render_as_pdf and

already_rendered? that use the verb "render".
…d by both the current

middleware and other middlewares.

Added log_render_pdf_start and log_render_pdf_completion, which give useful debugging information if
config.debug is true.
…ons configured in

Shrimp.config, so that you can override an option from Shrimp.config by passing in a different value
to the middleware.
Change run! to simply call run and then check the @error status afterwards.
… get the redirect URL, etc.

Add response and redirect_to methods to Phantom to make it easy to get at that data.

Add a .chomp to @Result to get rid of the extra newline at the end.
…mjs command synchronously and

doesn't use any of the fancy polling that Shrimp::Middleware uses.  It also avoids showing the user
an ugly "Preparing pdf..." page, which doesn't even go away after the PDF file has been downloaded.

The caveat is that you must make sure that your web server is able to process any number of
concurrent requests (or at least enough concurrent requests that you aren't worried).  If your web
server reaches its concurrency limit and then receives a request for a .pdf resource, then when
phantomjs attempts to request the .html version of the resource, this new HTTP request will have to
wait until the server finishes some other request and is able to serve this new HTTP request.

That should generally be okay as long as most of the other requests are quick enough.  But in the
worst case, where each of the requests currently being served is a request for a .pdf resource
(which must generate an additional request for the .html request), then it will create a deadlock,
because request A blocks while waiting for request B to complete, while at the same time request B
can't complete because it is blocking while waiting for request A to finish (in the simple case
where the max concurrency is 2).

In Rails, if you are using a multi-threaded web server such as puma, then you *must* set
config.allow_concurrency = true to allow concurrent requests.  Otherwise, the Rack::Lock middleware
will prevent the 2nd request (that PhantomJS generates) from being processed until the 1st request
(for the .pdf file) has completed, which immediately creates a deadlock!
…o instance variables on a copy

of the self.

By making a copy of the middleware object, even if two web server threads started out with the same
middleware object, when they call call(), a new copy is created in each thread, and only that copy
is modified, so that the threads do not sharing any data.

This is one of the approaches mentioned at http://railscasts.com/episodes/151-rack-middleware.
TylerRick and others added 10 commits November 27, 2013 16:39
…nloaded PDF file, by having the

HTML resource respond with a X-Pdf-Filename header containing the desired file name.

(This is only possible for the SynchronousMiddleware.  In the asynchronous Middleware, the code that
sends the PDF file to the user doesn't have access to the Phantom object, since that object was
created in the separate forked process.)
…f renaming from '_tmp.pdf'.

This allows us to specify an output format other than '.pdf', like '.png'.
- ensure this is checking phantomjs is done with the rendering by
  writing to a done file when the phantomjs command is done
- before this, the code was just checking if the file exists and if the
  file size is not 0
To match changes made in c3f524f
* master:
  bump version
  allow phantomjs to use any encryption protocol
  Do not count first page load as redirect
  v0.0.5
  Updated README
  Added :max_redirects_num config option
  Handle redirects
  Code cleanup
  Test against rbx-2 and 2.1.1. Dropped tests for 1.9.2
  v0.0.4
  Escape URIs before passing them to the shell
  fix travis-ci urls

Conflicts:
       README.md
       lib/shrimp/configuration.rb
       lib/shrimp/phantom.rb
       lib/shrimp/rasterize.js
       spec/shrimp/phantom_spec.rb
Copy link

@tevanraj tevanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@TylerRick
Copy link
Author

Thanks for reviewing the changes, @tevanraj . Is there anything needed before this can get merged?

@brendon
Copy link

brendon commented Dec 14, 2018

These fixes look great! :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants