Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#427 API for getting the last y position and more #666

Merged
merged 4 commits into from
Mar 19, 2021

Conversation

danfickle
Copy link
Owner

@danfickle danfickle commented Mar 9, 2021

#427

Including the page positions of all layers. By popular demand! With test.

Adds the following on PdfBoxRenderer:

   /**
     * Returns the bottom Y postion in bottom-up PDF units
     * on the last page of content.
     * 
     * <strong>WARNING:</strong> NOT transform aware.
     */
    public float getLastContentBottom();
    /**
     * Returns a list of page positions for all layers in the document.
     * The page positions are sorted from first page to last and then top to bottom.
     * The page position values are in bottom-up PDF units.
     *
     * <strong>WARNING:</strong> NOT transform aware. Transformed layers will return page
     * positions that are not correct.
     */
     public List<PagePosition> getLayersPositions();

    /**
     * Returns a list of page positions for a single layer.
     * The page positions are sorted from first page to last and then top to bottom.
     * The page position values are in bottom-up PDF units.
     *
     * Compare to {@link #getLayersPositions()} which will return page
     * positions for all layers.
     *
     * <strong>WARNING:</strong> NOT transform aware. A transformed layer will return page
     * positions that are not correct.
     */
    public List<PagePosition<Layer>> getLayerPositions(Layer layer);

PagePosition is an object containing x, y, width, height, page number and in this case Layer under element.

Including the page positions of all layers. By popular demand!
@stechio
Copy link

stechio commented Mar 9, 2021

I got a try against my test case (previously implemented for #427) and it seems to work well, thanks.

Just a suggestion: won't it be smoother if PagePosition referenced its Layer directly rather than through an awkward, purposely-generated identifier (id property)? What's the actual benefit of indirectly referencing a Layer through an identifier?

As kindly suggested by @stechio.
Also:
+ Make PagePosition immutable.
+ Rename getPagePositions to getAllLayerPagePositions in case we later want to add methods for boxes, etc.
@stechio
Copy link

stechio commented Mar 10, 2021

That's super-handy, kudos!

Recapping, now these are the available position-related methods of PdfBoxRenderer:
findPagePositionsByID (deprecated)
getAllLayerPagePositions
getLastYPositionOfContent
getLayerPagePositions

Some impressions:

  1. IMHO, findPagePositionsByID shouldn't be deprecated (I actually found it useful to retrieve the rendering position of XHTML elements, such as paragraphs or divs, I had previously marked by id)
  2. contrary to venerable old-day Java best practices, I detest interface verbosity (IMHO a neat, succinct, well-documented API is much more productive and enjoyable to work with than long, boring, entropy-wasteful camel sequences...). In the high-level context of PdfBoxRenderer (where the user is primarily interested to the final result of the rendering, not its inner workings), the only relevant positions exposed to the API are PagePosition, so fully-qualifying each and every method seems needlessly redundant ("Position" would fit nicely). Here are my picks:
    findPositionsById instead of findPagePositionsByID
    getLastContentBottom instead of getLastYPositionOfContent ("Bottom" precisely conveys the meaning of that value, instead of the slightly ambiguous "LastYPosition" which could be interpreted as the Y coordinate of the last content)
    getLayerPositions instead of getLayerPagePositions
    getLayersPositions instead of getAllLayerPagePositions

@stechio
Copy link

stechio commented Mar 10, 2021

Another suggestion related to PdfBoxRenderer.findPagePositionsByID(..): if (as I root for) this method survives deprecation, it would be cool to generalize PagePosition to reference not just a Layer object, but a Box object too (this way, references to both types would be consolidated into a common interface):

package com.openhtmltopdf.pdfboxout;

import com.openhtmltopdf.layout.Layer;
import com.openhtmltopdf.render.Box;

public class PagePosition<T> {
. . .
    private final T _element; // instead of 'private final Layer _layer';
. . .

    public PagePosition(T element, String id, int pageNo, float x, float y, float width, float height) {
. . .
        _element = element;
. . .
    }
. . .
    /**
     * Element associated to this position.
     * 
     * @return {@link Box} or {@link Layer}
     */
    public T getElement() {
        return _element;
    }
}

Consequently:

public class PdfBoxRenderer implements Closeable, PageSupplier {
. . .
    public List<PagePosition<Box>> findPagePositionsByID(Pattern pattern) {
. . .
    }

    public List<PagePosition<Layer>> getAllLayerPagePositions() {
. . .
    }

That solution could be enhanced subclassing PagePosition:

public abstract class PagePosition<T> {
. . .
    private final T _element; // instead of 'private final Layer _layer';
. . .

    protected PagePosition(T element, String id, int pageNo, float x, float y, float width, float height) {
. . .
        _element = element;
. . .
    }
. . .
    /**
     * Element associated to this position.
     */
    public T getElement() {
        return _element;
    }
}

public class LayerPosition extends PagePosition<Layer> {
    public LayerPosition(Layer element, int pageNo, float x, float y, float width, float height) {
        super(element, null, pageNo, x, y, width, height);
    }
}

public class BoxPosition extends PagePosition<Box> {
    public BoxPosition(Box element, int pageNo, float x, float y, float width, float height) {
        super(element, element.getElement().getAttribute("id"), pageNo, x, y, width, height);
    }
}

Implementing excellent suggestions made by @stechio in #666.

Note: I did not rename findPagePositionsByID as others may be using.
@stechio
Copy link

stechio commented Mar 12, 2021

It feels more solid now, thanks.

About findPagePositionsByID, I think it could be smoothly migrated to the new name making it deprecated (as your initial intent, BTW) and adding the newly-named method along (getBoxPositionsByID would be even better, as initially-proposed findPositionsByID now sounds ambiguous considering the new Layer-related position methods; furthermore, I would drop the "find" verb in favor of "get" for semantic affinity with the Layer-related position methods, whose filter is a Layer instead of a Pattern). Here it is my proposal:

public class PdfBoxRenderer implements Closeable, PageSupplier {
. . .
    // ** OLD method **
    /**
     * @deprecated Use {@link #getBoxPositionsByID(Pattern)} instead.
     */
    @Deprecated
    public List<PagePosition<Box>> findPagePositionsByID(Pattern pattern) {
        return getBoxPositionsByID(pattern);
    }

    // ** NEW method **
    public List<PagePosition<Box>> getBoxPositionsByID(Pattern pattern) {
        /* NOTE: The same naming has to be mirrored to PdfBoxOutputDevice and its implementations. */
        return _outputDevice.getBoxPositionsByID(newLayoutContext(), pattern);
    }
. . .
    public List<PagePosition<Layer>> getLayerPositions(Layer layer) {
. . .
    }

Extending further the filter analogy to the Layer-related methods, they could be considered just overloads of the same retrieval functionality (with Layer filter, gets the positions of that layer; without filters, gets the positions of all the layers):

public class PdfBoxRenderer implements Closeable, PageSupplier {
. . .
    /**
     * Returns a list of page positions for all layers in the document.
     * The page positions are sorted from first page to last and then top to bottom.
     * The page position values are in bottom-up PDF units.
     *
     * <strong>WARNING:</strong> NOT transform aware. A transformed layer will return page
     * positions that are not correct.
     */
    // ** This is the previously-named getLayersPositions() **
    public List<PagePosition<Layer>> getLayerPositions() {
        return getLayerPositions(null);
    }

    /**
     * Returns a list of page positions for the given layer (or all layers in the document if unspecified).
     * The page positions are sorted from first page to last and then top to bottom.
     * The page position values are in bottom-up PDF units.
     *
     * <strong>WARNING:</strong> NOT transform aware. A transformed layer will return page
     * positions that are not correct.
     */
    public List<PagePosition<Layer>> getLayerPositions(Layer layer) {
        if (layer != null) {
            // ** HERE the original getLayerPositions(Layer) implementation.
        } else {
            // ** HERE the original getLayersPositions() implementation.
        }
    }

These would be the ultimately polished names:
getBoxPositionsByID(Pattern)
getLayerPositions()
getLayerPositions(Layer)

@danfickle
Copy link
Owner Author

Hi @stechio,

I have to do a long promised release and I think we are getting into diminishing returns, so I'm going to merge now.

I'd like to give you a huge thanks for reviewing my code (and hopefully making me a slightly better coder). I hope you stay involved with this project, be it code review, finding issues and possibly PRs. Anyway, thank you!

@danfickle danfickle merged commit c21acd5 into open-dev-v1 Mar 19, 2021
@danfickle danfickle deleted the fix_427_api_get_last_y branch March 19, 2021 08:36
@stechio
Copy link

stechio commented Mar 19, 2021

Sorry that I'm late in the party, but today I discovered the current implementation of #666 doesn't work with absolutely-positioned elements -- please take a look at my comment (it includes a fully-reproducible case).
thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants