Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to get the last rendered page of a paragraph/element #670

Open
Czechh opened this issue Dec 21, 2023 · 2 comments
Open

Ability to get the last rendered page of a paragraph/element #670

Czechh opened this issue Dec 21, 2023 · 2 comments

Comments

@Czechh
Copy link

Czechh commented Dec 21, 2023

Is your feature request related to a problem? Please describe.

When reading a docx file, it's really useful to understand where a paragraph is located within a document to create experiences around moving the renderer to that point and generate references and quotes that come from a docx document.

Describe the solution you'd like

Since the page number is really something that is part of the render engine of the docx file, I do believe that editors like MS Word, inserts <w:lastRenderedPageBreak/> break points (more info). So adding using this XML element to infer the page while constructing the document and adding that value to each Paragraph and Table should suffice.

Something like:

impl FromXML for Document {
    fn from_xml<R: Read>(reader: R) -> Result<Self, ReaderError> {
        let mut parser = EventReader::new(reader);
        let mut last_rendered_page_index = 0;
        let mut doc = Self::default();
        loop {
            let e = parser.next();
            match e {
                Ok(XmlEvent::StartElement {
                    attributes, name, ..
                }) => {
                    let e = XMLElement::from_str(&name.local_name).unwrap();
                    match e {
                        XMLElement::Paragraph => {
                            let mut p = Paragraph::read(&mut parser, &attributes)?;
                            p = p.last_rendered_page_break_number(last_rendered_page_index);
                            doc = doc.add_paragraph(p);
                            continue;
                        }
                        ...
                        XMLElement::LastRenderedPageBreak => {
                            last_rendered_page_index += 1;

                            continue;
                        }
                        _ => {}
                        ...

Describe alternatives you've considered

I have considered getting the estimates of the element sizes, and doing a rough calculation of that possible page number. But, this might be more buggy and hacky than the other alternative.

Additional context

I'm happy to work on this, if the author agrees!

@bokuweb
Copy link
Owner

bokuweb commented Dec 24, 2023

@Czechh Thanks for your proposal. Also, thanks for sponsoring.
I am interested, may I ask you to try to make a PR?

@Czechh
Copy link
Author

Czechh commented Dec 26, 2023

Of course! I'll get a pr going! Thank you for the response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants