Skip to content

Feature Request: Paragraph.get_listnum() #471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nikeqiang opened this issue Feb 7, 2018 · 7 comments
Open

Feature Request: Paragraph.get_listnum() #471

nikeqiang opened this issue Feb 7, 2018 · 7 comments

Comments

@nikeqiang
Copy link

following from #180 #25 #217 ... especially #25

Proposed Feature: Paragraph method computing/yielding a string of the paragraph number/heading as it would appear in the open microsoft word document (e.g. '9.1', '(a)', '(i)', 'Article III', etc.) ;

In obtaining the above, assuming (based on very preliminary review of this helpful article: https://msdn.microsoft.com/en-us/library/office/ee922775(v=office.14).aspx that the following need to first be obtained (let me know what's missing!)
(i) the abstract numbering format
(ii) the level of current paragraph within numbered list if nested e.g. ('9.' in example below would be level1 and '9.1' would be level2
(iii) the index of the current paragraph within the immediate nested list ('(a)' index 0 , '(b)' index 1 etc).
(iv) overrides?

if preferred, the get_listnum() method could return a tuple of (numText, format, level, index) --> ("9.1", '%1.%2' , 2, 0)

For example,

when open in microsoft word the docx appears as below:


SECTION 5
INTEREST PAYMENTS

9. INTEREST

9.1 Calculation of interest

The rate of interest on each Loan for each period is the percentage rate per annum which is the aggregate of the applicable:

(a) margin; and
(b) LIBOR.


currently, (paragraph.text, paragraph.style) yields:

SECTION 5 INTEREST PAYMENTS , _ParagraphStyle("Normal")
INTEREST , _ParagraphStyle("Style1")
Calculation of interest , _ParagraphStyle("Style2")
The rate of interest on each Loan for each period is the percentage rate per annum which is the aggregate of the applicable:
margin; and , _ParagraphStyle("Style3")
LIBOR. _ParagraphStyle("Style3")

paragraph.listnum_text() would yield:
''
'9.'
'9.1'
''
'(a)'
'(b)'
etc.

Thoughts, comments? Interest in a concerted effort to finally solve this issue and add it to the library? I realise a lot of people have already spent significant time on this and that there are a bunch of pitfalls I'm not considering (if it were that easy, we'd have the feature already...)

Still, I feel like if you solved this you could basically circumvent microsoft word entirely!

@amucunguzi
Copy link

This would be a wonderful feature to have. I have spent the last two hours trying to find a way to get the numbering of a paragraph with "List Paragraph" style. I am only able to get the text.

@tarekbazine
Copy link

tarekbazine commented Mar 5, 2019

Is there any solution to achieve this ? even a work around !

@amucunguzi
Copy link

Is there any solution to achieve this ? even a work around !

I spent way too much time on that, and as far as I can tell, python-dox cannot solve that (for now, the logic isn't implemented). I ended up using Java (Apache POI).

@Amin-Azar
Copy link

Amin-Azar commented Mar 16, 2019

I need to add this feature for my work! I will work on it this weekend and share the result with you guys.

@longbowking
Copy link

@Amin1360 did you make it work?

@PinakiChat1
Copy link

I also need this feature. Has it been implemented?
When I run print(block.listnum_text()) in the following code snippet, it prints out an empty string.
This means listnum_text() has been implemented otherwise I would have got an error. Does it return the heading number? If so how to use it? Else how can we get the heading number?

for block in iter_block_items(doc):
    if isinstance(block, Paragraph):
        try:
                print(block.listnum_text()) 
                return block.text
        except:
            if ValueError:
                next
    elif isinstance(block, Table):
        for row in block.rows:
            prior_tc = None
            for cell in row.cells:
                if cell._tc == prior_tc:
                    continue
                prior_tc = cell._tc
                for paragraph in cell.paragraphs:
                    return paragraph.text

@natmayak
Copy link

natmayak commented May 16, 2021

It would be interesting to dig how this is implemented in MS Word itself.
What I mean is this. If you copy some numbered text and paste special it as unformatted text, you will get all the text with the paragraph numbers, like this "1. Some copied text" where "1." would be textual symbols.

I'm a beginner in programming and it's still difficult for me to go deep into classes (I'm working on it), but maybe the paste special thing will help someone develop the idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants