You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In some cases PyMuPDF is adding newline characters in the middle of words which do no exist if you simply copy/paste the text from the PDF or extract the text using other libraries.
pdftotext:
'Table 5.14: TDR Modifications\nPrimary modification type (1)\n\nPrincipal\nforgiveness\n\n($ in millions)\n\nInterest\nrate\nreduction\n\nOther\nconcessions (2)\n\nTotal\n\nFinancial effects of modifications\n\nChargeoffs (3)\n\nWeighted\naverage\ninterest\nrate\nreduction\n\nRecorded\ninvestment\nrelated to\ninterest rate\nreduction (4)\n\nYear Ended December 31, 2022\n24\n\n24\n\n349\n\n397\n\n—\n\n10.69%\n\nCommercial real estate\n\nCommercial and industrial\n\n$\n\n—\n\n12\n\n112\n\n124\n\n—\n\n0.92\n\nLease financing\n\n—\n\n—\n\n2\n\n2\n\n—\n\n—\n\n—\n\n24\n\n36\n\n463\n\n523\n\n—\n\n7.51\n\n36\n\nTotal commercial\n\n$\n\n24\n12\n\nResidential mortgage\n\n1\n\n369\n\n1,357\n\n1,727\n\n6\n\n1.61\n\n369\n\nCredit card\n\n—\n\n311\n\n—\n\n311\n\n—\n\n20.33\n\n311\n\nAuto\n\n2\n\n7\n\n63\n\n72\n\n16\n\n4.33\n\n7\n\nOther consumer\n\n—\n\n19\n\n3\n\n22\n\n1\n\n11.48\n\n19\n\nTrial modifications (5)\n\n—\n\n—\n\n228\n\n228\n\n—\n\n—\n\n—\n\nTotal consumer\n\n3\n\n706\n\n1,651\n\n2,360\n\n23\n\n10.14\n\n706\n\n27\n\n742\n\n2,114\n\n2,883\n\n23\n\n10.02%\n\n$\n\n2\n\n9\n\n879\n\n890\n\n20\n\n0.81%\n\n$\n\n41\n\n15\n\n259\n\n315\n\n—\n\n1.28\n\nTotal\n\n$\n\n742\n\nYear Ended December 31, 2021\nCommercial and industrial\n\n$\n\nCommercial real estate\nLease financing\n\n9\n14\n\n—\n\n—\n\n7\n\n7\n\n—\n\n—\n\n—\n\nTotal commercial\n\n43\n\n24\n\n1,145\n\n1,212\n\n20\n\n1.11\n\n23\n\nResidential mortgage\n\n—\n\n70\n\n1,324\n\n1,394\n\n3\n\n1.80\n\n70\n\nCredit card\n\n—\n\n106\n\n—\n\n106\n\n—\n\n19.12\n\n106\n\nAuto\n\n1\n\n4\n\n131\n\n136\n\n54\n\n3.82\n\n4\n\nOther consumer\n\n—\n\n18\n\n1\n\n19\n\n—\n\n11.83\n\n18\n\nTrial modifications (5)\n\n—\n\n—\n\n(3)\n\n(3)\n\n—\n\n—\n\n—\n\nTotal consumer\n\n1\n\n198\n\n1,453\n\n1,652\n\n57\n\n12.01\n\n198\n\n44\n\n222\n\n2,598\n\n2,864\n\n77\n\n10.84%\n\n$\n\n221\n\n$\n\n48\n\nTotal\n\n$\n\nYear Ended December 31, 2020\n24\n\n47\n\n2,971\n\n3,042\n\n162\n\n0.74%\n\nCommercial real estate\n\nCommercial and industrial\n\n10\n\n35\n\n684\n\n729\n\n5\n\n1.11\n\nLease financing\n\n—\n\n—\n\n1\n\n1\n\n—\n\n—\n\n—\n\nTotal commercial\n\n34\n\n82\n\n3,656\n\n3,772\n\n167\n\n0.90\n\n83\n\nResidential mortgage\n\n—\n\n25\n\n4,277\n\n4,302\n\n7\n\n1.93\n\n51\n\nCredit card\n\n—\n\n272\n\n—\n\n272\n\n—\n\n14.12\n\n272\n\n(2)\n(3)\n(4)\n(5)\n\n35\n\nAuto\n\n4\n\n6\n\n166\n\n176\n\n93\n\n4.65\n\n6\n\nOther consumer\n\n—\n\n23\n\n34\n\n57\n\n1\n\n8.28\n\n23\n\nTrial modifications (5)\n\n—\n\n—\n\n3\n\n3\n\n—\n\n—\n\n—\n\nTotal consumer\n\n4\n\n326\n\n4,480\n\n4,810\n\n101\n\n11.80\n\n352\n\n38\n\n408\n\n8,136\n\n8,582\n\n268\n\n9.73%\n\nTotal\n(1)\n\n$\n\n$\n\n$\n\n435\n\nAmounts represent the recorded investment in loans after recognizing the effects of the TDR, if any. TDRs may have multiple types of concessions, but are presented only once in the first\nmodification type based on the order presented in the table above. The reported amounts include loans remodified of $445 million, $737 million, and $1.5 billion for the years ended December 31,\n2022, 2021 and 2020, respectively.\nOther concessions include loans with payment (principal and/or interest) deferral, loans discharged in bankruptcy, loan renewals, term extensions and other interest and noninterest adjustments, but\nexclude modifications that also forgive principal and/or reduce the contractual interest rate. The reported amounts include loans that are new TDRs that may have COVID-19-related payment\ndeferrals and exclude COVID-19-related payment deferrals on loans previously reported as TDRs given limited current financial effects other than payment deferral.\nCharge-offs include write-downs of the investment in the loan in the period it is contractually modified. The amount of charge-off will differ from the modification terms if the loan has been charged\ndown prior to the modification based on our policies. In addition, there may be cases where we have a charge-off/down with no legal principal modification.\nRecorded investment related to interest rate reduction reflects the effect of reduced interest rates on loans with an interest rate concession as one of their concession types, which includes loans\nreported as a principal primary modification type that also have an interest rate concession.\nTrial modifications are granted a delay in payments due under the original terms during the trial payment period. However, these loans continue to advance through delinquency status and accrue\ninterest according to their original terms. Any subsequent permanent modification generally includes interest rate related concessions; however, the exact concession type and resulting financial\neffect are usually not known until the loan is permanently modified. Trial modifications for the period are presented net of previously reported trial modifications that became permanent in the\ncurrent period.\n\nWells Fargo & Company\n\n123\n\n\x0c'
pdfplumber:
'Table 5.14: TDR Modifications\nPrimary modification type (1) Financial effects of modifications\nWeighted Recorded\naverage investment\nInterest interest related to\nPrincipal rate Other Charge- rate interest rate\n($ in millions) forgiveness reduction concessions (2) Total offs (3) reduction reduction (4)\nYear Ended December 31, 2022\nCommercial and industrial $ 24 24 349 397 — 10.69% $ 24\nCommercial real estate — 12 112 124 — 0.92 12\nLease financing — — 2 2 — — —\nTotal commercial 24 36 463 523 — 7.51 36\nResidential mortgage 1 369 1,357 1,727 6 1.61 369\nCredit card — 311 — 311 — 20.33 311\nAuto 2 7 63 72 16 4.33 7\nOther consumer — 19 3 22 1 11.48 19\nTrial modifications (5) — — 228 228 — — —\nTotal consumer 3 706 1,651 2,360 23 10.14 706\nTotal $ 27 742 2,114 2,883 23 10.02% $ 742\nYear Ended December 31, 2021\nCommercial and industrial $ 2 9 879 890 20 0.81% $ 9\nCommercial real estate 41 15 259 315 — 1.28 14\nLease financing — — 7 7 — — —\nTotal commercial 43 24 1,145 1,212 20 1.11 23\nResidential mortgage — 70 1,324 1,394 3 1.80 70\nCredit card — 106 — 106 — 19.12 106\nAuto 1 4 131 136 54 3.82 4\nOther consumer — 18 1 19 — 11.83 18\nTrial modifications (5) — — (3) (3) — — —\nTotal consumer 1 198 1,453 1,652 57 12.01 198\nTotal $ 44 222 2,598 2,864 77 10.84% $ 221\nYear Ended December 31, 2020\nCommercial and industrial $ 24 47 2,971 3,042 162 0.74% $ 48\nCommercial real estate 10 35 684 729 5 1.11 35\nLease financing — — 1 1 — — —\nTotal commercial 34 82 3,656 3,772 167 0.90 83\nResidential mortgage — 25 4,277 4,302 7 1.93 51\nCredit card — 272 — 272 — 14.12 272\nAuto 4 6 166 176 93 4.65 6\nOther consumer — 23 34 57 1 8.28 23\nTrial modifications (5) — — 3 3 — — —\nTotal consumer 4 326 4,480 4,810 101 11.80 352\nTotal $ 38 408 8,136 8,582 268 9.73% $ 435\n(1) Amounts represent the recorded investment in loans after recognizing the effects of the TDR, if any. TDRs may have multiple types of concessions, but are presented only once in the first\nmodification type based on the order presented in the table above. The reported amounts include loans remodified of $445 million, $737 million, and $1.5 billion for the years ended December 31,\n2022, 2021 and 2020, respectively.\n(2) Other concessions include loans with payment (principal and/or interest) deferral, loans discharged in bankruptcy, loan renewals, term extensions and other interest and noninterest adjustments, but\nexclude modifications that also forgive principal and/or reduce the contractual interest rate. The reported amounts include loans that are new TDRs that may have COVID-19-related payment\ndeferrals and exclude COVID-19-related payment deferrals on loans previously reported as TDRs given limited current financial effects other than payment deferral.\n(3) Charge-offs include write-downs of the investment in the loan in the period it is contractually modified. The amount of charge-off will differ from the modification terms if the loan has been charged\ndown prior to the modification based on our policies. In addition, there may be cases where we have a charge-off/down with no legal principal modification.\n(4) Recorded investment related to interest rate reduction reflects the effect of reduced interest rates on loans with an interest rate concession as one of their concession types, which includes loans\nreported as a principal primary modification type that also have an interest rate concession.\n(5) Trial modifications are granted a delay in payments due under the original terms during the trial payment period. However, these loans continue to advance through delinquency status and accrue\ninterest according to their original terms. Any subsequent permanent modification generally includes interest rate related concessions; however, the exact concession type and resulting financial\neffect are usually not known until the loan is permanently modified. Trial modifications for the period are presented net of previously reported trial modifications that became permanent in the\ncurrent period.\nWells Fargo & Company 123'
The text from the footnotes in this example look okay using pdfplumber and pdftotext, but with pymupdf it outputs text that looks like (1) \nAmounts r\n epresent \n the r\n ecorded \n investment \n in loa\n \nns a\n fter \n recognizing \n the effect\n \ns of t\n \n he TD\n \nR, \n if a\n ny. with \n scattered throughout.
PyMuPDF version
1.24.9
Operating system
Linux
Python version
3.10
The text was updated successfully, but these errors were encountered:
here is script that can be used as a circumvention while the team is working on a final solution
This will definitely be known to the team, but noting here just for completeness and if someone searches...this issue is more fundamental than words, as even the rawdict format has it. Thanks!
Description of the bug
In some cases PyMuPDF is adding newline characters in the middle of words which do no exist if you simply copy/paste the text from the PDF or extract the text using other libraries.
How to reproduce the bug
wellsfargo-2022-annual-report.pdf
The text from the footnotes in this example look okay using
pdfplumber
andpdftotext
, but withpymupdf
it outputs text that looks like(1) \nAmounts r\n epresent \n the r\n ecorded \n investment \n in loa\n \nns a\n fter \n recognizing \n the effect\n \ns of t\n \n he TD\n \nR, \n if a\n ny.
with\n
scattered throughout.PyMuPDF version
1.24.9
Operating system
Linux
Python version
3.10
The text was updated successfully, but these errors were encountered: