Skip to content

Commit

Permalink
T.P.Class: shortcut for base64 data URIs in downloadOrRead.
Browse files Browse the repository at this point in the history
This avoids calling the slow URI parser from network-uri on
data URIs, instead calling our own parser.

Benchmarks on an html -> docx conversion with large base64 image:
GCs from 7942 to 6695, memory in use from 3781M to 2351M,
GC time from 7.5 to 5.6.

See #10075.
  • Loading branch information
jgm committed Dec 19, 2024
1 parent 449824c commit 26b5c95
Showing 1 changed file with 7 additions and 2 deletions.
9 changes: 7 additions & 2 deletions src/Text/Pandoc/Class/PandocMonad.hs
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,8 @@ import Text.Pandoc.Logging
import Text.Pandoc.MIME (MimeType, getMimeType)
import Text.Pandoc.MediaBag (MediaBag, lookupMedia, MediaItem(..))
import Text.Pandoc.Shared (safeRead, makeCanonical, tshow)
import Text.Pandoc.URI (uriPathToPath)
import Text.Pandoc.URI (uriPathToPath, pBase64DataURI)
import qualified Data.Attoparsec.Text as A
import Text.Pandoc.Walk (walkM)
import qualified Text.Pandoc.UTF8 as UTF8
import Data.ByteString.Base64 (decodeLenient)
Expand Down Expand Up @@ -333,7 +334,11 @@ fetchItem s = do
downloadOrRead :: PandocMonad m
=> T.Text
-> m (B.ByteString, Maybe MimeType)
downloadOrRead s = do
downloadOrRead s
| "data:" `T.isPrefixOf` s,
Right (bs, mt) <- A.parseOnly pBase64DataURI s
= pure (bs, Just mt)
| otherwise = do
sourceURL <- getsCommonState stSourceURL
case (sourceURL >>= parseURIReference' . ensureEscaped, ensureEscaped s) of
(Just u, s') -> -- try fetching from relative path at source
Expand Down

0 comments on commit 26b5c95

Please sign in to comment.