Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

从Excel中读取的图片是否可以区分是不是隐藏对象? #414

Open
RexH0 opened this issue Jan 6, 2025 · 12 comments
Open

从Excel中读取的图片是否可以区分是不是隐藏对象? #414

RexH0 opened this issue Jan 6, 2025 · 12 comments
Labels

Comments

@RexH0
Copy link

RexH0 commented Jan 6, 2025

List<Drawings.Picture> pictures = reader.listPictures();
读取出来的图片有没有办法区分是不是隐藏对象?

@wangguanquan
Copy link
Owner

如何隐藏图片,可以给一个操作手册我本地测试一下才能给出方案

@RexH0
Copy link
Author

RexH0 commented Jan 8, 2025

文件已发送到您邮箱
导入此文件,使用 List<Drawings.Picture> pictures = reader.listPictures(); 读取文件时 程序直接卡死 不会再往下走,
不知道excel 有什么问题,用wps打开也很慢,客户做的excel,有没有办法可以提示文件有问题?

@RexH0
Copy link
Author

RexH0 commented Jan 8, 2025

image
卡住是因为读到十几万个图片,这些图片在excel 内看不到

@wangguanquan
Copy link
Owner

wangguanquan commented Jan 8, 2025 via email

@RexH0
Copy link
Author

RexH0 commented Jan 8, 2025

这个表格应该是有问题的,这些图片在表格里我也没看到在哪里,程序有没有办法把它找出来 ?

@wangguanquan
Copy link
Owner

wangguanquan commented Jan 8, 2025 via email

@wangguanquan
Copy link
Owner

wangguanquan commented Jan 8, 2025 via email

@RexH0
Copy link
Author

RexH0 commented Jan 8, 2025

那是否有办法可以判断xml文件中有重复节点?

@wangguanquan
Copy link
Owner

wangguanquan commented Jan 8, 2025

使用如下方法可以极大提升读取速度

  1. 自定义XMLDrawings并覆写parseDrawings方法,处理“隐藏”图片和重复节点
  2. 自定义ExcelReader并覆写init方法引用第一步自定义XMLDrawings
  3. 使用自定义ExcelReader
// 自定义XMLDrawings
public static class MyXMLDrawings extends XMLDrawings {

    public MyXMLDrawings(ExcelReader reader) {
        super(reader);
    }

    // Parse drawings.xml
    protected List<Picture> parseDrawings(ZipFile zipFile, ZipEntry entry, Path imagesPath) {
        int i = entry.getName().lastIndexOf('/');
        String relsKey;
        if (i > 0)
            relsKey = entry.getName().substring(0, i) + "/_rels" + entry.getName().substring(i);
        else if ((i = entry.getName().lastIndexOf('\\')) > 0)
            relsKey = entry.getName().substring(0, i) + "\\_rels" + entry.getName().substring(i);
        else relsKey = entry.getName();
        String key = relsKey + ".rels";
        ZipEntry entry1 = getEntry(zipFile, key);
        if (entry1 == null) return null; //throw new ExcelReadException("The file format is incorrect or corrupted. [" + key + "]");
        SAXReader reader = SAXReader.createDefault();
        Document document;
        try {
            document = reader.read(zipFile.getInputStream(entry1));
        } catch (DocumentException | IOException e) {
            throw new ExcelReadException("The file format is incorrect or corrupted. [" + key + "]");
        }
        List<Element> list = document.getRootElement().elements();
        Relationship[] rels = new Relationship[list.size()];
        i = 0;
        for (Element e : list) {
            rels[i++] = new Relationship(e.attributeValue("Id"), e.attributeValue("Target"), e.attributeValue("Type"));
        }
        RelManager relManager = RelManager.of(rels);

        try {
            document = reader.read(zipFile.getInputStream(entry));
        } catch (DocumentException | IOException e) {
            throw new ExcelReadException("The file format is incorrect or corrupted. [" + entry.getName() + "]");
        }

        Element root = document.getRootElement();
        Namespace xdr = root.getNamespaceForPrefix("xdr"), a = root.getNamespaceForPrefix("a");

        List<Element> elements = root.elements();
        List<Picture> pictures = new ArrayList<>(elements.size());
        // 处理大量重复的节点
        Map<String, Path> localPathMap = new HashMap<>(Math.min(1 << 10, elements.size()));
        for (Element e : root.elements()) {
            Element pic = e.element(QName.get("pic", xdr));
            // Not a picture
            if (pic == null) continue;

            Element blipFill = pic.element(QName.get("blipFill", xdr));
            if (blipFill == null) continue;

            Element blip = blipFill.element(QName.get("blip", a));
            if (blip == null) continue;

            // FIXME 判断是否隐藏,如果为"隐藏"图片需要业务判断是否需要读取
            Element nvPicPr = pic.element(QName.get("nvPicPr", xdr));
            if (nvPicPr != null) {
                Element cNvPr = nvPicPr.element(QName.get("cNvPr", xdr));
                // 隐藏图片默认不读取
                if (cNvPr != null && "1".equals(cNvPr.attributeValue("hidden"))) {
                    continue;
                }
            }

            Namespace r = blip.getNamespaceForPrefix("r");
            String embed = blip.attributeValue(QName.get("embed", r));
            Relationship rel = relManager.getById(embed);
            if (rel != null && Const.Relationship.IMAGE.equals(rel.getType())) {
                Picture picture = new Picture();
                pictures.add(picture);
                // Copy image to tmp path
                String target = toZipPath(rel.getTarget());
                // FIXME 修改点:先从缓存里查看是否已解析过图片,如果有则直接从缓存中获取
                Path targetPath = localPathMap.get(target);
                if (targetPath == null && (entry = getEntry(zipFile, "xl/" + target)) != null) {
                    // Copy image to tmp path
                    try {
                        targetPath = imagesPath.resolve(rel.getTarget());
                        Files.copy(zipFile.getInputStream(entry), targetPath, StandardCopyOption.REPLACE_EXISTING);
                        localPathMap.put(target, targetPath);
                    } catch (IOException ioException) { }
                }
                picture.localPath = targetPath;

                int[][] ft = parseDimension(e, xdr);
                picture.dimension = new Dimension(ft[0][2] + 1, (short) (ft[0][0] + 1), ft[1][2] + 1, (short) (ft[1][0] + 1));
                picture.padding = new short[] { (short) ft[0][3], (short) ft[1][1], (short) ft[1][3], (short) ft[0][1] };
                String editAs = e.attributeValue("editAs");
                int property = -1;
                if (StringUtil.isNotEmpty(editAs)) {
                    switch (editAs) {
                        case "twoCell" : property = 0; break;
                        case "oneCell" : property = 1; break;
                        case "absolute": property = 2; break;
                        default:
                    }
                }
                picture.property = property;
                Element spPr = pic.element(QName.get("spPr", xdr));
                if (spPr != null) {
                    Element xfrm = spPr.element(QName.get("xfrm", a));
                    String rot;
                    if (xfrm != null && StringUtil.isNotBlank(rot = xfrm.attributeValue("rot"))) {
                        try {
                            picture.revolve = Integer.parseInt(rot) / 60000;
                        } catch (Exception ex) {
                            // Ignore
                        }
                    }

                    // TODO Attach picture effects
                }

                Element extLst = blip.element(QName.get("extLst", a));
                if (extLst == null) continue;

                for (Element ext : extLst.elements()) {
                    Element srcUrl = ext.element("picAttrSrcUrl");
                    // hyperlink
                    if (srcUrl != null) {
                        rel = relManager.getById(srcUrl.attributeValue(QName.get("id", r)));
                        if (rel != null && Const.Relationship.HYPERLINK.equals(rel.getType())) {
                            picture.srcUrl = rel.getTarget();
                        }
                    }
                }
            }
        }
        return !pictures.isEmpty() ? pictures : null;
    }
}
// 自定义ExcelReader
public static class MyExcelReader extends ExcelReader {
    public MyExcelReader(Path path) throws IOException {
        super(path);
    }

    public MyExcelReader(InputStream stream) throws IOException {
        super(stream);
    }

    @Override
    protected ExcelReader init(Path path) throws IOException {
        super.init(path);
        // FIXME 使用MyXMLDrawings
        if (drawings != null) {
            drawings = new MyXMLDrawings(this);
            for (Sheet sheet : sheets) {
                ((XMLSheet) sheet).setDrawings(drawings);
            }
        }
        return this;
    }
}
// 使用自定义ExcelReader
@Test public void testMyExcelReader() throws IOException {
    // FIXME 使用自定义MyExcelReader
    try (ExcelReader reader = new MyExcelReader(Paths.get("F:/excel/1836921628.xlsx"))) {
        List<Drawings.Picture> list = reader.listPictures();
    }
}

@RexH0
Copy link
Author

RexH0 commented Jan 9, 2025

我先参照你的方法试试

wangguanquan pushed a commit that referenced this issue Jan 9, 2025
@wangguanquan
Copy link
Owner

我先参照你的方法试试

上面的方法是否有效?

@RexH0
Copy link
Author

RexH0 commented Jan 10, 2025

没试呢,在忙其他需求,试了反馈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants