Parsing line Error: list index out of range #5101

WZMIAOMIAO · 2021-12-28T07:11:50Z

系统环境/System Environment：Ubuntu18.04
版本号/Version：Paddle 2.2.1
PaddleOCR：release2.4
问题相关组件/Related components：pyclipper
数据集： https://paddleocr.bj.bcebos.com/dataset/det_data_lesson_demo.tar
运行指令/Command Code：!python tools/train.py -c configs/det/det_mv3_db.yml
完整报错/Complete Error Message：

[2021/12/23 12:41:26] root INFO: train dataloader has 94 iters
[2021/12/23 12:41:26] root INFO: valid dataloader has 250 iters
[2021/12/23 12:41:26] root INFO: During the training process, after the 0th iteration, an evaluation is run every 500 iterations
[2021/12/23 12:41:26] root INFO: Initialize indexs of datasets:['/home/aistudio/work/data/det_data_lesson_demo/train.txt']
[2021/12/23 12:41:54] root INFO: epoch: [1/100], iter: 10, lr: 0.000027, loss: 9.582685, loss_shrink_maps: 4.681584, loss_threshold_maps: 3.961636, loss_binary_maps: 0.939466, reader_cost: 1.81348 s, batch_cost: 2.77511 s, samples: 88, ips: 3.17105
[2021/12/23 12:41:58] root ERROR: When parsing line mtwi/train/TB1_5H8n3vD8KJjy0FlXXagBFXa_!!0-item_pic.jpg.jpg	[{"transcription": "\u6d53\u7f29\u9664\u81ed\u6db2", "points": [[473.55, 99.64], [456.18, 41.82], [778.73, 39.82], [777.73, 105.82]]}, {"transcription": "1000ml", "points": [[476.27, 158.73], [477.27, 129.09], [618.55, 124.09], [618.55, 158.73]]}, {"transcription": "\u62b510\u74f6", "points": [[647.55, 121.64], [652.55, 165.64], [771.09, 166.64], [773.09, 121.64]]}, {"transcription": "\u9001", "points": [[691.82, 347.45], [690.82, 437.36], [768.55, 426.36], [777.0, 345.36]]}, {"transcription": "YaHo\u4e9a\u79be", "points": [[94.0, 289.0], [94.0, 305.73], [164.73, 305.73], [164.73, 287.0]]}, {"transcription": "YaHo\u4e9a\u79be", "points": [[242.55, 290.0], [242.55, 303.27], [317.45, 303.27], [316.45, 287.0]]}, {"transcription": "YaHo\u4e9a\u79be", "points": [[650.55, 476.36], [651.55, 485.82], [694.91, 486.82], [695.91, 477.36]]}, {"transcription": "Disiuf", "points": [[48.36, 325.55], [46.55, 359.09], [154.55, 363.45], [156.55, 330.55]]}, {"transcription": "spray", "points": [[61.45, 362.73], [61.45, 378.91], [123.27, 377.0], [121.27, 361.91]]}, {"transcription": "spray", "points": [[211.73, 360.27], [214.73, 377.73], [272.64, 377.73], [269.64, 360.27]]}, {"transcription": "Disiufectaut", "points": [[198.73, 324.55], [199.73, 361.82], [387.0, 357.82], [390.0, 328.55]]}, {"transcription": "\u5ba0\u7269\u9664\u81ed\u6db2", "points": [[271.64, 379.64], [272.64, 400.0], [369.82, 399.0], [371.82, 381.64]]}, {"transcription": "\u5ba0\u7269", "points": [[125.73, 379.64], [122.73, 399.27], [153.82, 401.27], [154.82, 381.64]]}, {"transcription": "\u6d53\u7f29\u578b", "points": [[63.82, 380.82], [67.82, 397.55], [116.73, 399.55], [114.73, 382.82]]}, {"transcription": "\u6d53\u7f29\u578b", "points": [[216.91, 382.55], [214.91, 401.27], [267.27, 399.27], [269.27, 383.55]]}, {"transcription": "\u8309\u8389\u82b1\u82ac\u82b3", "points": [[63.18, 422.09], [61.18, 429.36], [104.82, 429.36], [105.82, 422.09]]}, {"transcription": "\u8309\u8389\u82b1\u82ac\u82b3", "points": [[211.73, 421.09], [211.73, 429.82], [256.64, 430.82], [254.64, 421.09]]}, {"transcription": "\u51c0\u542b\u91cf\uff1a1000ML", "points": [[63.18, 442.91], [61.18, 454.55], [144.36, 453.55], [143.36, 442.91]]}, {"transcription": "\u51c0\u542b\u91cf\uff1a1000ML", "points": [[216.18, 444.64], [213.18, 457.27], [294.91, 454.27], [296.91, 445.64]]}, {"transcription": "\u51c0\u542b\u91cf", "points": [[703.8, 615.47], [705.8, 621.8], [721.4, 620.8], [723.4, 615.47]]}, {"transcription": "500ML", "points": [[704.2, 622.4], [702.2, 626.67], [720.2, 627.67], [720.2, 622.4]]}, {"transcription": "\u8309\u8389\u82b1\u82ac\u82b3", "points": [[661.07, 546.13], [661.07, 553.53], [689.93, 553.53], [691.93, 547.13]]}, {"transcription": "\u5ba0\u7269\u795b\u5473\u55b7\u96fe", "points": [[640.8, 537.13], [642.8, 526.87], [713.2, 526.87], [710.2, 537.13]]}, {"transcription": "Healthy", "points": [[62.27, 408.93], [64.27, 419.6], [101.07, 417.6], [103.07, 408.93]]}, {"transcription": "Antiscptic", "points": [[104.27, 408.53], [104.27, 420.67], [152.33, 420.67], [154.33, 408.53]]}, {"transcription": "###", "points": [[213.13, 406.93], [215.13, 417.0], [249.33, 417.0], [249.33, 406.93]]}, {"transcription": "Antiscptic&DeodorantForPet", "points": [[253.13, 417.0], [253.6, 408.47], [408.2, 407.93], [401.27, 418.07]]}, {"transcription": "Healthy", "points": [[224.2, 406.4], [224.2, 407.93], [221.67, 406.93], [223.67, 407.4]]}, {"transcription": "DEODORANT", "points": [[627.0, 505.07], [627.0, 496.47], [724.53, 495.47], [724.53, 505.07]]}, {"transcription": "SPRAY", "points": [[651.47, 519.87], [651.47, 509.2], [698.0, 508.2], [702.0, 518.87]]}, {"transcription": "\u4e70\u4e00\u9001\u4e00", "points": [[27.07, 790.8], [12.93, 645.2], [484.67, 631.93], [450.8, 786.27]]}, {"transcription": "###", "points": [[123.93, 700.2], [121.93, 700.73], [124.47, 700.73], [124.47, 700.2]]}, {"transcription": "\u5206\u89e3\u81ed\u5473", "points": [[515.0, 793.07], [514.0, 710.53], [786.0, 700.53], [793.0, 784.07]]}, {"transcription": "\u9001\u9664\u81ed\u55b7\u96fe500ml", "points": [[522.2, 698.0], [522.2, 666.47], [786.87, 662.47], [788.87, 697.0]]}]
, error happened with msg: Traceback (most recent call last):
  File "/home/aistudio/work/PaddleOCR/ppocr/data/simple_dataset.py", line 119, in __getitem__
    outs = transform(data, self.ops)
  File "/home/aistudio/work/PaddleOCR/ppocr/data/imaug/__init__.py", line 43, in transform
    data = op(data)
  File "/home/aistudio/work/PaddleOCR/ppocr/data/imaug/make_border_map.py", line 60, in __call__
    self.draw_border_map(text_polys[i], canvas, mask=mask)
  File "/home/aistudio/work/PaddleOCR/ppocr/data/imaug/make_border_map.py", line 81, in draw_border_map
    padded_polygon = np.array(padding.Execute(distance)[0])
IndexError: list index out of range

问题描述

之前我有提过一个相关issue#5029，里面提到了两个问题：

第一个是Paddle读取图片的问题(之前也提过PR)
第二个就是解析标签数据时遇到的问题，这两天才定位到原因

通过定位我发现对于有些特别小的目标区域（或者说标注有问题的数据），通过pyclipper收缩后得到的result是一个空列表。下面是一个简易的测试脚本：

import pyclipper

subject = [(179.67479414146868, 330.7079846112311),
           (179.72345774838013, 331.9309867710583),
           (177.66925958423326, 331.2121355291577),
           (179.2829373650108, 331.5241968142549)]
distance = 0.07327365999299185

padding = pyclipper.PyclipperOffset()
padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
result = padding.Execute(distance)
print(result)  # []
padded_polygon = np.array(padding.Execute(distance)[0])

在PaddleOCR官方源码中，是没有对padding.Execute的结果做判断的，当结果为空列表时，就会引发Error：

PaddleOCR/ppocr/data/imaug/make_border_map.py

Lines 78 to 81 in fb3d36a

    
           padding = pyclipper.PyclipperOffset() 
        
           padding.AddPath(subject, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON) 
        
           padded_polygon = np.array(padding.Execute(distance)[0])

简单修复方案

将：

padded_polygon = np.array(padding.Execute(distance)[0])

改成：

result = padding.Execute(distance)
if len(result) == 0:
    return
padded_polygon = np.array(result[0])

如果官方觉得可行，我可以再提个PR，如果有更好的解决办法，就等官方人员进行修复。

复现错误过程

首先在项目根目录创建了一个test.txt标签文件，文件里就一行信息（就是把解析报错的那行信息拿了过来）

./test.jpg	[{"transcription": "\u6d53\u7f29\u9664\u81ed\u6db2", "points": [[473.55, 99.64], [456.18, 41.82], [778.73, 39.82], [777.73, 105.82]]}, {"transcription": "1000ml", "points": [[476.27, 158.73], [477.27, 129.09], [618.55, 124.09], [618.55, 158.73]]}, {"transcription": "\u62b510\u74f6", "points": [[647.55, 121.64], [652.55, 165.64], [771.09, 166.64], [773.09, 121.64]]}, {"transcription": "\u9001", "points": [[691.82, 347.45], [690.82, 437.36], [768.55, 426.36], [777.0, 345.36]]}, {"transcription": "YaHo\u4e9a\u79be", "points": [[94.0, 289.0], [94.0, 305.73], [164.73, 305.73], [164.73, 287.0]]}, {"transcription": "YaHo\u4e9a\u79be", "points": [[242.55, 290.0], [242.55, 303.27], [317.45, 303.27], [316.45, 287.0]]}, {"transcription": "YaHo\u4e9a\u79be", "points": [[650.55, 476.36], [651.55, 485.82], [694.91, 486.82], [695.91, 477.36]]}, {"transcription": "Disiuf", "points": [[48.36, 325.55], [46.55, 359.09], [154.55, 363.45], [156.55, 330.55]]}, {"transcription": "spray", "points": [[61.45, 362.73], [61.45, 378.91], [123.27, 377.0], [121.27, 361.91]]}, {"transcription": "spray", "points": [[211.73, 360.27], [214.73, 377.73], [272.64, 377.73], [269.64, 360.27]]}, {"transcription": "Disiufectaut", "points": [[198.73, 324.55], [199.73, 361.82], [387.0, 357.82], [390.0, 328.55]]}, {"transcription": "\u5ba0\u7269\u9664\u81ed\u6db2", "points": [[271.64, 379.64], [272.64, 400.0], [369.82, 399.0], [371.82, 381.64]]}, {"transcription": "\u5ba0\u7269", "points": [[125.73, 379.64], [122.73, 399.27], [153.82, 401.27], [154.82, 381.64]]}, {"transcription": "\u6d53\u7f29\u578b", "points": [[63.82, 380.82], [67.82, 397.55], [116.73, 399.55], [114.73, 382.82]]}, {"transcription": "\u6d53\u7f29\u578b", "points": [[216.91, 382.55], [214.91, 401.27], [267.27, 399.27], [269.27, 383.55]]}, {"transcription": "\u8309\u8389\u82b1\u82ac\u82b3", "points": [[63.18, 422.09], [61.18, 429.36], [104.82, 429.36], [105.82, 422.09]]}, {"transcription": "\u8309\u8389\u82b1\u82ac\u82b3", "points": [[211.73, 421.09], [211.73, 429.82], [256.64, 430.82], [254.64, 421.09]]}, {"transcription": "\u51c0\u542b\u91cf\uff1a1000ML", "points": [[63.18, 442.91], [61.18, 454.55], [144.36, 453.55], [143.36, 442.91]]}, {"transcription": "\u51c0\u542b\u91cf\uff1a1000ML", "points": [[216.18, 444.64], [213.18, 457.27], [294.91, 454.27], [296.91, 445.64]]}, {"transcription": "\u51c0\u542b\u91cf", "points": [[703.8, 615.47], [705.8, 621.8], [721.4, 620.8], [723.4, 615.47]]}, {"transcription": "500ML", "points": [[704.2, 622.4], [702.2, 626.67], [720.2, 627.67], [720.2, 622.4]]}, {"transcription": "\u8309\u8389\u82b1\u82ac\u82b3", "points": [[661.07, 546.13], [661.07, 553.53], [689.93, 553.53], [691.93, 547.13]]}, {"transcription": "\u5ba0\u7269\u795b\u5473\u55b7\u96fe", "points": [[640.8, 537.13], [642.8, 526.87], [713.2, 526.87], [710.2, 537.13]]}, {"transcription": "Healthy", "points": [[62.27, 408.93], [64.27, 419.6], [101.07, 417.6], [103.07, 408.93]]}, {"transcription": "Antiscptic", "points": [[104.27, 408.53], [104.27, 420.67], [152.33, 420.67], [154.33, 408.53]]}, {"transcription": "###", "points": [[213.13, 406.93], [215.13, 417.0], [249.33, 417.0], [249.33, 406.93]]}, {"transcription": "Antiscptic&DeodorantForPet", "points": [[253.13, 417.0], [253.6, 408.47], [408.2, 407.93], [401.27, 418.07]]}, {"transcription": "Healthy", "points": [[224.2, 406.4], [224.2, 407.93], [221.67, 406.93], [223.67, 407.4]]}, {"transcription": "DEODORANT", "points": [[627.0, 505.07], [627.0, 496.47], [724.53, 495.47], [724.53, 505.07]]}, {"transcription": "SPRAY", "points": [[651.47, 519.87], [651.47, 509.2], [698.0, 508.2], [702.0, 518.87]]}, {"transcription": "\u4e70\u4e00\u9001\u4e00", "points": [[27.07, 790.8], [12.93, 645.2], [484.67, 631.93], [450.8, 786.27]]}, {"transcription": "###", "points": [[123.93, 700.2], [121.93, 700.73], [124.47, 700.73], [124.47, 700.2]]}, {"transcription": "\u5206\u89e3\u81ed\u5473", "points": [[515.0, 793.07], [514.0, 710.53], [786.0, 700.53], [793.0, 784.07]]}, {"transcription": "\u9001\u9664\u81ed\u55b7\u96fe500ml", "points": [[522.2, 698.0], [522.2, 666.47], [786.87, 662.47], [788.87, 697.0]]}]

接着执行以下代码

from PIL import Image
from tools.program import load_config, get_logger
from ppocr.data.simple_dataset import SimpleDataSet

# 创建一个和mtwi/train/TB1_5H8n3vD8KJjy0FlXXagBFXa_!!0-item_pic.jpg.jpg一样大小的图片，这样就不用去下载数据集
img = Image.new('RGB', size=(800, 800))
img.save("test.jpg")

config_path = "./configs/det/det_mv3_db.yml"
global_config = load_config(config_path)
global_config["Train"]["dataset"]["data_dir"] = "./"
global_config["Train"]["dataset"]["label_file_list"] = ["test.txt"]
logging = get_logger(name="root")
dataset = SimpleDataSet(global_config, "Train", logging, seed=0)
result = dataset[0]

在执行上述代码时，会发现有时会报错，有时不会报错，猜测是数据增强引入的随机性，进一步分析：

transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - DetLabelEncode: # Class handling label
      - IaaAugment:
#          augmenter_args:
#            - { 'type': Fliplr, 'args': { 'p': 0.5 } }
#            - { 'type': Affine, 'args': { 'rotate': [-10, 10] } }
#            - { 'type': Resize, 'args': { 'size': [0.5, 3] } }
      - EastRandomCropData:
          size: [640, 640]
          max_tries: 50
          keep_ratio: true
      - MakeBorderMap:
          shrink_ratio: 0.4
          thresh_min: 0.3
          thresh_max: 0.7
      - MakeShrinkMap:
          shrink_ratio: 0.4
          min_text_size: 8
      - NormalizeImage:
          scale: 1./255.
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
          order: 'hwc'

首先将augmenter_args关闭，发现问题依旧存在。但当我把EastRandomCropData给关闭后，问题就无法复现了。可以确定是EastRandomCropData的随机性导致的。查看EastRandomCropData的__call__方法里所采用的np.random方法并没有固定随机数种子，接着我在random_crop_data.py文件中把随机数种子固定成1，这样就能保证问题每次都能复现了。顺便建议下，希望在代码里提供固定随机数种子的方法，方便大家复现错误。

import numpy as np
import cv2
import random

np.random.seed(1)

The text was updated successfully, but these errors were encountered:

vineethbabu · 2021-12-29T06:30:26Z

Hi, facing this issue currently, was able run table structure recognition without any errors few days back but now getting this error:

Traceback (most recent call last): File "table/predict_table.py", line 221, in <module> main(args) File "table/predict_table.py", line 197, in main pred_html = text_sys(img) File "table/predict_table.py", line 88, in __call__ rec_res, elapse = self.text_recognizer(img_crop_list) File "/home/vineeth/Documents/PaddleOCR/tools/infer/predict_rec.py", line 368, in __call__ rec_result = self.postprocess_op(preds) File "/home/vineeth/Documents/PaddleOCR/ppocr/postprocess/rec_postprocess.py", line 97, in __call__ text = self.decode(preds_idx, preds_prob, is_remove_duplicate=True) File "/home/vineeth/Documents/PaddleOCR/ppocr/postprocess/rec_postprocess.py", line 69, in decode idx])]) IndexError: list index out of range

Command ran:
python3 table/predict_table.py --det_model_dir=inference/en_ppocr_mobile_v2.0_table_det_infer --rec_model_dir=inference/en_ppocr_mobile_v2.0_table_rec_infer --table_model_dir=inference/en_ppocr_mobile_v2.0_table_structure_infer --image_dir=/home/vineeth/Downloads/J01_crop1.jpg --rec_char_dict_path=../ppocr/utils/dict/table_dict.txt --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --rec_char_dict_path=../ppocr/utils/dict/en_dict.txt --det_limit_side_len=736 --det_limit_type=min --output ../output/table

Thanks in advance.

paddle-bot-old · 2022-04-19T06:34:41Z

Since you haven't replied for more than 3 months, we have closed this issue/pr.
If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up.
It is recommended to pull and try the latest code first.
由于您超过三个月未回复，我们将关闭这个issue/pr。
若问题未解决或有后续问题，请随时重新打开（建议先拉取最新代码进行尝试），我们会继续跟进。

paddle-bot-old bot assigned littletomatodonkey Dec 28, 2021

Evezerest added the help wanted label Dec 28, 2021

This was referenced Dec 29, 2021

PaddleOCR社区常规赛 #4982

Closed

fix parsing line error #5131

Closed

paddle-bot-old bot closed this as completed Apr 19, 2022

an1018 pushed a commit to an1018/PaddleOCR that referenced this issue Aug 17, 2022

update PicoDet and GFL post_process (PaddlePaddle#5101)

df55cb9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing line Error: list index out of range #5101

Parsing line Error: list index out of range #5101

WZMIAOMIAO commented Dec 28, 2021 •

edited

Loading

vineethbabu commented Dec 29, 2021

paddle-bot-old bot commented Apr 19, 2022

Parsing line Error: list index out of range #5101

Parsing line Error: list index out of range #5101

Comments

WZMIAOMIAO commented Dec 28, 2021 • edited Loading

问题描述

简单修复方案

复现错误过程

vineethbabu commented Dec 29, 2021

paddle-bot-old bot commented Apr 19, 2022

WZMIAOMIAO commented Dec 28, 2021 •

edited

Loading