Skip to content

Commit

Permalink
Modify picture numbers
Browse files Browse the repository at this point in the history
Modify picture numbers in `文本检测实践篇.ipynb` and `text_detection_practice.ipynb`
  • Loading branch information
RangeKing committed Jan 27, 2022
1 parent 713c4ea commit 5b89f58
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 16 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
"<center><img src=\"https://ai-studio-static-online.cdn.bcebos.com/4e31d512f7e147d4847cb1a0ee27a8260ef05506c9254fc1b19137bab1831ac8\"\n",
"width=\"200\", height=\"400\" ></center>\n",
"\n",
"<br><center>图0 12.jpg </center>\n",
"<br><center>图1 ./12.jpg </center>\n",
"\n",
"```\n",
"[[79.0, 555.0], [398.0, 542.0], [399.0, 571.0], [80.0, 584.0]]\n",
Expand Down Expand Up @@ -237,7 +237,7 @@
"\n",
"\n",
"<center><img src=\"https://ai-studio-static-online.cdn.bcebos.com/5eabdb59916a4267a049e5440f5093a63b6bfac9010844fb971aad0607d455a1\" width = \"600\"></center>\n",
"<center><br>图1 DB模型与其他方法的区别</br></center>\n",
"<center><br>图2 DB模型与其他方法的区别</br></center>\n",
"<br></br>\n",
"\n",
"基于分割的普通文本检测算法其流程如上图中的蓝色箭头所示,此类方法得到分割结果之后采用一个固定的阈值得到二值化的分割图,之后采用诸如像素聚类的启发式算法得到文本区域。\n",
Expand Down Expand Up @@ -280,15 +280,15 @@
"可以发现,增强因子会放大错误预测的梯度,从而优化模型得到更好的结果。**图2(b)** 中,$x<0$ 的部分为正样本预测为负样本的情况,可以看到,增益因子k将梯度进行了放大;而 **图2(c)** 中$x>0$ 的部分为负样本预测为正样本时,梯度同样也被放大了。\n",
"\n",
"<center><img src=\"https://ai-studio-static-online.cdn.bcebos.com/29255d870bd74403af37c8f88cb10ebca0c3117282614774a3d607efc8be8c84\" width = \"600\"></center>\n",
"<center><br>图3DB算法示意图</br></center>\n",
"<center><br>图3 DB算法示意图</br></center>\n",
"<br></br>\n",
"\n",
"\n",
"\n",
"DB算法整体结构如下图所示:\n",
"\n",
"<center><img src=\"https://ai-studio-static-online.cdn.bcebos.com/6e1f293e9a1f4c90b6c26919f16b95a4a85dcf7be73f4cc99c9dc5477bb956e6\" width = \"1000\"></center>\n",
"<center><br>图3 DB模型网络结构示意图</br></center>\n",
"<center><br>图4 DB模型网络结构示意图</br></center>\n",
"<br></br>\n",
"\n",
"输入的图像经过网络Backbone和FPN提取特征,提取后的特征级联在一起,得到原图四分之一大小的特征,然后利用卷积层分别得到文本区域预测概率图和阈值图,进而通过DB的后处理得到文本包围曲线。\n"
Expand Down Expand Up @@ -1001,7 +1001,7 @@
"本次实验选取了场景文本检测和识别(Scene Text Detection and Recognition)任务最知名和常用的数据集ICDAR2015。icdar2015数据集的示意图如下图所示:\n",
"\n",
"<center><img src=\"https://ai-studio-static-online.cdn.bcebos.com/e1b06e0c8e904a2aa412e9eea41f45cce3d58543232948fa88200298fd3cd2e4\" width = \"600\"></center>\n",
"<center><br>图4 icdar2015数据集示意图 </br></center>\n",
"<center><br>图5 icdar2015数据集示意图 </br></center>\n",
"<br></br>\n",
"\n",
"该项目中已经下载了icdar2015数据集,存放在 /home/aistudio/data/data96799 中,可以运行如下指令完成数据集解压,或者从链接中自行下载。"
Expand Down Expand Up @@ -2768,7 +2768,7 @@
"2. 实验场景中遇到此类问题,该如何优化模型?\n",
"\n",
"<center><img src=\"https://ai-studio-static-online.cdn.bcebos.com/e23f47c7c39f4b92bb494444d3724758401cd9810a8d469690093857c7f05d9e\" width = \"600\"></center>\n",
"<center><br>图5 GT框与预测框的标注示例 </br></center>\n",
"<center><br>图6 GT框与预测框的标注示例 </br></center>\n",
"<br></br>\n",
"\n",
"\n"
Expand Down Expand Up @@ -3323,7 +3323,7 @@
"\n",
"<center><img src=\"https://ai-studio-static-online.cdn.bcebos.com/ccf08d89e0974a848e1a929eefbc1e7176ba211121cb4b76a6b6501cb27b1c9f\" width = \"500\"></center>\n",
"\n",
"<center><br>图6 文字区域漏检测 </br></center>\n",
"<center><br>图7 文字区域漏检测 </br></center>\n",
"<br></br>\n",
"\n",
"上述问题表现检测了一部分文字,但是文本预测框和GT框的IOU大于阈值0.5,检测指标无法正常反馈出来,如果此类结果较多,建议增大IOU阈值。另外,漏检测的本质原因在于,一部分文字的特征没有响应,归根结底是网络没有学习到漏检测部分文字的特征。建议具体问题具体分析,可视化预测结果分析漏检测的原因,是否是因为光照,形变,文字较长等因素导致的,然后针对性的使用数据增强、调整网络、或者调整后处理等方法优化检测结果。\n",
Expand All @@ -3349,7 +3349,7 @@
"<center><img src=\"https://ai-studio-static-online.cdn.bcebos.com/1b68e58548f94603854cd342602686bc4f0ada68b6214e3986f332816164c518\"\n",
"width = \"700\"></center>\n",
"\n",
"<center><br>图7 det_data_lesson_demo训练数据示例 </br></center>\n",
"<center><br>图8 det_data_lesson_demo训练数据示例 </br></center>\n",
"<br></br>\n",
"\n"
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@
"Using the installed `paddleocr` to predict the input image `./12.jpg`, the following results will be obtained:\n",
"\n",
"<center><img src=\"https://ai-studio-static-online.cdn.bcebos.com/4e31d512f7e147d4847cb1a0ee27a8260ef05506c9254fc1b19137bab1831ac8\" width=\"200\", height=\"400\" ></center>\n",
"<br><center>Figure 0: 12.jpg </center>\n",
"<br><center>Figure 1: ./12.jpg </center>\n",
"\n",
"```\n",
"[[79.0, 555.0], [398.0, 542.0], [399.0, 571.0], [80.0, 584.0]]\n",
Expand Down Expand Up @@ -238,7 +238,7 @@
"\n",
"\n",
"<center><img src=\"https://ai-studio-static-online.cdn.bcebos.com/5eabdb59916a4267a049e5440f5093a63b6bfac9010844fb971aad0607d455a1\" width = \"600\"></center>\n",
"<center><br>Figure 1: The difference between DB model and other methods</br></center>\n",
"<center><br>Figure 2: The difference between DB model and other methods</br></center>\n",
"<br></br>\n",
"\n",
"The process of the segmentation-based ordinary text detection algorithm is shown by the blue arrow in the above figure. After this method obtains the segmentation result, a fixed threshold is used to obtain the binarized segmentation map, and then heuristic algorithms such as pixel clustering are used to obtain Text area.\n",
Expand Down Expand Up @@ -281,15 +281,15 @@
"It can be found that the enhancement factor will amplify the gradient of the error prediction, thereby optimizing the model to obtain better results. **Figure 2(b)**, the part of $x<0$ is the case where positive samples are predicted to be negative samples. It can be seen that the gain factor k amplifies the gradient; while **Figure 2(c)** The part where $x>0$ is a negative sample is predicted to be a positive sample, the gradient is also magnified.\n",
"\n",
"<center><img src=\"https://ai-studio-static-online.cdn.bcebos.com/29255d870bd74403af37c8f88cb10ebca0c3117282614774a3d607efc8be8c84\" width = \"600\"></center>\n",
"<center><br>Figure 2: Schematic diagram of DB algorithm</br></center>\n",
"<center><br>Figure 3: Schematic diagram of DB algorithm</br></center>\n",
"<br></br>\n",
"\n",
"\n",
"\n",
"The overall structure of the DB algorithm is shown in the figure below:\n",
"\n",
"<center><img src=\"https://ai-studio-static-online.cdn.bcebos.com/6e1f293e9a1f4c90b6c26919f16b95a4a85dcf7be73f4cc99c9dc5477bb956e6\" width = \"1000\"></center>\n",
"<center><br>Figure 3: Schematic diagram of DB model network structure</br></center>\n",
"<center><br>Figure 4: Schematic diagram of DB model network structure</br></center>\n",
"<br></br>\n",
"\n",
"The input image is extracted through the network Backbone and FPN to extract features, and the extracted features are cascaded together to obtain a quarter-size feature of the original image. Then, the convolutional layer is used to obtain the text area prediction probability map and the threshold map respectively, and then through the DB The post-processing to get the text enclosing curve.\n"
Expand Down Expand Up @@ -1008,7 +1008,7 @@
"This experiment selected ICDAR2015, the most well-known and commonly used data set for Scene Text Detection and Recognition tasks. The schematic diagram of the icdar2015 data set is shown in the figure below:\n",
"\n",
"<center><img src=\"https://ai-studio-static-online.cdn.bcebos.com/e1b06e0c8e904a2aa412e9eea41f45cce3d58543232948fa88200298fd3cd2e4\" width = \"600\"></center>\n",
"<center><br>Figure 4: icdar2015 data set schematic diagram </br></center>\n",
"<center><br>Figure 5: icdar2015 data set schematic diagram </br></center>\n",
"<br></br>\n",
"\n",
"The icdar2015 data set has been downloaded in this project and stored in /home/aistudio/data/data96799. You can run the following command to decompress the data set, or download it yourself from the link."
Expand Down Expand Up @@ -2803,7 +2803,7 @@
"2. When encountering such problems in the experimental scenario, how to optimize the model?\n",
"\n",
"<center><img src=\"https://ai-studio-static-online.cdn.bcebos.com/e23f47c7c39f4b92bb494444d3724758401cd9810a8d469690093857c7f05d9e\" width = \"600\"></center>\n",
"<center><br>Figure 5: Example of labeling of GT box and prediction box</br></center>\n",
"<center><br>Figure 6: Example of labeling of GT box and prediction box</br></center>\n",
"<br></br>\n",
"\n",
"\n"
Expand Down Expand Up @@ -3359,7 +3359,7 @@
"\n",
"<center><img src=\"https://ai-studio-static-online.cdn.bcebos.com/ccf08d89e0974a848e1a929eefbc1e7176ba211121cb4b76a6b6501cb27b1c9f\" width = \"500\"></center>\n",
"\n",
"<center><br>Figure 6: Text area missing detection </br></center>\n",
"<center><br>Figure 7: Text area missing detection </br></center>\n",
"<br></br>\n",
"\n",
"The above problem shows that a part of the text is detected, but the IOU of the text prediction box and the GT box is greater than the threshold 0.5, and the detection indicators cannot be fed back normally. If there are many such results, it is recommended to increase the IOU threshold. In addition, the essential reason for the missed detection is that the features of some texts do not respond. In the final analysis, the network has not learned the features of the missed detection of some texts. It is recommended to analyze specific problems in detail, visualize the prediction results to analyze the reasons for the missed detection, whether it is caused by factors such as lighting, deformation, long text, etc., and then use data enhancement, network adjustment, or post-processing adjustments to optimize the detection results .\n",
Expand All @@ -3383,7 +3383,7 @@
"<center><img src=\"https://ai-studio-static-online.cdn.bcebos.com/1b68e58548f94603854cd342602686bc4f0ada68b6214e3986f332816164c518\"\n",
"width = \"700\"></center>\n",
"\n",
"<center><br>Figure 7: det_data_lesson_demo training data example </br></center>\n",
"<center><br>Figure 8: det_data_lesson_demo training data example </br></center>\n",
"<br></br>\n"
]
}
Expand Down

0 comments on commit 5b89f58

Please sign in to comment.