Skip to content

Add paddlemodel #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 29 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,30 @@ pip install layoutparser[ocr]

**For Windows Users:** Please read [installation.md](installation.md) for details about installing Detectron2.

## **Recent updates**

2021.6.8 Update new detection model (PaddleDetection) and ocr model (PaddleOCR).

```Python
# Install PaddlePaddle
# CUDA10.1
python -m pip install paddlepaddle-gpu==2.1.0.post101 -f https://paddlepaddle.org.cn/whl/mkl/stable.html
# CPU
python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple

# Install the paddle ocr components when necessary
pip install layoutparser[paddleocr]
```

For more PaddlePaddle CUDA version or environment to quick install, please refer to the [PaddlePaddle Quick Installation document](https://www.paddlepaddle.org.cn/install/quick)

## Quick Start

We provide a series of examples for to help you start using the layout parser library:

1. [Table OCR and Results Parsing](https://github.com/Layout-Parser/layout-parser/blob/master/examples/OCR%20Tables%20and%20Parse%20the%20Output.ipynb): `layoutparser` can be used for conveniently OCR documents and convert the output in to structured data.

2. [Deep Layout Parsing Example](https://github.com/Layout-Parser/layout-parser/blob/master/examples/Deep%20Layout%20Parsing.ipynb): With the help of Deep Learning, `layoutparser` supports the analysis very complex documents and processing of the hierarchical structure in the layouts.
3. [Deep Layout Parsing using Paddle](examples/Deep%20Layout%20Parsing%20using%20Paddle.ipynb): `layoutparser` supports the analysis very complex documents and processing of the hierarchical structure in the layouts Using Paddle models.


## DL Assisted Layout Prediction Example
Expand All @@ -63,6 +80,17 @@ With only 4 lines of code in `layoutparse`, you can unlock the information from
>>> lp.draw_box(image, layout,) # With extra configurations
```

Use PaddleDetection model:

```python
>>> import layoutparser as lp
>>> model = lp.PaddleDetectionLayoutModel('lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config')
>>> layout = model.detect(image) # You need to load the image somewhere else, e.g., image = cv2.imread(...)
>>> lp.draw_box(image, layout,) # With extra configurations
```

If you want to train Paddledetection model yourself, please refer to:[Train_PaddleDetection_model.md](docs/notes/Train_PaddleDetection_model.md)

## Contributing

We encourage you to contribute to Layout Parser! Please check out the [Contributing guidelines](.github/CONTRIBUTING.md) for guidelines about how to proceed. Join us!
Expand Down
4 changes: 3 additions & 1 deletion dev-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,6 @@ sphinx_rtd_theme
google-cloud-vision==1
pytesseract
pycocotools
git+https://github.com/facebookresearch/detectron2.git@v0.4#egg=detectron2
git+https://github.com/facebookresearch/detectron2.git@v0.4#egg=detectron2
paddlepaddle==2.1.0
paddleocr>=2.0.1
142 changes: 142 additions & 0 deletions docs/notes/Train_PaddleDetection_model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
## Install Requirements:
Copy link
Member

@lolipopshock lolipopshock Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have external documentations for training the Paddle models - we can add a link to that. But this should not be included in the layout parser documentation.


- PaddlePaddle 2.1
- OS 64 bit
- Python 3(3.5.1+/3.6/3.7/3.8/3.9),64 bit
- pip/pip3(9.0.1+), 64 bit
- CUDA >= 10.1
- cuDNN >= 7.6

## Install PaddleDetection

```bash
# Clone PaddleDetection repository
cd <path/to/clone/PaddleDetection>
git clone https://github.com/PaddlePaddle/PaddleDetection.git

cd PaddleDetection
# Install other dependencies
pip install -r requirements.txt
```

## Prepare Dataset

Download [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet):

```bash
cd PaddleDetection/dataset/
mkdir publaynet
# download dataset
wget https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/publaynet.tar.gz?_ga=2.104193024.1076900768.1622560733-649911202.1622560733

tar -xvf publaynet.tar.gz
```

Folder structure:

| File or Folder | Description | num |
| :------------- | :----------------------------------------------- | ------- |
| `train/` | Images in the training subset | 335,703 |
| `val/` | Images in the validation subset | 11,245 |
| `test/` | Images in the testing subset | 11,405 |
| `train.json` | Annotations for training images | |
| `val.json` | Annotations for validation images | |
| `LICENSE.txt` | Plaintext version of the CDLA-Permissive license | |
| `README.txt` | Text file with the file names and description | |

## Modify Config Files

Use `configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml` config for training:

<div align='center'>
<img src='../../examples/data/PaddleDetection_config.png' width='600px'/>
</div>


From the figure above, `ppyolov2_r50vd_dcn_365e_coco.yml` the config depends on other config files:

```
coco_detection.yml:mainly explains the path of training data and verification data

runtime.yml:describes common runtime parameters, such as whether to use a GPU, and how many Epoch checkpoints to store per Epoch,etc.

optimizer_365e.yml:mainly explains learning rate and optimizer.

ppyolov2_r50vd_dcn.yml:mainly explains the model, and the trunk network.

ppyolov2_reader.yml:mainly explains the configuration of data reader, such as batch size, number of concurrent loading child processes, etc, and post-read preprocessing operations, such as resize, data enhancement, etc
```

You will need to modify the above configuration file according to the actual situation.

## Train

* Perform evaluation in training

```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --eval
```

Notice: If you encounter "`Out of memory error`" problem, try reducing batch size in `ppyolov2_reader.yml` file.

* Fine-tune other task

When using pre-trained model to fine-tune other task, pretrain_weights can be used directly. The parameters with different shape will be ignored automatically. For example:

```bash
export CUDA_VISIBLE_DEVICES=0,1,2,3
# If the shape of parameters in program is different from pretrain_weights,
# then PaddleDetection will not use such parameters.
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml \
-o pretrain_weights=output/ppyolov2_r50vd_dcn_365e_coco/model_final \
```

## Inference

- Output specified directory && Set up threshold

```
export CUDA_VISIBLE_DEVICES=0
python tools/infer.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml \
--infer_img=demo/000000570688.jpg \
--output_dir=infer_output/ \
--draw_threshold=0.5 \
-o weights=output/ppyolov2_r50vd_dcn_365e_coco/model_final \
--use_vdl=Ture
```

`--draw_threshold` is an optional argument. Default is 0.5. Different thresholds will produce different results depending on the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659).

## Inference and deployment

### Export model for inference

```bash
python tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --output_dir=./inference \
-o weights=output/ppyolov2_r50vd_dcn_365e_coco/model_final.pdparams
```

* -c:config file
* --output_dir:model save dir

The prediction model is exported to the directory 'inference/ppyolov2_r50vd_dcn_365e_coco', respectively:`infer_cfg.yml`, `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`

More Info:https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/deploy/EXPORT_MODEL.md

### Python inference

```bash
python deploy/python/infer.py --model_dir=./inference/ppyolov2_r50vd_dcn_365e_coco --image_file=./demo/road554.png --use_gpu=True
```

* --model_dir:the previous step exported model dir
* --image_file:inference image name
* --use_gpu:whether use gpu

More Info:https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1/deploy/python

C++ infernece:https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/deploy/cpp



63 changes: 50 additions & 13 deletions docs/notes/modelzoo.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

We provide a spectrum of pre-trained models on different datasets.

## Example Usage:
## Example Usage using Detectron2:

```python
import layoutparser as lp
Expand All @@ -14,22 +14,59 @@ model = lp.Detectron2LayoutModel(
model.detect(image)
```

## Example Usage using PaddleDetection:

```python
import layoutparser as lp
model = lp.PaddleDetectionLayoutModel(
config_path="lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config", # In model catalog
label_map ={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}, # In model`label_map`
threshold =0.5] # Optional
)
model.detect(image)
```

## Model Catalog

| Dataset | Model | Config Path | Eval Result (mAP) |
|-----------------------------------------------------------------------|--------------------------------------------------------------------------------------------|--------------------------------------------------------|---------------------------------------------------------------------------|
| [HJDataset](https://dell-research-harvard.github.io/HJDataset/) | [faster_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/j4yseny2u0hn22r/config.yml?dl=1) | lp://HJDataset/faster_rcnn_R_50_FPN_3x/config | |
| [HJDataset](https://dell-research-harvard.github.io/HJDataset/) | [mask_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/4jmr3xanmxmjcf8/config.yml?dl=1) | lp://HJDataset/mask_rcnn_R_50_FPN_3x/config | |
| [HJDataset](https://dell-research-harvard.github.io/HJDataset/) | [retinanet_R_50_FPN_3x](https://www.dropbox.com/s/z8a8ywozuyc5c2x/config.yml?dl=1) | lp://HJDataset/retinanet_R_50_FPN_3x/config | |
| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet) | [faster_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/f3b12qc4hc0yh4m/config.yml?dl=1) | lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config | |
| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet) | [mask_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/u9wbsfwz4y0ziki/config.yml?dl=1) | lp://PubLayNet/mask_rcnn_R_50_FPN_3x/config | |
| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet) | [mask_rcnn_X_101_32x8d_FPN_3x](https://www.dropbox.com/s/nau5ut6zgthunil/config.yaml?dl=1) | lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config | 88.98 [eval.csv](https://www.dropbox.com/s/15ytg3fzmc6l59x/eval.csv?dl=0) |
| [PrimaLayout](https://www.primaresearch.org/dataset/) | [mask_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/yc92x97k50abynt/config.yaml?dl=1) | lp://PrimaLayout/mask_rcnn_R_50_FPN_3x/config | 69.35 [eval.csv](https://www.dropbox.com/s/9uuql57uedvb9mo/eval.csv?dl=0) |
| [NewspaperNavigator](https://news-navigator.labs.loc.gov/) | [faster_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/wnido8pk4oubyzr/config.yml?dl=1) | lp://NewspaperNavigator/faster_rcnn_R_50_FPN_3x/config | |
| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) | [faster_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/7cqle02do7ah7k4/config.yaml?dl=1) | lp://TableBank/faster_rcnn_R_50_FPN_3x/config | 89.78 [eval.csv](https://www.dropbox.com/s/1uwnz58hxf96iw2/eval.csv?dl=0) |
| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) | [faster_rcnn_R_101_FPN_3x](https://www.dropbox.com/s/h63n6nv51kfl923/config.yaml?dl=1) | lp://TableBank/faster_rcnn_R_101_FPN_3x/config | 91.26 [eval.csv](https://www.dropbox.com/s/e1kq8thkj2id1li/eval.csv?dl=0) |
| Dataset | Model | Config Path | Eval Result (mAP) |
| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| [HJDataset](https://dell-research-harvard.github.io/HJDataset/) | [faster_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/j4yseny2u0hn22r/config.yml?dl=1) | lp://HJDataset/faster_rcnn_R_50_FPN_3x/config | |
| [HJDataset](https://dell-research-harvard.github.io/HJDataset/) | [mask_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/4jmr3xanmxmjcf8/config.yml?dl=1) | lp://HJDataset/mask_rcnn_R_50_FPN_3x/config | |
| [HJDataset](https://dell-research-harvard.github.io/HJDataset/) | [retinanet_R_50_FPN_3x](https://www.dropbox.com/s/z8a8ywozuyc5c2x/config.yml?dl=1) | lp://HJDataset/retinanet_R_50_FPN_3x/config | |
| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet) | [faster_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/f3b12qc4hc0yh4m/config.yml?dl=1) | lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config | |
| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet) | [mask_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/u9wbsfwz4y0ziki/config.yml?dl=1) | lp://PubLayNet/mask_rcnn_R_50_FPN_3x/config | |
| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet) | [mask_rcnn_X_101_32x8d_FPN_3x](https://www.dropbox.com/s/nau5ut6zgthunil/config.yaml?dl=1) | lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config | 88.98 [eval.csv](https://www.dropbox.com/s/15ytg3fzmc6l59x/eval.csv?dl=0) |
| [PubLayNet](https://github.com/ibm-aur-nlp/PubLayNet) | [ppyolov2_r50vd_dcn_365e_publaynet](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_publaynet.tar) | lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config | 93.6 [eval.csv](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/eval_publaynet.csv) |
| [PrimaLayout](https://www.primaresearch.org/dataset/) | [mask_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/yc92x97k50abynt/config.yaml?dl=1) | lp://PrimaLayout/mask_rcnn_R_50_FPN_3x/config | 69.35 [eval.csv](https://www.dropbox.com/s/9uuql57uedvb9mo/eval.csv?dl=0) |
| [NewspaperNavigator](https://news-navigator.labs.loc.gov/) | [faster_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/wnido8pk4oubyzr/config.yml?dl=1) | lp://NewspaperNavigator/faster_rcnn_R_50_FPN_3x/config | |
| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) | [faster_rcnn_R_50_FPN_3x](https://www.dropbox.com/s/7cqle02do7ah7k4/config.yaml?dl=1) | lp://TableBank/faster_rcnn_R_50_FPN_3x/config | 89.78 [eval.csv](https://www.dropbox.com/s/1uwnz58hxf96iw2/eval.csv?dl=0) |
| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) | [faster_rcnn_R_101_FPN_3x](https://www.dropbox.com/s/h63n6nv51kfl923/config.yaml?dl=1) | lp://TableBank/faster_rcnn_R_101_FPN_3x/config | 91.26 [eval.csv](https://www.dropbox.com/s/e1kq8thkj2id1li/eval.csv?dl=0) |
| [TableBank](https://doc-analysis.github.io/tablebank-page/index.html) | [ppyolov2_r50vd_dcn_365e_tableBank_word](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/ppyolov2_r50vd_dcn_365e_tableBank_word.tar) | lp://TableBank/ppyolov2_r50vd_dcn_365e_tableBank_word/config | 96.2 [eval.csv](https://paddle-model-ecology.bj.bcebos.com/model/layout-parser/eval_tablebank.csv) |

* For PubLayNet models, we suggest using `mask_rcnn_X_101_32x8d_FPN_3x` model as it's trained on the whole training set, while others are only trained on the validation set (the size is only around 1/50). You could expect a 15% AP improvement using the `mask_rcnn_X_101_32x8d_FPN_3x` model.
* []()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一行,没用的话就删掉吧


* Compare the time cost of **Detectron2** and **PaddleDetection**(ppyolov2_* models in the above table):

PublayNet Dataset:

| Model | CPU time cost | GPU time cost |
| --------------- | ------------- | ------------- |
| Detectron2 | 16545.5ms | 209.5ms |
| PaddleDetection | 1713.7ms | 66.6ms |

TableBank Dataset:

| Model | CPU time cost | GPU time cost |
| --------------- | ------------- | ------------- |
| Detectron2 | 7623.2ms | 104.2.ms |
| PaddleDetection | 1968.4ms | 65.1ms |

**Envrionment:**

​ **GPU: **a single NVIDIA Tesla P40

​ **CPU:** Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz,24core

## Model `label_map`

Expand Down
Loading