Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporter Custom Models Fix #77

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

rjwb1
Copy link

@rjwb1 rjwb1 commented Nov 30, 2022

Correctly applies params from the model cfg to the onnx exporter

@GuillaumeAnoufa
Copy link

GuillaumeAnoufa commented Dec 15, 2022

You reversed VOXEL_SIZE_X and VOXEL_SIZE_Y in the definition of simplify_preprocess

Defined in simplifier_onnx.py as:
def simplify_preprocess(onnx_model, VOXEL_SIZE_Y, VOXEL_SIZE_X, MAX_POINTS_PER_VOXEL):
Called in exporter.py with:
onnx_final = simplify_preprocess(onnx_simp, VOXEL_SIZE_X, VOXEL_SIZE_Y, MAX_POINTS_PER_VOXEL)

@rjwb1
Copy link
Author

rjwb1 commented Dec 15, 2022

@GuillaumeAnoufa I will update this, I was rushing and didn't realise as my cloud is square

@rjwb1
Copy link
Author

rjwb1 commented Dec 22, 2022

@GuillaumeAnoufa I have fixed this. I reversed it twice so it actually should not of affected the final model but better to have correctly named variables/args...

@Allamrahul
Copy link

Hi, I have used this commit to successfully export my model to onnx format. However, when I perform predictions using TensorRT , I am seeing different results when compared to when I just do eval on the trained pth file. More about my issue can be found in #82. Please let me know if I am missing something.

@rjwb1
Copy link
Author

rjwb1 commented Feb 6, 2023

Hi @Allamrahul have you verified the pointcloud information is being loaded correctly and in the right order?

@Allamrahul
Copy link

Allamrahul commented Feb 6, 2023

Could you further elaborate if possible? What do you mean by right order? I was able to use my custom data, train the model to detect a single object, validate the results using the demo.py file: The boxes look right on the eval set and the results look really good. Post that, I tried to export but I realized that everything in the export script was hardcoded for 3 classes. I then referred your PR, made those changes, and thankfully, they unblocked me and allowed me to export the model. I later moved the generated params.h to include folder and .onnx file to model folder and followed the instrcutions in https://github.com/NVIDIA-AI-IOT/CUDA-PointPillars, under compile and run.

If the point cloud information was not being loaded correctly, I think my results on eval set would have been terrible. I compared the exporter.py, the file responsible for exporting to onnx and demo.py script, the one which performs the eval and helps me visualize predictions on my eval set: both process the data in the same manner.

I am using the following command for export:
image

I have also changed line 157 in main.py to let me predict on .npy files instead of .bin file

If you need further information to guide in the right direction, please let me know.

@Allamrahul
Copy link

Allamrahul commented Feb 6, 2023

exporter.py file


import glob
import onnx
import torch
import argparse
import numpy as np

from pathlib import Path
from onnxsim import simplify
from pcdet.utils import common_utils
from pcdet.models import build_network
from pcdet.datasets import DatasetTemplate
from pcdet.config import cfg, cfg_from_yaml_file

from exporter_paramters import export_paramters as export_paramters
from simplifier_onnx import simplify_preprocess, simplify_postprocess

class DemoDataset(DatasetTemplate):
    def __init__(self, dataset_cfg, class_names, training=True, root_path=None, logger=None, ext='.bin'):
        """
        Args:
            root_path:
            dataset_cfg:
            class_names:
            training:
            logger:
        """
        super().__init__(
            dataset_cfg=dataset_cfg, class_names=class_names, training=training, root_path=root_path, logger=logger
        )
        self.root_path = root_path
        self.ext = ext
        data_file_list = glob.glob(str(root_path / f'*{self.ext}')) if self.root_path.is_dir() else [self.root_path]

        data_file_list.sort()
        self.sample_file_list = data_file_list

    def __len__(self):
        return len(self.sample_file_list)

    def __getitem__(self, index):
        if self.ext == '.bin':
            points = np.fromfile(self.sample_file_list[index], dtype=np.float32).reshape(-1, 4)
        elif self.ext == '.npy':
            points = np.load(self.sample_file_list[index])
        else:
            raise NotImplementedError

        input_dict = {
            'points': points,
            'frame_id': index,
        }

        data_dict = self.prepare_data(data_dict=input_dict)
        return data_dict

def parse_config():
    parser = argparse.ArgumentParser(description='arg parser')
    parser.add_argument('--cfg_file', type=str, default='cfgs/kitti_models/pointpillar.yaml',
                        help='specify the config for demo')
    parser.add_argument('--data_path', type=str, default='demo_data',
                        help='specify the point cloud data file or directory')
    parser.add_argument('--ckpt', type=str, default=None, help='specify the pretrained model')
    parser.add_argument('--ext', type=str, default='.bin', help='specify the extension of your point cloud data file')

    args = parser.parse_args()

    cfg_from_yaml_file(args.cfg_file, cfg)

    return args, cfg

def main():
    args, cfg = parse_config()
    export_paramters(cfg)
    logger = common_utils.create_logger()
    logger.info('------ Convert OpenPCDet model for TensorRT ------')
    demo_dataset = DemoDataset(
        dataset_cfg=cfg.DATA_CONFIG, class_names=cfg.CLASS_NAMES, training=False,
        root_path=Path(args.data_path), ext=args.ext, logger=logger
    )

    model = build_network(model_cfg=cfg.MODEL, num_class=len(cfg.CLASS_NAMES), dataset=demo_dataset)
    model.load_params_from_file(filename=args.ckpt, logger=logger, to_cpu=True)
    model.cuda()
    model.eval()
    np.set_printoptions(threshold=np.inf)
    with torch.no_grad():

      MAX_VOXELS = 10000
      NUMBER_OF_CLASSES = len(cfg.CLASS_NAMES)
      MAX_POINTS_PER_VOXEL = None

      DATA_PROCESSOR = cfg.DATA_CONFIG.DATA_PROCESSOR
      POINT_CLOUD_RANGE = cfg.DATA_CONFIG.POINT_CLOUD_RANGE
      for i in DATA_PROCESSOR:
          if i['NAME'] == "transform_points_to_voxels":
              MAX_POINTS_PER_VOXEL = i['MAX_POINTS_PER_VOXEL']
              VOXEL_SIZES = i['VOXEL_SIZE']
              break

      if MAX_POINTS_PER_VOXEL == None:
          logger.info('Could Not Parse Config... Exiting')
          import sys
          sys.exit()

      VOXEL_SIZE_X = abs(POINT_CLOUD_RANGE[0] - POINT_CLOUD_RANGE[3]) / VOXEL_SIZES[0]
      VOXEL_SIZE_Y = abs(POINT_CLOUD_RANGE[1] - POINT_CLOUD_RANGE[4]) / VOXEL_SIZES[1]

      FEATURE_SIZE_X = VOXEL_SIZE_X / 2  # Is this number of bins?
      FEATURE_SIZE_Y = VOXEL_SIZE_Y / 2

      dummy_voxels = torch.zeros(
          (MAX_VOXELS, MAX_POINTS_PER_VOXEL, 4),
          dtype=torch.float32,
          device='cuda:0')

      dummy_voxel_idxs = torch.zeros(
          (MAX_VOXELS, 4),
          dtype=torch.int32,
          device='cuda:0')

      dummy_voxel_num = torch.zeros(
          (1),
          dtype=torch.int32,
          device='cuda:0')

      dummy_input = dict()
      dummy_input['voxels'] = dummy_voxels
      dummy_input['voxel_num_points'] = dummy_voxel_num
      dummy_input['voxel_coords'] = dummy_voxel_idxs
      dummy_input['batch_size'] = torch.tensor(1)

      torch.onnx.export(model,       # model being run
          dummy_input,               # model input (or a tuple for multiple inputs)
          "./pointpillar_raw.onnx",  # where to save the model (can be a file or file-like object)
          export_params=True,        # store the trained parameter weights inside the model file
          opset_version=11,          # the ONNX version to export the model to
          do_constant_folding=True,  # whether to execute constant folding for optimization
          keep_initializers_as_inputs=True,
          input_names = ['voxels', 'voxel_num', 'voxel_idxs'],   # the model's input names
          output_names = ['cls_preds', 'box_preds', 'dir_cls_preds'], # the model's output names
          )

      onnx_raw = onnx.load("./pointpillar_raw.onnx")  # load onnx model
      onnx_trim_post = simplify_postprocess(onnx_raw, FEATURE_SIZE_X, FEATURE_SIZE_Y, NUMBER_OF_CLASSES)
      
      onnx_simp, check = simplify(onnx_trim_post)
      assert check, "Simplified ONNX model could not be validated"

      onnx_final = simplify_preprocess(onnx_simp, VOXEL_SIZE_X, VOXEL_SIZE_Y, MAX_POINTS_PER_VOXEL)
      onnx.save(onnx_final, "pointpillar.onnx")
      print('finished exporting onnx')

    logger.info('[PASS] ONNX EXPORTED.')

if __name__ == '__main__':
    main()

@Allamrahul
Copy link

Allamrahul commented Feb 6, 2023

simplifier_onnx.py

import onnx
import numpy as np
import onnx_graphsurgeon as gs

@gs.Graph.register()
def replace_with_clip(self, inputs, outputs,  voxel_array):
    for inp in inputs:
        inp.outputs.clear()

    for out in outputs:
        out.inputs.clear()

    op_attrs = dict()
    op_attrs["dense_shape"] =  voxel_array

    return self.layer(name="PPScatter_0", op="PPScatterPlugin", inputs=inputs, outputs=outputs, attrs=op_attrs)

def loop_node(graph, current_node, loop_time=0):
  for i in range(loop_time):
    next_node = [node for node in graph.nodes if len(node.inputs) != 0 and len(current_node.outputs) != 0 and node.inputs[0] == current_node.outputs[0]][0]
    current_node = next_node
  return next_node

def simplify_postprocess(onnx_model, FEATURE_SIZE_X, FEATURE_SIZE_Y, NUMBER_OF_CLASSES):
  print("Use onnx_graphsurgeon to adjust postprocessing part in the onnx...")
  graph = gs.import_onnx(onnx_model)

  cls_preds = gs.Variable(name="cls_preds", dtype=np.float32, shape=(1, int(FEATURE_SIZE_Y), int(FEATURE_SIZE_X), 2 * NUMBER_OF_CLASSES * NUMBER_OF_CLASSES))
  box_preds = gs.Variable(name="box_preds", dtype=np.float32, shape=(1, int(FEATURE_SIZE_Y), int(FEATURE_SIZE_X), 14 * NUMBER_OF_CLASSES))
  dir_cls_preds = gs.Variable(name="dir_cls_preds", dtype=np.float32, shape=(1, int(FEATURE_SIZE_Y), int(FEATURE_SIZE_X), 4 * NUMBER_OF_CLASSES))

  tmap = graph.tensors()
  new_inputs = [tmap["voxels"], tmap["voxel_idxs"], tmap["voxel_num"]]
  new_outputs = [cls_preds, box_preds, dir_cls_preds]

  for inp in graph.inputs:
    if inp not in new_inputs:
      inp.outputs.clear()

  for out in graph.outputs:
    out.inputs.clear()

  first_ConvTranspose_node = [node for node in graph.nodes if node.op == "ConvTranspose"][0]
  concat_node = loop_node(graph, first_ConvTranspose_node, 3)
  assert concat_node.op == "Concat"

  first_node_after_concat = [node for node in graph.nodes if len(node.inputs) != 0 and len(concat_node.outputs) != 0 and node.inputs[0] == concat_node.outputs[0]]

  for i in range(3):
    transpose_node = loop_node(graph, first_node_after_concat[i], 1)
    assert transpose_node.op == "Transpose"
    transpose_node.outputs = [new_outputs[i]]

  graph.inputs = new_inputs
  graph.outputs = new_outputs
  graph.cleanup().toposort()
  
  return gs.export_onnx(graph)


def simplify_preprocess(onnx_model, VOXEL_SIZE_X, VOXEL_SIZE_Y, MAX_POINTS_PER_VOXEL):
  print("Use onnx_graphsurgeon to modify onnx...")
  graph = gs.import_onnx(onnx_model)

  tmap = graph.tensors()
  MAX_VOXELS = tmap["voxels"].shape[0]

  VOXEL_ARRAY = np.array([int(VOXEL_SIZE_X), int(VOXEL_SIZE_Y)])

  
  input_new = gs.Variable(name="voxels", dtype=np.float32, shape=(MAX_VOXELS, MAX_POINTS_PER_VOXEL, 10))


  X = gs.Variable(name="voxel_idxs", dtype=np.int32, shape=(MAX_VOXELS, 4))

 
  Y = gs.Variable(name="voxel_num", dtype=np.int32, shape=(1,))

  first_node_after_pillarscatter = [node for node in graph.nodes if node.op == "Conv"][0]

  first_node_pillarvfe = [node for node in graph.nodes if node.op == "MatMul"][0]

  next_node = current_node = first_node_pillarvfe
  for i in range(6):
    next_node = [node for node in graph.nodes if node.inputs[0] == current_node.outputs[0]][0]
    if i == 5:              # ReduceMax
      current_node.attrs['keepdims'] = [0]
      break
    current_node = next_node

  last_node_pillarvfe = current_node

 
  graph.inputs.append(Y)
  inputs = [last_node_pillarvfe.outputs[0], X, Y]
  outputs = [first_node_after_pillarscatter.inputs[0]]
  graph.replace_with_clip(inputs, outputs,  VOXEL_ARRAY)


  graph.cleanup().toposort()

  graph.inputs = [first_node_pillarvfe.inputs[0] , X, Y]
  graph.outputs = [tmap["cls_preds"], tmap["box_preds"], tmap["dir_cls_preds"]]

  graph.cleanup()
 
  graph.inputs = [input_new, X, Y]
  first_add = [node for node in graph.nodes if node.op == "MatMul"][0]
  first_add.inputs[0] = input_new

  graph.cleanup().toposort()

  return gs.export_onnx(graph)

if __name__ == '__main__':
    mode_file = "pointpillar-native-sim.onnx"
    simplify_preprocess(onnx.load(mode_file))

@Allamrahul
Copy link

Hi @Allamrahul have you verified the pointcloud information is being loaded correctly and in the right order?

By this, do you mean how main.py is loading the .npy file? The script is meant for .bin files but it should work for .npy files as well. Please let me know if I am missing something.

@Allamrahul
Copy link

@GuillaumeAnoufa I have fixed this. I reversed it twice so it actually should not of affected the final model but better to have correctly named variables/args...

Hi, I used this commit but when I compared my results using the pth file Vs TRT inference, my predictions matched in box sizes, z dimension and confidence but not in X and Y coordinates. I tweaked the code the following way:
In exporter.py, I kept the following line unchanged:
onnx_final = simplify_preprocess(onnx_simp, VOXEL_SIZE_X, VOXEL_SIZE_Y, MAX_POINTS_PER_VOXEL)
But in simplifier_onnx.py,
I swapped the order: def simplify_preprocess(onnx_model, VOXEL_SIZE_Y, VOXEL_SIZE_X, MAX_POINTS_PER_VOXEL)
and made VOXEL_ARRAY = np.array([int(VOXEL_SIZE_X), int(VOXEL_SIZE_Y)]).

This is atleast allowing me to get the same results across both eval using pth and using the onnx file for TRT inference. Not sure why this is working. That being said, I am getting slightly lesser number of predictions when I make predictions using TF-RT. Not sure why this is. Would really like some help in to understand if what I am doing is right.

@rjwb1
Copy link
Author

rjwb1 commented Feb 9, 2023

Hi, this is the same as my original commit before the suggestion was made by @GuillaumeAnoufa to change it. I guess I was right all along as I inspected the model in netron. I'll revert the commit suggested by @GuillaumeAnoufa

@Allamrahul
Copy link

Allamrahul commented Feb 9, 2023

Hi, I tried your 1st commit but that's not working:
The following is the analysis:

In your 1st iteration:

Call: X, Y
Fn def: Y, X
VOXEL_ARRAY: Y, X

conclusion:
Call's X maps to VOXEL_ARRAY[0]
Call's Y maps to VOXEL_ARRAY[1]

2nd iteration: (according to commit suggested by @GuillaumeAnoufa):

call: X, Y
Fn def: X, Y,
VOXEL_ARRAY: X, Y

conclusion:
Call's X maps to VOXEL_ARRAY[0]
Call's Y maps to VOXEL_ARRAY[1]

what works for me:
call: X, Y
Fn def: Y, X
VOXEL_ARRAY: X, Y

conclusion:
Call's X maps to VOXEL_ARRAY[1]
Call's Y maps to VOXEL_ARRAY[0]


I have just retried iteration 1 and 2 again and they don't solve the issue because, inherently they both are doing the same thing. Mapping gets reversed if I try the way I suggested. Could you confirm this?

@rjwb1
Copy link
Author

rjwb1 commented Feb 9, 2023

That seems right, in my implementation by voxel shape is (600,600) so I would not notice this issue. I will fix this as soon as I can

@Allamrahul
Copy link

One more question: the boxes I get during TFRT inference are just a subset of the boxes I get during evaluation phase using the pth file. For example, for a .npy file, during eval phase, if I get 4 bounding boxes, I am getting 1 or 2 or 3 during TFRT inference and the output number changes every time I run it. Any way to get all the detections during TFRT inference?

@rjwb1
Copy link
Author

rjwb1 commented Feb 9, 2023

@Allamrahul are you using the same score and NMS threshold? I guess I would start by adjusting these in Params.h. I haven't directly compared my PyTorch results to the ones from tensorrt but they seem the same for me.

@rjwb1
Copy link
Author

rjwb1 commented Feb 9, 2023

I just removed that entirely as I require very fast performance. I also implemented a better way of loading params from a yaml file exporter.py generates if you'd be interested.

For guidance in my Params.h I find that a score threshold of 0.3-0.4 and an NMS thresh of 0.01 works well.

@Allamrahul
Copy link

Will check that. Additionally, when I enable fp16, I am getting 100's of bounding boxes ( in the range of 5 to 350) during TFRT inference. When I disable fp 16, recompile and run, the number of detections are back to normal.

Let me know the right way of doing it and if I am missing something here.

@rjwb1
Copy link
Author

rjwb1 commented Feb 9, 2023

This worked for me. Obviously FP16 can incur a accuracy penalty

@Allamrahul
Copy link

Could you specify what worked for you? Its not clear from your comment. Thanks.
Also, currently, my score threshold is 0.25 and NMS thresh of 0.01 in params.h. I am just using the params.h the exporter.py generates during onnx model generation

@rjwb1
Copy link
Author

rjwb1 commented Feb 9, 2023

I mean FP16 worked normally for me when commenting the lines you suggested above

@rjwb1
Copy link
Author

rjwb1 commented Feb 9, 2023

Perhaps try a score_thesh of 0.4

@Allamrahul
Copy link

By normally, you mean you too are getting hundreds of detections? Sorry, I dont have much experience in deployment and this is the first time I am dealing with fp16.

@rjwb1
Copy link
Author

rjwb1 commented Feb 9, 2023

No worries, I meant that I did not observes having hundreds of detections with FP16 but my confidence is set to 0.3. Perhaps look at the detections you are getting and if you are receiving lots of low scores increase the threshold.

@Allamrahul
Copy link

Got it, let me check that.

@Allamrahul
Copy link

Allamrahul commented Feb 9, 2023

Also, one more thing: I am using .npy files since I am using a custom dataset. I observed that there is an 32 byte offset when I load the same npy file via python, numpy VS when I load it through cpp. Could this be a factor?

@rjwb1
Copy link
Author

rjwb1 commented Feb 9, 2023

I am using ROS so I do not have to load any files so can't fully recommend a solution. However I do write the binary files I use for training with OpenPCDet. The only think I could recommend is trying to making the dtype of the numpy array you are using np.float16. Although I seem to get good results in my implementation without explicitly using float16 when I convert from the ROS msg.

@Allamrahul
Copy link

Allamrahul commented Feb 10, 2023

@rjwb1 , could you point me to the exact TFRT inference files you are using at the moment? As mentioned before, my fp16 numbers are out of whack, 300 detections in some case and 5 in other. I am expecting it to give a number between 3 and 5 for every point cloud. I would like to cross reference the exact commit or group of commits you are using for inference just to make sure I am not missing anything of importance. After analyzing the results, I found out that the model is too confident on some examples, giving out a confidence values of 90 to 100 % in a lot of the detection. But on some examples, its giving the right output.

@Allamrahul
Copy link

@rjwb1 , in regards to #85, I see that MAX_VOXELS is hard coded to 10000 in the export script exporter.py. But when I examine the pointpillar.yaml, I see this:
MAX_NUMBER_OF_VOXELS: {
'train': 16000,
'test': 40000
}
So, should'nt MAX_VOXELS in the export script be 40000?

I tried this out: when I gave 40000 for MAX_VOXELS and exported the onnx file, the multiple false positives I get in FP16 inference TFRT goes away. Can anyone confirm what I did makes sense?

@big773
Copy link

big773 commented Mar 20, 2023

Correctly applies params from the model cfg to the onnx exporter

Hi,I can export My custom onnx model, but the results seem innocent,can you give me some help

@rjwb1 rjwb1 mentioned this pull request Apr 3, 2023
@HSqure
Copy link

HSqure commented Apr 14, 2023

Hello, thank you for your work on the custom model conversion. I found that the dense shape of PPScatter_0 in the model converted with your code is reversed compared to the official ONNX model.

But, currently, everything is working fine after making a modification to swap the positions of VOXEL_SIZE_Y and VOXEL_SIZE_X in the below section:

In simplifier_onnx.py,line 83 as:
before modification:

VOXEL_ARRAY = np.array([int(VOXEL_SIZE_X),int(VOXEL_SIZE_Y)])

after modification:

VOXEL_ARRAY = np.array([int(VOXEL_SIZE_Y),int(VOXEL_SIZE_X)])

Hope this helps to solve the problem!

@zzt007
Copy link

zzt007 commented Apr 30, 2023

@Allamrahul
hello , have u solved the trt mismatch problem? I also meet this problem , could u tell me how to solve it?
thanks for your guidance.

@Acuno41
Copy link

Acuno41 commented Aug 9, 2023

Hello everyone and thanks @rjwb1 for the amazing updates,

I can successfully trained my custom data with 3 classes like KITTI (vehicle, pedestrian, cyclist) in OpenPCDet and results looked fine on python side. Then i converted the model with exporter.py and also ran it succesfully in my c++ code.

Then i trained same custom data with 12 classes (i seperated the vehicle class like bus, van ,truck) and results also looked fine on python side but after the exporting the model with exporter.py the results on c++ side was completely random and produced lots of large false detections.

Has anyone encountered a problem like this before? or trained with different class sizes before ?
Could there be something class dependent parameter in exporter.py?

I would be glad if anyone can help.
Thank you.

@Acuno41
Copy link

Acuno41 commented Aug 9, 2023

I found out that the "MAX_POINTS_PER_VOXEL" parameter in the pointpillar.yaml file is the problem. When I change the parameter from the default 32 to something different, it causes the problem I described above.
I am looking for solution.

@zzt007
Copy link

zzt007 commented Aug 9, 2023 via email

@Acuno41
Copy link

Acuno41 commented Aug 9, 2023

hi @zzt007,

The pointcloud in my dataset is very dense in close range so i set the MAX_POINTS_PER_VOXEL parameter to 128. But after i trained my data with that parameter and export with this functions, the boundingbox results were completely random in c++ side. Then i started training with default MAX_POINTS_PER_VOXEL:32 parameter and tested after couple of epochs the model started to detect the objects in correct boundingbox sizes.

I'am still in early stages in training and optimizing parameters, but as soon as i get proper results i will compare the results.

@rjwb1
Copy link
Author

rjwb1 commented Aug 9, 2023

Hi guys, I had to make some small changes as I work in a different private repository so I haven't fully tested everything. For my application I use a single class however I have tried with multiple. And I also use a custom voxel size and number (XYZ) and this works for me. I'm not at my computer right now but when I return I'd be happy to help 👍🏼

@rjwb1
Copy link
Author

rjwb1 commented Aug 9, 2023

Just to confirm you're correctly copying the Params.h header over? In my version I generate a config file that does not need to be rebuilt but I haven't done this here

@rjwb1
Copy link
Author

rjwb1 commented Aug 9, 2023

@Acuno41 I have discovered that the MAX_POINTS_PER_VOXEL is also hardcoded in the kernel.h. Did you change it here?

const int POINTS_PER_VOXEL = 32; // depands on "params.h"
const int WARP_SIZE = 32; // one warp(32 threads) for one pillar
const int WARPS_PER_BLOCK = 4; // four warp for one block
const int FEATURES_SIZE = 10; // features maps number depands on "params.h"
const int PILLARS_PER_BLOCK = 64; // one thread deals with one pillar and a block has PILLARS_PER_BLOCK threads
const int PILLAR_FEATURE_SIZE = 64; // feature count for one pillar depands on "params.h"

@zzt007
Copy link

zzt007 commented Aug 10, 2023 via email

@Acuno41
Copy link

Acuno41 commented Aug 10, 2023

Hi @rjwb1,
thanks for the response,

Just to confirm you're correctly copying the Params.h header over? In my version I generate a config file that does not need to be rebuilt but I haven't done this here

Yes, i correctly copied the params.h to the c++ side and checked if they loaded correctly to the c++ code

@Acuno41 I have discovered that the MAX_POINTS_PER_VOXEL is also hard-coded in the kernel.h. Did you change it here?

const int POINTS_PER_VOXEL = 32; // depands on "params.h"
const int WARP_SIZE = 32; // one warp(32 threads) for one pillar
const int WARPS_PER_BLOCK = 4; // four warp for one block
const int FEATURES_SIZE = 10; // features maps number depands on "params.h"
const int PILLARS_PER_BLOCK = 64; // one thread deals with one pillar and a block has PILLARS_PER_BLOCK threads
const int PILLAR_FEATURE_SIZE = 64; // feature count for one pillar depands on "params.h"

And also i updated the kernel.h little bit to remove the hardcoded param.h dependent parameters, kernel.h looks like below in my code

const int THREADS_FOR_VOXEL = 256;    // threads number for a block
const int POINTS_PER_VOXEL = Params::max_num_points_per_pillar;      // depands on "params.h"
const int WARP_SIZE = 32;             // one warp(32 threads) for one pillar
const int WARPS_PER_BLOCK = 4;        // four warp for one block
const int FEATURES_SIZE = 10;         // features maps number depands on "params.h"
const int PILLARS_PER_BLOCK = 64;     // one thread deals with one pillar and a block has PILLARS_PER_BLOCK threads
const int PILLAR_FEATURE_SIZE = Params::num_feature_scatter;   // feature count for one pillar depands on "params.h"

And i changed max_num_points_per_pillar and num_feature_scatter to static const in params.h.

Considering that the MAX_POINTS_PER_VOXEL parameter is used in the preprocess part, I suspect something there might be causing the problem while preparing the data to feed to model.

@soo4826
Copy link

soo4826 commented Aug 29, 2023

Hi @rjwb1

I also follow your forked repository and this PR #77 ,
But it does not shows same result compared to my pytorch(*.pth) inference

Here's my overall procedure!

1. Train my custom model with custom dataset

  • INPUT_RANGE: [-80, -80, -10, 80, 80, 10] (Square!)
  • VOXEL_SIZE: [0.4, 0.4, 20]

2. Convert my custom model *.pth into *.onnx with exporter.py

3. Change include/param.h

  • Apply newly created file param.h in step 2

5. Modify Hard-coded value in kernel.h

  • POINTS_PER_VOXEL

6. build and infer

  • visualize with open3d

Here's result of pytorch+ros inference
image

Also, here's result of CUDA-PointPillars
image
Can you give me some advice?

Also, have you wrapped this package into ROS?

@zzt007
Copy link

zzt007 commented Aug 30, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants