Skip to content

Question on Entity Extraction from Raw Outputs of Agent #20

@enochii

Description

@enochii

Hi, thanks for your excellent work! I encountered an issue described below.

LocAgent uses a function called get_module_from_line_number to extract function entities (ultimately added to found_entities in location/loc_outputs.jsonl), as shown below.

def get_module_from_line_number(line, file_path, searcher):
    assert file_path in searcher.G.nodes
    file_node = searcher.get_node_data([file_path])[0]
    print(f'file_path:line -> {file_path}:{line}')
    print(f'file_node -> {file_node}')
    cur_start_line = file_node['start_line']
    cur_end_line = file_node['end_line']
    cur_node = None
    
    for nid in searcher.G.nodes():
        # if not nid.startswith(file_path) or ':' not in nid:
        #     continue
        node = searcher.G.nodes[nid]
        if node['type'] != NODE_TYPE_FUNCTION: continue
        
        # to do: strict matching
        # if file_node['node_id'] not in nid: continue
        
        if 'start_line' in node and 'end_line' in node:
            if node['start_line'] < cur_start_line or node['end_line'] > cur_end_line:
                continue
            if line >= node['start_line'] and line <= node['end_line']:
                cur_node = node
                cur_node['name'] = nid
                cur_start_line = node['start_line']
                cur_end_line = node['end_line']
    if cur_node:
        print(f'cur_node -> {cur_node}')
        return (cur_node, cur_end_line)
    return (None, None)

From my understanding, it should extract the function at Line line of file file_path. If this is the case, the implementation is not entirely correct. Note that as found_entities is used for computing evaluation metrics, the calculated results may be inaccurate.

Using the print statement added by me (see the above code), I got some output in the figure below, where the file path of the found entity (cur_node) does not match the file_path.

Image

Thanks for your time. If my analysis is correct, I am happy to submit a short PR (by adding a commented line in the above code) to address this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions