-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Improvement #144
Comments
Can you give an example of a specific performance problem you're having? jsonpath_ng doesn't parse JSON for you, but there are many faster parsers than Python's JSON module if that's your bottleneck. If you have a specific performance problem with jsonpath_ng, it would help to have more details. |
I used your library to write a csv to json converter with the row headers being jpath, it worked well except for the performance. I'm probably using it wrong, but it looks like the parse is particularly expensive (I also have a lot of queries). (I used cprofiler and snakeviz to display this) |
I did more profiling to see if I had specific expensive queries, but in fact, I'm doing 80 path queries, and each of them is taking about: ~34 ms But in total that ends up being ~2765.30ms |
@jpetersen23 Can you post the code that's giving you performance problems? |
I cant share my actual code or data, but I made a toy example from my data/code. from jsonpath_ng.ext import parse
import time
pairs = [
("$.metadata.content_release_version", "taco"),
("$.id", "taco"),
("$.config.priority", "taco"),
("$.created_at", "taco"),
("$.update_at", "taco"),
("$.event_type", "taco"),
("$.event_state", "taco"),
("$.config.requires_one_of.token[0].thingy_id", "taco"),
("$.config.requires_one_of.token[0].amount", "taco"),
("$.config.asset_map.event_icon", "taco"),
("$.config.asset_map.key_art", "taco"),
("$.config.loc_map.desc.namespace", "taco"),
("$.config.loc_map.desc.key", "taco"),
("$.config.loc_map.title.namespace", "taco"),
("$.config.loc_map.title.key", "taco"),
("$.config.loc_map.something_desc.namespace", "taco"),
("$.config.loc_map.something_desc.key", "taco"),
("$.config.challenges.BANANAS_01.event_progress", "taco"),
("$.config.challenges.BANANAS_02.event_progress", "taco"),
("$.config.challenges.BANANAS_03.event_progress", "taco"),
("$.config.challenges.BANANAS_04.event_progress", "taco"),
("$.config.challenges.BANANAS_05.event_progress", "taco"),
("$.config.challenges.BANANAS_06.event_progress", "taco"),
("$.config.challenges.BANANAS_07.event_progress", "taco"),
("$.config.challenges.BANANAS_08.event_progress", "taco"),
("$.config.challenges.BANANAS_09.event_progress", "taco"),
("$.config.challenges.BANANAS_10.event_progress", "taco"),
("$.config.challenges.BANANAS_11.event_progress", "taco"),
("$.config.challenges.BANANAS_12.event_progress", "taco"),
("$.config.challenges.BANANAS_13.event_progress", "taco"),
("$.config.challenges.BANANAS_14.event_progress", "taco"),
("$.config.challenges.BANANAS_15.event_progress", "taco"),
("$.config.challenges.BANANAS_16.event_progress", "taco"),
("$.config.challenges.BANANAS_17.event_progress", "taco"),
("$.config.challenges.BANANAS_18.event_progress", "taco"),
("$.config.challenges.BANANAS_19.event_progress", "taco"),
("$.config.challenges.BANANAS_20.event_progress", "taco"),
("$.config.challenges.BANANAS_01.auto_assign", "taco"),
("$.config.challenges.BANANAS_02.auto_assign", "taco"),
("$.config.challenges.BANANAS_03.auto_assign", "taco"),
("$.config.challenges.BANANAS_04.auto_assign", "taco"),
("$.config.challenges.BANANAS_05.auto_assign", "taco"),
("$.config.challenges.BANANAS_06.auto_assign", "taco"),
("$.config.challenges.BANANAS_07.auto_assign", "taco"),
("$.config.challenges.BANANAS_08.auto_assign", "taco"),
("$.config.challenges.BANANAS_09.auto_assign", "taco"),
("$.config.challenges.BANANAS_10.auto_assign", "taco"),
("$.config.challenges.BANANAS_11.auto_assign", "taco"),
("$.config.challenges.BANANAS_12.auto_assign", "taco"),
("$.config.challenges.BANANAS_13.auto_assign", "taco"),
("$.config.challenges.BANANAS_14.auto_assign", "taco"),
("$.config.challenges.BANANAS_15.auto_assign", "taco"),
("$.config.challenges.BANANAS_16.auto_assign", "taco"),
("$.config.challenges.BANANAS_17.auto_assign", "taco"),
("$.config.challenges.BANANAS_18.auto_assign", "taco"),
("$.config.challenges.BANANAS_19.auto_assign", "taco"),
("$.config.challenges.BANANAS_20.auto_assign", "taco"),
("$.config.tiers.\"00\".threshold", "taco"),
("$.config.tiers.\"00\".array_type[0].thingy_id", "taco"),
("$.config.tiers.\"00\".array_type[0].amount", "taco"),
("$.config.tiers.\"01\"", "taco"),
("$.config.tiers.\"02\"", "taco"),
("$.config.tiers.\"03\"", "taco"),
("$.config.tiers.\"04\"", "taco"),
("$.config.tiers.\"05\"", "taco"),
("$.config.tiers.\"06\"", "taco"),
("$.config.tiers.\"07\"", "taco"),
("$.config.tiers.\"08\"", "taco"),
("$.config.tiers.\"09\"", "taco"),
("$.config.tiers.\"10\"", "taco"),
("$.config.tiers.\"11\"", "taco"),
("$.config.tiers.\"12\"", "taco"),
("$.config.tiers.\"13\"", "taco"),
("$.config.tiers.\"14\"", "taco"),
("$.config.tiers.\"15\"", "taco"),
("$.config.tiers.\"16\"", "taco"),
("$.config.tiers.\"17\"", "taco"),
("$.config.tiers.\"18\"", "taco"),
("$.config.tiers.\"19\"", "taco")
]
json_output = {}
parse_total_time = 0
start_time = time.process_time()
for pair in pairs:
parse_start_time = time.process_time()
jsonpath_expr = parse(pair[0])
duration = 1000 * (time.process_time() - parse_start_time)
parse_total_time += duration
jsonpath_expr.update_or_create(json_output, pair[1])
total_time = 1000 * (time.process_time() - start_time)
print(f"Parse Time: {parse_total_time}ms. Total Time: {total_time}ms") Here is cprof output for a run of it: Parse Time: 2665.3660000000023ms. Total Time: 2672.836ms I also made a follow up test comparing it to a python jq setup. The python jq version produced equivalent json, with the following times: |
Not sure how viable but I changed the parser class to setup the parser table only once and reused the parser and I got the time from: |
@lukasjesche That should work. Although it does also require slight code changes to the example. Instead of calling the |
Wow, great findings here. Having read this thread, I decided to start caching my parsers where applicable and went down from 14 minutes processing time to 7 seconds. |
Hi @lukasjesche, I’m facing a similar issue related to performance, could you please post the refactored you did in this example? |
Is there any way to improve performance/cache responses to make it faster to parse and query large json files?
The text was updated successfully, but these errors were encountered: