-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow XML datasource (especially when using xpath) #762
Comments
Did some further investigating / debugging: Interestingly,
returns false. It looks like data set caching is not enabled which might explain above behavior. I see some references to "appContext" parameters like "org.eclipse.birt.data.cache.memory", but I have no glue how to set those parameters in the BIRT designer / preview viewer. Any help is appreciated. |
Nope, in my example project, this "needs cache for data-engine" property is set to true as well so it looks unrelated. I found this document: https://www.eclipse.org/birt/release20specs/BPS7_Data_Set_Caching.pdf BTW, how to use a birt.ini within the development environment? |
I think you should try to debug this. |
By temporarily changing the preview viewer, I added This caused that caching has been enabled and Unfortunately, this did not solve my performance issue. What we would need here is a data cache - to fit the data source (in my case the 300 kb xml) in memory so that it only needs to be parsed once and executing the actual, individual query at the data stored in the cache. |
To be more precise: If the same SQL query with the same combination of DataSet parameter values is accessed a second time from the layout, then the results are fetched from the cache.
Yes. This prevents costly/slow queries to be sent to the database again - it is a very important performance feature. Unfortunately that doesn't help in your case... |
The root cause is that the whole XML really has to be parsed again and again.
The report could probably be much faster if one would use an XML parser to create a Java object structure from the XML once and then use POJO or scripted data sets to further process this Java object structure. |
Thanks @hvbtup for your comment to which I mostly agree. You're right that a custom "XML to Java Object-Transformator" would solve most of the performance problems. Ideally, we should think about using an alternative XML DataSource, e.g. a DOM-based one (of course, the users need to be careful when dealing with large xmls) and use standard JAXP-Features (e.g. XPath) executed on the DOM-tree instead of SAX-oriented parsing and custom xpath code. A couple of months a ago, when preparing the sample project and collecting profiling data, |
I agree mostly. I'd like to add that XML is an overkill format for data serialization in comparison to eg JSON. The code base is definitely complicated - and not only the data engine part. But let's face it: Your idea with a DOM-based XML data source sounds quite reasonable to me. It will need more memory to hold the DOM representation, but it should reduce processing time dramatically. Even DOM is overkill in many case IMHO. |
Following up discussion #759, I've created an example project which reproduces the problem:
we have one 300kb xml, one xml data source and two data sets (parent and child)
We iterate through the parent dataset and for each entry in the parent dataset, we iterate through each of the children of the parent entry (multiple times just to see the issue clearly).
As BIRT does not support tree-like datasets, we use xpath and data set parameters in order to filter the correct childs.
Even though it is a simple report and a tiny-sized input file, report generation (PDF) takes ~10 seconds on a high-end workstation.
BIRT_Slow_xml_processing_with_filters_and_xpath.zip
The text was updated successfully, but these errors were encountered: