Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get currency name(USD,IND) with the DocQuery? #51

Open
ChandraReddy97 opened this issue Nov 9, 2022 · 9 comments
Open

How to get currency name(USD,IND) with the DocQuery? #51

ChandraReddy97 opened this issue Nov 9, 2022 · 9 comments

Comments

@ChandraReddy97
Copy link

@ankrgyl
i have tried multiple ways to extract the currency name.but no luck.
can you suggest me a matching query(question) to extract the currency from the doc?

@ankrgyl
Copy link
Contributor

ankrgyl commented Nov 9, 2022

Can you share the document and value you are trying to extract?

@ChandraReddy97
Copy link
Author

document links:
https://drive.google.com/file/d/1sZo5yGA6NkMNSCVm7fgtuYnhPMQAJFH3/view?usp=share_link
https://drive.google.com/file/d/1ZzLgqo1OyZTpnXnQwr74RQHMLKbEi8DE/view?usp=share_link

in the below you can see the symbols $,euro symbol.

based on that can we get currency name like USD,EUR something like this.

@ChandraReddy97
Copy link
Author

9240725072.pdf
in this doc u can see , amount(USD).
so suggest me the query(question) to get currency name as USD ?

@ankrgyl
Copy link
Contributor

ankrgyl commented Nov 9, 2022

DocQuery is extractive -- meaning it will only return text that is present in the document. Neither of the first two documents contains the string "USD" or "EUR", so there's no way to get DocQuery to return that value.

I may suggest using something simpler, like a regular expression that searches for terms like "USD" or "$" to suggest "USD" and "EUR" or € to suggest "EUR", unless there is a consistent pattern for where this information lies in the document itself.

@ChandraReddy97
Copy link
Author

In the above pdf file(loaded file) we have USD string right.

@ChandraReddy97
Copy link
Author

@ankrgyl
Is there any document classification and document question/answering custom model training jupyter notebook available please share with me.that will be very much helpful

@ankrgyl
Copy link
Contributor

ankrgyl commented Nov 9, 2022

In the above pdf file(loaded file) we have USD string right.

Yes, but not the other two

Is there any document classification and document question/answering custom model training jupyter notebook available please share with me.that will be very much helpful

Unfortunately not, but if you'd be willing to contribute one, we'd be happy to include it!

@ChandraReddy97
Copy link
Author

ChandraReddy97 commented Nov 10, 2022

@ankrgyl
Is there any way/chance to extract the line-items/table-items using DocQueryEngine ?

@ankrgyl
Copy link
Contributor

ankrgyl commented Nov 10, 2022

Not at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants