-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Just first regex hit is shown if multiple regex patterns match the same input string #1897
Comments
I can take a look at this, @lfcnassif. |
If you have available time @tc-wleite, that would be great! Thank you very much for all your volunteer work on this project! |
I was able to reproduce and fix the issue reported by @paulobreim (#1745 (comment)). It doesn't seem to be the same situation reported by @milcent-CVM (#1745 (comment)). @milcent-CVM, can you provide a couple of sample strings that should match the regex you created? |
Sure! Thank you! Let me just post the (still in evolution - regex101.com) Regex Pattern that got the most hits (still doesn't account for all latin characters in the person's name and expects case insensitive, which is the default in IPED):
I will join here many parts of the articles in one peace that would result in many hits, OK? """ DURVAL JOSÉ SOLEDADE SANTOS JOSÉ LUIZ OSORIO DE ALMEIDA FILHO Dos fatos Diretor Relator: Alexandre Costa Rangel Relatório de Julgamento (1251150) SEI 19957.010729/2019-31 / pg. 1 Data do julgamento: 23/06/2020 Relator: Diretor Henrique Machado Acusados: LEONARDO BRUNET MENDES DE MORAES Diretor-Relator FRANCISCO AUGUSTO DA COSTA E SILVA Presidente RELATÓRIO Relator: Leonardo Brunet Mendes De Moraes DOS FATOS WLADIMIR CASTELO BRANCO CASTRO Presidente da Sessão RELATÓRIO Relator : Diretor Wladimir Castelo Branco Castro Rio de Janeiro, 04 de abril de 2007. Maria Helena de Santana Marcelo Fernandez Trindade Rio de Janeiro, 18 de dezembro de 2007. Durval Soledade Maria Helena dos Santos Fernandes de Santana Rio de Janeiro, 28 de agosto de 2007. Marcos Barbosa Pinto Maria Helena dos Santos Fernandes de Santana Eli Loria |
@milcent-CVM, using the regex and the text posted, regex101 is not showing any matches (screenshot below). |
@tc-wleite , this is probably due to regex101 being case-sensitive, while IPED is not (at least according to RegexConfig.txt).
|
@milcent-CVM, it seems that syntax used by regex101 and IPED (which uses dk.brics.automaton) is not the same. Trying simpler expressions first in IPED should help. import dk.brics.automaton.*;
public class Test {
public static void main(String[] args) {
RegExp r = new RegExp("([^\\-]relatora?)");
Automaton a = r.toAutomaton();
String input = " relator".toLowerCase();
System.out.println(a.run(input));
}
} This program output is "true". |
Fix regex processing with multiple hits for the same string (#1897)
OK!! Thank you very much!!
I will start with simpler patterns and work my way up, and try to find some testing framework more compatible with IPED’s Regex.
Cheers!
Marcel Milcent
…________________________________
De: Luis Filipe Nassif ***@***.***>
Enviado: Tuesday, September 26, 2023 11:25:55 PM
Para: sepinf-inc/IPED ***@***.***>
Cc: Marcel Tavares Quinteiro Milcent Assis ***@***.***>; Mention ***@***.***>
Assunto: Re: [sepinf-inc/IPED] Custom user regex patterns in RegexConfig.txt could be ignored (Issue #1897)
Closed #1897<#1897> as completed via #1900<#1900>.
—
Reply to this email directly, view it on GitHub<#1897 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ATVENE6ARPLVUG4WEI2XY73X4OFDHANCNFSM6AAAAAA5G7GH3A>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Reported on #1745
The text was updated successfully, but these errors were encountered: