Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail over retry attempt #250

Open
MuhammadhAadhil opened this issue Dec 29, 2019 · 10 comments · May be fixed by #624
Open

Fail over retry attempt #250

MuhammadhAadhil opened this issue Dec 29, 2019 · 10 comments · May be fixed by #624

Comments

@MuhammadhAadhil
Copy link

Hi All,

I'm really new to this community and this is my 1st comment. We are using QuickFix library for our application. There is a requirement received to handle the fail-over scenario for initiator session as below.

  1. If a disconnection experienced for the primary host(SocketAcceptHost), We should try the same connection for a configurable times.

  2. If all the retry attempts are failed only we need to start trying to the other hosts(SocketAcceptHost1, SocketAcceptHost2...SocketAcceptHost[N]).
    As of my understanding, the above requirement is not available with existing code and it seems doable while introducing new parameter to the class "IoSessionInitiator". Please be kind enough to advise me further on this.

As i mentioned, I'm really new to this community and i'm not sure this is the right place to discuss this. If it is not please someone point me to the right direction. Thanks in advance.

@chrjohn
Copy link
Member

chrjohn commented Dec 30, 2019

Maybe you can outline the changes that you want to introduce into the IoSessionInitiator? IMHO it would be good to introduce some kind of strategy that can easily be changed without changing QFJ itself.

@MuhammadhAadhil
Copy link
Author

MuhammadhAadhil commented Jan 6, 2020

Hi @chrjohn,
Thanks a lot for the Response. I've tried the change and tested. It is working as expected. As you mentioned Main change is in the class IoSessionInitiator. But couple of classes also need to be modified in order to introduce the New Configuration.
I've Add those classes here. I would be grateful if you could able overview my changes. I've done this change on top of Version 2.0.0.
If this change is acceptable please be kind enough to guide me to add this change to the Latest Code.
ChangedClasses.zip
Configuration&TestEvidence.zip

@philipwhiuk
Copy link
Contributor

To submit a change so we can modify/merge it:

@PetteriPertola
Copy link

Hi, was this ever added as a PR / merged into a release version by any chance? I tried to look but I couldn't find anything that suggested it had.
Thanks!

@chrjohn
Copy link
Member

chrjohn commented Oct 26, 2020

Hi @PetteriPertola , no this has not been merged since there was no PR submitted.

@PetteriPertola
Copy link

PetteriPertola commented Oct 26, 2020

Hi @PetteriPertola , no this has not been merged since there was no PR submitted.

Thanks. We're seeing a similar issue: If primary host is down when starting up, then the failover mechanism of SocketConnectHost1, SocketConnectPort1 does not work, it just keeps retrying the SocketConnectHost over and over again.

@chrjohn
Copy link
Member

chrjohn commented Nov 24, 2020

@PetteriPertola Maybe you can take a stab at a PR?

@suguiura
Copy link

suguiura commented Apr 8, 2022

Hi guys,

I found a funny thing: the failover feature works sometimes, but not always.

For the failover test I created a working acceptor at port 9998; and I also created a TCP server at port 9999 to send a reset back to the initiator as soon as it connects:

final ServerSocket serverSocket = new ServerSocket(9999);

while (true) {
    try (final Socket socket = serverSocket.accept()) {
        socket.setSoLinger(true, 0);
        System.out.println("accepted " + socket);
    } finally {
        System.out.println("done");
    }
}

The TCP server above makes the initiator to try the next host:port. However, if we add a Thread.sleep(20) before closing the connection, the failover stops working.

This divergence happens when calling Net.pollConnectNow(fd) at .finishConnect() method from sun.nio.ch.SocketChannelImpl happens sooner or later depending on the response from the server.

A workaround to this problem case is to add a Thread.sleep(MILLIS) (with a reasonable MILLIS value) at the beginnig of .finishConnect(handle) method from org.apache.mina.transport.socket.nio.NioSocketConnector of the org.apache.mina:mina-core:2.1.4 dependency.

@chrjohn
Copy link
Member

chrjohn commented Aug 12, 2022

@suguiura
Thanks for the comment but if you are suggesting a change for MINA then your best bet is to open an issue in their issue tracker. http://issues.apache.org/jira/browse/DIRMINA

MuhammadhAadhil pushed a commit to MuhammadhAadhil/quickfixj that referenced this issue Mar 24, 2023
@MuhammadhAadhil
Copy link
Author

Hi All,
I'm really apologizing for my late response. I've added the change. Please be kind enough to review.
#624

@chrjohn chrjohn linked a pull request Apr 6, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants