Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Driver fail to reconnect if cluster changes #150

Closed
marvinkite opened this issue Sep 15, 2020 · 2 comments · Fixed by #151
Closed

Driver fail to reconnect if cluster changes #150

marvinkite opened this issue Sep 15, 2020 · 2 comments · Fixed by #151

Comments

@marvinkite
Copy link

  • Neo4j version: Neo4J Aura (4 Enterprise)
  • Neo4j Mode: HA cluster with 3 members/Casual cluster with 1 core 2 read-replica
  • Driver version: 1.8 Go driver v1.8.3
  • Operating system: Linux Alpine
  • Steps to reproduce
    Time to time execute some QueryTransaction
  • Expected behavior
    Always returns a result and recovers from errors if possible.
  • Actual behavior
    The application was not used over night when I assume the routing table contained the server 4 and 5 and 6. When the first query hit the server the operation failed but the second query succeeded.
    Running the CALL dbms.cluster.routing.getRoutingTable({}) cypher showed the cluster is reconfigured to server 7 (WRITE) and 8,9 (READ). The operation was executed with session from driver.Session(neo4j.AccessModeRead) so following the number logic the lowest number might be the write node so seems odd that the operation was trying to connect to server 4 and log shows it may checked 5 and 6 before. My application received the EOF error instead of driver recovering and returning with the result.
  1. router 1:Reading routing table for '' from any of [neo4j-core-xxxxx-6.production-orch-0001.neo4j.io:7687 neo4j-core-xxxxx-5.production-orch-0001.neo4j.io:7687 neo4j-core-xxxxxxxx-4.production-orch-0001.neo4j.io:7687]

  2. pool 1:No server connection available to any of [neo4j-core-xxxxx-6.production-orch-0001.neo4j.io:7687]

  3. pool 1:No server connection available to any of [neo4j-core-xxxxx-5.production-orch-0001.neo4j.io:7687]

  4. bolt-209632@neo4j-core-xxxxxx-4.production-orch-0001.neo4j.io:7687/v4:write tcp 10.xxxxx:43874->xxx:7687: write: broken pipe

    github.com/neo4j/neo4j-go-driver/neo4j.adaptorLogger.Error
        /vendor/github.com/neo4j/neo4j-go-driver/neo4j/logging.go:121
    github.com/neo4j/neo4j-go-driver/neo4j/internal/bolt.(*bolt4).sendMsg
        /vendor/github.com/neo4j/neo4j-go-driver/neo4j/internal/bolt/bolt4.go:139
    github.com/neo4j/neo4j-go-driver/neo4j/internal/bolt.(*bolt4).Close
        /vendor/github.com/neo4j/neo4j-go-driver/neo4j/internal/bolt/bolt4.go:663"
    
  5. pool 1:No server connection available to any of [neo4j-core-xxxxxx-4.production-orch-0001.neo4j.io:7687]

  6. router 1:Unable to retrieve routing table from neo4j-core-xxxxxx-4.production-orch-0001.neo4j.io:7687: EOF

    github.com/neo4j/neo4j-go-driver/neo4j.adaptorLogger.Error
        /vendor/github.com/neo4j/neo4j-go-driver/neo4j/logging.go:121
    github.com/neo4j/neo4j-go-driver/neo4j/internal/router.(*Router).getTable
        /vendor/github.com/neo4j/neo4j-go-driver/neo4j/internal/router/router.go:108
    github.com/neo4j/neo4j-go-driver/neo4j/internal/router.(*Router).Readers
        /vendor/github.com/neo4j/neo4j-go-driver/neo4j/internal/router/router.go:122
    github.com/neo4j/neo4j-go-driver/neo4j.(*session).borrowConn
        /vendor/github.com/neo4j/neo4j-go-driver/neo4j/session.go:379
    github.com/neo4j/neo4j-go-driver/neo4j.(*session).Run
        /vendor/github.com/neo4j/neo4j-go-driver/neo4j/session.go:456
    
  7. bolt-209494@xxxxxx.databases.neo4j.io:7687/v4:write tcp 10.8.1.168:43868->35.187.125.190:7687: write: broken pipe

    github.com/neo4j/neo4j-go-driver/neo4j.adaptorLogger.Error
        /vendor/github.com/neo4j/neo4j-go-driver/neo4j/logging.go:121
    github.com/neo4j/neo4j-go-driver/neo4j/internal/bolt.(*bolt4).sendMsg
        /vendor/github.com/neo4j/neo4j-go-driver/neo4j/internal/bolt/bolt4.go:139
    github.com/neo4j/neo4j-go-driver/neo4j/internal/bolt.(*bolt4).Close
        /vendor/github.com/neo4j/neo4j-go-driver/neo4j/internal/bolt/bolt4.go:663
    
@2hdddg
Copy link
Contributor

2hdddg commented Sep 15, 2020

Thanks for the very well written issue!
I can see that the driver doesn't fall back to the original URL when last known routers fail. A patch is on the way...

2hdddg pushed a commit to 2hdddg/neo4j-go-driver that referenced this issue Sep 15, 2020
The initial URI was only used when there were no routers from the
previous table. This fix makes sure that initial URI is used to
retrieve routers when previous set of routers fail.

fix neo4j#150
2hdddg pushed a commit that referenced this issue Sep 15, 2020
The initial URI was only used when there were no routers from the
previous table. This fix makes sure that initial URI is used to
retrieve routers when previous set of routers fail.

fix #150
@marvinkite
Copy link
Author

Hi,

I'm using the driver with this fix and I'm still connecting to Aura service and regularly see the EOF error messages.

There is only few hundreds or less of request per hour and I see this same error message 1-3 times per hour. Compare to the number of call it's very often.

  1. router 1:Reading routing table for '' from previously known routers: [neo4j-core-xxxxx-12.production-orch-0001.neo4j.io:7687 neo4j-core-xxxxx-11.production-orch-0001.neo4j.io:7687 neo4j-core-xxxxx-10.production-orch-0001.neo4j.io:7687]

  2. bolt-206756@neo4j-core-xxxxx-12.production-orch-0001.neo4j.io:7687/v4:Connected

  3. bolt-206443@neo4j-core-xxxxx-10.production-orch-0001.neo4j.io:7687/v4:EOF

  4. pool 1:Pruning neo4j-core-xxxxx-10.production-orch-0001.neo4j.io:7687 for connections that might be dead

  5. pool 1:Unregistering dead or too old connection to neo4j-core-d86dc369-10.production-orch-0001.neo4j.io:7687

My application catch the io.EOF error message here and invokes the same operation again.

  1. bolt-206981@neo4j-core-xxxxx-10.production-orch-0001.neo4j.io:7687/v4:Connected
"github.com/indykite/jarvis/pkg/cypher.(*logger).Errorf
	/pkg/cypher/client.go:297
github.com/neo4j/neo4j-go-driver/neo4j.adaptorLogger.Error
	/vendor/github.com/neo4j/neo4j-go-driver/neo4j/logging.go:121
github.com/neo4j/neo4j-go-driver/neo4j/internal/bolt.(*bolt4).receiveMsg
	/vendor/github.com/neo4j/neo4j-go-driver/neo4j/internal/bolt/bolt4.go:148
github.com/neo4j/neo4j-go-driver/neo4j/internal/bolt.(*bolt4).receiveSuccess
	/vendor/github.com/neo4j/neo4j-go-driver/neo4j/internal/bolt/bolt4.go:172
github.com/neo4j/neo4j-go-driver/neo4j/internal/bolt.(*bolt4).run
	/vendor/github.com/neo4j/neo4j-go-driver/neo4j/internal/bolt/bolt4.go:431
github.com/neo4j/neo4j-go-driver/neo4j/internal/bolt.(*bolt4).Run
	/vendor/github.com/neo4j/neo4j-go-driver/neo4j/internal/bolt/bolt4.go:480
github.com/neo4j/neo4j-go-driver/neo4j.(*session).Run
	/vendor/github.com/neo4j/neo4j-go-driver/neo4j/session.go:466

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants