-
Notifications
You must be signed in to change notification settings - Fork 2
Redis crashing using module #20
Comments
Most of my testing will never see "done" reported as redis crashes out ps i love the module. ive been using hard coded sourcetree change version of this for several years |
@stopforumspam Hi, This log is quite big. Mind pointing out the exact command that is causing issues? |
It's really only one command, iset.add run 450,000 times in a large multi pipeline. The php script in the ticket that I used to generate the report. It will crash redis most of the time it runs, in libpthread it seems
…________________________________
From: Dani Tseitlin ***@***.***>
Sent: Friday, June 4, 2021 6:16:57 PM
To: danitseitlin/redis-interval-sets ***@***.***>
Cc: stopforumspam ***@***.***>; Mention ***@***.***>
Subject: Re: [danitseitlin/redis-interval-sets] Redis crashing using module (#20)
@stopforumspam<https://github.com/stopforumspam> Hi, This log is quite big. Mind pointing out the exact command that is causing issues?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#20 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AKT2M3OQWKUI4IJ2ZL6R4A3TRBVVTANCNFSM4572BRUA>.
|
I should add that the same code using zsets instead of isets will run continuously without error
…________________________________
From: Paul Lush ***@***.***>
Sent: Friday, June 4, 2021 7:21:01 PM
To: danitseitlin/redis-interval-sets ***@***.***>; danitseitlin/redis-interval-sets ***@***.***>
Cc: Mention ***@***.***>
Subject: Re: [danitseitlin/redis-interval-sets] Redis crashing using module (#20)
It's really only one command, iset.add run 450,000 times in a large multi pipeline. The php script in the ticket that I used to generate the report. It will crash redis most of the time it runs, in libpthread it seems
________________________________
From: Dani Tseitlin ***@***.***>
Sent: Friday, June 4, 2021 6:16:57 PM
To: danitseitlin/redis-interval-sets ***@***.***>
Cc: stopforumspam ***@***.***>; Mention ***@***.***>
Subject: Re: [danitseitlin/redis-interval-sets] Redis crashing using module (#20)
@stopforumspam<https://github.com/stopforumspam> Hi, This log is quite big. Mind pointing out the exact command that is causing issues?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#20 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AKT2M3OQWKUI4IJ2ZL6R4A3TRBVVTANCNFSM4572BRUA>.
|
@gavrie Any chance you have any insights of how to debug this issue? |
@danitseitlin the first step would be to create the simplest possible setup that reproduces the bug reliably. Have you succeeded in reproducing it? |
I compiled a debug build and ran it in gdb, and guess what... no crash. I'll try to replicate a crash in a debugger |
OK, a crash `Count of items: 169386 ` |
it hangs there. if you run up another terminal and issue a |
the lack of crashing was because i forgot to run |
`... Count of items before new item: 169392 Count of items: 169393 Count of items before new item: 169393 Count of items: 169394 Count of items before new item: 169394 Count of items: 169395 Count of items before new item: 169395 Count of items: 169396 Count of items before new item: 169396 Count of items: 169397` the insert process hangs here, during the ->exec() , so I issue a `Thread 1 "redis-server" received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () (gdb) backtrace #0 0x0000000000000000 in ?? () #1 0x00005555555d04d7 in rdbSaveObject (rdb=0x7fffffffd080, o=, key=0x7fffffffd000) at rdb.c:1046 #2 0x00005555555d0d39 in rdbSaveKeyValuePair (rdb=rdb@entry=0x7fffffffd080, key=key@entry=0x7fffffffd000, val=val@entry=0x7ffff74a5d90, expiretime=) at rdb.c:1110 #3 0x00005555555d14b8 in rdbSaveRio (rdb=0x7fffffffd080, error=0x7fffffffd07c, rdbflags=, rsi=0x0) at rdb.c:1261 #4 0x00005555555d3cf3 in rdbSave (filename=0x7ffff744a530 "dump.rdb", rsi=0x0) at rdb.c:1377 #5 0x00005555555d56fc in saveCommand (c=0x7ffff2f1d6c0) at rdb.c:2863 #6 0x00005555555a2126 in call (c=0x7ffff2f1d6c0, flags=15) at server.c:3713 #7 0x00005555555a2fc3 in processCommand (c=c@entry=0x7ffff2f1d6c0) at server.c:4238 #8 0x00005555555b71d0 in processCommandAndResetClient (c=c@entry=0x7ffff2f1d6c0) at networking.c:2010 #9 0x00005555555b9a4a in processInputBuffer (c=0x7ffff2f1d6c0) at networking.c:2111 #10 0x000055555565109c in callHandler (handler=, conn=0x7ffff7415200) at connhelpers.h:79 #11 connSocketEventHandler (el=, fd=, clientData=0x7ffff7415200, mask=) at connection.c:295 #12 0x000055555559a80a in aeProcessEvents (eventLoop=eventLoop@entry=0x7ffff74230a0, flags=flags@entry=27) at ae.c:427 #13 0x000055555559aa9d in aeMain (eventLoop=0x7ffff74230a0) at ae.c:487 #14 0x0000555555596ed7 in main (argc=, argv=) at server.c:6392` rdb.c:1046 `/* Then write the module-specific representation + EOF marker. */ mt->rdb_save(&io,mv->value);` sp it looks like two issues are happening, the interval set insert is hanging and then redis is crashing while the insert is hanging when it goes to save a db |
I added more debug logging and I'm now thinking that this is a bug Redis that your code has happened upon I removed the process from using a multi insert to just iterating over the CSV and everything completes processing without a crash. What are your thoughts? |
further testing with the iteration vs multi still gets segfaults in save, first with this when issuing a flush
and then the segfault when issuing a |
Thanks @stopforumspam! @danitseitlin I suggest that you try to replicate the crash and then we can take it from there. The fact that it works fine with zset leads me to think that the problem is in the module, not in Redis. My first guess would be that it's due to some resource constraint, so it may be worthwhile to use a memory profiler and see if there's a leak somewhere. |
@gavrie Thanks, i will contact you further for more help on the subject. |
if you disable save then it doesnt seem to crash, and the script will complete however if you then issue a save command, redis will segfault |
@stopforumspam Hi, is this the error:
? |
So I've replicated it easily.
And the redis dump
and rust
|
and from docker on ubuntu 20.04 running the same flush/iadd/save
|
and then built on ubuntu 20.04 using the dockerfile in the repo tested with flsuhb/iadd/set as above
and
|
and then in gdb
rdb.c line 1054
|
@stopforumspam I see, so the issue was save after setting an interval set. I will see what's up with it. Thanks |
Yes, it's saving. If you run redis empherically in memory with persistence then it runs without crashing
|
@stopforumspam Does this also reproduce when using bgsave instead of save? |
sadly yes
And the output (run from docker container)
|
Could it be because you're setting data in Redis and when it saves, it has a null/None call pointer that should be returning a string? eg and then again for loading I would add these but my Rust really is about zero |
@stopforumspam Hi, thanks for all the help! I also noticed this myself and I am working on support for rdb_save and rdb_load :) |
@stopforumspam hi, wanted to sync you i finished fixing the bug. I just need to fix the tests and i will merge the fix. |
i've just started testing, so far so good :) |
so far its looking fantastic. memory usage vs my super memory-lite implementation using zsets for ip-to-country, ip-to-asn, and is-ip-in-blacklist is reduced by about 50% usage with your iset code, and with reduced overhead for maintaining them. when you have to run redis on a 512mb VPS, that really counts. superb 👍 |
I spoke too soon. a flush/import/save works, but if you shutdown redis and then load the rdb, then save it, it segfaults `$ redis-cli
and the redis dump `$ src/redis-server redis.conf interval_set_traceinterval_set_trace: 0: redis_module::base_info_func so adding some console logging in the import loop, and it shows this when printing i Notice 37227263} having a curly bracket? i was wondering where this was coming from. I wonder if its because I'm binary packing the member (using php pack function) here, the member is i shouldnt imagine a lot of people would be using non-ascii member names but you never know, and redis has always been binary safe. maybe split the member name off and then trim the min/max scores? edit - just fixing some data as its giving me strange results on each run. I'll get a reportable command to you ASAP |
OK, I think I have it. A member with a binary encoded member that has a } in it, one that is then followed by another member with a } in it, will generate invalid JSON (in the loop
if you can, you might want to look at using msgpack libs for handling all that string processing, as having a member name with a } will break your character parsing/building. having a member name with a ' character in it will have that stripped, resulting in data integrity failure. attached in a dump that will segfault on load TL/DR - you cant have a { or } in member names |
I didn't merge the fix yet, tests are throwing some errors getting help by the original |
i cherry picked the commits in your testing branch. it saves/loads OK with ascii member names, but I havent done any serious load testing on it yet |
@stopforumspam Cool, thanks! |
`2781896:M 03 Jun 2021 02:04:06.846 # Redis 255.255.255 crashed by signal: 11, si_code: 1
2781896:M 03 Jun 2021 02:04:06.846 # Accessing address: (nil)
------ STACK TRACE ------
Backtrace:
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7fbdb76423c0]
------ REGISTERS ------
2781896:M 03 Jun 2021 02:04:06.846 #
RAX:0000000000000009 RBX:00007fbdb6e22080
RCX:0000000000000000 RDX:00007ffc6c639120
RDI:00007ffc6c638e10 RSI:00007fbdb6e4d0d0
RBP:00007ffc6c6391a0 RSP:00007ffc6c638df8
R8 :0000000000000008 R9 :00007ffc6c638d00
R10:0000000000000068 R11:00007fbdb7000900
R12:00007fbdb6ea5f10 R13:00007ffc6c639120
R14:0000000000000000 R15:0000000000000000
RIP:0000000000000000 EFL:0000000000010217
CSGSFS:002b000000000033
2781896:M 03 Jun 2021 02:04:06.846 # (00007ffc6c638e07) -> 0000000000000008
2781896:M 03 Jun 2021 02:04:06.846 # (00007ffc6c638e06) -> 0000007000000101
2781896:M 03 Jun 2021 02:04:06.846 # (00007ffc6c638e05) -> 0000000000000040
2781896:M 03 Jun 2021 02:04:06.847 # (00007ffc6c638e04) -> 0000000000001030
2781896:M 03 Jun 2021 02:04:06.847 # (00007ffc6c638e03) -> 0000564515ebb808
2781896:M 03 Jun 2021 02:04:06.847 # (00007ffc6c638e02) -> 00007ffc00000063
2781896:M 03 Jun 2021 02:04:06.847 # (00007ffc6c638e01) -> 0000000000001000
2781896:M 03 Jun 2021 02:04:06.847 # (00007ffc6c638e00) -> 00007ffc6c639120
2781896:M 03 Jun 2021 02:04:06.847 # (00007ffc6c638dff) -> 0000000000000000
2781896:M 03 Jun 2021 02:04:06.847 # (00007ffc6c638dfe) -> 0000000000000000
2781896:M 03 Jun 2021 02:04:06.847 # (00007ffc6c638dfd) -> 00007fbdb6e22080
2781896:M 03 Jun 2021 02:04:06.847 # (00007ffc6c638dfc) -> 00007ffc6c6391a0
2781896:M 03 Jun 2021 02:04:06.847 # (00007ffc6c638dfb) -> 0000000000000009
2781896:M 03 Jun 2021 02:04:06.847 # (00007ffc6c638dfa) -> 00007ffc6c639120
2781896:M 03 Jun 2021 02:04:06.847 # (00007ffc6c638df9) -> 00007ffc6c63913e
2781896:M 03 Jun 2021 02:04:06.847 # (00007ffc6c638df8) -> 0000564515853477
------ INFO OUTPUT ------
Server
redis_version:255.255.255
redis_git_sha1:a049f629
redis_git_dirty:0
redis_build_id:f789c5cf2d7c81a8
redis_mode:standalone
os:Linux 5.4.0-70-generic x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:9.3.0
process_id:2781896
process_supervised:no
run_id:1b2fc434d8343e97f3887fc1655bd3b199c64462
tcp_port:6379
server_time_usec:1622685846846416
uptime_in_seconds:106
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:12073110
executable:/usr/bin/redis-server
config_file:/etc/redis/redis.conf
io_threads_active:0
Clients
connected_clients:2
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:48
client_recent_max_output_buffer:2665520
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
Memory
used_memory:15222824
used_memory_human:14.52M
used_memory_rss:17379328
used_memory_rss_human:16.57M
used_memory_peak:19737152
used_memory_peak_human:18.82M
used_memory_peak_perc:77.13%
used_memory_overhead:3353152
used_memory_startup:810568
used_memory_dataset:11869672
used_memory_dataset_perc:82.36%
allocator_allocated:15557712
allocator_active:16375808
allocator_resident:19271680
total_system_memory:16790601728
total_system_memory_human:15.64G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.05
allocator_frag_bytes:818096
allocator_rss_ratio:1.18
allocator_rss_bytes:2895872
rss_overhead_ratio:0.90
rss_overhead_bytes:-1892352
mem_fragmentation_ratio:1.14
mem_fragmentation_bytes:2199320
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:2542512
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
Persistence
loading:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:1
rdb_bgsave_in_progress:0
rdb_last_save_time:1622685787
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
Stats
total_connections_received:2
total_commands_processed:447419
instantaneous_ops_per_sec:0
total_net_input_bytes:36275763
total_net_output_bytes:9035623
instantaneous_input_kbps:0.01
instantaneous_output_kbps:12.21
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:1
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:280664
dump_payload_sanitizations:0
total_reads_processed:2218
total_writes_processed:560
io_threaded_reads_processed:0
io_threaded_writes_processed:0
Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:4c03be0c6868d319686a6f617e80115f2454c272
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:0
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
CPU
used_cpu_sys:0.557793
used_cpu_user:1.810808
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:0.552498
used_cpu_user_main_thread:1.806709
Modules
module:name=interval_set,ver=1,api=1,filters=0,usedby=[],using=[],options=[]
Commandstats
cmdstat_iset.del:calls=1,usec=22,usec_per_call=22.00,rejected_calls=0,failed_calls=0
cmdstat_command:calls=1,usec=1485,usec_per_call=1485.00,rejected_calls=0,failed_calls=0
cmdstat_iset.add:calls=447416,usec=1623358,usec_per_call=3.63,rejected_calls=0,failed_calls=280664
cmdstat_flushall:calls=1,usec=2009,usec_per_call=2009.00,rejected_calls=0,failed_calls=0
Errorstats
errorstat_UTF8:count=280664
Cluster
cluster_enabled:0
Keyspace
db0:keys=1,expires=0,avg_ttl=0
------ CLIENT LIST OUTPUT ------
id=3 addr=127.0.0.1:49856 laddr=127.0.0.1:6379 fd=7 name= age=59 idle=4 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=122 omem=2501488 tot-mem=2522016 events=rw cmd=iset.add user=default redir=-1
id=4 addr=127.0.0.1:49858 laddr=127.0.0.1:6379 fd=8 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=14 qbuf-free=40940 argv-mem=4 obl=0 oll=0 omem=0 tot-mem=61452 events=r cmd=save user=default redir=-1
------ CURRENT CLIENT INFO ------
id=4 addr=127.0.0.1:49858 laddr=127.0.0.1:6379 fd=8 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=14 qbuf-free=40940 argv-mem=4 obl=0 oll=0 omem=0 tot-mem=61452 events=r cmd=save user=default redir=-1
argv[0]: 'save'
------ MODULES INFO OUTPUT ------
------ FAST MEMORY TEST ------
2781896:M 03 Jun 2021 02:04:06.848 # Bio thread for job type #0 terminated
2781896:M 03 Jun 2021 02:04:06.848 # Bio thread for job type #1 terminated
2781896:M 03 Jun 2021 02:04:06.848 # Bio thread for job type #2 terminated
2781896:M 03 Jun 2021 02:04:06.848 # IO thread(tid:140452758603520) terminated
2781896:M 03 Jun 2021 02:04:06.849 # IO thread(tid:140452750210816) terminated
2781896:M 03 Jun 2021 02:04:06.849 # IO thread(tid:140452741818112) terminated
*** Preparing to test memory region 5645159e8000 (2277376 bytes)
*** Preparing to test memory region 564515ebb000 (135168 bytes)
*** Preparing to test memory region 7fbdb0a89000 (35651584 bytes)
*** Preparing to test memory region 7fbdb2c8a000 (8388608 bytes)
*** Preparing to test memory region 7fbdb348b000 (8388608 bytes)
*** Preparing to test memory region 7fbdb3c8c000 (8388608 bytes)
*** Preparing to test memory region 7fbdb448d000 (8388608 bytes)
*** Preparing to test memory region 7fbdb4c8e000 (8388608 bytes)
*** Preparing to test memory region 7fbdb6a00000 (8388608 bytes)
*** Preparing to test memory region 7fbdb729d000 (24576 bytes)
*** Preparing to test memory region 7fbdb742e000 (8192 bytes)
*** Preparing to test memory region 7fbdb7629000 (16384 bytes)
*** Preparing to test memory region 7fbdb764c000 (16384 bytes)
*** Preparing to test memory region 7fbdb76fe000 (4096 bytes)
*** Preparing to test memory region 7fbdb7854000 (8192 bytes)
*** Preparing to test memory region 7fbdb7899000 (4096 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.
=== REDIS BUG REPORT END. Make sure to include from START to END. ===
Suspect RAM error? Use redis-server --test-memory to verify it.
2782056:C 03 Jun 2021 02:04:07.906 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2782056:C 03 Jun 2021 02:04:07.906 # Redis version=255.255.255, bits=64, commit=a049f629, modified=0, pid=2782056, just started
2782056:C 03 Jun 2021 02:04:07.906 # Configuration loaded
2782056:M 03 Jun 2021 02:04:07.907 * monotonic clock: POSIX clock_gettime
.
.-
__ ''-._ _.-
.
. ''-._ Redis 255.255.255 (a049f629/0) 64 bit.-
.-```. ```\/ _.,_ ''-._ ( ' , .-` | `, ) Running in standalone mode |`-._`-...-` __...-.
-.|'_.-'| Port: 6379 |
-.._ / _.-' | PID: 2782056
-._-._
-./ .-' .-'|
-._
-.-.__.-' _.-'_.-'| |
-.-._ _.-'_.-' | https://redis.io
-._-._
-..-'.-' .-'|
-._
-.-.__.-' _.-'_.-'| |
-.-._ _.-'_.-' |
-._-._
-..-'_.-' _.-'-._
-..-' _.-'-._ _.-'
-..-'2782056:M 03 Jun 2021 02:04:07.908 # Server initialized
2782056:M 03 Jun 2021 02:04:07.909 * <interval_set> Created new data type 'IntervlSt'
2782056:M 03 Jun 2021 02:04:07.909 * Module 'interval_set' loaded from /usr/local/bin/libintervalsets.so
2782056:M 03 Jun 2021 02:04:07.910 * Loading RDB produced by version 255.255.255
2782056:M 03 Jun 2021 02:04:07.910 * RDB age 60 seconds
2782056:M 03 Jun 2021 02:04:07.910 * RDB memory usage when created 0.83 Mb
2782056:M 03 Jun 2021 02:04:07.910 * DB loaded from disk: 0.000 seconds
2782056:M 03 Jun 2021 02:04:07.910 * Ready to accept connections`
This only happens with the interval set usage. The server itself is a E5 xeon with ECC memory which tests out fine. The redis binary was compiled on Ubuntu 20.04 x64 from the 6.23 release and the git pull from Jun 2nd 2021
I've replicated it with this small PHP script that pulls in ASN, country and IP data. Most of the time the script fails to execute. running up the cli and issuing a SAVE will generate the dump
`<?php
$redis = new Redis();
$redis->connect("127.0.0.1");
function ip2long32($ip) {
return ( ip2long($ip) & 0xFFFFFFFF );
}
$handle = fopen("ip2asn-combined.tsv", "r");
if (!$handle) {
\n");
}
$redis->flushall();
$m=$redis->pipeline();
echo "Reading ip data...\n";
while (($line = fgetcsv($handle, 0 , "\t")) !== false) {
}
echo "Executing...";
$m->exec();
echo "Done...\n";
fclose($handle);`
Rerunning the script will randomly throw this exception as well
PHP Fatal error: Uncaught RedisException: MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk. Commands that may modify the data set are disabled, because this instance is configured to report errors during writes if RDB snapshotting fails (stop-writes-on-bgsave-error option). Please check the Redis logs for details about the RDB error. in iset.php:24
The text was updated successfully, but these errors were encountered: