Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Multi DB #20305

Open
wants to merge 41 commits into
base: master
Choose a base branch
from
Open

Enable Multi DB #20305

wants to merge 41 commits into from

Conversation

Pan-XT
Copy link

@Pan-XT Pan-XT commented Sep 19, 2024

Why I did it

Enable Multi DB

How I did it

modify [dockers/docker-database/Dockerfile.j2] and add [multi_database_config.json.j2]
modify [dockers/docker-database/critical_processes.j2] and [dockers/docker-database/supervisord.conf.j2] to resolve the formatting issues caused by multiple Redis processes.
modify [dockers/docker-database/docker-database-init.sh] to enable Multi DB if /etc/sonic/enable_multidb exists and ensure that the redis service has the necessary permissions to access /var/lib/$inst
modify [Makefile.work] [slave.mk] [/files/build_templates/sonic_debian_extension.j2 ] for setting ENABLE_MULTIDB=y when building SONiC image.

@eddieruan-alibaba
Copy link
Collaborator

Double committed to Phoenixwing eddieruan-alibaba@9f9f4d8

and verified on vSONiC

admin@PE3:~$ ps aux | grep redis
message+ 1011 1.4 0.2 86764 22372 pts/0 Sl 04:30 0:31 /usr/bin/redis-server 127.0.0.1:6379
message+ 1012 0.6 0.2 77548 22316 pts/0 Sl 04:30 0:14 /usr/bin/redis-server 127.0.0.1:6380
message+ 1015 0.5 0.2 64748 17696 pts/0 Sl 04:30 0:11 /usr/bin/redis-server 127.0.0.1:6381
message+ 1016 1.9 0.3 71404 22532 pts/0 Sl 04:30 0:43 /usr/bin/redis-server 127.0.0.1:6382
message+ 1018 0.2 0.2 62188 16456 pts/0 Sl 04:30 0:05 /usr/bin/redis-server 127.0.0.1:6383

@eddieruan-alibaba
Copy link
Collaborator

/azpw ms_conflict

@eddieruan-alibaba
Copy link
Collaborator

@StormLiangMS, @liushilongbuaa Can you trigger the following command?

Github Branch not ready
Please wait a few minutes to run again!
'/azpw ms_conflict'

Copy link
Collaborator

@eddieruan-alibaba eddieruan-alibaba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is enabled in Phoenix wing and verified in Phoenix wing's daily sanity.

http://phoenixwing.com.cn/vsonic

@eddieruan-alibaba
Copy link
Collaborator

Validated with phoenix wing vsonic image

admin@PE3:$ ls -l /etc/sonic/enable_multidb
-rw-r--r-- 1 root root 0 Oct 11 01:46 /etc/sonic/enable_multidb
admin@PE3:
$ ps aux | grep redis
message+ 990 0.8 0.5 86764 39228 pts/0 Sl 06:28 7:06 /usr/bin/redis-server 127.0.0.1:6379
message+ 991 0.8 0.2 71404 21248 pts/0 Rl 06:28 6:50 /usr/bin/redis-server 127.0.0.1:6380
message+ 992 0.3 0.2 64748 17384 pts/0 Sl 06:28 2:28 /usr/bin/redis-server 127.0.0.1:6381
message+ 994 2.9 0.3 71404 24000 pts/0 Sl 06:28 23:44 /usr/bin/redis-server 127.0.0.1:6382
message+ 996 0.1 0.2 62188 16904 pts/0 Sl 06:28 1:22 /usr/bin/redis-server 127.0.0.1:6383
admin 111122 0.0 0.0 6868 648 pts/0 S+ 19:51 0:00 grep redis

@kperumalbfn
Copy link
Contributor

kperumalbfn commented Nov 4, 2024

@saiarcot895 @kcudnik could you review this PR

@kcudnik
Copy link
Contributor

kcudnik commented Nov 5, 2024

@saiarcot895 @kcudnik could you review this PR

Can you fix build errors ?

@eddieruan-alibaba
Copy link
Collaborator

@saiarcot895 @kcudnik could you review this PR

Can you fix build errors ?

The build failure was due to yesterday's connection outage

2024-11-04T08:09:16.9633070Z get_url_version https://sonicstorage.blob.core.windows.net/public/fips/bookworm/1.4.3-1/armhf/symcrypt-openssl_1.4.3-1_armhf.deb failed
2024-11-04T08:09:16.9633607Z make[1]: *** [Makefile:12: /sonic/target/debs/bookworm/symcrypt-openssl_1.4.3-1_armhf.deb] Error 1

Could someone help to trigger a rebuild? @kperumalbfn @kcudnik

},
"redis1": {
"hostname" : "{{HOST_IP}}",
"port" : 6380,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The port selection conflicts with both the chassis DB and Smart Switch DB service and must be changed. Port 6380 is used by the chassis Redis server, while ports 6381-6388 are assigned to the DPU databases.

Copy link
Collaborator

@eddieruan-alibaba eddieruan-alibaba Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oleksandrivantsiv where does chassis DB's port get defined?

This multi-db approach is enabled only on the platform which needs performance and controlled via a compile flag. ENABLE_MULTIDB=y The detail information could be found at https://lists.sonicfoundation.dev/g/sonic-wg-routing/wiki/37961

During the discussion in routing WG, we feel there is NO use case to enable it for chassis based platform for the following reasons.

  1. The scale for chassis based approach is limited. MSFT uses it as T1, and Alibaba uses it as gateway. The benefit is less.
  2. Even chassis case wants to use it later. It may only wants to enable it for main switch asic, not enable all 5 for each DPU. That means we need to create a new j2 for that case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with Guahan on this one. 6380 will be kept for Chassis DB. It is not easy to change chassis's port number.

We use 6379, 6378, 6377, ... a.k.a going down for new redis instances.

@Pan-XT can you update your diff?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update redis1,reids2,redis3,redis4 port number to 6378,6377,6376,6375

@@ -31,6 +31,8 @@ dependent_startup=true
{% if INSTANCES %}
{% for redis_inst, redis_items in INSTANCES.items() %}
{%- if redis_inst != 'remote_redis' %}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest remove unnecessary change in this file

Copy link
Author

@Pan-XT Pan-XT Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is the same as #20545 , so I remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#Closed

"unix_socket_path" : "/var/run/redis{{DEV}}/redis.sock",
"persistence_for_warm_boot" : "yes"
{% if DATABASE_TYPE is defined and DATABASE_TYPE != "" %}
,"database_type": "{{DATABASE_TYPE}}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find code in sonic-swss-common to parse and handle 'database_type', is there a PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I search that PR, not find any code read "database_type"
This attribute is an optional attribute, and this file need be parsed by SonicDBConfig::parseDatabaseConfig
https://github.com/sonic-net/sonic-swss-common/blob/master/common/dbconnector.cpp

Otherwise, sonic-swss-common lib can't handle multi-DB, so please make sure sonic-swss-common also change to handle it.

Copy link
Author

@Pan-XT Pan-XT Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liuh-80

This field was added via
9f08f88
by @Pterosaur

It is current SONiC behavior
https://github.com/sonic-net/sonic-buildimage/blob/master/dockers/docker-database/database_config.json.j2#L10

Can you help to check with him to see what the need for this field?

"DPU_APPL_DB" : {
"id" : 15,
"separator": ":",
"instance" : {% if include_remote_db %} "remote_redis" {% else %} "redis" {% endif %},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest define a variable for all these dupe template

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the multi_database_config.json.j2 is consistent with the database_config.json.j2, except for the added content related to redis1,redis2,redis3,redis4. Defining a variable for all these dupe templates will be implemented in future modifications.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#Closed

@@ -125,6 +129,12 @@ do
else
echo -n > /var/lib/$inst/dump.rdb
fi

chown -R redis:redis /var/lib/$inst
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments to this file, help other understand.
Also, this seems not related with enable multi-DB, can this move to another PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the necessary permissions for enable_multidb, see /etc/supervisor/conf.d/supervisord.conf.j2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Pan-XT , can you add your comments to this file? that will help other understand the propose of this code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,I add commnets “the Redis process is operating under the 'redis' user in supervisord and make redis user own /var/lib/$inst inside db container.”

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#Closed

@@ -1,5 +1,6 @@
{% if INSTANCES %}
{% for redis_inst, redis_items in INSTANCES.items() %}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this unnecessary change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no blank line here, all program:{{ redis_inst }} of critical_processes will be concentrated on one line, which will cause subsequent regular expression matching to fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does changing {%- endfor %} to {% endfor %} work? That will make the intention a bit clearer.

and revert chmod -f 0760 $sock_file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants