Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New p4orch development changes #3066

Merged
merged 2 commits into from
May 15, 2024
Merged

Conversation

mint570
Copy link
Contributor

@mint570 mint570 commented Feb 29, 2024

New p4orch development changes.

This is the first PR for upstreaming the recent P4Orch changes (has been a while since the last time).
The main changes include:

  1. Add new status code SWSS_RC_NOT_EXECUTED
  2. Enable redis pipeline in response publisher for P4Orch (performance improvement)
  3. Enable background thread of write to db in response publisher for P4Orch (performance improvement)
  4. P4Orch writes to APPL DB instead of APPL STATE DB for responses (prepare for the zmq change, APPL DB will be used as APPL STATE DB in P4Orch for performance improvement)

More details on the P4Orch response path changes (changes are only done on P4Orch, other orchs has no behavior changes):

  1. Enable P4Orch to use redis pipeline in response publisher.
    The redis pipeline feature in response publisher is done by the SONiC communicate to improve performance. We enabled P4Orch to use the pipeline feature. This will increase performance in batched request processing. This change has no behavior changes.

  2. Enable background thread in response publisher to write to DB for P4Orch.
    This change will move the APPL STATE DB write into a background thread, which will improve performance. This change has no behavior changes. The original change was done for APPL STATE DB since the response path writes into the APPL STATE DB. With the next change, APPL STATE DB is deprecated and replaced with APPL DB. So the overall change is that the P4Orch response path will write into APPL DB in a background thread.

  3. P4Orch writes to APPL DB instead of APPL STATE DB in response path.
    The APPL STATE DB is supposed to have successful entries programmed in the ASIC, while the APPL DB has the intend. However, for P4 tables, P4RT needs to clean up APPL DB to match with APPL STATE DB to keep them in sync when a request fails. This is done for warmboot purpose as we don't want to program the failed intend during warmboot. So for P4RT, the APPL STATE DB is redundant information. This change will replace APPL STATE DB with APPL DB. In later PR, we will also upstream the change for P4Orch to enable zmq. This will reduce half of the DB write operations. This change will require the later P4RT change to be in sync.

@mint570 mint570 requested a review from prsunny as a code owner February 29, 2024 04:47
@mint570
Copy link
Contributor Author

mint570 commented Feb 29, 2024

This PR requires sonic-net/sonic-swss-common#828

@mint570 mint570 force-pushed the new_p4orch_upstream branch from 090bf88 to 048cc63 Compare March 6, 2024 03:11
@mint570 mint570 force-pushed the new_p4orch_upstream branch from 048cc63 to 75ac844 Compare March 19, 2024 21:21
@mint570 mint570 mentioned this pull request Mar 20, 2024
@mint570 mint570 force-pushed the new_p4orch_upstream branch from 75ac844 to ca95fe9 Compare April 1, 2024 22:54
@mint570 mint570 force-pushed the new_p4orch_upstream branch 2 times, most recently from f263fcf to df4e004 Compare April 9, 2024 00:42
@@ -64,26 +64,52 @@ void RecordResponse(const std::string &response_channel, const std::string &key,

} // namespace

ResponsePublisher::ResponsePublisher(bool buffered)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide the details of response_publisher in the description? What was the behavior before and whats the new change? Is it breaking any previous implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR description.

@prsunny
Copy link
Collaborator

prsunny commented Apr 15, 2024

Why change the return path to APP_DB instead of APPL_STATE_DB?

@mint570
Copy link
Contributor Author

mint570 commented Apr 15, 2024

Why change the return path to APP_DB instead of APPL_STATE_DB?

I updated the PR description.
This is to prepare for the zmq change that we are going to upstream. P4Orch will only update APPL DB (in a background thread) on response path. APPL STATE DB is redundant for P4Orch (as we need to keep them in sync for warmboot purpose).

@prsunny
Copy link
Collaborator

prsunny commented Apr 22, 2024

@qiluo-msft , could you please review the response publish section? As per description there is no change to other orch.

@prsunny prsunny requested a review from qiluo-msft April 22, 2024 17:21
@mint570 mint570 force-pushed the new_p4orch_upstream branch 2 times, most recently from 8066625 to 03f0722 Compare April 24, 2024 17:40
@mint570
Copy link
Contributor Author

mint570 commented Apr 24, 2024

Got vs test failure on fabric port test.

@mint570
Copy link
Contributor Author

mint570 commented Apr 25, 2024

Got vs test failure on fabric port test.

Looks like this is not related to my change. I tried a test PR with no change it still failed the vs test.

@mint570 mint570 force-pushed the new_p4orch_upstream branch from 03f0722 to 4e81549 Compare April 26, 2024 16:32
@prsunny
Copy link
Collaborator

prsunny commented Apr 26, 2024

@mint570 , there has been a VS issue in buildimage and is fixed now. I'll trigger all PRs to rerun

@mint570 mint570 force-pushed the new_p4orch_upstream branch 3 times, most recently from 9c01751 to fa0efe5 Compare April 30, 2024 23:32
@qiluo-msft
Copy link
Contributor

@bocon13, could you please review the response publish section?

// Thread to write to DB.
std::unique_ptr<std::thread> m_update_thread;
std::queue<entry> m_queue;
mutable std::mutex m_lock;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mutable

why mutable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a common practice for mutex object.

The key word "mutable" means that the variable can be changed in a const method.
If we have a method to read a variable that is protected by a mutex, it is nature to declare the method as const since it only does read operation. But the mutex object needs to be mutable for the read method to get the lock.

In this case, we might not really need it to be mutable. But we should follow the common practice.

Copy link
Contributor

@qiluo-msft qiluo-msft May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might not really need it to be mutable -> let's just remove it.

Thanks for the explanation! (not a blocking issue)

@@ -57,8 +61,29 @@ class ResponsePublisher : public ResponsePublisherInterface
void setBuffered(bool buffered);

private:
struct entry
Copy link
Contributor

@qiluo-msft qiluo-msft May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entry

Where can I find HLD? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just internal implementation details. We don't have HLD for this.

The overall design is to put the DB update operation into a different thread for the response publisher.

In the detailed implementation here, we use a FIFO queue to store the DB update events. The main thread will queue up the event into the queue, and the "DB update thread" will read from the queue and process the DB update.
The "entry" struct here is the "DB update event".

Let me know if you need more details on this.

}
if (e.flush)
{
m_pipe->flush();
Copy link
Contributor

@qiluo-msft qiluo-msft May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_pipe

You should not share DBConnector between threads. It is not thread-safe. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pipe object is only used in the "DB update thread". It is not used in the main thread beside constructor.

The DB write operations are completed handled by the "DB update thread" if thread mode is enabled.
(If thread mode is disabled, the main thread will update the DB. There is no "DB update thread" in that case.)

@mint570 mint570 force-pushed the new_p4orch_upstream branch 2 times, most recently from f3af0da to 9155503 Compare May 2, 2024 20:47
@mint570 mint570 force-pushed the new_p4orch_upstream branch 3 times, most recently from fb27335 to 883fd32 Compare May 13, 2024 16:38
mint570 added 2 commits May 14, 2024 16:42
Change-Id: I867ce9d1d4a641b35493fb81d60813083d4404f9
Change-Id: Ib028780e726a9480e6049514bc776ccf27b84496
@mint570 mint570 force-pushed the new_p4orch_upstream branch from 883fd32 to a43e01a Compare May 14, 2024 23:42
@@ -241,26 +235,6 @@ def test_AclRulesAddUpdateDelPass(self, dvs, testlog):
assert status == True
util.verify_attr(fvs, attr_list)

# query application state database for ACL tables
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since its deleted, where is the check for APPL_STATE_DB entries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P4Orch will no longer write to APPL_STATE_DB, so there is no APPL_STATE_DB entries for P4RT table.
(P4Orch writes to APPL_DB instead, which is no change at the moment as APPL_DB is already written when orchagent pops the request entries.)

But P4Orch will still send the response to the Redis channel, which is checked and verified in util.verify_response().
And the test also checks the APPL_DB entries. For now it doesn't mean much. But in the future when we upstream the ZMQ changes, P4Orch will not write to APPL_DB in pop; it will write to APPL_DB in response path instead. At that time the P4RT table in APPL_DB will represent the state (not the intent).

@qiluo-msft
Copy link
Contributor

LGTM.

@prsunny prsunny merged commit c36333c into sonic-net:master May 15, 2024
17 checks passed
@mint570 mint570 deleted the new_p4orch_upstream branch May 15, 2024 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants