Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Alert] Support PagerDuty Plugin && Alert module judging strategy #7992

Closed
3 tasks done
qingwli opened this issue Jan 13, 2022 · 11 comments · Fixed by #8636
Closed
3 tasks done

[Feature][Alert] Support PagerDuty Plugin && Alert module judging strategy #7992

qingwli opened this issue Jan 13, 2022 · 11 comments · Fixed by #8636
Assignees
Labels
discussion discussion feature new feature

Comments

@qingwli
Copy link
Member

qingwli commented Jan 13, 2022

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

Hi, All.

We want to add an alert plugin named PagerDuty for DolphinScheduler. And we would like to contribute to the community.

There is one question that needs to be discussed.

There is two alert strategies success & fail for process or task. One record will be inserted in t_ds_alert when triggered. And the record doesn't contain the alert strategy. Other alert plugins don't care about alert strategy, they all send the title and content to where they are configured.

And there is a param named severity in PagerDuty API, which in the official documentation means :

The perceived severity of the status the event is describing with respect to the affected system. This can be critical, error, warning or info.

We think that when somebody configured the alert strategy of process or task is success, it means when the process or task succeeded, they just want a message, this situation should use severity: info calls the PagerDuty API.

When they configured the strategy as fail, calling the PagerDuty Api with critical or error is better.

But if support this case, we need to change the code. Add a column in the t_ds_alert table and change the logic insert the strategy field in this table when the alert strategy is triggered.

Use case

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@qingwli qingwli added feature new feature Waiting for reply Waiting for reply labels Jan 13, 2022
@github-actions
Copy link

Hi:

@CalvinKirs
Copy link
Member

Hi, I have a question, is this a plugin? and if so, what is its main function on the plugin side? Or is it just that the granularity of alert services is more detailed?

@CalvinKirs CalvinKirs added discussion discussion and removed Waiting for reply Waiting for reply labels Jan 13, 2022
@qingwli
Copy link
Member Author

qingwli commented Jan 13, 2022

It's just an alert plugin that has more detailed granularity.

@CalvinKirs
Copy link
Member

It's just an alert plugin that has more detailed granularity.

Do we need to specify it when the task is executed? This granularity seems to be the general granularity of alert, right?

@qingwli
Copy link
Member Author

qingwli commented Jan 13, 2022

DolphinScheduler has specified the refined notification strategy when the task is executed.
We want to bind DolphinScheduler to PagerDuty's granularity.
DolphinScheduler's success -> PagerDuty's info
DolphinScheduler's fail -> PagerDuty's critical or error

@qingwli
Copy link
Member Author

qingwli commented Jan 17, 2022

There is a situation in our scenario. Defined a task and when the task succeeds, send a message to the WeChat group, and call the owner via PagerDuty when the task fails. Call different notification plug-ins with different task statuses.

DolphinScheduler is very suitable for this kind of scenario. It relies on saving the alert strategy in t_ds_alert when triggered like above.

1. Here is the design:

1. Add alert_strategy in t_ds_alert table

alert_strategy tinyint(4) NOT NULL COMMENT 'Alert strategy: 0 process is successfully, 1 process is failed'.

Save the strategy when the alert is triggered.

2. Add alert_strategy in t_ds_alert_plugin_instance table

alert_strategy tinyint(4) NOT NULL COMMENT 'Alarm strategy: 1 process is sent successfully, 2 process is sent failed, 3 process is sent successfully and all failures are sent'.

Add strategy choice when creating alarm instance. The default value is 3 processes is sent successfully and all failures are sent. For compatibility with older versions.

3. Change the code to bind the alert alarm instance strategy to process strategy

Bind alarm instance's strategy to t_ds_process_instance's warning_type and t_ds_schedules's warning_type.

When a process with a configured notification strategy triggers an alert. A new record will insert in t_ds_alert table saving the triggered strategy, whether the process is successful or failure. The configured warning group will be triggered. The alarm instance will be triggered one by one.

And there is a new logic. Only process's strategy is matched alarm instance's strategy. The alert will be sent, otherwise, the alert will be ignored.

4. For example

When the process's warning_type is 1 process is sent successfully.

And the alarm group has three alarm instances a & b & c. The a's alert_strategy is 1 process is sent successfully. The b's alert_strategy is 2 process is sent failed. The c's alert_strategy is 3 process is sent successfully and all failures are sent.

  • When the process is successful, the alarm instance a and c will send, the alarm instance b will ignore.
  • When the process fails, the alarm instances b and c will send, the alarm instance a will ignore.

2. Here is the plan:

We want to part this solution into three-part.

  1. Add PargerDuty plugin and call API with severity: critical as above
  2. Add alert_strategy column in t_ds_alert table & Save the alert strategy when triggered
  3. Add alert_strategy column in t_ds_alert_plugin_instance table & Change the logic to match strategy

@qingwli qingwli changed the title [Feature][Alter] Support PagerDuty Plugin [Feature][Alert] Support PagerDuty Plugin Jan 17, 2022
@qingwli qingwli changed the title [Feature][Alert] Support PagerDuty Plugin [Feature][Alert] Support PagerDuty Plugin && Alert module judging strategy Jan 17, 2022
@CalvinKirs
Copy link
Member

There is a situation in our scenario. Defined a task and when the task succeeds, send a message to the WeChat group, and call the owner via PagerDuty when the task fails. Call different notification plug-ins with different task statuses.

DolphinScheduler is very suitable for this kind of scenario. It relies on saving the alert strategy in t_ds_alert when triggered like above.

1. Here is the design:

1. Add alert_strategy in t_ds_alert table

alert_strategy tinyint(4) NOT NULL COMMENT 'Alert strategy: 0 process is successfully, 1 process is failed'.

Save the strategy when the alert is triggered.

2. Add alert_strategy in t_ds_alert_plugin_instance table

alert_strategy tinyint(4) NOT NULL COMMENT 'Alarm strategy: 1 process is sent successfully, 2 process is sent failed, 3 process is sent successfully and all failures are sent'.

Add strategy choice when creating alarm instance. The default value is 3 processes is sent successfully and all failures are sent. For compatibility with older versions.

3. Change the code to bind the alert alarm instance strategy to process strategy

Bind alarm instance's strategy to t_ds_process_instance's warning_type and t_ds_schedules's warning_type.

When a process with a configured notification strategy triggers an alert. A new record will insert in t_ds_alert table saving the triggered strategy, whether the process is successful or failure. The configured warning group will be triggered. The alarm instance will be triggered one by one.

And there is a new logic. Only process's strategy is matched alarm instance's strategy. The alert will be sent, otherwise, the alert will be ignored.

4. For example

When the process's warning_type is 1 process is sent successfully.

And the alarm group has three alarm instances a & b & c. The a's alert_strategy is 1 process is sent successfully. The b's alert_strategy is 2 process is sent failed. The c's alert_strategy is 3 process is sent successfully and all failures are sent.

  • When the process is successful, the alarm instance a and c will send, the alarm instance b will ignore.
  • When the process fails, the alarm instances b and c will send, the alarm instance a will ignore.

2. Here is the plan:

We want to part this solution into three-part.

  1. Add PargerDuty plugin and call API with severity: critical as above
  2. Add alert_strategy column in t_ds_alert table & Save the alert strategy when triggered
  3. Add alert_strategy column in t_ds_alert_plugin_instance table & Change the login to match strategy

+1, Good job

@lenboo
Copy link
Contributor

lenboo commented Jan 17, 2022

@liqingwang Good job!

Add alert_strategy in t_ds_alert table

According to my understanding, this field should be "process/task state"?
When process/task finish, alert can only be triggered once, and 't_ds_alert' will only have one data about this process/task.

  1. Change the code to bind the alert alarm instance strategy to process strategy
    Bind alarm instance's strategy to t_ds_process_instance's warning_type and t_ds_schedules's > warning_type.

When a process with a configured notification strategy triggers an alert. A new record will insert > in t_ds_alert table saving the triggered strategy, whether the process is successful or failure. The > configured warning group will be triggered. The alarm instance will be triggered one by one.

And there is a new logic. Only process's strategy is matched alarm instance's strategy. The alert > will be sent, otherwise, the alert will be ignored._

The alert instances can be sent according to state and strategy.
At the same time, we need to consider not only success and failure, but also others, such as pause and stop, and possible increase in the future

@qingwli
Copy link
Member Author

qingwli commented Jan 18, 2022

@lenboo Thanks for your reply!

Add alert_strategy in t_ds_alert table
According to my understanding, this field should be "process/task state"?

Yes, you are right. Maybe change the COMMENT to below is more clear.
alert_strategy tinyint(4) NOT NULL COMMENT 'Alert strategy: 0 process/task is successfully, 1 process/task is failed'.

When process/task finish, alert can only be triggered once, and 't_ds_alert' will only have one data about this process/task.

That's right. Only one record will be inserted in 't_ds_alert' when process/task is finished. But process/task binds the alert group, and multiple instances of multiple alert methods can be created in an alert group. The design document is here. The source code is here.

At the same time, we need to consider not only success and failure, but also others, such as pause and stop, and possible increase in the future

Yes, I agree with you. And I think we can do step-by-step iterations. In the further, we can support more scenes, such as pause and stop you said, and maybe online and offline. The current architecture is supported.

@qingwli
Copy link
Member Author

qingwli commented Jan 19, 2022

The first part is ready to be reviewed. #8120

@qingwli
Copy link
Member Author

qingwli commented Mar 1, 2022

Hi All.

I have finished the code and the design needs to be changed a bit.

1. Add warning_type in t_ds_alert table

ALTER TABLE `t_ds_alert` ADD COLUMN `warning_type` tinyint(4) DEFAULT '2' COMMENT '1 process is successfully, 2 process/task is failed';

Save the warning type when the alert is triggered. And the default value is 2.

The reason why use warning_type because of we have this enum warning_type in the code, and we don't need to create the new enum.

And in the future when we want to add new alert situation we just needs to change one place code.

2. Don't need to add column in t_ds_alert_plugin_instance table

I add a new input param named warningType in AlertPluginManager class. And installPlugin function will add this params to all plugins.

Has three choice, success fail all. The default value is all.

The judging strategy is same as above, when the plugin params has no warningType field and the default value is all to compatible with historical logic.

And I merged Plan 2 Save the alert strategy when triggered & Plan 3 Change the login to match strategy.

Please to be reviewed. #8636

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion discussion feature new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants