The aim is to create a knowledge database around the CRISP (DM/ML(Q)) Framework, by setting up the CRISP Phases in Graph, so that each action becomes a node. Eventually, best practies will be collected and added as further nodes to each action node. This collection is planed to be executed both by asking industry experts for best practices, as well as scraping and analysing scientific papers.
- Business Needs
- Purpose and Success Criteria from a business point of view
- Define the minimum acceptable level of performance to meet the business goals
- Objective ML Success by Introduction of KPI(s)
- Robustness
- Scalability
- Explainability
- Resource Demand
- In terms of Cost and Time needed to collect enough consistent data
- Data is collected iteratively, hence (planned) modifications of the dataset should be documented
- Expert Knowledge regarding datasets like expected value ranges of features, maximum number of missing values. Guide to identify non-plausible data
- Initial data, added data & production data must be checked according to the requirements
- Filter Methods
- Wrapper Methods
- Embedded Methods
- It would be beneficial to have someone with expert knowledge review it again.
- Discarding features/samples should be well documented and strictly based on objective quality criteria.
- Robustness
- Explainability
- Scalability
- Resource Demand
- Model Complexity
- Experimental Documentation
- Develop a plan to validate performance.
1.Non-stationary data distributions/data drift 2.Degradation of hardware 3.System updates
1.Monitor all input signals compared to training data to catch updates in the input data. 2.Determine actions for anomalies in the input. 3.Monitor History of Performance