You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What features are being used to make the classification?
_Type of Classification
Description
Example_
Categorical (Nominal)
Classification of entities into particular categories.
That thing is a dog.That thing is a car.
Ordinal
Classification of entities in some kind of ordered relationship.
You are stronger than him.It is hotter today than yesterday.
Adjectival or Predicative
Classification based on some quality of an entity.
That car is fast.She is smart.
Cardinal
Classification based on a numerical value.
He is six feet tall.It is 25.3 degrees today.
Categorical classification is also called nominal classification because it classifies an entity in terms of the name of the class it belongs to. This is the type of classification we focus on in this document.
Why are those features important?
Let’s imagine that you’ve landed a consulting gig with a bank who have asked you to identify those who have a high likelihood of default on the next month’s bill. Armed with the machine learning techniques that you’ve learnt and practiced, let’s say you proceed to analyze the data set given by your client and have used a random forest algorithm that achieves a reasonably high accuracy. Your next task is to present to the business stakeholders from the client’s team how you achieved these results. What would you say to them? Will they be able to understand all the hyperparameters of the algorithm that you tweaked in order to land on your final model? How will they react when you start talking about the number of estimators and Gini criterion of the random forest?
Although it is important to be proficient in understanding the inner workings of the algorithm, it is far more essential to be able to communicate the findings to an audience who may not have any theoretical / practical knowledge of machine learning. Just showing that the algorithm predicts well is not enough. You have to attribute the predictions to the elements of the input data that contribute to your accuracy. Thankfully, the random forest implementation of sklearn does give an output called “feature importances” which helps us explain the predictive power of the features in the dataset. But, there are certain drawbacks to this method that we will explore in this post, and an alternative technique to assess the feature importances that overcomes these drawbacks.
What does that say about the problem domain?
A problem domain is the area of expertise or application that needs to be examined to solve a problem. A problem domain is simply looking at only the topics of an individual's interest, and excluding everything else. For example, when developing a system to measure good practice in medicine, carpet drawings at hospitals would not be included in the problem domain. In this example, the domain refers to relevant topics solely within the delimited area of interest: medicine. This points to a limitation of an overly specific, or overly bounded, problem domain. An individual may think they are interested in medicine and not interior design, but a better solution exists outside of the problem domain as it was initially conceived. For example, when IDEO researchers noticed that patients in hospitals spent a huge amount of time staring at acoustic ceiling tiles, which "became a symbol of the overall ambiance: a mix of boredom and anxiety from feeling lost, uninformed, and out of control."
The text was updated successfully, but these errors were encountered:
What features are being used to make the classification?
Categorical classification is also called nominal classification because it classifies an entity in terms of the name of the class it belongs to. This is the type of classification we focus on in this document.
Why are those features important?
Let’s imagine that you’ve landed a consulting gig with a bank who have asked you to identify those who have a high likelihood of default on the next month’s bill. Armed with the machine learning techniques that you’ve learnt and practiced, let’s say you proceed to analyze the data set given by your client and have used a random forest algorithm that achieves a reasonably high accuracy. Your next task is to present to the business stakeholders from the client’s team how you achieved these results. What would you say to them? Will they be able to understand all the hyperparameters of the algorithm that you tweaked in order to land on your final model? How will they react when you start talking about the number of estimators and Gini criterion of the random forest?
Although it is important to be proficient in understanding the inner workings of the algorithm, it is far more essential to be able to communicate the findings to an audience who may not have any theoretical / practical knowledge of machine learning. Just showing that the algorithm predicts well is not enough. You have to attribute the predictions to the elements of the input data that contribute to your accuracy. Thankfully, the random forest implementation of sklearn does give an output called “feature importances” which helps us explain the predictive power of the features in the dataset. But, there are certain drawbacks to this method that we will explore in this post, and an alternative technique to assess the feature importances that overcomes these drawbacks.
What does that say about the problem domain?
A problem domain is the area of expertise or application that needs to be examined to solve a problem. A problem domain is simply looking at only the topics of an individual's interest, and excluding everything else. For example, when developing a system to measure good practice in medicine, carpet drawings at hospitals would not be included in the problem domain. In this example, the domain refers to relevant topics solely within the delimited area of interest: medicine. This points to a limitation of an overly specific, or overly bounded, problem domain. An individual may think they are interested in medicine and not interior design, but a better solution exists outside of the problem domain as it was initially conceived. For example, when IDEO researchers noticed that patients in hospitals spent a huge amount of time staring at acoustic ceiling tiles, which "became a symbol of the overall ambiance: a mix of boredom and anxiety from feeling lost, uninformed, and out of control."
The text was updated successfully, but these errors were encountered: