I received a task asking me to analysis data for inflammatory bowel disease patients.
The data contains information about 33K patients The data columns as follow
- pt_ID: unique ID for each patient
- gender: integer value indicates the gender of each patient
- 1 means male
- 2 means female
- diagnosis: integer value indicates the type of IBD
- 1 means Crohn's
- 2 means UC
- 3 means Unspecified
- 4 means Other
- hosp_admin: integer value indicates if the patient recorded in a hospital or not
- 1 means Yes
- 2 means No
- 3 means Unknown
- age_band_jan1: the age value for each patient
- bmi: contains the body mass for each patient
- smoker: is the patient smoker or not
- past_smoker: is the patient was smoker or not
- diet_preference: which diet the patient follows
- cancer: is the patient has cancer or not
- heart_disease: is the patient has heart disease or not
The task requires me to answer some questions
- What is the percentage of each BMI category in each age group
- Which type of IBD tends to have sufferers of a higher BMI?
- Which type of IBD has more sufferers that smoke or previously smoked?
- Which type of IBD has more cancer sufferers?
- Which type of IBD has more heart disease sufferers?