[data-colorid=mr5bz5wdao]{color:#bf2600} html[data-color-mode=dark] [data-colorid=mr5bz5wdao]{color:#ff6640}[data-colorid=qilzv6b9qc]{color:#ff5630} html[data-color-mode=dark] [data-colorid=qilzv6b9qc]{color:#cf2600}[data-colorid=hisiwhv6cc]{color:#bf2600} html[data-color-mode=dark] [data-colorid=hisiwhv6cc]{color:#ff6640}[data-colorid=bh1towvo59]{color:#bf2600} html[data-color-mode=dark] [data-colorid=bh1towvo59]{color:#ff6640}[data-colorid=zb0xpg0375]{color:#ff5630} html[data-color-mode=dark] [data-colorid=zb0xpg0375]{color:#cf2600}[data-colorid=z31xsyttsu]{color:#ff5630} html[data-color-mode=dark] [data-colorid=z31xsyttsu]{color:#cf2600}[data-colorid=xnzgmg3vbw]{color:#ff5630} html[data-color-mode=dark] [data-colorid=xnzgmg3vbw]{color:#cf2600}[data-colorid=mcumx5ysfw]{color:#ff5630} html[data-color-mode=dark] [data-colorid=mcumx5ysfw]{color:#cf2600}[data-colorid=rb7jg0wu4u]{color:#ff5630} html[data-color-mode=dark] [data-colorid=rb7jg0wu4u]{color:#cf2600}[data-colorid=faaqfxh949]{color:#bf2600} html[data-color-mode=dark] [data-colorid=faaqfxh949]{color:#ff6640}[data-colorid=zi1jtfpncn]{color:#ff5630} html[data-color-mode=dark] [data-colorid=zi1jtfpncn]{color:#cf2600}[data-colorid=abjqq4wub2]{color:#ff5630} html[data-color-mode=dark] [data-colorid=abjqq4wub2]{color:#cf2600}[data-colorid=eu6av933ww]{color:#bf2600} html[data-color-mode=dark] [data-colorid=eu6av933ww]{color:#ff6640}[data-colorid=s78qiewozi]{color:#bf2600} html[data-color-mode=dark] [data-colorid=s78qiewozi]{color:#ff6640}[data-colorid=uaqeoynofg]{color:#ff5630} html[data-color-mode=dark] [data-colorid=uaqeoynofg]{color:#cf2600}[data-colorid=z9d8osui3r]{color:#ff5630} html[data-color-mode=dark] [data-colorid=z9d8osui3r]{color:#cf2600}[data-colorid=nthibw39rq]{color:#ff5630} html[data-color-mode=dark] [data-colorid=nthibw39rq]{color:#cf2600}[data-colorid=th1tatf7r9]{color:#ff5630} html[data-color-mode=dark] [data-colorid=th1tatf7r9]{color:#cf2600}[data-colorid=x9kalryowd]{color:#ff5630} html[data-color-mode=dark] [data-colorid=x9kalryowd]{color:#cf2600}
//
Difference between Train and Test
| CV Name | Train Percentage of Null Values | Test Percentage of Null Values | Difference |
| CV_GFR_latest | 76.5 | 3.4 | 73.1 |
| CV_MAP_cuff_mean | 38.8 | 18.3 | 20.5 |
| CV_MAP_cuff_max | 38.8 | 18.3 | 20.5 |
| CV_MAP_cuff_min | 38.8 | 18.3 | 20.5 |
| CV_MAP_cuff_latest | 27.9 | 12.3 | 15.6 |
- Drop these because it has a high difference between train and test dataset
Null Percentage
- Drop temperature_skin and troponin based on high null %
| CV | Train Percentage of Null Values | Test Percentage of Null Values | Difference |
| CV_temperature_skin_latest | 99.8 | 99.9 | -0.1 |
| CV_troponin_HS_latest | 97.8 | 95.4 | 2.4 |
| CV_temperature_bladder_latest | 85.2 | 86.3 | -1.1 |
| CV_PO2_art_istat_latest | 85.1 | 88.3 | -3.2 |
| CV_TCO2_art_latest | 85.1 | 88.3 | -3.2 |
| CV_PCO2_art_latest | 85.1 | 88.2 | -3.1 |
| CV_pH_art_istat_latest | 85.1 | 88.2 | -3.1 |
| CV_HCO3_latest | 85.1 | 88.1 | -3 |
| CV_FiO2_latest | 85.1 | 88.1 | -3 |
| CV_base_excess_istat_latest | 85.1 | 88.1 | -3 |
| CV_MAP_invasive_latest | 84.7 | 86.8 | -2.1 |
# Feature Correlation Matrix
-
Manually look at highly correlated features (defined by abs(correlation) >= 0.7)
- Used Spearman correlation (instead of Pearson) because relationship may not be linear.
-
To Do
-
Remove time series feature engineering for GCS components (motor_response, verbal_response, eye_opening)
- The min, max, mean, latest tend to be highly correlated
- However, I think it’s a good idea to keep min
-
Remove GCS as a feature
- Since we have the individual components – having GCS does not add more information
-
Other highly correlated
-
| Features | Correlation | Action |
|---|---|---|
| GFR Creatinine | -0.79 | *Drop GFR (higher null rate)* Keep Creatinine |
| HCO3, TCO2, Base Excess | 0.99 and 0.96 | Drop HCO3 and TCO2 Keep Base Excess (highest scaled variance) |
| INR, prothrombin | 0.99 | Drop prothrombin (lower scaled variance) Keep INR |
| systolic_blood_pressure, MAP | 0.75 | Keep both |
- Basically remove things 0.9 and higher
PCA
- Rank each feature by importance and explained variance
- Interestingly enough, many of the “mean” time series ones low variance and low importance

- We may consider dropping these