[data-colorid=mr5bz5wdao]{color:#bf2600} html[data-color-mode=dark] [data-colorid=mr5bz5wdao]{color:#ff6640}[data-colorid=qilzv6b9qc]{color:#ff5630} html[data-color-mode=dark] [data-colorid=qilzv6b9qc]{color:#cf2600}[data-colorid=hisiwhv6cc]{color:#bf2600} html[data-color-mode=dark] [data-colorid=hisiwhv6cc]{color:#ff6640}[data-colorid=bh1towvo59]{color:#bf2600} html[data-color-mode=dark] [data-colorid=bh1towvo59]{color:#ff6640}[data-colorid=zb0xpg0375]{color:#ff5630} html[data-color-mode=dark] [data-colorid=zb0xpg0375]{color:#cf2600}[data-colorid=z31xsyttsu]{color:#ff5630} html[data-color-mode=dark] [data-colorid=z31xsyttsu]{color:#cf2600}[data-colorid=xnzgmg3vbw]{color:#ff5630} html[data-color-mode=dark] [data-colorid=xnzgmg3vbw]{color:#cf2600}[data-colorid=mcumx5ysfw]{color:#ff5630} html[data-color-mode=dark] [data-colorid=mcumx5ysfw]{color:#cf2600}[data-colorid=rb7jg0wu4u]{color:#ff5630} html[data-color-mode=dark] [data-colorid=rb7jg0wu4u]{color:#cf2600}[data-colorid=faaqfxh949]{color:#bf2600} html[data-color-mode=dark] [data-colorid=faaqfxh949]{color:#ff6640}[data-colorid=zi1jtfpncn]{color:#ff5630} html[data-color-mode=dark] [data-colorid=zi1jtfpncn]{color:#cf2600}[data-colorid=abjqq4wub2]{color:#ff5630} html[data-color-mode=dark] [data-colorid=abjqq4wub2]{color:#cf2600}[data-colorid=eu6av933ww]{color:#bf2600} html[data-color-mode=dark] [data-colorid=eu6av933ww]{color:#ff6640}[data-colorid=s78qiewozi]{color:#bf2600} html[data-color-mode=dark] [data-colorid=s78qiewozi]{color:#ff6640}[data-colorid=uaqeoynofg]{color:#ff5630} html[data-color-mode=dark] [data-colorid=uaqeoynofg]{color:#cf2600}[data-colorid=z9d8osui3r]{color:#ff5630} html[data-color-mode=dark] [data-colorid=z9d8osui3r]{color:#cf2600}[data-colorid=nthibw39rq]{color:#ff5630} html[data-color-mode=dark] [data-colorid=nthibw39rq]{color:#cf2600}[data-colorid=th1tatf7r9]{color:#ff5630} html[data-color-mode=dark] [data-colorid=th1tatf7r9]{color:#cf2600}[data-colorid=x9kalryowd]{color:#ff5630} html[data-color-mode=dark] [data-colorid=x9kalryowd]{color:#cf2600}

EDA: https://docs.google.com/spreadsheets/d/1CxsfVNbub2SrAsxHV5_0btlO-qTjPww0/edit?usp=drive_link&rtpof=true&sd=true

//

Difference between Train and Test

CV NameTrain Percentage of Null ValuesTest Percentage of Null ValuesDifference
CV_GFR_latest76.53.473.1
CV_MAP_cuff_mean38.818.320.5
CV_MAP_cuff_max38.818.320.5
CV_MAP_cuff_min38.818.320.5
CV_MAP_cuff_latest27.912.315.6
  • Drop these because it has a high difference between train and test dataset

Null Percentage

  1. Drop temperature_skin and troponin based on high null %
CVTrain Percentage of Null ValuesTest Percentage of Null ValuesDifference
CV_temperature_skin_latest99.899.9-0.1
CV_troponin_HS_latest97.895.42.4
CV_temperature_bladder_latest85.286.3-1.1
CV_PO2_art_istat_latest85.188.3-3.2
CV_TCO2_art_latest85.188.3-3.2
CV_PCO2_art_latest85.188.2-3.1
CV_pH_art_istat_latest85.188.2-3.1
CV_HCO3_latest85.188.1-3
CV_FiO2_latest85.188.1-3
CV_base_excess_istat_latest85.188.1-3
CV_MAP_invasive_latest84.786.8-2.1

# Feature Correlation Matrix

  1. Manually look at highly correlated features (defined by abs(correlation) >= 0.7)

    • Used Spearman correlation (instead of Pearson) because relationship may not be linear.
  2. To Do

    • Remove time series feature engineering for GCS components (motor_response, verbal_response, eye_opening)

      • The min, max, mean, latest tend to be highly correlated
      • However, I think it’s a good idea to keep min
    • Remove GCS as a feature

      • Since we have the individual components – having GCS does not add more information
    • Other highly correlated

FeaturesCorrelationAction
GFR Creatinine-0.79*Drop GFR (higher null rate)* Keep Creatinine
HCO3, TCO2, Base Excess0.99 and 0.96Drop HCO3 and TCO2 Keep Base Excess (highest scaled variance)
INR, prothrombin0.99Drop prothrombin (lower scaled variance) Keep INR
systolic_blood_pressure, MAP0.75Keep both
  • Basically remove things 0.9 and higher

PCA

  • Rank each feature by importance and explained variance
  • Interestingly enough, many of the “mean” time series ones low variance and low importance

  • We may consider dropping these