Predictive Analytics

July 25-28, 2017
(Instructor: Samir Abdelrahman and Andrew Redd)

Course Aim: is to describe the linkage between the conceptual and practice views of the integration between the machine learning and statistical techniques in predicting clinical outcomes.

Course Objectives are to:

  • Understand the methodology of developing and validating the predictive models for clinical outcomes.
  • Learn the main state-of-the-art machine learning and statistics techniques that are commonly used in literature predictive modeling
  • Apply the methodology on different use cases.
  • Practice these use cases using python/R on MIMIC as a publicly available dataset.

Initial Course Contents:

  • Day 1: Introduction:
    1. Research Question: The audience might understand the difference among questions related to classification, prediction, and clustering.
    2. Data Quality Methods: They include some statistical basic methods and distributions that identify data noise, outliers, and missing data.
    3. Predictor Selection Methods: They include machine learning feature selection methods and hypothetical tests.
    4. Machine Learning Techniques: They primarily include:
      1. Classification: Like Rule-based, Tree-based, Function-based, Bayesian categories
      2. Clustering: K-means and Hierarchal Clustering
    5. Validation Methods and Metrics: They includes
      1. Cross validation and bootstrapping methods and how to validate clustering.
      2. Metrics: AUC, PPV, NPV, F-measure, purity (clustering), p-value and confidence interval.
    6. Result Interpretation:
      1. Threshold setting
      2. Black-Box versus interpretable machine learning techniques.      
  • Day 2 and 3: Use Cases: [Apply individual techniques from the above]
    1. Readmission
    2. Mortality
    3. Disease Diagnosis
    4. Clustering symptoms and comorbidities
  • Day 4: Combining Modeling Techniques: [Apply combining approaches for the above]
    1. Super Learner
    2. Bagging and Boosting
    3. Voting and Stacking
    4. Meta Classification: Classifier that uses another classifier.

Common Aspects:

  • We will work on binary outcomes for classification.
  • No Time or longitudinal analysis.
  • For each use case, we will present and describe the predictive modeling literature overview and select to use the most outperformers (if any).
  • Based on 3 and further discussion, we will continue refining the course materials.