Predictive Analytics

July 31- August 3, 2018
(Instructor: Samir Abdelrahman, Lynd Bacon)

Course Aim: is to describe the linkage between the conceptual and practice views of the integration between the machine learning and statistical techniques in predicting clinical outcomes.

Course Prerequisite:  programming in Python– all examples will run on Jupyter Notebook.

Course Objectives are to:

  • Understand the methodology of developing and validating the predictive models for clinical outcomes.
  • Learn the main state-of-the-art machine learning and statistics techniques that are commonly used in literature predictive modeling
  • Apply the methodology on different use cases.
  • Practice these use cases using python on MIMIC data, and individual-level satisfaction, choice and preference, and response data, as a publicly available or provided datasets.

Initial Course Contents:

  • Behavioral Use Cases:
    • Covariates of patient satisfaction
    • Structure of stated preferences
    • Predictors of response to an outbound communication campaign
  • MIMIC Use Cases:
    • Readmission
    • Mortality
    • Disease Diagnosis
    • Clustering symptoms and comorbidities
  • Day 1: Introduction:
    Part 1: Basics
    1. Research Question: The audience might understand the difference among questions related to classification, prediction, and clustering.
    2. Descriptive Analysis:
      1. Summary statistics.
      2. Statistical hypothesis testing.
      3. Bayesian approach: frequentist perspective vs. Bayesian perspective.
    3. Machine Learning Techniques: They primarily include:
      1. Regression
      2. Classification: Like Rule-based, Tree-based, Function-based, Bayesian categories
      3. Clustering: Kmeans and Hierarchal Clustering
    4. Validation Methods and Metrics: They includes
      1. Cross validation and bootstrapping methods and how to validate clustering.
      2. Metrics: AUC, PPV, NPV, F-measure, purity (clustering), p-value and confidence interval.
    5. Result Interpretation:
      1. Threshold settings for minimizing classification errors.
      2. Black-Box versus interpretable machine learning techniques.   
    6. Health Data Problems: Debates, Principles, and Methods
      1. Missing data
      2. Imbalance Classes

Part 2: Pandas and sklearn practice

  • Day 2: Regression and Classification 
  • Part 1: Single Methods [Regression, Logistic Regression, Decision Tree, KNN, SVM]
    Part 2: Combining Methods [Boosting, Bagging, Voting]
  • Day 3: Clustering and Deep Learning Introduction
  • Part 1: Clustering: [Kmeans, Hierarchical Clustering, and Model-Based Clustering]
    Part 2: Deep Learning History and Neural Network
  • Day 4: Keras  (Deep learning in Python)

Common Aspects:

  • We will work on binary outcomes for classification.
  • No Time or longitudinal analysis.
  • For each use case or algorithm, we will start with a toy example to demonstrate the related basics.
  • Based course discussion and student feedback, we will continue refining the course materials.   

Course Fee: 4 days course
Students (Undergraduate, Graduate & Post Doc) $60.00
Faculty & Non-Academic $ 180.00
Entire Summer Course: (can choose up to eleven courses)
Students (Undergraduate, Graduate & Post Doc) $125.00
Faculty & Non-Academic $ 400.00