Upload
anaelia-ovalle
View
75
Download
4
Embed Size (px)
Citation preview
Deep Neural Networks Using Demographic Data to Predict Thyroid Disorder Anaelia Ovalle, ShihYin Chen, Yasser Attiga, John LaGue
Table of Contents
I. Introduction to Project & Overall GoalII. Introduction to Neural Networks & TensorFlowIII.Methods Used and ResultsIV.Discussion V. Concluding Thoughts
Thyroid Receptor Beta in Complex With Inhibitor
Thyroid Disease
Hyper & Hypothyroidism
Thyroiditis
Thyroid Cancer
Goal: Using solely demographic data, identify a
subset of the population more likely to have a thyroid
disorder
Machine Learning Algorithms
Neural NetworksA Neural Network or ‘Artificial’ Neural Network (ANN):
● Machine learning technique that attempts to mimic behaviors of human neurons.
● Unlike the more algorithmic approaches we have seen this semester.● In this way an ANN can extract patterns and uncover complex trends in
a data set. ● Implemented for both supervised and unsupervised learning problems.● Applications: from classifying fraudulent credit card transactions to
YouTube images of cats.● Or more importantly to health issues such as determining populations
at risk for a certain disease.● Yet, each ANN must be tailored to each specific problem set and, as
our group discovered, the necessary computation time and resources for an ANN can rapidly increase with complexity.
Basic ANNs have only one or two hidden layers of neurons
DNNs have 3 or more hidden layers
Basic ANN
Deep Neural Network
● DNN can be far more effective at extracting hidden trends in a discombobulated data set
● Yet, Increased complexity does not necessarily mean increased network performance
● Open source machine learning software library for Python
● Much of our project involved running numerous models to determine important and optimal parameters
● hidden_units and activation_fntf.contrib.learn.DNNClassifier.__init__(hidden_units, feature_columns, model_dir=None, n_classes=2, weight_column_name=None, optimizer=None, activation_fn=relu, dropout=None, gradient_clip_norm=None, enable_centered_bias=False, config=None,feature_engineering_fn=None, embedding_lr_multipliers=None)
TensorFlow from Google
Unbalanced DataStatus Number
No Thyroid Disease 701850
Thyroid Disease 45451
➢ Only 6.082% of the patients have the thyroid disease.
● Downsampling:○ Randomly drop instances from the over-represented class
● Upsampling with Bootstrap:○ Randomly add copies of instances from the under-represented class
● Upsampling with SMOTE (Synthetic Minority Over-sampling Technique):○ Create synthetic samples from the minor class instead of creating
copies.○ Depend upon the amount of over-sampling required, neighbors from
the k nearest neighbors are randomly chosen.○ Select two or more similar instances and perturbing an instance one
attribute at a time by a random amount within the difference to the neighboring instances.
Sampling Method for Imbalanced Data
Sample Code: SMOTE sampling
Sample Code: Dynamic Neural Network
Results● Sampling methods and activation functions:
○ 28 models● Include variety of hidden layers / nodes:
○ Limitless number of models
Sensitivity = TP / (TP + FN)Precision = TP / (TP + FP)
Precision vs. Sensitivity
● Explored thresholds of 0.4, 0.5, and 0.65
● As precision improved, sensitivity decreased
Becomes a game of tug-of-war.
Do we want to reach more people with thyroid disorders or be more precise with the people we do reach?
Introducing Lift!
Introducing Lift!
Introducing Lift!● Lift compares the effectiveness of using a model versus selecting targets
randomly● Can be expressed in two ways:
○ Lift ratio [p/P]○ Percent increment: [(p-P) / P] * 100%
■ p = positives in target samples from model■ P = prevalence in general population
Introducing Lift!● Lift compares the effectiveness of using a model versus selecting targets
randomly● Can be expressed in two ways:
○ Lift ratio [p/P]○ Percent increment: [(p-P) / P] * 100%
■ p = positives in target samples from model■ P = prevalence in general population
Results: Lift
** Original RF = random forest classifier using original, unbalanced data
ImplicationsRisk factors for various thyroid disorders
include:1. Family history2. Iodine intake deficiency3. Age4. Sex5. Type 1 diabetes status6. Radiation history
● Addressing unbalanced data was key
● Chose parameters to tweak such as activation functions and hidden nodes
● Analyzed different models for effectiveness
● Evaluated models based on precision, recall, and lift
● Demographic data can have some insight on detecting thyroid disorders in a population
Concluding Thoughts...
Thank You