Kevin Gray – Computer Scientist

Customer Analysis for Scott Forge

Scott Forge is a forging company that has made parts for the Navy, Mars Rovers, Caterpillar, and more.

The Objective

With hundreds of thousands of rows in their customer database, they sought the help of a team of data scientists from MSOE to use machine learning and clustering to better understand their customers.

Data Cleaning

Because the data was not collected with data science in mind, we had to spend several weeks cleaning the data and merging two incompatible datasets. We determined that the two different datasets had a few percent overlap. I remember spending 20 hrs trying to find a workaround and eventually we had to break the news to the stakeholders that either we use only one dataset or discard most of the data. They chose to discard the data and merge.

Parallelized Training Pipeline

With the dataset cleaned, we decided to use DBSCAN to cluster the data. With hundreds of features we couldn't simply guess, so we chose the numerical features and used grid search to find the best features to use and also searched across hyperparameters. We overall generated thousands of models utilizing 50 CPU cores to achieve results in a few minutes. This allowed us to iterate our training quickly.

Results

The results were combined from all 50 python processes into a single csv file that was ranked by silhouette score. This score determines how well the clusters are separated from each other.

Our key deliverable was giving Scott Forge a cleaned merged dataset and rapid training pipeline they could use for further analysis. Each week we would present updates and keep the stakeholders informed of our decisions.

We also provided a report with our findings and recommendations for further development.

Key Skills

Data Science
Parallel Computing
Data Engineering
Stakeholder Communication
Scikit-Learn
Pandas

Diagnosing Dementia using 3D PET Scans

This project focused on developing a deep learning system to predict dementia risk from 3D PET brain scans, enabling early diagnosis and intervention for patients. Code

Performance Optimization

The initial training pipeline took 24 hours to complete a single epoch, making iteration and experimentation prohibitively slow. By implementing intelligent caching strategies and leveraging 16 H100 GPUs for distributed training, we reduced training time to under 10 minutes per epoch. This dramatic speedup allowed us to rapidly experiment with different architectures and hyperparameters.

Data Pipeline

Working with a 300 GB database of 3D PET scans required careful engineering to efficiently load and process the data. I built a custom PyTorch data pipeline that could handle the massive dataset, implementing efficient data loading, preprocessing, and batching strategies to keep the GPUs fully utilized during training.

User Interface

To make the model accessible to medical professionals, I created a full-stack application with a web-based frontend and backend server. Users can easily upload brain scan images and receive dementia risk predictions, making the research model practical for real-world clinical use.

Key Skills

Deep Learning
PyTorch
Distributed Training
High Performance Computing
Full-Stack Development
Medical Imaging

Classifying 300 Tropical Plants for the Milwaukee Dome Alliance

A hackathon project creating a scavenger hunt game where guests use a webapp to find plants in the domes, verified by a deep learning model. Code

Objective

The Milwaukee Dome Alliance is a non-profit organization that is dedicated to preserving and showcasing plants from around the world. In Fall 2025 they sponsored a hackathon at the Milwaukee School of Engineering to seek innovative ways to engage guests.

Our Solution

We sought to create a scavenger hunt game where guests could use a webapp and attempt to find various plants in the domes. A deep learning model would be used to verify the plants were found.

The Model

We used MobileNetV4 fine tuned on 300 plants inside the tropical dome. A dataset of 1.3 million images was used in training. Data augmentation was used to increase validation accuracy.

The Data Pipeline

In order to efficiently train the model within the time constraints of the hackathon, we focused our efforts on accelerating the training speed. Normally it would have taken a full day to train. Using PyTorch Distributed Data Loader and storing preprocessed images, we were able to speed up training to 3 minutes per epoch. The images were converted to the WebDataset format to speed up loading from network attached storage. We trained for 50 epochs in total.

The Webapp

We used a pre-trained model from the Hugging Face model hub to classify the plants.

Key Skills

HPC Pipeline
Data Engineering
PyTorch

Grass Scattering System

A scattering plugin for Godot that achieves real-time high density foliage on consumer hardware using multithreading and GPU instancing. Code

Objective

Create a scattering plugin for Godot that could achieve real time high density foliage on consumer hardware.

The Solution

Multithreading was used to build the indexes for where to render the grass. A recursive function generated a quad tree which was used to only render grass that the camera could see and to enable a level of detail system so that grass far away was less dense.

GPU instancing was used to render the grass at high density.

Key Skills

Multithreading
Godot
GPU Instancing

Hi! I'm Kevin Gray

Skills

Featured Projects

Customer Analysis for Scott Forge

The Objective

Data Cleaning

Parallelized Training Pipeline

Results

Key Skills

Diagnosing Dementia using 3D PET Scans

Performance Optimization

Data Pipeline

User Interface

Key Skills

Classifying 300 Tropical Plants for the Milwaukee Dome Alliance

Objective

Our Solution

The Model

The Data Pipeline

The Webapp

Key Skills

Grass Scattering System

Objective

The Solution

Key Skills

Projects

Breakout Game

Line Follower using Computer Vision

A randomized Timer

Contact