Pablo X Zumba
About me
Hello! I'm a versatile and driven professional who recently graduated with a Master's degree in Business Analytics and Information Systems.
My background includes a Bachelor's degree in Electronic Engineering and an Associate's degree as a Biomedical Technician, showcasing my multidisciplinary approach to problem-solving and innovation.
Passionate about data, I am particularly intrigued by data engineering, analytics, and machine learning. I'm eager to apply my skills to uncover valuable insights and drive impactful change across various industries.
My thirst for knowledge doesn't end there, as I am an avid reader with a strong interest in science books covering topics such as history, artificial intelligence, and Stoic philosophy. A firm believer in the importance of work-life balance, I love exploring the great outdoors and participating in activities like hiking, biking, and swimming. I also enjoy the extreme sport of "emotional self-control", as it helps me cultivate resilience and adaptability in both my personal and professional life.
I am excited to connect with like-minded professionals and collaborate on projects that leverage my unique skill set while expanding my horizons in the ever-evolving world of data and technology. Let's get in touch and create something extraordinary together!
Download Resume
Download ResumeTechnical Skills
Programming Languages
Spark, Python, R, SQL (SQLite, Microsoft SQL Server, MySQL) NoSQL (Cassandra, Hive, Pig, Impala), HTML, CSS, JavaScript
Frameworks and API
Databricks, Snowflake, Flask, Google Big Query, Chart js, Microsoft Machine Learning Studio, Cloudera
Software Tools
Tableau, Power BI, Wireshark, Jupyter, MS Office, Git & GitHub, Draw io, VS Code, DB Browser, Canva
Statistical Proficiencies
Descriptive and Inferential Statistics in R and Python; Excel (VLOOKUP and Pivot Tables)
Python Libraries
TensorFlow, Scikit-learn, Keras, Pandas, Matplotlib, NumPy, SciPy, Seaborn, BeautifulSoup
Cybersecurity
Risk management, security best practices, Security law/ethics, and secure software development and testing
Core Competencies
Design advanced systems processes using UML and OOP. Identifying IT-related product lifecycle challenges, depth case analysis and communication, managing technology-focused businesses, and presenting data analytics projects
Portfolio

Attendance Prediction ML Models
Developing machine learning algorithms in Databricks - Apache Spark for the 2016 Baseball Season to improve profitability. Highest Accuracy Model ~83%.
View Project
Inmates Reintegrate Program Database Design
Database design, development, and implementation for an inmate reintegration program using draw.io for modeling and Microsoft SQL Server & Azure for implementation.
View on GitHub
Intelligent Tourist Website
Created a website using HTML, CSS, JavaScript, chart.js, and Python Flask to suggest tours and excursions based on solo travelers' preferences.
Visit WebsiteWatch the video walkthrough

Divorce in the US
Created a thorough report on the possible reasons and causes for divorce in the United States using SQLite, R-Studio, and Tableau. The purpose was to help a psychologist concentrate on significant factors during marital therapy.
View Project Report
E-commerce Database Design
A database design, development, and implementation project for an e-commerce company utilizing Draw.io for ERD modeling and Microsoft SQL Server & Azure for implementation.
View Project Report
Life expectancy and GDP Project
The purpose of this project is to investigate the relationship between life expectancy and Gross Domestic Product (GDP) in six countries using Python and JupyterLab.
View Project on GitHub
Patient Demographic and Health Analysis
In this project, we undertook a comprehensive study of a patient dataset with a multi-faceted objective. Our primary goal was to extract meaningful information about the population distribution and health-related costs while considering lifestyle and family attributes.
View Project on GitHub
Biodiversity in National Parks
This project performs an analysis of data on the conservation status of endangered species in different national parks and investigates whether there are patterns or issues related to the types of species that are endangered.
View Project on GitHub
Web-Scraping with Beautiful Soup
This is a project about how to scrape data from a given website. This project will use pyplot, numpy, pandas, request and BeautifulSoup libraries. The website is hosted in AWS and has over 1700 reviews of chocolate bars from all around the world.
View Project on GitHub
Design of Medical Big Data Systems
This is a research paper about the non-functional requirements for the design of Medical Big Data Systems.
View Project
R code to determine whether a variable is normally distributed
The goal of this project is to demonstrate how to use R to determine if a variable is normally distributed by using histograms, boxplots, QQplots, and QQlines.
View Project on GitHub
Quantitative Analysis of Credit Unions: An Application of Statistical Methods in R
Investigating Member Sizes and Total Assets in Florida, California, New York, and New Jersey Credit Unions using Confidence Intervals and Hypothesis Testing
View Project on GitHub
Exploratory Data Analysis and Regression Modeling of U.S. Domestic Airfares
Analyzing Airfare and Passenger Data for Selected U.S. Domestic Routes with R: Data Preprocessing, Descriptive Statistics, and Regression Analysis
View Project on GitHub
Time Series Analysis and Forecasting of US Domestic Flight Passengers
Modeling Seasonality in Passenger Volume: Applying Regression and Durbin-Watson Test to De-seasonalize and Reseasonalize US Flight Data
View Project on GitHub
Comparative Analysis of Vehicle Listings on Craig's List: Applying Stratified Sampling and ANOVA in R
An analysis of regional and cylindrical variations in asking prices and odometer readings in five U.S. regions
View Project on GitHub
Predictive Modeling of Home Sales in Hunters Green: Analyzing Days on Market and Sale Prices
Exploring the Key Predictors of Property Sale Duration and Price through Regression Models and Hypothesis Testing in R
View Project on GitHub
Predictive Analysis of Customer Churn in Telecom Services: A Multimodel Approach in R
Examining Predictors of Churn Among Telephone, Internet, and Dual-Service Subscribers: Logistic Regression Models, Performance Metrics, and Classifier Tuning
View Project on GitHub
Survival Analysis in Medical Trials: Comparing Patient Outcomes on Standard vs. Test Treatments
Utilizing Kaplan-Meier Estimates and Parametric Models to Assess Survival Probabilities and Influence of Age and Diagnosis Months on Treatment Efficacy
View Project on GitHub
Analyzing Retail Sales and Promotions: A Comprehensive Study on Pricing, Product Categories, and Store Segments
Unraveling the Impact of Promotions and Pricing on Product Sales and Identifying the Most Elastic Products in a Large Retail Chain
View Project on GitHub
The Dynamics of MLB Game Attendance: An In-depth Analysis Using Machine Learning Approaches
Evaluating and Comparing Predictive Models for Attendance at Major League Baseball Games
View Project on GitHub
Predicting Salaries from Job Postings: A Machine Learning Approach
Evaluating Predictive Models for Job Salaries Using Text-based and Numeric Features
View Project on GitHub
Predicting Heartbeat Anomalies: A Comparative Study of Neural Networks
Evaluating Multiclass Classification Models for Heartbeat Measurements
View Project on GitHub
A Template for Exploratory Data Analysis and Preparation
Understanding and Preprocessing Data for Advanced Analytics
View Project on GitHub
House Price Prediction in King County, WA using Regression Techniques
Building and Comparing Various Regression Models to Predict House Sale Prices
View Project on GitHub
Loan Default Prediction in the Banking Industry using SVM Models
Preventing Bad Loans by Predicting Loan Default Using Various SVM Techniques and Hyperparameter Optimization
View Project on GitHub
Predicting Hospital Readmission in Diabetic Patients Using Decision Tree Models
Improving Healthcare Performance Metrics through Enhanced Predictive Modelling and Hyperparameter Optimization
View Project on GitHub
Predicting Underage Drinking in High School Students Using Ensemble Machine Learning Methods
Harnessing the Power of Ensemble Learning to Detect and Intervene in Underage Drinking Cases
View Project on GitHub
Predicting Patient's Smoker Status Through Text Mining and Machine Learning | SCIKIT-LEARN
Leveraging Latent Semantic Analysis and Stochastic Gradient Descent Classifier for Healthcare Predictive Analytics
View Project on GitHub
Preventing Bad Loans with Machine Learning: An Exercise in Binary Classification and Feature Engineering in the Banking Industry
Leveraging Neural Networks to Predict Loan Outcomes
View Project on GitHub
Predicting Loan Default Risk in the Banking Industry: An Exercise in Binary Classification, Feature Engineering and Neural Networks using Keras
Designing Shallow and Deep Neural Networks for Risk Assessment in Home Loans
View Project on GitHub
Convolutional Neural Network for Lego Brick Classification
Image Recognition Model for Differentiating Types of Lego Bricks
View Project on GitHub
Power Grid Stress Prediction using Recurrent Neural Networks
Implementing and Comparing Different RNN Architectures for Time Series Data
View Project on GitHub