π Week 6 was all about Exploratory Data Analysis (EDA) & Visualization!
I cleaned, analyzed, and visualized an uncleaned dataset of Data Science jobs using Pandas, Matplotlib & Seaborn. Let's break it down! π§΅
@TDataImmersed#TDI@DabereNnamani
β Cleaned messy data
β Uncovered job trends
β Created powerful visuals
EDA & Visualization are π for Data Science!
Want to see everything? Check out my notebook:
π anaconda.cloud/share/notebooβ¦
Which visualization do you use most? Letβs discuss! ππ
π₯ Seaborn for Advanced Plots
Heatmap: Correlation between key variables π₯
Box Plot: Job title vs company ratings π
Pair Plot: Relationships between salary, rating & founding year
Aesthetics Insights = π‘
π Matplotlib for EDA
Histogram: Salary distribution π°
Bar Chart: Top locations for Data Science jobs πΊοΈ
Line Plot: Salary trends by company size π’
Visualizing data brings numbers to life! π₯
π EDA = Knowing Your Data
Summary stats for Rating, Salary, and Revenue
Identified top job titles & their average ratings
Analyzed salary trends by company size
EDA helps spot patterns & anomalies fast! π
π§Ό Data Cleaning is the foundation of good analysis!
Handled missing values π΅οΈ
Extracted & cleaned Salary Estimate π°
Standardized Company Names & Locations π
Data cleaning = better insights! β
π Week 5 was all about Data Cleaning & Transformation with Pandas!
From handling missing values to merging DataFrames, this was a deep dive into real-world data prep. Letβs break it down! π§΅π
Wrap-Up & Full Notebook
β Data cleaned
β New features created
β Data merged
β Insights uncovered
This was real-world data prep at its finest! Check out my full notebook here:
π hhttps://anaconda.cloud/share/notebooks/bab3f1ea-092c-4be5-ac0d-4b16fad8224e/overview
String Cleaning & Deck Extraction
π‘ Text manipulation in Pandas
I extracted the deck from the Cabin column to analyze survival rates by deck.
π· Question β‘οΈ π· My Solution
Text data isnβt always cleanβPandas makes it easy!
π Merge vs. Concatenate?
merge() = Joins datasets on a key (like PassengerId)
concat() = Stacks datasets (vertically or horizontally)
π· Question β‘οΈ π· My Solution
These techniques help when dealing with multiple data sources!
Creating New Features
π οΈ Feature Engineering
I added:
β FamilySize = (sibsp parch 1)
β FarePerPerson = Fare Γ· FamilySize
π· Question β‘οΈ π· My Solution
Why? These features give new insights into passengersβ social & economic backgrounds!
π° Outliers distort averages!
I detected extreme fare prices using the IQR method and capped them instead of removing.
π· Question β‘οΈ π· My Solution
Capping ensures we keep all data while limiting extreme values! π³οΈ
π Data transformation step!
Instead of 1, 2, 3, I converted Pclass into "1st Class", "2nd Class", "3rd Class" for better readability.
π· Question β‘οΈ π· My Solution
Why? Clear labels improve data storytelling! π
π Duplicate records skew analysis!
Using drop_duplicates(), I checked and removed any duplicates in Titanic data.
π· Question β‘οΈ π· My Solution
Have you ever encountered duplicate headaches? π€―
You may not know what to do with missing values...
π€ Drop or Fill?
dropna() β Remove missing data (good if thereβs little missing)
fillna() β Replace missing values (mean, median, etc.)
I used the median for Age to avoid outliers! π·
Finding Missing Data
π Identifying missing values in the Titanic dataset using Pandas:
π· Question β‘οΈ π· My Solution
Missing values can break analysisβstep 1 is always detection!
π Week 3 of my Python journey was all about diving into File Handling, CSVs, and NumPy! π
From reading Titanic data to exploring arrays with NumPy, this week was packed with exciting tasks. Letβs break it down: π§΅
@DabereNnamani@TDataImmersed@JacobAjala#TDI
That wraps up my Week 3 highlights! π Want to explore the complete code and dive into more details?
Check it out here:
π anaconda.cloud/share/notebooβ¦
What was your favorite part? Letβs discuss! β¨
π NumPy Adventures
NumPy made math magical! I:
Built and manipulated 1D/2D arrays
Found fare stats (min, max, mean) for Titanic data
Explored indexing and random arrays π²β¨
π· Questions β‘οΈ π· My Solutions
How do YOU use NumPy? Let me know! π