Introduction
In this tutorial, we'll explore how to analyze and visualize workforce data using Python and popular data science libraries. This is particularly relevant given recent corporate restructuring events like Meta's layoffs. We'll learn how to process employee data, identify trends, and create meaningful visualizations that can help organizations understand their workforce dynamics.
Prerequisites
To follow along with this tutorial, you'll need:
- Python 3.7 or higher installed on your system
- Basic understanding of Python programming
- Installed libraries: pandas, matplotlib, seaborn, and numpy
You can install the required packages using pip:
pip install pandas matplotlib seaborn numpy
Step-by-step instructions
Step 1: Setting up the data environment
First, we need to create a Python environment and import our required libraries. This step establishes the foundation for all our data analysis work.
Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
# Set style for better-looking plots
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)
Why this step**: We're importing essential data science libraries that will help us manipulate data, create visualizations, and perform statistical analysis. The seaborn styling and matplotlib configuration make our visualizations more appealing and professional.
Step 2: Creating sample workforce data
Before analyzing real company data, let's create a realistic dataset that mimics Meta's workforce structure for demonstration purposes.
Generate sample employee data
# Create sample workforce data
np.random.seed(42) # For reproducible results
# Define departments and roles
departments = ['Engineering', 'Marketing', 'Sales', 'Product', 'Operations', 'Research']
roles = ['Software Engineer', 'Data Scientist', 'Product Manager', 'Marketing Specialist', 'Sales Representative', 'Research Scientist']
# Generate sample data
n_employees = 10000
employee_data = {
'employee_id': range(1, n_employees + 1),
'department': np.random.choice(departments, n_employees),
'role': np.random.choice(roles, n_employees),
'tenure_years': np.random.exponential(2, n_employees),
'salary': np.random.normal(75000, 25000, n_employees),
'location': np.random.choice(['San Francisco', 'New York', 'Seattle', 'Austin', 'Remote'], n_employees),
'status': np.random.choice(['Active', 'Terminated'], n_employees, p=[0.95, 0.05])
}
# Create DataFrame
df = pd.DataFrame(employee_data)
print(df.head())
Why this step**: Creating realistic sample data allows us to practice data analysis techniques without needing access to sensitive company information. This approach helps us understand the patterns and methods that would apply to real workforce data.
Step 3: Data exploration and cleaning
Before performing any analysis, we need to understand our data structure and identify any issues that need correction.
Examine data structure and clean if necessary
# Check basic information about the dataset
print("Dataset shape:", df.shape)
print("\nData types:")
print(df.dtypes)
print("\nMissing values:")
print(df.isnull().sum())
# Basic statistics
print("\nBasic statistics:")
print(df.describe())
# Check for any duplicate employee IDs
print("\nDuplicate employee IDs:", df['employee_id'].duplicated().sum())
Why this step**: Understanding our data's structure is crucial for accurate analysis. We check for missing values, data types, and duplicates to ensure our analysis will be reliable and meaningful.
Step 4: Analyzing workforce trends
Now we'll explore key workforce metrics that are particularly relevant in the context of corporate restructuring.
Calculate key workforce metrics
# Calculate workforce distribution by department
dept_distribution = df['department'].value_counts()
print("Workforce distribution by department:")
print(dept_distribution)
# Calculate termination rates by department
termination_rates = df.groupby('department')['status'].value_counts().unstack(fill_value=0)
termination_rates['termination_rate'] = termination_rates['Terminated'] / (termination_rates['Active'] + termination_rates['Terminated'])
print("\nTermination rates by department:")
print(termination_rates[['termination_rate']])
Why this step**: Understanding departmental distribution and termination rates helps identify which areas might be most affected by restructuring. This analysis provides insights into where workforce changes are occurring.
Step 5: Creating visualizations
Visualizations make complex workforce data more accessible and understandable, especially when presenting findings to stakeholders.
Create workforce distribution charts
# Create visualizations
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
# Department distribution
dept_counts = df['department'].value_counts()
axes[0,0].pie(dept_counts.values, labels=dept_counts.index, autopct='%1.1f%%')
axes[0,0].set_title('Workforce Distribution by Department')
# Salary distribution
axes[0,1].hist(df['salary'], bins=30, alpha=0.7, color='skyblue')
axes[0,1].set_xlabel('Salary')
axes[0,1].set_ylabel('Frequency')
axes[0,1].set_title('Salary Distribution')
# Termination rates by department
termination_rates.plot(kind='bar', ax=axes[1,0])
axes[1,0].set_title('Termination Rates by Department')
axes[1,0].set_ylabel('Rate')
axes[1,0].tick_params(axis='x', rotation=45)
# Tenure distribution
axes[1,1].hist(df['tenure_years'], bins=30, alpha=0.7, color='lightgreen')
axes[1,1].set_xlabel('Tenure (years)')
axes[1,1].set_ylabel('Frequency')
axes[1,1].set_title('Employee Tenure Distribution')
plt.tight_layout()
plt.show()
Why this step**: Visual representations help quickly identify patterns and trends in workforce data. These charts would be invaluable for executives to understand the impact of restructuring decisions and identify which departments are most affected.
Step 6: Advanced analysis with correlation
Let's examine how different workforce factors correlate with each other to identify potential patterns.
Perform correlation analysis
# Create a correlation matrix for numerical variables
numerical_cols = ['tenure_years', 'salary', 'employee_id']
# Convert employee_id to numeric for correlation
numeric_df = df[numerical_cols].copy()
numeric_df['employee_id'] = pd.to_numeric(numeric_df['employee_id'])
# Calculate correlation matrix
correlation_matrix = numeric_df.corr()
print("Correlation Matrix:")
print(correlation_matrix)
# Create heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0)
plt.title('Correlation Matrix of Workforce Variables')
plt.show()
Why this step**: Correlation analysis helps identify relationships between workforce factors. Understanding these connections can help predict which employees might be at higher risk of leaving or which departments might need additional support during restructuring.
Step 7: Generating insights report
Finally, we'll compile our findings into a structured report that could be used for strategic decision-making.
Create summary report
# Generate insights report
print("=== WORKFORCE ANALYSIS REPORT ===")
print(f"Total workforce: {len(df)} employees")
print(f"Active employees: {len(df[df['status'] == 'Active'])}")
print(f"Terminated employees: {len(df[df['status'] == 'Terminated'])}")
# Department insights
print("\n=== DEPARTMENT INSIGHTS ===")
for dept in departments:
dept_data = df[df['department'] == dept]
active_count = len(dept_data[dept_data['status'] == 'Active'])
termination_rate = (len(dept_data[dept_data['status'] == 'Terminated']) / len(dept_data)) * 100
print(f"{dept}: {active_count} active employees, {termination_rate:.1f}% termination rate")
# Salary insights
print("\n=== SALARY INSIGHTS ===")
print(f"Average salary: ${df['salary'].mean():,.0f}")
print(f"Salary range: ${df['salary'].min():,.0f} - ${df['salary'].max():,.0f}")
# Tenure insights
print("\n=== TENURE INSIGHTS ===")
print(f"Average tenure: {df['tenure_years'].mean():.1f} years")
print(f"Tenure range: {df['tenure_years'].min():.1f} - {df['tenure_years'].max():.1f} years")
Why this step**: A structured report format makes it easy for decision-makers to quickly understand key findings and use them for strategic planning. This approach mirrors how real organizations would present workforce analytics to leadership teams.
Summary
In this tutorial, we've learned how to work with workforce data using Python. We created sample employee data, explored key metrics, visualized workforce trends, and generated insights that would be valuable during corporate restructuring. The techniques demonstrated here can be applied to real company data to understand workforce dynamics and support strategic decision-making processes.
By understanding these data analysis methods, you're now equipped to analyze workforce data in any organization, whether it's preparing for or responding to major restructuring events like the layoffs at Meta. The skills learned here are directly applicable to HR analytics, workforce planning, and organizational development.



