Introduction
In today's rapidly evolving tech landscape, companies like General Motors are shifting their workforce toward AI-native development skills. This tutorial will guide you through building a simple AI workflow system that combines data engineering, prompt engineering, and model deployment concepts that are in high demand in the current job market. You'll learn to create a basic AI workflow pipeline that processes data, generates prompts, and interacts with language models.
Prerequisites
- Basic Python programming knowledge
- Understanding of REST APIs and HTTP requests
- Knowledge of data processing concepts (pandas, data frames)
- Access to an OpenAI API key or similar LLM service
- Python libraries: requests, pandas, json, os
Step-by-step Instructions
Step 1: Set Up Your Development Environment
Install Required Libraries
First, create a virtual environment and install the necessary packages. This ensures your project dependencies don't interfere with other Python projects.
python -m venv ai_workflow_env
source ai_workflow_env/bin/activate # On Windows: ai_workflow_env\Scripts\activate
pip install requests pandas openai
Configure API Keys
Set up your environment variables to store your API keys securely. This is crucial for production workflows.
import os
os.environ['OPENAI_API_KEY'] = 'your_openai_api_key_here'
Step 2: Create Data Processing Module
Build Data Engineering Framework
Our AI workflow needs to process data before generating prompts. This module will handle data ingestion and preprocessing.
import pandas as pd
import json
class DataProcessor:
def __init__(self):
self.data = None
def load_data(self, file_path):
"""Load data from CSV file"""
self.data = pd.read_csv(file_path)
print(f"Loaded {len(self.data)} records")
return self.data
def clean_data(self):
"""Clean and prepare data for AI processing"""
# Remove null values
self.data = self.data.dropna()
# Remove duplicates
self.data = self.data.drop_duplicates()
print("Data cleaning completed")
return self.data
def generate_summary(self):
"""Generate data summary for prompt engineering"""
summary = {
"total_records": len(self.data),
"columns": list(self.data.columns),
"data_types": self.data.dtypes.to_dict()
}
return json.dumps(summary, indent=2)
Step 3: Implement Prompt Engineering System
Design Prompt Templates
Prompt engineering is crucial for effective AI interactions. This module creates dynamic prompts based on data insights.
class PromptEngineer:
def __init__(self):
self.templates = {
"data_insight": "Based on the following data summary, provide insights about the dataset:\n{data_summary}",
"analysis_request": "Analyze the following data and suggest business implications:\n{data_summary}",
"report_generation": "Generate a comprehensive report based on this data:\n{data_summary}"
}
def create_prompt(self, template_name, data_summary):
"""Generate prompt using template and data"""
if template_name in self.templates:
prompt = self.templates[template_name].format(data_summary=data_summary)
return prompt
else:
raise ValueError(f"Template {template_name} not found")
Step 4: Integrate Language Model Interaction
Build LLM Client
This module handles communication with language models, implementing proper error handling and response parsing.
import openai
class LLMClient:
def __init__(self, api_key=None):
if api_key:
openai.api_key = api_key
def generate_response(self, prompt, model="gpt-3.5-turbo"):
"""Send prompt to LLM and return response"""
try:
response = openai.ChatCompletion.create(
model=model,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
max_tokens=500,
temperature=0.7
)
return response['choices'][0]['message']['content'].strip()
except Exception as e:
print(f"Error generating response: {e}")
return None
Step 5: Create Complete Workflow Pipeline
Assemble All Components
Now we'll combine all components into a cohesive workflow that demonstrates AI-native development practices.
class AIWorkflow:
def __init__(self, data_file):
self.data_processor = DataProcessor()
self.prompt_engineer = PromptEngineer()
self.llm_client = LLMClient()
self.data_file = data_file
def run_workflow(self):
"""Execute complete AI workflow"""
# Step 1: Load and process data
data = self.data_processor.load_data(self.data_file)
cleaned_data = self.data_processor.clean_data()
# Step 2: Generate data summary
summary = self.data_processor.generate_summary()
print("Data Summary:", summary)
# Step 3: Create prompt
prompt = self.prompt_engineer.create_prompt("data_insight", summary)
print("Generated Prompt:", prompt)
# Step 4: Get AI response
response = self.llm_client.generate_response(prompt)
# Step 5: Output results
if response:
print("AI Response:", response)
return response
else:
print("Failed to get AI response")
return None
Step 6: Test Your Workflow
Create Sample Data and Run Test
Create a sample CSV file to test your workflow and ensure everything works correctly.
# Create sample data
sample_data = {
"product": ["Laptop", "Phone", "Tablet"],
"price": [1200, 800, 500],
"rating": [4.5, 4.2, 3.8]
}
sample_df = pd.DataFrame(sample_data)
sample_df.to_csv('sample_data.csv', index=False)
# Run workflow
workflow = AIWorkflow('sample_data.csv')
result = workflow.run_workflow()
Summary
This tutorial demonstrated how to build an AI-native workflow system that combines data engineering, prompt engineering, and language model integration. You've learned to create modular components that work together to process data, generate prompts, and interact with AI models. These skills are highly valued in today's job market as companies like GM seek professionals who can bridge traditional IT with modern AI capabilities. The modular approach shown here can be extended to include more sophisticated data processing, multiple AI model integrations, and production-ready error handling.



