Top 10 Feature Engineering Questions for Interviews

Master feature engineering techniques to enhance machine learning models and prepare for technical interviews with practical insights and examples.

scale.jobs

27 Mar 2025 — 16 min read

Feature engineering is crucial for improving machine learning models and acing technical interviews. This guide covers the top 10 feature engineering topics you need to know, including practical techniques and real-world examples. Here's a quick overview:

Feature Engineering Basics: Transform raw data into useful features through transformation, creation, and selection.
Feature Selection vs. Feature Extraction: Learn when to choose between selecting key features or creating new ones.
Handling Missing Data: Techniques like deletion, imputation, and advanced methods (e.g., KNN or model-based imputation).
Encoding Categorical Variables: Use methods like one-hot, label, or target encoding to handle nominal and ordinal data.
Scaling Numerical Features: Apply scaling methods (e.g., Min-Max, Standard, Robust) to improve model performance.
Feature Binning: Simplify continuous variables into categories using equal-width, equal-frequency, or custom bins.
Feature Interactions: Combine features (e.g., multiplicative, additive) to uncover relationships.
Dimensionality Reduction: Use PCA, autoencoders, or feature selection to reduce high-dimensional datasets.
Time Series Feature Engineering: Extract time-based features like lags, rolling statistics, and seasonal trends.
Testing Feature Quality: Validate features using statistical tests, feature importance metrics, and cross-validation.

Quick Comparison Table

Topic	Key Methods/Techniques	Best For
Feature Selection	Filter, Wrapper, Embedded	Simplifying datasets, improving models
Feature Extraction	PCA, LDA, Autoencoders	Reducing dimensions, creating new features
Handling Missing Data	Deletion, Imputation, KNN, Model-based	Managing incomplete datasets
Encoding Categorical Data	One-Hot, Label, Target, Binary Encoding	Handling nominal/ordinal variables
Scaling Numerical Features	Min-Max, Standard, Robust Scaling, Log Transform	Normalizing numerical data
Feature Binning	Equal-Width, Equal-Frequency, Custom, Tree-based	Simplifying continuous variables
Feature Interactions	Multiplicative, Additive, Ratios, Polynomial	Capturing relationships between features
Dimensionality Reduction	PCA, Autoencoders, Feature Selection	High-dimensional datasets
Time Series Features	Lag, Rolling Stats, Seasonal Decomposition	Temporal datasets
Testing Feature Quality	Correlation, ANOVA, Feature Importance	Validating feature impact

Mastering these concepts will prepare you for machine learning interviews and improve your ability to build effective models. Let’s dive deeper into each topic.

Feature Engineering Full Course - in 1 Hour | Beginner Level

1. What Is Feature Engineering?

Feature engineering is the process of turning raw data into features that help algorithms make better predictions. Think of it as preparing raw ingredients for a recipe - data scientists refine and shape the data so it works well with machine learning models.

Here’s what the process typically involves:

Data Transformation: Converting raw data into a format that models can use, like scaling numerical values or encoding categorical variables.
Feature Creation: Modifying or combining data to highlight important relationships, such as creating new columns from existing ones.
Feature Selection: Picking the most useful attributes while removing those that add noise or redundancy.
Applying Domain Knowledge: Using industry-specific insights to create features that reflect meaningful patterns.

For example, you might transform a timestamp into features like:

Day of the week
Hour of the day
Whether it’s a weekend
Holiday status
Days since the last purchase

When discussing feature engineering in interviews, explain your choices and reasoning clearly. Highlight why certain features were created and how they improved the model.

To excel at feature engineering, focus on:

A deep understanding of the problem you’re solving
Familiarity with data transformation techniques
The ability to spot patterns in data
Experience with validating and testing features

2. Feature Selection vs. Feature Extraction

When working with feature engineering, it's important to understand the distinction between feature selection and feature extraction. While feature selection focuses on picking the most relevant features from the original dataset, feature extraction creates entirely new features. Both approaches aim to improve model performance, but they do so in different ways.

Feature Selection

Feature selection is about identifying and keeping the most important features. Common methods include:

Filter Methods: Use statistical tests like correlation or chi-square to evaluate feature relevance.
Wrapper Methods: Assess subsets of features by testing their impact on model performance.
Embedded Methods: Combine feature selection with model training, such as in LASSO regression.

Feature Extraction

Feature extraction involves transforming existing features into new ones. Popular techniques include:

Principal Component Analysis (PCA): Reduces dimensionality while retaining as much variance as possible.
Linear Discriminant Analysis (LDA): Creates features that maximize separation between classes.
Autoencoders: Neural networks that learn compressed, meaningful representations of data.

Comparison Table

Here’s a quick breakdown of when to use each approach:

Aspect	Feature Selection	Feature Extraction
Data Interpretability	High - Original features remain intact	Lower - Features are transformed
Computational Cost	Lower	Higher
Dimensionality	Limited by original features	Can create fewer dimensions
Domain Knowledge Use	Easier to incorporate	Harder to interpret

Practical Example: Text Classification

Feature Selection: Selecting key words based on frequency or importance scores.
Feature Extraction: Generating dense vector representations with methods like Word2Vec.

Choosing the Right Approach

Your decision will depend on several factors:

How much interpretability you need for your features.
The computational resources at your disposal.
The specific requirements of your machine learning task.
The quality and quantity of your training data.

Both methods play a key role in simplifying data and improving model performance. Next, we’ll dive into handling missing values, another critical aspect of feature engineering.

3. Methods to Handle Missing Data

Missing data in datasets can affect how well your model performs. Here’s a breakdown of the main approaches and when to use them.

Types of Missing Data

Missing Completely at Random (MCAR): No pattern exists in why data is missing.
Missing at Random (MAR): Missing values are related to other observed data.
Missing Not at Random (MNAR): Missing values depend on unobserved data.

Common Handling Techniques

Deletion Methods
These involve removing rows with missing values:
- Complete Case Analysis: Deletes rows with any missing values.
- Pairwise Deletion: Removes rows only for specific analyses.
  Useful when less than 5% of the data is missing and follows the MCAR type.
Simple Imputation
Replaces missing values with basic statistics:
- Mean/Median Imputation: For numerical data.
- Mode Imputation: For categorical data.
- Forward/Backward Fill: Effective for time series data.
Advanced Imputation

Method	Advantages	Best For
KNN Imputation	Considers relationships between features	Small to medium datasets
Multiple Imputation	Reflects uncertainty in missing data	Complex missing patterns
Model-based Imputation	Produces precise estimates	Large datasets with patterns

Choosing the Right Approach

When deciding how to handle missing data, consider these factors:

Data Volume: How much data can you afford to lose?
Missing Pattern: Is there an identifiable pattern in the missing data?
Feature Importance: How critical is the feature with missing values?
Resources Available: Do you have the computational power for advanced methods?

Best Practices

Investigate Missing Patterns: Understand why data is missing before taking action.
Document Your Process: Keep a record of the method used for transparency.
Validate Your Approach: Test how different methods affect model performance.
Leverage Domain Expertise: Missing values might carry specific meaning in certain contexts.

Monitoring Model Performance

When dealing with missing data, keep an eye on these metrics to evaluate the impact:

Accuracy before and after addressing missing data.
Changes in the distribution of imputed features.
Shifts in feature importance.
Cross-validation scores.

How you handle missing data can directly influence your model's success. Treat it as a crucial step in your feature engineering process. Up next, we’ll dive into managing categorical variables effectively.

4. Working with Categorical Variables

Now that we've covered handling missing data, let's dive into encoding categorical variables. Properly managing these variables can have a big impact on your model's performance.

Understanding Categorical Data Types

Categorical variables generally fall into two groups:

Nominal: Categories with no specific order (e.g., colors, product types)
Ordinal: Categories that follow a natural order (e.g., education levels, satisfaction ratings)

Common Encoding Techniques

Encoding Method	Best For	Pros	Cons
Label Encoding	Ordinal data	Saves memory, keeps category order	May suggest false relationships
One-Hot Encoding	Nominal data	Avoids implying order	Can create very large matrices
Target Encoding	High-cardinality features	Captures category-target links	Prone to overfitting
Binary Encoding	High-cardinality nominal	Reduces memory usage	Can reduce interpretability

Handling High Cardinality

Features with many unique categories need special care:

Frequency-Based Encoding: Combine less common categories into an "Other" group when they appear in less than 1% of the data or when there are more than 30 unique values.
Feature Hashing: Lowers the number of dimensions while maintaining acceptable model performance.
Embedding Techniques: Useful in deep learning, these methods capture complex relationships between categories.

Best Practices for Encoding

Analyze Category Distribution: Look at the frequency of categories before choosing an encoding method.
Plan for Unseen Categories: Decide how to handle categories not present in the training data.
Check Feature Interactions: Some encoding methods work better when paired with specific features.
Keep an Eye on Memory Usage: Encoding can significantly increase memory requirements.

Common Pitfalls to Avoid

Information Leakage: Be careful with target encoding during cross-validation to avoid data leakage.
Feature Explosion: One-hot encoding can create too many features, leading to inefficiency.
Encoding Missing Values: When appropriate, treat missing values as their own category.
Sparse Matrices: If memory is limited, consider alternatives to sparse matrices.

A solid validation strategy is key to ensuring your encoding choices work well for both performance and resource efficiency.

Validation Strategy

Test different encoding methods to compare model performance and memory use.
Look for multicollinearity in the encoded features.
Verify how the model handles unseen categories during testing.

The way you encode categorical variables affects both how well your model performs and how easy it is to interpret. Aim for a balance between efficiency and effectiveness.

5. Scaling Numerical Features

After encoding categorical variables, the next step is to scale numerical features. This step ensures your model doesn't favor features with larger ranges, which could skew training results. Mastering scaling techniques is a crucial skill for machine learning professionals and often comes up in interviews.

Why Scaling Is Important

When numerical features have vastly different ranges - like income ($30,000–$200,000) compared to age (18–80) - algorithms can unintentionally prioritize larger values. Scaling helps level the playing field.

Common Scaling Methods

Method	Formula	Best For	Key Notes
Min-Max Scaling	(x - min)/(max - min)	Data with defined bounds	Sensitive to outliers
Standard Scaling	(x - mean)/std	General use	Doesn't limit values to a range
Robust Scaling	(x - median)/IQR	Data with outliers	Requires more computation
Log Transform	log(x)	Right-skewed data	Only works for positive values

How to Choose the Right Scaler

The best scaling method depends on several factors:

Algorithm needs: Some models, like neural networks, rely heavily on scaled inputs.
Data distribution: Check if your data is skewed or has outliers.
Outliers: Robust scaling or log transformation can handle these better.
Interpretability: Consider how scaling affects the readability of your features.

Best Practices for Implementation

Fit scalers only on training data to avoid data leakage during validation or testing.
Handle missing values before scaling and document the parameters used.
Ensure scaled features retain their original relationships and relevance.

Special Notes

Tree-based models: These models, like random forests, don’t require scaling because they’re invariant to monotonic transformations.
Neural networks: These models perform better when features are scaled.
Distance-based algorithms: Scaling is critical for accurate distance calculations.

Building a Scaling Pipeline

A good pipeline should:

Validate inputs and handle missing values.
Apply the same scaling parameters to new data during inference.
Ensure consistency across training and testing datasets.

Avoid These Mistakes

Don’t scale target variables unless explicitly required.
Avoid using the wrong scaling method for skewed data.
Never apply log transformations to non-positive values.
Always scale new data using the parameters derived from training data.

Why It Matters

Scaling improves model performance by enhancing convergence, accuracy, and numerical stability while reducing the impact of outliers. Instead of blindly applying a single scaling method, tailor your approach to the specific needs of your data and model.

6. Feature Binning Methods

Feature binning, or discretization, is the process of converting continuous variables into categorical bins. This approach can help improve model performance by reducing noise and highlighting non-linear patterns.

Types of Binning Methods

Method	Description	Best Use Case	Considerations
Equal-Width	Divides the range into equal intervals	Works well with evenly distributed data	Highly sensitive to outliers
Equal-Frequency	Creates bins with the same number of observations	Ideal for skewed distributions	May combine very different ranges
Custom	Uses manually defined boundaries based on domain knowledge	Fits specific business needs	Requires expertise
Decision Tree	Splits bins using decision tree algorithms	Handles complex non-linear relationships	Can be computationally heavy

When to Use Feature Binning

To simplify high-cardinality features by reducing unique values
To capture non-linear patterns without adding polynomial features
To reduce the influence of outliers
To align features with meaningful, domain-specific categories

Implementation Best Practices

Analyze Your Data: Look at the distribution, outliers, and natural breaks before deciding on binning.
Choose the Right Number of Bins: Aim for 5 to 10 bins. Too few can oversimplify, while too many might lead to overfitting.

Common Pitfalls to Watch Out For

Oversimplification can cause loss of important information.
Be cautious of data leakage when setting binning parameters.
Address outliers and missing values before binning to avoid edge-case issues.
Ensure bins are meaningful and interpretable for stakeholders.

Advanced Binning Techniques

Monotonic Binning: Creates bins that maintain a consistent relationship between the feature and the target variable. This is particularly useful in credit scoring.
Dynamic Binning: Adjusts bin boundaries based on the target variable's distribution, aiming to enhance predictive accuracy.

How Binning Impacts Model Performance

The effect of binning varies by model type:

Linear Models: Benefit from binning as it helps capture non-linear patterns.
Tree-Based Models: Usually handle non-linear relationships on their own, so binning might not be necessary.
Neural Networks: Often work better with normalized continuous variables rather than binned features.

Validation Strategy

Test model performance both with and without binning to evaluate its impact.
Check the distribution of observations across bins to avoid imbalance.
Ensure that the bins align with business logic and objectives.
Apply the same binning strategy consistently to both training and test datasets.

With validated binned features, you can shift focus to creating meaningful feature interactions for your model.

7. Creating Feature Interactions

Feature interactions allow you to create new predictors by combining multiple features, helping to uncover relationships that improve model performance. Knowing how to build and use these interactions can make a big difference in your results.

Types of Feature Interactions

Interaction Type	Formula	Example Use	Purpose
Multiplicative	A × B	Price per square foot (`price × area`)	Captures scaling relationships
Additive	A + B	Combined risk scores	Aggregates related metrics
Ratio	A ÷ B	Body Mass Index (`weight ÷ height²`)	Normalizes data
Polynomial	A² or A × B²	Distance calculations	Models non-linear relationships

Examples of Domain-Specific Interactions

Financial Data

Debt-to-Income Ratio
Price-to-Earnings Ratio
Current Ratio

E-commerce

Click-through Rate
Conversion Rate
Average Order Value

Healthcare

Body Mass Index
Blood Pressure Ratios
Drug Dosage per Body Weight

Guidelines for Implementation

Start Simple
Pair features that logically make sense together.
Manage Complexity
Be cautious - creating too many interactions can lead to an explosion of features. For example, second-order interactions grow at n(n-1)/2.
Validate Effectiveness
- Test correlation with the target variable.
- Check for multicollinearity.
- Use cross-validation to confirm value.
- Monitor performance metrics to ensure improvement.

Advanced Techniques for Interaction Creation

Automated Discovery

Use tree-based models to detect important feature combinations.
Apply statistical tests to identify meaningful interactions.
Use regularization techniques to avoid overfitting.

Domain-Specific Adjustments

Time-based interactions for temporal datasets.
Geographic interactions for spatial data.
Hierarchical combinations for categorical variables.

Best Practices

Document Everything: Clearly label and explain each interaction.
Version Control: Keep track of all feature engineering changes.
Stay Logical: Ensure interactions are understandable to stakeholders.
Scale Thoughtfully: Scale interaction terms separately from original features if needed.

Watch Out For These Pitfalls

Adding redundant interactions that don't improve results.
Ignoring missing values in interaction terms.
Overcomplicating the model without meaningful gains.
Skipping validation on test data.

Example: Technical Implementation

# Creating interaction features
df['price_per_sqft'] = df['price'] / df['square_feet']
df['distance'] = np.sqrt(df['x']**2 + df['y']**2)
df['location_time'] = df['location'] + '_' + df['time_of_day']

When creating feature interactions, focus on logical combinations that align with your business goals. The aim is to highlight relationships that enhance model accuracy while keeping the model easy to interpret.

Next, we’ll dive into dimensionality reduction to handle the complexity of large feature sets.

8. Dimensionality Reduction

Dimensionality reduction simplifies your feature space, making it easier to work with high-dimensional data while improving model performance. Let’s break down the key techniques and considerations.

Principal Component Analysis (PCA)

PCA is a method that converts correlated features into uncorrelated components, ordered by the amount of variance they explain. This technique reduces complexity while retaining as much data variability as possible.

Key Points About PCA

Variance Explained: Aim to select components that account for 80-95% of the total variance.
Interpretability: Principal components can be hard to interpret in their transformed state.

Feature Selection Methods

Feature selection focuses on identifying the most relevant features for your model. Here’s a comparison of common approaches:

Method	Description	Best Use Case	Drawback
Filter	Uses statistical measures (e.g., correlation, chi-square)	Quick initial screening	May overlook feature interactions
Wrapper	Evaluates subsets of features with model performance	Thorough optimization	Resource-intensive
Embedded	Selects features during model training (e.g., Lasso, Ridge)	Automatic integration	Results depend on the model

Autoencoder Dimensionality Reduction

Autoencoders are neural networks designed to compress data into a smaller representation and then reconstruct it. They are particularly useful for non-linear relationships in data.

How to Use Autoencoders

Architecture Design
- Match the input layer to your feature count.
- Gradually reduce the size of hidden layers.
- Use a bottleneck layer to define the reduced dimensions.
Training Tips
- Choose a suitable loss function (e.g., Mean Squared Error for continuous data).
- Monitor reconstruction error to assess performance.
- Apply regularization techniques to avoid overfitting.

Domain-Specific Approaches

Dimensionality reduction methods often depend on the type of data you're working with:

Text Data: Use techniques like topic modeling or word embeddings.
Image Data: Employ convolutional autoencoders for better feature extraction.
Time Series: Account for temporal patterns when reducing dimensions.
Categorical Data: Try multiple correspondence analysis for effective compression.

Monitoring Performance

Keep an eye on these metrics to evaluate the effectiveness of your dimensionality reduction:

Information Retention: Check how much variance is preserved.
Model Performance: Compare accuracy before and after reduction.
Computational Efficiency: Measure training and inference times.
Memory Usage: Track how much storage the reduced data requires.

Example: PCA in Action

Here’s a Python snippet to apply PCA:

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Scale the features
X_scaled = StandardScaler().fit_transform(X)

# Apply PCA to retain 95% of variance
pca = PCA(n_components=0.95)
X_reduced = pca.fit_transform(X_scaled)

print(f"Number of components: {pca.n_components_}")
print(f"Total variance explained: {pca.explained_variance_ratio_.sum():.2%}")

Common Mistakes to Avoid

Overreduction: Cutting too many dimensions can result in losing critical information.
Skipping Scaling: PCA and other methods often require normalized data.
Ignoring Context: Always consider the specific needs of your domain and data.
Weak Validation: Test how dimensionality reduction impacts downstream tasks to ensure it’s effective.

Dimensionality reduction is a powerful tool, but it’s crucial to balance simplification with preserving meaningful information.

9. Time Series Feature Engineering

Time series feature engineering focuses on extracting patterns from time-based data to improve predictive models. It builds on standard techniques but emphasizes the unique aspects of temporal data.

Basic Time Components

Start by pulling out key time-related elements:

Hour of day
Day of the week
Month
Quarter
Year
Weekend or weekday indicator
Holiday flags

Rolling Window Features

Summarize trends over specific time periods using rolling window calculations:

Window Type	Common Metrics	Example Use Case
Simple Moving Average	Mean, Max, Min	Smooth short-term fluctuations
Exponential Moving Average	Weighted mean	Highlight recent changes
Rolling Standard Deviation	Volatility	Assess stability over time
Rolling Quantiles	25th, 75th percentiles	Track distribution shifts

Lag Features

Lag features help capture the influence of past values on the current state:

# Example of creating lag features
df['lag_1'] = df['value'].shift(1)  # Yesterday's value
df['lag_7'] = df['value'].shift(7)  # Value from one week ago
df['lag_30'] = df['value'].shift(30)  # Value from one month ago

Seasonal Decomposition

Break down a time series into its key components: trend, seasonality, and residuals. This helps uncover underlying patterns.

Domain-Specific Time Features

Customize features based on your industry or application:

Finance: Trading days, market hours
Retail: Shopping seasons, promotional events
Web Traffic: Peak browsing times, scheduled downtimes
Manufacturing: Production cycles, maintenance schedules

Date Difference Features

Calculate time intervals between events to uncover meaningful patterns:

# Example of date difference calculations
df['days_since_last_event'] = (df['current_date'] - df['last_event_date']).dt.days
df['days_until_next_event'] = (df['next_event_date'] - df['current_date']).dt.days

Time-Based Ratios

Use ratios to compare current values with past periods:

Current value vs. previous day's value
Current value vs. same day last week
Current value vs. same month last year

Best Practices

Handle Missing Data: Fill gaps using forward-fill or backward-fill methods.
Avoid Data Leakage: Ensure that features only use information available up to the prediction point.
Consider Scaling: Account for the cyclical nature of time-based features when scaling.
Check Stationarity: Apply transformations to stabilize non-stationary time series.

Feature Selection Tips

Begin with simple time-based features.
Incorporate industry-specific features as needed.
Experiment with different window sizes to find the optimal fit.
Use your model to test feature importance.
Keep an eye on computational efficiency.

These strategies help set the stage for building strong predictive models using time series data.

10. Testing Feature Quality

Testing feature quality ensures that the features you engineer actually improve your model's performance. Here's how you can do it:

Statistical Tests

Use these statistical methods to evaluate your features:

Correlation Analysis: Identify multicollinearity with Pearson or Spearman correlation.
Chi-Square Tests: Examine relationships between categorical features.
ANOVA: Test how features differ across target classes.
Information Gain: Quantify feature relevance in classification tasks.

Feature Importance Metrics

Different models provide tools to measure feature importance. Here's a quick overview:

Model Type	Importance Metric	What It Shows
Random Forest	Gini Importance	Reduction in node impurity
XGBoost	Feature Score	Contribution to split gain
Linear Models	Coefficient Values	Magnitude of feature weights
LASSO/Ridge	Regularization Path	Order of feature selection

Cross-Validation Impact

Check feature impact using cross-validation:

# Example: Evaluating feature impact
baseline_score = cross_val_score(model, X_base, y).mean()
new_feature_score = cross_val_score(model, X_with_new, y).mean()
improvement = ((new_feature_score - baseline_score) / baseline_score) * 100

Stability Analysis

Test features under varying conditions to ensure reliability:

Time Stability: Does the feature perform consistently over time?
Population Stability: Does it behave similarly across different groups of data?
Missing Value Impact: How does it handle missing data?
Outlier Sensitivity: Does it remain robust against extreme values?

After confirming stability, weigh the costs and benefits of using each feature.

Feature Cost-Benefit Analysis

Think about practical considerations when implementing features:

Computation Time: How much processing power is needed?
Storage Requirements: How much memory does it take up?
Maintenance Effort: How complex is it to update?
Performance Gain: How much does it improve the model?

Common Pitfalls

Avoid these common mistakes when testing features:

Data Leakage: Accidentally including future data in your features.
Selection Bias: Testing only on data splits that favor the feature.
Overfitting: Creating too many features that don't generalize well.
Redundancy: Adding features that are highly correlated with existing ones.

Documentation Requirements

Keep detailed records for every feature:

How it was created and its dependencies.
Validation results and performance metrics.
How often it needs updates.
Known limitations or edge cases.
Its impact on overall model performance.

Conclusion

Excelling in feature engineering is key to thriving in machine learning interviews and roles. From managing missing data to evaluating feature quality, these skills highlight your technical knowledge and problem-solving abilities. Strong feature engineering expertise not only equips you for tough interviews but also makes landing the job more achievable.

While technical preparation is essential, job hunting can be time-consuming. Shubham Dhakle, Outcome Manager at Scale.jobs, emphasizes:

"Focus on interview prep - we handle the rest"

Here are some effective strategies to prepare:

Brush Up on Core Concepts
Understand selection, scaling, and dimensionality reduction - key topics for tech interviews.
Practice Real-World Applications
Work on handling missing data, creating feature interactions, scaling data, and validating features using actual datasets.
Anticipate Common Challenges
Be ready to discuss how you choose techniques, handle different data types, validate features, and tackle edge cases.

These steps not only enhance your technical proficiency but also make your job search more efficient. As Scale.jobs user Anuva Agarwal shares:

"I would recommend trying out Scale.jobs to anyone looking to make more time in their schedule for interview prep and networking, so that the repetitive portion of the job application process can be outsourced"

Feature engineering combines both theory and hands-on skills. Gaining this balance through consistent practice and preparation will set you up for success in machine learning roles.