Exam Practice | Data Science

L01 Python Setup

Q1 Easy L01

Which of the following is NOT a valid Python data type?

A. decimal
B. float
C. str
D. int

Solution

Answer: A
Python has int, float, str, bool as basic types. 'decimal' is a module, not a built-in type.

Q2 Easy L01

What is the result of 7 // 2 in Python?

A. 3.0
B. 4
C. 3.5
D. 3

Solution

Answer: D
// is floor division, which returns the integer part of the division.

Q3 Easy L01

Which operator is used for exponentiation in Python?

A. **
B. exp()
C. ^^
D. ^

Solution

Answer: A
** is the exponentiation operator. ^ is bitwise XOR in Python.

L02 Data Structures

Q4 Easy L02

What is the output of [1, 2, 3][1]?

A. 3
B. 2
C. [2]
D. 1

Solution

Answer: B
Python uses zero-based indexing, so index 1 returns the second element.

Q5 Easy L02

Which method adds an element to the end of a list?

A. extend()
B. append()
C. add()
D. insert()

Solution

Answer: B
append() adds a single element to the end of a list.

Q6 Easy L02

What does stocks[-1] return for stocks = ['AAPL', 'GOOGL', 'MSFT']?

A. 'GOOGL'
B. Error
C. 'MSFT'
D. 'AAPL'

Solution

Answer: C
Negative indexing starts from the end; -1 is the last element.

L03 Control Flow

Q7 Easy L03

What keyword starts a conditional statement in Python?

A. case
B. if
C. switch
D. when

Solution

Answer: B
Python uses 'if' for conditional statements.

Q8 Easy L03

What is the output of: for i in range(3): print(i)?

A. 1 2
B. 0 1 2 3
C. 1 2 3
D. 0 1 2

Solution

Answer: D
range(3) produces 0, 1, 2.

Q9 Easy L03

Which keyword is used for the 'otherwise' condition?

A. default
B. elif
C. else
D. otherwise

Solution

Answer: C
else is used for the default case when no conditions match.

L04 Functions

Q10 Easy L04

Which keyword is used to define a function in Python?

A. func
B. def
C. define
D. function

Solution

Answer: B
Python uses 'def' to define functions.

Q11 Easy L04

What does a function return if no return statement is used?

A. Error
B. None
C. 0
D. Empty string

Solution

Answer: B
Functions without explicit return return None.

Q12 Easy L04

What is the purpose of a docstring?

A. To document the function
B. To import modules
C. To define variables
D. To run code

Solution

Answer: A
Docstrings document what a function does.

L05 DataFrames Introduction

Q13 Easy L05

Which library provides the DataFrame data structure?

A. Matplotlib
B. Pandas
C. Scikit-learn
D. NumPy

Solution

Answer: B
Pandas provides DataFrames for tabular data.

Q14 Easy L05

What is the difference between a Series and a DataFrame?

A. Series is for text only
B. No difference
C. Series is 1D, DataFrame is 2D
D. Series is 2D, DataFrame is 1D

Solution

Answer: C
Series is a single column; DataFrame is a table.

Q15 Easy L05

Which function reads a CSV file into a DataFrame?

A. pd.read_csv()
B. pd.import_csv()
C. pd.open_csv()
D. pd.load_csv()

Solution

Answer: A
pd.read_csv() reads CSV files into DataFrames.

L06 Selection Filtering

Q16 Easy L06

Which method selects rows by label?

A. select
B. loc
C. at
D. iloc

Solution

Answer: B
loc uses labels; iloc uses integer positions.

Q17 Easy L06

What does df.iloc[0] return?

A. Error
B. First cell
C. First column
D. First row

Solution

Answer: D
iloc[0] selects the first row by position.

Q18 Easy L06

How do you filter df where column 'price' > 100?

A. df[df['price'] > 100]
B. df.where(price > 100)
C. df.filter(price > 100)
D. df[price > 100]

Solution

Answer: A
Boolean indexing: df[condition].

L07 Missing Data

Q19 Easy L07

What value represents missing data in pandas?

A. NA
B. NaN
C. All of the above
D. None

Solution

Answer: C
Pandas recognizes None, NaN, and NA as missing.

Q20 Easy L07

Which method checks for missing values?

A. df.empty()
B. df.null()
C. df.missing()
D. df.isna()

Solution

Answer: D
isna() or isnull() checks for missing values.

Q21 Easy L07

What does df.dropna() do?

A. Removes rows with any missing values
B. Replaces with zeros
C. Counts missing values
D. Fills missing values

Solution

Answer: A
dropna() removes rows containing NaN by default.

L08 Basic Operations

Q22 Easy L08

What does df['price'] * 2 do?

A. Creates 2 columns
B. Multiplies each value by 2
C. Error
D. Doubles the column name

Solution

Answer: B
Operations are applied element-wise.

Q23 Easy L08

How do you add a new column 'total' as 'price' * 'quantity'?

A. df.create('total')
B. df.total = df.price * df.quantity
C. df['total'] = df['price'] * df['quantity']
D. df.add('total', 'price' * 'quantity')

Solution

Answer: C
Assign result of operation to new column.

Q24 Easy L08

What does df.sort_values('price') do?

A. Sorts by price ascending
B. Counts values
C. Sorts columns
D. Sorts index

Solution

Answer: A
sort_values sorts by specified column.

L09 GroupBy Operations

Q25 Easy L09

What does df.groupby('sector') do?

A. Filters by sector
B. Creates groups by sector
C. Counts sectors
D. Sorts by sector

Solution

Answer: B
groupby() creates groups for aggregation.

Q26 Easy L09

What is the split-apply-combine pattern?

A. Import pattern
B. File operations
C. Data cleaning steps
D. GroupBy workflow: split into groups, apply function, combine results

Solution

Answer: D
GroupBy splits data, applies aggregation, combines results.

Q27 Easy L09

What does df.groupby('sector')['price'].mean() return?

A. Series with mean price per sector
B. List
C. Single number
D. DataFrame

Solution

Answer: A
Returns Series indexed by group keys.

L10 Merging Joining

Q28 Easy L10

What is the default join type in pd.merge()?

A. inner
B. right
C. outer
D. left

Solution

Answer: A
Inner join is the default, keeping only matching rows.

Q29 Easy L10

What does left join do?

A. Keeps all rows from both
B. Keeps only matching rows
C. Keeps all rows from left DataFrame
D. Keeps all rows from right DataFrame

Solution

Answer: C
Left join keeps all left rows, fills NaN for non-matches.

Q30 Easy L10

What is pd.concat() used for?

A. Stacking DataFrames vertically or horizontally
B. Aggregating data
C. Filtering data
D. Joining on keys

Solution

Answer: A
concat() stacks DataFrames along an axis.

L11 NumPy Basics

Q31 Easy L11

What is the main data structure in NumPy?

A. DataFrame
B. ndarray
C. Dictionary
D. List

Solution

Answer: B
NumPy's core is the n-dimensional array (ndarray).

Q32 Easy L11

What does np.array([1, 2, 3]) create?

A. Matrix
B. 2D array
C. List
D. 1D array

Solution

Answer: D
Creates a 1-dimensional NumPy array.

Q33 Easy L11

What is broadcasting in NumPy?

A. Automatic size matching for operations
B. Copying arrays
C. Printing arrays
D. Sending arrays over network

Solution

Answer: A
Broadcasting allows operations on arrays of different shapes.

L12 Time Series

Q34 Easy L12

What does pd.to_datetime() do?

A. Creates time delta
B. Converts to datetime object
C. Formats date
D. Converts to date string

Solution

Answer: B
to_datetime() parses strings/numbers to datetime objects.

Q35 Easy L12

What is a DatetimeIndex?

A. Time zone
B. Date column
C. List of dates
D. Index with datetime values

Solution

Answer: D
DatetimeIndex is pandas index type for time series.

Q36 Easy L12

What does df.resample('M') do?

A. Groups by month for aggregation
B. Sorts by month
C. Creates monthly dates
D. Filters monthly

Solution

Answer: A
resample() groups time series for aggregation.

L13 Descriptive Statistics

Q37 Easy L13

What does df.describe() return?

A. Column names
B. Summary statistics
C. Shape
D. Data types

Solution

Answer: B
describe() returns count, mean, std, min, quartiles, max.

Q38 Easy L13

What is the median?

A. Range midpoint
B. Average value
C. Most frequent value
D. Middle value when sorted

Solution

Answer: D
Median is the 50th percentile.

Q39 Easy L13

What is the mode?

A. Average value
B. Largest value
C. Most frequent value
D. Middle value

Solution

Answer: C
Mode is the most commonly occurring value.

L14 Distributions

Q40 Easy L14

What characterizes a normal distribution?

A. Uniform
B. Bell-shaped, symmetric
C. Bimodal
D. Skewed right

Solution

Answer: B
Normal distribution is symmetric and bell-shaped.

Q41 Easy L14

What parameters define $N(\mu, \sigma^2)$?

A. Min and max
B. Mode and range
C. $\mu$ only
D. $\mu$ and $\sigma^2$

Solution

Answer: D
Normal is defined by $\mu$ (mean) and $\sigma$ (std).

Q42 Easy L14

What is the $68$-$95$-$99.7$ rule?

A. % within $\pm 1\sigma$, $\pm 2\sigma$, $\pm 3\sigma$
B. Sample sizes
C. Confidence levels
D. Percentile ranks

Solution

Answer: A
Empirical rule: $68\%$ within $\pm 1\sigma$, $95\%$ within $\pm 2\sigma$.

L15 Hypothesis Testing

Q43 Easy L15

What is the null hypothesis ($H_0$)?

A. Alternative claim
B. Default assumption of no effect
C. Research hypothesis
D. What we want to prove

Solution

Answer: B
$H_0$ is the default position we test against.

Q44 Easy L15

What is a $p$-value?

A. Effect size
B. Confidence level
C. Probability $H_0$ is true
D. $P(\text{data} \mid H_0)$

Solution

Answer: D
$p = P(\text{data as extreme} \mid H_0 \text{ true})$.

Q45 Easy L15

What does $p < 0.05$ mean?

A. Result unlikely under $H_0$
B. $H_1$ is proven
C. Effect is large
D. $H_0$ is false

Solution

Answer: A
Low $p$-value suggests data unlikely under null.

L16 Correlation

Q46 Easy L16

What does correlation measure?

A. Difference
B. Linear relationship strength
C. Variability
D. Causation

Solution

Answer: B
Correlation measures strength and direction of linear relationship.

Q47 Easy L16

What is the range of Pearson correlation $r$?

A. $-\infty$ to $+\infty$
B. $0$ to $+\infty$
C. $0$ to $1$
D. $-1$ to $+1$

Solution

Answer: D
$r \in [-1, +1]$.

Q48 Easy L16

What does $r = 0$ indicate?

A. No linear relationship
B. Causation
C. Perfect positive
D. Perfect negative

Solution

Answer: A
$r = 0$ means no linear correlation (may have nonlinear).

L17 Matplotlib Basics

Q49 Easy L17

What does plt.figure() create?

A. Plot
B. New figure container
C. Data
D. Axes

Solution

Answer: B
figure() creates a new figure (canvas) for plotting.

Q50 Easy L17

What does plt.subplots(2, 3) return?

A. 3 figures
B. Single axis
C. 6 figures
D. Figure and 2x3 array of axes

Solution

Answer: D
Returns (figure, axes array) with 2 rows, 3 columns.

Q51 Easy L17

How do you set figure size?

A. plt.figure(figsize=(10, 6))
B. fig.set_size(10, 6)
C. plt.figsize(10, 6)
D. plt.size(10, 6)

Solution

Answer: A
figsize parameter sets width, height in inches.

L18 Seaborn Plots

Q52 Easy L18

What is Seaborn built on?

A. Bokeh
B. Matplotlib
C. D3.js
D. Plotly

Solution

Answer: B
Seaborn is built on top of matplotlib.

Q53 Easy L18

What does sns.set_theme() do?

A. Exports theme
B. Resets theme
C. Creates theme
D. Sets default styling

Solution

Answer: D
set_theme() applies seaborn's aesthetic defaults.

Q54 Easy L18

What does sns.histplot() create?

A. Histogram with optional KDE
B. Line plot
C. Bar chart
D. Scatter plot

Solution

Answer: A
histplot() creates histogram, can overlay KDE.

L19 Multi Panel Figures

Q55 Easy L19

What does plt.subplots(2, 2) create?

A. 2 figures with 2 axes each
B. 2x2 grid of axes in one figure
C. Error
D. 4 separate figures

Solution

Answer: B
Creates figure with 2 rows and 2 columns of axes.

Q56 Easy L19

How do you access axes in 2x2 grid?

A. axes(0, 0)
B. axes.get(0, 0)
C. axes[0]
D. axes[row, col]

Solution

Answer: D
Index with [row, col] for 2D axes array.

Q57 Easy L19

What does sharex=True do?

A. Subplots share x-axis limits
B. Links x labels
C. Copies x data
D. Shares data

Solution

Answer: A
sharex synchronizes x-axis limits across subplots.

L20 Data Storytelling

Q58 Easy L20

What is data storytelling?

A. Fictional stories
B. Combining data, visuals, and narrative
C. Database queries
D. Raw data display

Solution

Answer: B
Data storytelling communicates insights through narrative.

Q59 Easy L20

What should a chart title convey?

A. Author name
B. Variable names
C. Data source
D. Key insight or finding

Solution

Answer: D
Titles should highlight the main takeaway.

Q60 Easy L20

What is the data-ink ratio?

A. Ratio of data to non-data ink
B. Resolution
C. Color ratio
D. Print quality

Solution

Answer: A
Maximize data-ink, minimize chartjunk (Tufte).

L21 Linear Regression

Q61 Easy L21

What does linear regression predict?

A. Probabilities only
B. Continuous values
C. Binary outcomes
D. Categories

Solution

Answer: B
Linear regression predicts continuous target variables.

Q62 Easy L21

What is the simple linear regression equation?

A. $y = \log(x)$
B. $y = x^2$
C. $y = mx$
D. $y = \beta_0 + \beta_1 x$

Solution

Answer: D
The equation is $y = \beta_0 + \beta_1 x$ where $\beta_0$ is the intercept and $\beta_1$ is the slope.

Q63 Easy L21

What does the slope $\beta_1$ represent?

A. Change in $y$ per unit change in $x$
B. $R^2$ value
C. Error term
D. Y-intercept

Solution

Answer: A
The slope $\beta_1$ represents the rate of change of $y$ with respect to $x$.

L22 Regularization

Q64 Easy L22

What is regularization?

A. Data normalization
B. Adding penalty to prevent overfitting
C. Feature selection only
D. Data cleaning

Solution

Answer: B
Regularization adds a penalty term to reduce overfitting by constraining coefficient magnitudes.

Q65 Easy L22

What does $L_2$ regularization penalize?

A. Residuals
B. Number of features
C. Absolute values $|\beta|$
D. Sum of squared coefficients $\sum \beta_j^2$

Solution

Answer: D
$L_2$ (Ridge) penalty is $\lambda \sum_{j=1}^{p} \beta_j^2$.

Q66 Easy L22

What does $L_1$ regularization penalize?

A. Sum of absolute coefficients $\sum |\beta_j|$
B. Variance
C. Feature count
D. Squared values

Solution

Answer: A
$L_1$ (Lasso) penalty is $\lambda \sum_{j=1}^{p} |\beta_j|$.

L23 Regression Metrics

Q67 Easy L23

What is MSE?

A. Maximum Squared Error
B. Mean Squared Error
C. Minimum Square Estimate
D. Mean Standard Error

Solution

Answer: B
$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2$, the mean of squared residuals.

Q68 Easy L23

What is RMSE?

A. Ratio Mean Squared Error
B. Random Mean Square
C. Root Mean Squared Error
D. Relative Mean Standard Error

Solution

Answer: C
$RMSE = \sqrt{MSE}$, has same units as target variable $y$.

Q69 Easy L23

What is MAE?

A. Mean Absolute Error
B. Mean Average Error
C. Median Absolute Error
D. Maximum Absolute Error

Solution

Answer: A
$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_i - \hat{y}_i|$, mean of absolute residuals.

L24 Factor Models

Q70 Easy L24

What is a factor model?

A. Division model
B. Model explaining returns via common factors
C. Single variable model
D. Multiplication model

Solution

Answer: B
Factor models attribute returns to systematic factors.

Q71 Easy L24

What is the CAPM single factor?

A. Momentum
B. Value
C. Size
D. Market excess return

Solution

Answer: D
CAPM uses market return minus risk-free rate.

Q72 Easy L24

What does beta measure in CAPM?

A. Sensitivity to market
B. Return
C. Volatility
D. Alpha

Solution

Answer: A
$\beta$ is systematic risk exposure.

L25 Logistic Regression

Q73 Easy L25

What does logistic regression predict?

A. Clusters
B. Probabilities $P(y=1|x)$ for classification
C. Rankings
D. Continuous values

Solution

Answer: B
Logistic regression predicts class probabilities $P(y=1|x)$.

Q74 Easy L25

What is the sigmoid function?

A. Exponential function
B. Step function
C. Linear function
D. S-shaped function mapping to $(0,1)$

Solution

Answer: D
Sigmoid: $\sigma(z) = \frac{1}{1 + e^{-z}}$.

Q75 Easy L25

What is the range of sigmoid output?

A. $(0, 1)$
B. $[0, \infty)$
C. $(-\infty, \infty)$
D. $[-1, 1]$

Solution

Answer: A
Sigmoid outputs probabilities in the range $(0, 1)$.

L26 Decision Trees

Q76 Easy L26

What is a decision tree?

A. Neural network
B. Tree of if-then rules
C. Clustering algorithm
D. Linear model

Solution

Answer: B
Decision trees split data using hierarchical rules.

Q77 Easy L26

What is a leaf node?

A. Branch
B. Split point
C. Root
D. Terminal node with prediction

Solution

Answer: D
Leaf nodes contain final predictions.

Q78 Easy L26

What is Gini impurity?

A. Probability of misclassification
B. Entropy
C. Information gain
D. Purity measure

Solution

Answer: A
Gini measures misclassification probability $\text{Gini} = 1 - \sum p_i^2$.

L27 Classification Metrics

Q79 Easy L27

What is accuracy?

A. Precision
B. Correct predictions / total predictions
C. Recall
D. True positives only

Solution

Answer: B
$\text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}}$

Q80 Easy L27

When is accuracy misleading?

A. Never misleading
B. With balanced classes
C. Always reliable
D. With imbalanced classes

Solution

Answer: D
Accuracy can be high by predicting majority class.

Q81 Easy L27

What is precision?

A. $\text{TP} / (\text{TP} + \text{FP})$
B. Accuracy
C. $\text{TN} / (\text{TN} + \text{FP})$
D. $\text{TP} / (\text{TP} + \text{FN})$

Solution

Answer: A
$\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}$

L28 Class Imbalance

Q82 Easy L28

What is class imbalance?

A. Missing classes
B. Unequal distribution of classes
C. Too many classes
D. Equal class sizes

Solution

Answer: B
Imbalance when one class is much more frequent.

Q83 Easy L28

Why is imbalance problematic?

A. Better accuracy
B. More data
C. Models train faster
D. Model biased toward majority class

Solution

Answer: D
Models can achieve high accuracy by ignoring minority.

Q84 Easy L28

What is oversampling?

A. Replicating or synthesizing minority samples
B. Sampling less
C. Collecting more data
D. Removing majority samples

Solution

Answer: A
Oversampling increases minority class representation.

L29 KMeans Clustering

Q85 Easy L29

What type of learning is clustering?

A. Reinforcement
B. Unsupervised
C. Semi-supervised
D. Supervised

Solution

Answer: B
Clustering has no target labels.

Q86 Easy L29

What does K-Means minimize?

A. Variance
B. Between-cluster distance
C. Number of clusters
D. Within-cluster sum of squared distances

Solution

Answer: D
$k$-Means minimizes inertia $\sum_{i} \sum_{x \in C_i} \|x - \mu_i\|^2$.

Q87 Easy L29

What is a centroid?

A. Cluster center (mean)
B. Boundary
C. Outlier
D. Edge point

Solution

Answer: A
Centroid is the mean of cluster points.

L30 Hierarchical Clustering

Q88 Easy L30

What is hierarchical clustering?

A. K-Means variant
B. Tree-based nested clusters
C. Single level clustering
D. Flat partitioning

Solution

Answer: B
Hierarchical builds nested cluster hierarchy.

Q89 Easy L30

What is a dendrogram?

A. Distance matrix
B. Scatter plot
C. Cluster label
D. Tree diagram showing cluster merges

Solution

Answer: D
Dendrogram visualizes hierarchical structure.

Q90 Easy L30

What is agglomerative clustering?

A. Bottom-up merging
B. Random clustering
C. K-Means
D. Top-down splitting

Solution

Answer: A
Agglomerative starts with each point, merges up.

L31 PCA

Q91 Easy L31

What does PCA stand for?

A. Partial Correlation Analysis
B. Principal Component Analysis
C. Predictive Clustering Algorithm
D. Primary Component Analysis

Solution

Answer: B
Principal Component Analysis.

Q92 Easy L31

What is the goal of PCA?

A. Regression
B. Clustering
C. Classification
D. Dimensionality reduction preserving variance

Solution

Answer: D
PCA reduces dimensions while keeping variance.

Q93 Easy L31

What are principal components?

A. Orthogonal directions of maximum variance
B. Random directions
C. Cluster centers
D. Original features

Solution

Answer: A
PCs are uncorrelated directions capturing variance.

L32 ML Pipeline

Q94 Easy L32

What is an sklearn Pipeline?

A. Visualization tool
B. Sequence of transforms + estimator
C. Database connection
D. Data flow

Solution

Answer: B
Pipeline chains preprocessing and model.

Q95 Easy L32

Why use pipelines?

A. Optional convenience
B. More verbose
C. Slower processing
D. Prevents data leakage, cleaner code

Solution

Answer: D
Pipelines ensure proper fit/transform separation.

Q96 Easy L32

What is data leakage?

A. Test data influencing training
B. Security breach
C. Data loss
D. Memory leak

Solution

Answer: A
Leakage contaminates model with test info.

L33 Perceptron

Q97 Easy L33

What is a perceptron?

A. Clustering algorithm
B. Single artificial neuron
C. Optimization method
D. Deep network

Solution

Answer: B
Perceptron is simplest neural network unit.

Q98 Easy L33

What was the perceptron inspired by?

A. Economics
B. Statistics
C. Computers
D. Biological neurons

Solution

Answer: D
Modeled after biological neural processing.

Q99 Hard L33

What does a perceptron compute?

A. Weighted sum + activation
B. Correlation
C. Distance
D. Mean

Solution

Answer: A
Output = activation(weighted_sum + bias).

L34 MLP Activations

Q100 Easy L34

What is an MLP?

A. Maximum Likelihood Predictor
B. Multi-Layer Perceptron
C. Multiple Linear Predictor
D. Machine Learning Program

Solution

Answer: B
MLP = neural network with hidden layers.

Q101 Easy L34

What are hidden layers?

A. Missing layers
B. Input layers
C. Output layers
D. Layers between input and output

Solution

Answer: D
Hidden layers are internal processing layers.

Q102 Easy L34

Why do we need hidden layers?

A. Learn non-linear patterns
B. For linear problems
C. Reduce computation
D. Simplicity

Solution

Answer: A
Hidden layers enable non-linear decision boundaries.

L35 Backpropagation

Q103 Easy L35

What is the forward pass?

A. Weight update
B. Computing output from input
C. Error propagation
D. Gradient calculation

Solution

Answer: B
Forward pass computes predictions.

Q104 Easy L35

What is backpropagation?

A. Data preprocessing
B. Random update
C. Forward computation
D. Backward gradient propagation

Solution

Answer: D
Backprop calculates gradients via chain rule.

Q105 Easy L35

What mathematical concept underlies backpropagation?

A. Chain rule of calculus
B. Multiplication
C. Addition
D. Integration

Solution

Answer: A
Chain rule propagates gradients through layers.

L36 Overfitting Prevention

Q106 Easy L36

What is overfitting?

A. Perfect generalization
B. Model memorizes training data
C. Underfitting
D. Model too simple

Solution

Answer: B
Overfitting: low training error, high test error.

Q107 Easy L36

What is underfitting?

A. Overfitting
B. Perfect fit
C. Too complex model
D. Model too simple to capture patterns

Solution

Answer: D
Underfitting: high error on both train and test.

Q108 Easy L36

What is dropout?

A. Randomly zeroing neurons during training
B. Data removal
C. Dropping layers
D. Removing features

Solution

Answer: A
Dropout randomly deactivates neurons.

L37 Text Preprocessing

Q109 Easy L37

What is tokenization?

A. Translation
B. Splitting text into tokens (words/subwords)
C. Compression
D. Encryption

Solution

Answer: B
Tokenization breaks text into units.

Q110 Easy L37

What is lowercasing for?

A. Speed
B. Encryption
C. Aesthetics
D. Reducing vocabulary size

Solution

Answer: D
Lowercasing treats 'The' and 'the' as same.

Q111 Easy L37

What are stop words?

A. Common words with little meaning (the, is, a)
B. Technical terms
C. Rare words
D. Important words

Solution

Answer: A
Stop words are filtered out often.

L38 BOW TFIDF

Q112 Easy L38

What is Bag of Words (BOW)?

A. Grammar model
B. Document as word frequency vector
C. Translation model
D. Word order model

Solution

Answer: B
BOW ignores order, counts word occurrences.

Q113 Easy L38

What does BOW ignore?

A. Documents
B. Frequency
C. Words
D. Word order

Solution

Answer: D
BOW treats document as unordered word set.

Q114 Easy L38

What does CountVectorizer produce?

A. Term-document matrix with counts
B. Word2vec
C. TF-IDF
D. Word embeddings

Solution

Answer: A
CountVectorizer creates term frequency matrix.

L39 Word Embeddings

Q115 Easy L39

What are word embeddings?

A. Word counts
B. Dense vector representations of words
C. TF-IDF
D. One-hot vectors

Solution

Answer: B
Embeddings map words to dense vectors.

Q116 Easy L39

What is the key property of word embeddings?

A. High dimensional
B. Random
C. Sparse
D. Similar words have similar vectors

Solution

Answer: D
Semantic similarity = vector similarity.

Q117 Easy L39

What is Word2Vec?

A. Neural network for learning embeddings
B. Document encoder
C. TF-IDF variant
D. Word counter

Solution

Answer: A
Word2Vec learns embeddings from context.

L40 Sentiment Analysis

Q118 Easy L40

What is sentiment analysis?

A. Translation
B. Determining emotional tone of text
C. Summarization
D. Grammar checking

Solution

Answer: B
Sentiment analysis classifies positive/negative/neutral.

Q119 Easy L40

What are common sentiment categories?

A. Languages
B. Colors
C. Numbers
D. Positive, negative, neutral

Solution

Answer: D
Basic sentiment: positive, negative, neutral.

Q120 Easy L40

What is lexicon-based sentiment analysis?

A. Using word sentiment dictionaries
B. Random classification
C. Deep learning
D. Machine learning

Solution

Answer: A
Lexicon: predefined word sentiment scores.

L41 Model Serialization

Q121 Easy L41

What is model serialization?

A. Model evaluation
B. Saving model to file
C. Data processing
D. Model training

Solution

Answer: B
Serialization converts model to saveable format.

Q122 Easy L41

Why serialize models?

A. More data
B. Better accuracy
C. Faster training
D. Reuse without retraining

Solution

Answer: D
Save trained models for later use.

Q123 Easy L41

What is pickle?

A. Python object serialization module
B. Data format
C. Model type
D. Vegetable

Solution

Answer: A
pickle serializes Python objects.

L42 FastAPI

Q124 Easy L42

What is FastAPI?

A. ML library
B. Modern Python web framework for APIs
C. Visualization
D. Database

Solution

Answer: B
FastAPI builds fast, modern APIs.

Q125 Easy L42

What makes FastAPI 'fast'?

A. Manual coding
B. No features
C. Slow actually
D. Async support, Starlette, Pydantic

Solution

Answer: D
Built on fast async foundations.

Q126 Easy L42

What is a REST API?

A. Representational State Transfer interface
B. Random API
C. Real-time API
D. Sleep API

Solution

Answer: A
REST: HTTP-based stateless interface.

L43 Streamlit Dashboards

Q127 Easy L43

What is Streamlit?

A. ML library
B. Python framework for data apps
C. API framework
D. Database

Solution

Answer: B
Streamlit creates interactive data apps.

Q128 Easy L43

Main advantage of Streamlit?

A. Slow
B. Requires HTML/CSS
C. Complex setup
D. Simple Python scripts become apps

Solution

Answer: D
Pure Python, no frontend knowledge needed.

Q129 Easy L43

How to run Streamlit app?

A. streamlit run app.py
B. uvicorn app
C. flask run
D. python app.py

Solution

Answer: A
streamlit run command.

L44 Cloud Deployment

Q130 Easy L44

What is cloud deployment?

A. Desktop app
B. Hosting on remote servers
C. USB drive
D. Local installation

Solution

Answer: B
Cloud runs apps on remote infrastructure.

Q131 Easy L44

What is Docker?

A. Database
B. Programming language
C. Cloud provider
D. Containerization platform

Solution

Answer: D
Docker packages apps in containers.

Q132 Easy L44

What is a Docker container?

A. Lightweight isolated environment
B. Database
C. Physical server
D. Virtual machine

Solution

Answer: A
Containers share OS, isolate apps.

L45 Project Work 1

Q133 Easy L45

What is the first step in a data science project?

A. Deployment
B. Problem definition
C. Visualization
D. Model training

Solution

Answer: B
Start with clear problem statement.

Q134 Easy L45

What is EDA?

A. Efficient Data Algorithm
B. External Data Access
C. Error Detection Algorithm
D. Exploratory Data Analysis

Solution

Answer: D
EDA explores and understands data.

Q135 Easy L45

What does EDA typically include?

A. Summary stats, visualizations, distributions
B. Production code
C. Deployment
D. Model training

Solution

Answer: A
EDA reveals data patterns and issues.

L46 Project Work 2

Q136 Easy L46

What is model selection?

A. Using all models
B. Choosing best model for task
C. Ignoring models
D. Random choice

Solution

Answer: B
Select model based on performance and requirements.

Q137 Easy L46

What is hyperparameter tuning?

A. Data cleaning
B. Feature selection
C. Training
D. Optimizing model configuration

Solution

Answer: D
Tuning finds optimal hyperparameters.

Q138 Easy L46

What is ensemble learning?

A. Combining multiple models
B. Feature engineering
C. Model deletion
D. Single model

Solution

Answer: A
Ensembles aggregate predictions.

L47 ML Ethics

Q139 Easy L47

What is algorithmic bias?

A. User bias
B. Systematic unfairness in model outputs
C. No bias
D. Algorithm preference

Solution

Answer: B
Bias leads to unfair treatment of groups.

Q140 Easy L47

Where does bias in ML come from?

A. Hardware
B. Only algorithms
C. Perfect data
D. Training data, features, labels

Solution

Answer: D
Bias from biased data and design choices.

Q141 Easy L47

What is fairness in ML?

A. Equitable treatment across groups
B. Complex models
C. Fast predictions
D. High accuracy

Solution

Answer: A
Fairness: equal treatment regardless of group.

L48 Final Presentations

Q142 Easy L48

What is the purpose of final presentation?

A. Data entry
B. Communicate project results
C. Model training
D. Code review

Solution

Answer: B
Present findings to stakeholders.

Q143 Easy L48

Who is the audience for data science presentations?

A. No audience
B. Only executives
C. Only technical
D. Mixed technical and business

Solution

Answer: D
Adapt to mixed audiences.

Q144 Easy L48

What should an executive summary include?

A. Key findings and recommendations
B. Raw data
C. Technical details only
D. All code

Solution

Answer: A
Executive summary: high-level takeaways.