(SEM V) THEORY EXAMINATION 2023-24 STATISTICAL COMPUTING

B.Tech Engineering 0 downloads

₹29.00

SECTION A – Short Answers (2 × 10 = 20 Marks)

(a) Significance of Measures of Dispersion

Measures of dispersion describe how data values are spread around the mean.

Help identify variability, reliability, and consistency of data.

Examples:

Range = Max − Min → simple but affected by outliers.

Standard Deviation (SD) → average distance from mean; best for normally distributed data.
Difference: SD is more robust; range gives only extremes.

(b) Concept of Mean

The mean (average) is the sum of all values divided by the number of observations:

Xˉ=∑Xin\bar{X} = \frac{\sum X_i}{n}Xˉ=n∑Xi

Advantages: Simple, widely used, uses all data points.
Limitations: Affected by extreme values (outliers).

(c) Correlation and Its Significance

Correlation measures the degree of relationship between two variables (X & Y).

Positive correlation: both increase together (e.g., height & weight).

Negative correlation: one increases while the other decreases (e.g., speed & travel time).
Significance: Helps in prediction and understanding relationships.

(d) Inference Procedure for Correlation Coefficient

Steps:

State H0:ρ=0H_0: ρ = 0H0:ρ=0 (no correlation). Compute sample correlation rrr.

Calculate test statistic: t=rn−21−r2t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}}t=1−r2rn−2

Compare with t-critical value → accept/reject H0H_0H0.
Importance: Ensures correlation isn’t due to chance.

(e) Bivariate vs Simple Correlation

Simple correlation: between two variables (X, Y).

Bivariate correlation: includes simultaneous analysis of two or more related variables.
Example: Relationship between height–weight (simple) vs height–weight–age (bivariate).

(f) Linear Regression

Regression estimates the relationship between dependent (Y) and independent (X) variable:

Y=a+bXY = a + bXY=a+bX

where
b=n∑XY−(∑X)(∑Y)n∑X2−(∑X)2b = \frac{n\sum XY - (\sum X)(\sum Y)}{n\sum X^2 - (\sum X)^2}b=n∑X2−(∑X)2n∑XY−(∑X)(∑Y),
a=Yˉ−bXˉa = \bar{Y} - b\bar{X}a=Yˉ−bXˉ.
Slope (b) shows rate of change in Y per unit X.

(g) Simple vs Multiple Regression

Feature	Simple	Multiple
Variables	1 dependent, 1 independent	1 dependent, ≥2 independent
Equation	Y = a + bX	Y = a + b₁X₁ + b₂X₂ + …
Use	Simple relations	Multivariate impact
Reason to use Multiple Regression: To study influence of several predictors.

(h) Correlation Coefficient (X: 10,15,20,25; Y: 60,75,80,90)

r=nΣXY−(ΣX)(ΣY)[nΣX2−(ΣX)2][nΣY2−(ΣY)2]r = \frac{nΣXY - (ΣX)(ΣY)}{\sqrt{[nΣX^2 - (ΣX)^2][nΣY^2 - (ΣY)^2]}}r=[nΣX2−(ΣX)2][nΣY2−(ΣY)2]nΣXY−(ΣX)(ΣY)

After calculation:
r≈0.97r ≈ 0.97r≈0.97 → Strong positive correlation.

(i) Regression Line (Y on X) Given X = [2,4,6,8], Y = [5,8,11,14]:
Slope:

b=Σ(X−Xˉ)(Y−Yˉ)Σ(X−Xˉ)2=1.5b = \frac{Σ(X - \bar{X})(Y - \bar{Y})}{Σ(X - \bar{X})^2} = 1.5b=Σ(X−Xˉ)2Σ(X−Xˉ)(Y−Yˉ)=1.5

Intercept:

a=Yˉ−bXˉ=2a = \bar{Y} - b\bar{X} = 2a=Yˉ−bXˉ=2 Equation: Y=2+1.5XY = 2 + 1.5XY=2+1.5X

(j) Probability of Queen given Face Card Face cards = 12 (J,Q,K of 4 suits) → 4 queens.

P(Q∣F)=412=13P(Q|F) = \frac{4}{12} = \frac{1}{3}P(Q∣F)=124=31

SECTION B – Descriptive Questions (Any 3 × 10 = 30 Marks)

(a) Singular Value Decomposition (SVD)

Decomposes matrix BBB as B=UΣVTB = UΣV^TB=UΣVT

Σ (Sigma): diagonal matrix of singular values. U, V: orthogonal matrices.
Significance:

Helps in dimensionality reduction, noise removal, and data compression.

Retaining top-k singular values ≈ low-rank approximation.

(b) Multiple Regression Analysis

Model: Y=β0+β1X1+β2X2+εY = β_0 + β_1X_1 + β_2X_2 + εY=β0+β1X1+β2X2+ε

Interpretation: βiβ_iβi: effect of XiX_iXi on Y holding others constant.

Goodness of fit by R2R^2R2.
Example: predicting house price using area & location.

(c) Randomization Test (Two Teaching Methods)

Steps:

Combine all scores from both groups. Randomly reassign into new groups (15 & 20 students).

Calculate mean difference repeatedly (e.g., 10,000 trials).

Compare observed difference vs simulated distribution.
→ If p<0.05p < 0.05p<0.05, teaching methods differ significantly.

(d) Principal Component Analysis (PCA)

Standardize data. Compute covariance matrix. Find eigenvalues & eigenvectors.

Principal components = eigenvectors with largest eigenvalues.
Use: Reduces variables → keeps most variance.

(e) 95% Confidence Interval

Given:
n = 50, Xˉ=15\bar{X}=15Xˉ=15, s=3s=3s=3,

CI=Xˉ±zα/2sn=15±1.96×37.07CI = \bar{X} ± z_{\alpha/2}\frac{s}{\sqrt{n}} = 15 ± 1.96 \times \frac{3}{7.07}CI=Xˉ±zα/2ns=15±1.96×7.073 CI=15±0.83⇒(14.17, 15.83)CI = 15 ± 0.83 \Rightarrow (14.17,\ 15.83)CI=15±0.83⇒(14.17, 15.83)

SECTION C – Long Questions (Any 1 from each)

3(a) Monte Carlo Simulation

Generate 1000 random samples (n=30, σ=10) under H₀: μ=50.
Compute sample mean each time and check how often sample mean exceeds 95% critical limit → estimate p-value.
Reject H₀ if p < 0.05.

3(b) Markov Chains in MCMC

Markov Chain: Process where next state depends only on current state.

MCMC (e.g., Metropolis-Hastings, Gibbs sampling): draws samples from complex distributions.

Convergence: chain must reach stationary distribution for valid inference.

4(a) Monte Carlo Hypothesis Testing

Simulate sampling distribution of test statistic under H₀ to estimate p-value.
Advantages:

Works with non-normal or small samples. No strict parametric assumptions.

4(b) Jackknife Resampling

Systematically leave one observation out at a time, compute statistic θiθ_iθi, and average:

θ^jack=nθˉ−(n−1)θ−iˉ\hat{θ}_{jack} = n\bar{θ} - (n-1)\bar{θ_{-i}}θ^jack=nθˉ−(n−1)θ−iˉ

Provides estimates for bias and variance.

5(a) Permutation Test

Randomly shuffle group labels (A, B) and recompute mean difference repeatedly.
Count how often simulated |Δ| ≥ observed 2.5 → gives p-value.

6(a) 5-Fold Cross Validation

Mean Squared Errors = 12, 15, 10, 18, 14

MSEavg=12+15+10+18+145=13.8MSE_{avg} = \frac{12+15+10+18+14}{5} = 13.8MSEavg=512+15+10+18+14=13.8

6(b) History of R Language

Created by Ross Ihaka and Robert Gentleman (1993). Inspired by S language (Bell Labs).

1997: R Core Team formed. 2000: R 1.0.0 released.

Now maintained by R Foundation, used for data science, stats, and ML.

7(a) R Workspace Commands

# Save vector x <- c(2, 4, 6, 8, 10) save(x, file = "mydata.RData") # List variables in workspace ls() # Load and display values load("mydata.RData") print(x)