(SEM V) THEORY EXAMINATION 2023-24 STATISTICAL COMPUTING
SECTION A – Short Answers (2 × 10 = 20 Marks)
(a) Significance of Measures of Dispersion
Measures of dispersion describe how data values are spread around the mean.
Help identify variability, reliability, and consistency of data.
Examples:
Range = Max − Min → simple but affected by outliers.
Standard Deviation (SD) → average distance from mean; best for normally distributed data.
Difference: SD is more robust; range gives only extremes.
(b) Concept of Mean
The mean (average) is the sum of all values divided by the number of observations:
Xˉ=∑Xin\bar{X} = \frac{\sum X_i}{n}Xˉ=n∑Xi
Advantages: Simple, widely used, uses all data points.
Limitations: Affected by extreme values (outliers).
(c) Correlation and Its Significance
Correlation measures the degree of relationship between two variables (X & Y).
Positive correlation: both increase together (e.g., height & weight).
Negative correlation: one increases while the other decreases (e.g., speed & travel time).
Significance: Helps in prediction and understanding relationships.
(d) Inference Procedure for Correlation Coefficient
Steps:
State H0:ρ=0H_0: ρ = 0H0:ρ=0 (no correlation). Compute sample correlation rrr.
Calculate test statistic: t=rn−21−r2t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}}t=1−r2rn−2
Compare with t-critical value → accept/reject H0H_0H0.
Importance: Ensures correlation isn’t due to chance.
(e) Bivariate vs Simple Correlation
Simple correlation: between two variables (X, Y).
Bivariate correlation: includes simultaneous analysis of two or more related variables.
Example: Relationship between height–weight (simple) vs height–weight–age (bivariate).
(f) Linear Regression
Regression estimates the relationship between dependent (Y) and independent (X) variable:
Y=a+bXY = a + bXY=a+bX
where
b=n∑XY−(∑X)(∑Y)n∑X2−(∑X)2b = \frac{n\sum XY - (\sum X)(\sum Y)}{n\sum X^2 - (\sum X)^2}b=n∑X2−(∑X)2n∑XY−(∑X)(∑Y),
a=Yˉ−bXˉa = \bar{Y} - b\bar{X}a=Yˉ−bXˉ.
Slope (b) shows rate of change in Y per unit X.
(g) Simple vs Multiple Regression
| Feature | Simple | Multiple |
|---|---|---|
| Variables | 1 dependent, 1 independent | 1 dependent, ≥2 independent |
| Equation | Y = a + bX | Y = a + b₁X₁ + b₂X₂ + … |
| Use | Simple relations | Multivariate impact |
| Reason to use Multiple Regression: To study influence of several predictors. |
(h) Correlation Coefficient (X: 10,15,20,25; Y: 60,75,80,90)
r=nΣXY−(ΣX)(ΣY)[nΣX2−(ΣX)2][nΣY2−(ΣY)2]r = \frac{nΣXY - (ΣX)(ΣY)}{\sqrt{[nΣX^2 - (ΣX)^2][nΣY^2 - (ΣY)^2]}}r=[nΣX2−(ΣX)2][nΣY2−(ΣY)2]nΣXY−(ΣX)(ΣY)
After calculation:
r≈0.97r ≈ 0.97r≈0.97 → Strong positive correlation.
(i) Regression Line (Y on X) Given X = [2,4,6,8], Y = [5,8,11,14]:
Slope:
b=Σ(X−Xˉ)(Y−Yˉ)Σ(X−Xˉ)2=1.5b = \frac{Σ(X - \bar{X})(Y - \bar{Y})}{Σ(X - \bar{X})^2} = 1.5b=Σ(X−Xˉ)2Σ(X−Xˉ)(Y−Yˉ)=1.5
Intercept:
a=Yˉ−bXˉ=2a = \bar{Y} - b\bar{X} = 2a=Yˉ−bXˉ=2 Equation: Y=2+1.5XY = 2 + 1.5XY=2+1.5X
(j) Probability of Queen given Face Card Face cards = 12 (J,Q,K of 4 suits) → 4 queens.
P(Q∣F)=412=13P(Q|F) = \frac{4}{12} = \frac{1}{3}P(Q∣F)=124=31
SECTION B – Descriptive Questions (Any 3 × 10 = 30 Marks)
(a) Singular Value Decomposition (SVD)
Decomposes matrix BBB as B=UΣVTB = UΣV^TB=UΣVT
Σ (Sigma): diagonal matrix of singular values. U, V: orthogonal matrices.
Significance:
Helps in dimensionality reduction, noise removal, and data compression.
Retaining top-k singular values ≈ low-rank approximation.
(b) Multiple Regression Analysis
Model: Y=β0+β1X1+β2X2+εY = β_0 + β_1X_1 + β_2X_2 + εY=β0+β1X1+β2X2+ε
Interpretation: βiβ_iβi: effect of XiX_iXi on Y holding others constant.
Goodness of fit by R2R^2R2.
Example: predicting house price using area & location.
(c) Randomization Test (Two Teaching Methods)
Steps:
Combine all scores from both groups. Randomly reassign into new groups (15 & 20 students).
Calculate mean difference repeatedly (e.g., 10,000 trials).
Compare observed difference vs simulated distribution.
→ If p<0.05p < 0.05p<0.05, teaching methods differ significantly.
(d) Principal Component Analysis (PCA)
Standardize data. Compute covariance matrix. Find eigenvalues & eigenvectors.
Principal components = eigenvectors with largest eigenvalues.
Use: Reduces variables → keeps most variance.
(e) 95% Confidence Interval
Given:
n = 50, Xˉ=15\bar{X}=15Xˉ=15, s=3s=3s=3,
CI=Xˉ±zα/2sn=15±1.96×37.07CI = \bar{X} ± z_{\alpha/2}\frac{s}{\sqrt{n}} = 15 ± 1.96 \times \frac{3}{7.07}CI=Xˉ±zα/2ns=15±1.96×7.073 CI=15±0.83⇒(14.17, 15.83)CI = 15 ± 0.83 \Rightarrow (14.17,\ 15.83)CI=15±0.83⇒(14.17, 15.83)
SECTION C – Long Questions (Any 1 from each)
3(a) Monte Carlo Simulation
Generate 1000 random samples (n=30, σ=10) under H₀: μ=50.
Compute sample mean each time and check how often sample mean exceeds 95% critical limit → estimate p-value.
Reject H₀ if p < 0.05.
3(b) Markov Chains in MCMC
Markov Chain: Process where next state depends only on current state.
MCMC (e.g., Metropolis-Hastings, Gibbs sampling): draws samples from complex distributions.
Convergence: chain must reach stationary distribution for valid inference.
4(a) Monte Carlo Hypothesis Testing
Simulate sampling distribution of test statistic under H₀ to estimate p-value.
Advantages:
Works with non-normal or small samples. No strict parametric assumptions.
4(b) Jackknife Resampling
Systematically leave one observation out at a time, compute statistic θiθ_iθi, and average:
θ^jack=nθˉ−(n−1)θ−iˉ\hat{θ}_{jack} = n\bar{θ} - (n-1)\bar{θ_{-i}}θ^jack=nθˉ−(n−1)θ−iˉ
Provides estimates for bias and variance.
5(a) Permutation Test
Randomly shuffle group labels (A, B) and recompute mean difference repeatedly.
Count how often simulated |Δ| ≥ observed 2.5 → gives p-value.
6(a) 5-Fold Cross Validation
Mean Squared Errors = 12, 15, 10, 18, 14
MSEavg=12+15+10+18+145=13.8MSE_{avg} = \frac{12+15+10+18+14}{5} = 13.8MSEavg=512+15+10+18+14=13.8
6(b) History of R Language
Created by Ross Ihaka and Robert Gentleman (1993). Inspired by S language (Bell Labs).
1997: R Core Team formed. 2000: R 1.0.0 released.
Now maintained by R Foundation, used for data science, stats, and ML.
7(a) R Workspace Commands
# Save vector x <- c(2, 4, 6, 8, 10) save(x, file = "mydata.RData") # List variables in workspace ls() # Load and display values load("mydata.RData") print(x)
7(b) Vector and Matrix Creation in R
# Define vector v <- c(1, 2, 3, 4, 5) # Create 3x2 matrix from vector m <- matrix(v, nrow=3, ncol=2) # Display both print(v) print(m)
Related Notes
BASIC ELECTRICAL ENGINEERING
ENGINEERING PHYSICS THEORY EXAMINATION 2024-25
(SEM I) ENGINEERING CHEMISTRY THEORY EXAMINATION...
THEORY EXAMINATION 2024-25 ENGINEERING MATHEMATICS...
(SEM I) THEORY EXAMINATION 2024-25 ENGINEERING CHE...
(SEM I) THEORY EXAMINATION 2024-25 ENVIRONMENT AND...
Need more notes?
Return to the notes store to keep exploring curated study material.
Back to Notes StoreLatest Blog Posts
Best Home Tutors for Class 12 Science in Dwarka, Delhi
Top Universities in Chennai for Postgraduate Courses with Complete Guide
Best Home Tuition for Competitive Exams in Dwarka, Delhi
Best Online Tutors for Maths in Noida 2026
Best Coaching Centers for UPSC in Rajender Place, Delhi 2026
How to Apply for NEET in Gurugram, Haryana for 2026
Admission Process for BTech at NIT Warangal 2026
Best Home Tutors for JEE in Maharashtra 2026
Meet Our Exceptional Teachers
Discover passionate educators who inspire, motivate, and transform learning experiences with their expertise and dedication
Explore Tutors In Your Location
Discover expert tutors in popular areas across India
Discover Elite Educational Institutes
Connect with top-tier educational institutions offering world-class learning experiences, expert faculty, and innovative teaching methodologies