Data Cleaning MCQs (25)
1) What is Data Cleaning?
A. Creating new data
B. Removing errors and improving data quality
C. Encrypting data
D. Uploading data to cloud
2) Missing values are also called?
A. Null/NA values
B. Unique values
C. Primary keys
D. Index values
3) Which is a common way to handle missing values?
A. Imputation (mean/median/mode)
B. Increase duplicates
C. Convert to images
D. Disable dataset
4) Removing duplicate rows helps to?
A. Reduce data quality
B. Improve accuracy
C. Increase noise
D. Change data type
5) Outliers are?
A. Typical values
B. Extreme/unusual values
C. Duplicate values
D. Null values
6) Which method helps detect outliers visually?
A. Box plot
B. Pie chart
C. Word document
D. Image filter
7) Standardization means?
A. Convert to 0-1 range always
B. Scale data to mean 0 and std 1
C. Replace text with numbers only
D. Remove columns
8) Normalization usually means?
A. Scaling to a fixed range like 0 to 1
B. Only removing duplicates
C. Only removing nulls
D. Only converting datatypes
9) Data type conversion example?
A. “25” (text) → 25 (number)
B. 25 → “apple”
C. Image → PDF only
D. Text → video
10) Trimming is used to remove?
A. Spaces before/after text
B. Numbers
C. Dates
D. Rows
11) Handling inconsistent categories means?
A. Making labels consistent (Male/M, male)
B. Adding more random labels
C. Encrypting labels
D. Removing dataset
12) Data validation checks?
A. Correctness and allowed values
B. Only file size
C. Only colors
D. Only font style
13) Removing irrelevant columns helps to?
A. Reduce noise
B. Increase errors
C. Increase file size
D. Reduce accuracy
14) Handling wrong date formats is called?
A. Data formatting
B. Data encryption
C. Data backup
D. Data duplication
15) What is a common issue in text data?
A. Typos/spelling mistakes
B. CPU overheating
C. Internet speed
D. Screen resolution
16) Removing special characters is part of?
A. Text preprocessing
B. Cloud hosting
C. Networking
D. Hardware setup
17) What is data profiling?
A. Understanding data quality/statistics
B. Designing posters
C. Creating passwords
D. Installing software
18) Range check example?
A. Age must be 0 to 120
B. Age must be a color
C. Age must be a file
D. Age must be a photo
19) Consistency check example?
A. State and city matching
B. Watching videos
C. Changing wallpapers
D. Playing games
20) Data deduplication means?
A. Removing duplicates
B. Adding duplicates
C. Encrypting duplicates
D. Hiding duplicates
21) Which is a common tool for cleaning in Excel?
A. Remove Duplicates
B. Paint
C. Notepad only
D. Camera
22) In Power BI, data cleaning is mostly done in?
A. Power Query
B. DAX only
C. Dashboard view only
D. Report export
23) In Python, data cleaning commonly uses?
A. Pandas
B. MS Paint
C. Calculator
D. Windows Media Player
24) Why do we clean data before analysis?
A. To improve accuracy and reliability
B. To reduce internet cost
C. To increase errors
D. To avoid visualization
25) Best practice in data cleaning?
A. Keep original raw data backup
B. Delete raw data immediately
C. Never validate data
D. Ignore missing values always
Answer Key
1) B
2) A
3) A
4) B
5) B
6) A
7) B
8) A
9) A
10) A
11) A
12) A
13) A
14) A
15) A
16) A
17) A
18) A
19) A
20) A
21) A
22) A
23) A
24) A
25) A