PrepAway - Latest Free Exam Questions & Answers

Which data deduplication method increases the chance of identifying duplicate data even when there is only a m

Which data deduplication method increases the chance of identifying duplicate data even when
there is only a minor difference between two documents?

PrepAway - Latest Free Exam Questions & Answers

A.
Variable-length segment

B.
Single-instance

C.
File level

D.
Fixed-block

Explanation:

Data Deduplication Methods
File-level deduplication (also called single-instance storage) detects and removes redundant
copies of identical files. It enables storing only one copy of the file; the subsequent copies are
replaced with a pointer that points to the original file. File-level deduplication is simple and fast but
does not address the problem of duplicate content inside the files. For example, two 10-MB
PowerPoint presentations with a difference in just the title page are not considered as duplicate
files, and each file will be stored separately.
Subfile deduplication breaks the file into smaller chunks and then uses specialized algorithm to
detect redundant data within and across the file. As a result, subfile deduplication eliminates
duplicate data across files. There are two forms of subfile deduplication: fixedlength block and
variable-length segment. The fixed-length block deduplication divides the files into fixed-length
blocks and uses a hash algorithm to find the duplicate data. Although simple in design, fixedlength blocks might miss many opportunities to discover redundant data because the block
boundary of similar data might be different. Consider the addition of a person’s name to a
document’s title page. This shifts the whole document, and all the blocks appear to have changed,

causing the failure of the deduplication method to detect equivalencies. In variable-length segment
deduplication, if there is a change in the segment, the boundary for only that segment is adjusted,
leaving the remaining segments unchanged.
This method vastly improves the ability to find duplicate data segments compared to fixed-block.


Leave a Reply