

If that is the case, you can correct one bit with confidence. For example, if the Hamming distance is three, you have to flip three bits to change from a set of bits that represents one valid message with matching CRC to another valid message with its own matching CRC. The number of bits you can correct with forward error correction is also limited by the Hamming Distance of the polynomial.
#Crc vs checksum plus
For example what I call the 'Y5' polynomial, the 0x5935 polynomial only takes on up to 256 different values before they repeat going back farther, but on the other hand it is able to correct double bit errors that distance, which is 30 bytes plus 2 bytes for errors in the CRC itself. 2^n In practice, though, CRCs actually take on fewer distinct values for single bit errors. Detecting double bit errors or correction of single bit errors is limited to the number of distinct values the CRC can take, so for 8 bits, that would 256 for 16 bits, 65535 etc. You can detect a single bit error with a CRC in any size packet. Now, with those pre-sorted, you only have to check one of N-directories, looking for "file-size", or "reverse-CRC", or whatever other comparison you can do to that smaller data-set, fast.ĭoing a CRC-32 forwards and backwards on the same blob of data is more reliable than using CRC-64 in just one direction. You can use a CRC-8 to index the whole internet, and divide everything into one of N-catagories. In any event, it should NEVER be used as the ONLY form of comparison, just for a quick form of comparison, for indexing. Reverse/byte or reverse/bits, or bit-offsets)


(When they both have the same data-size, and (when tested backwards, they have a matching CRC. The point you would want to think about going larger, is when you start to see many collisions which can not be "confirmed" as "originals". You can also have collisions within the data-size, just by the generic limits of the formulas used, and constraints of using bits/bytes and base-ten systems, which depends on floating-point values, which get truncated and clipped. Attempting to brute-force, or cycle through random, or sequential data, of the same exact size, would leave a real narrow collision-rate. However, that results in a data-size that no-longer matches. Purposely manipulated data, to fake a match, is usually done by adding extra-data until the CRC matches a target. You will rarely have a collision of the same data-size, with a matching CRC, within the correct sizes. When doing comparison, you should ALSO be checking "data-sizes". But I know a few exist, when not purposely forced to exist. I use CRC-32 and file-size comparison and have NEVER, in the billions of files checked, run into a matching CRC-32 and File-Size collision. Here is a nice "real world" evaluation of CRC-N If anything that corrupts some data is likely to corrupt a lot of it, however, the inferior behavior of CRCs with double-bit errors may be a non-issue. If double-bit errors are the second most common failure mode (after single-bit errors), that would be bad. While that might not seem like a terribly likely scenario, a CRC8 will be somewhat worse at catching double-bit errors in long packets than at catching "packet is totally scrambled" errors. If packets are longer than the CRC period, however, then a double-bit error will go undetected if the distance between the erroneous bits is a multiple of the CRC period. Likewise with a 16-bit CRC of period 32768, using packets of 32768 bits or less. With an 8-bit CRC whose polynomial generates two periods of length 128, the fraction of single, double, or triple bit errors in a packet shorter than that which go undetected won't be 1/256-it will be zero. The advantage of CRC comes from its treatment of inputs which are very similar. Given two inputs which are massively different, the possibility of a false match will be about 1/256 with most forms of 8-bit check value (including CRC), 1/65536 with most forms of 16-bit check value (including CRC), etc. The choice of CRC length versus file size is mainly relevant in cases where one is more likely to have an input which differs from the "correct" input by three or fewer bits than to have a one which is massively different.
