# MD5 vs SHA1 for data integrity checking!

Hope you are well folks. 🙂

In this article, I am going to discuss about hash functions, properties of hash functions, attacks on hash functions, and which algorithm we are using to validate file content’s integrity and what are the next algorithms actively being developed to calculate hash.

Before I move further, it would be great if I tell you what are the hash functions and why on the earth they are needed !!

What is hash function ?

Hash function is an algorithms which converts input to a data which is unique to only that input and can not be reversed engineered.

More formal definition can be found here.

Now let’s try to convert this concept in mathematical expression.

Suppose input we will give to hash function is **“Let me know what is my hash”, **and we are representing it with constant **“M”, **and hash data for input **“M”** is **“9F5746A67A7FE4F15333381F00250431”. **Then we can write something like,

Hash(M) = 9F5746A67A7FE4F15333381F00250431, if M=

“Let me know what is my hash“

if we be more abstract and map our input and generated hash code with constants then we can do something like,

Hash(X) = H where M is the input and H is the generated hash

Basically, there are 3 properties which hash function should satisfy. If function or algorithm is satisfying these properties then and only then we can say it’s a hash function.

**1) Preimage resistance****:** Given a hash value *h*, it should be hard to find any message *m* such that *h* = *hash(m)*.

**2) Second preimage resistance****:** Given a message *m*_{1}, it should be hard to find a different message *m*_{2} such that *hash(m*_{1}) = hash(m_{2}).

_{1}

_{2}

_{1}) = hash(m

_{2})

**3) Collision resistance****:** Given two messages *m*_{1} and *m*_{2}, it should be “hard” to find a hash such that *hash(k, m*_{1}) = hash(k, m_{2}), where *k* is the hash key.

_{1}

_{2}

_{1}) = hash(k, m

_{2})

MD5 and SHA1 satisfies these properties.

Our use case was to find a algorithm that can generate a hash for a file and we were okay if there were few collisions. But we could not compromise on speed.

MD5 is 40% faster than SHA1 but has been known to be broken and can have collisions. ( source ) On the other hand Google has proved that SHA1 can be broken. So our obvious choice became MD5.

If you want to know what are the next hashing algorithms being developed today then you can visit these pages for more information.

Here, I am also including sample code to calculate MD5 hash in Java. Do check it out!!!!.

Cheers. 🙂

package com.causecode.sample; import javax.xml.bind.annotation.adapters.HexBinaryAdapter; import java.security.MessageDigest; import java.security.NoSuchAlgorithmException; import java.util.Scanner; /** * Demo Program to show how to calculate the hash code for the given input */ public class Main { /** * Enum representing Algorithms */ private enum Algorithm { MD5, SHA1 } /** * Main Entry Point for Java JVM * @param args * @throws NoSuchAlgorithmException */ public static void main(String[] args) throws NoSuchAlgorithmException { Scanner scanner = new Scanner(System.in); printToConsole("Enter the input to calculate the hash ? "); String input = scanner.nextLine(); printToConsole("----------------------"); printToConsole("Chose Algorithm.."); printToConsole("1) MD5"); printToConsole("2) SHA1"); int choice = scanner.nextInt(); scanner.close(); String hash; if (choice == 1) { hash = calculateHash(input, Algorithm.MD5); } else { hash = calculateHash(input, Algorithm.SHA1); } printToConsole("----------------------"); printToConsole(String.format("Calculated Hash for Given Input is:- \" %s \"", convertHashToHexString(hash))); } /** * Convert Given Hash to Hex Representation * @param hash * @return String */ private static String convertHashToHexString(String hash) { return new HexBinaryAdapter().marshal(hash.getBytes()); } /** * Calculates the hash for the given input using supplied Algorithm * @param message * @param algorithm * @return * @throws NoSuchAlgorithmException */ static String calculateHash(String message, Algorithm algorithm) throws NoSuchAlgorithmException { return String.valueOf(MessageDigest.getInstance(algorithm.toString()).digest(message.getBytes())); } /** * Utility Method to print line to the console * @param message */ static void printToConsole(String message) { System.out.println(message); } }

About CauseCode:We are a technology company specializing in Healthtech related Web and Mobile application development. We collaborate with passionate companies looking to change health and wellness tech for good. If you are a startup, enterprise or generally interested in digital health, we would love to hear from you! Let's connect at bootstrap@causecode.com