Remove redundancy of images while uploading it from the Phimpme photo app

Uploading images is an important feature for the Phimpme Photo App. While uploading images in the Phimpme application, there was a high probability that the user uploaded redundant images since there was no check on the upload. Therefore to tackle this problem, images should be hashed so that no two same images gets uploaded. To generate a hash of the file, the md5 algorithm and the sha1 algorithm is used.

What is MD5 algorithm?

The MD5 hashing algorithm is a one-way cryptographic function that accepts a message of any length as input and returns as output a fixed-length digest value to be used for authenticating the original message. It produces a 128 bit hash value. 

WhatIs.com

What is SHA1 algorithm?

SHA-1 (Secure Hash Algorithm 1) is a cryptographic hash function designed by the United States National Security Agency and is a U.S. Federal Information Processing Standard published by the United States NIST. SHA-1 produces a 160-bit (20-byte) hash value known as a message digest. A SHA-1 hash value is typically rendered as a hexadecimal number, 40 digits long.  

 – Wikipedia

Reason for using MD5 + SHA1 instead of SHA256

  1. The file becomes highly secured while maintaining its integrity.
  2. The algorithm MD5 + SHA1 is faster in hashing than SHA256.

Implementation

Create a separate Utils class named FileUtils.java. The class contains the static function to get the hash of any file in general. The message digest class is being used in both the algorithms.

The hash is the combination of the md5 hash string along with the SHA1 hash string. The message digest takes up algorithm as its argument to specify which hashing function would be used. The fileinputstream takes up the input file for initialization and the message digest is updated with the data bytes of the file. The StringBuffer class is then used to generate the fixed length hexadecimal value from the message digest since the string is going to be mutable in nature. The string value of the generated string buffer object is returned.

public class FileUtils {

//The utility method to get the hash of the file
 public static String getHash(final File file) throws NoSuchAlgorithmException, IOException {

  return md5(file) + "_" + sha1(file);
 }

//The md5 alogorithm
 public static String md5(final File file) throws NoSuchAlgorithmException, IOException {

  MessageDigest md = MessageDigest.getInstance("MD5");
  FileInputStream fis = new FileInputStream(file);
  byte[] dataBytes = new byte[1024];
  int nread = 0;
  while ((nread = fis.read(dataBytes)) != -1) {
   md.update(dataBytes, 0, nread);
  }

  //convert the byte to hex format
  StringBuffer sb = new StringBuffer("");
  for (byte mdbyte: md.digest()) {
   sb.append(Integer.toString((mdbyte & 0xff) + 0x100, 16).substring(1));
  }

  fis.close();
  md.reset();
  return sb.toString();
 }

//The SHA1 algorithm
 public static String sha1(final File file) throws NoSuchAlgorithmException, IOException {

  MessageDigest md = MessageDigest.getInstance("SHA1");
  FileInputStream fis = new FileInputStream(file);
  byte[] dataBytes = new byte[1024];
  int nread = 0;
  while ((nread = fis.read(dataBytes)) != -1) {
   md.update(dataBytes, 0, nread);
  }

  //convert the byte to hex format
  StringBuffer sb = new StringBuffer("");
  for (byte mdbyte: md.digest()) {
   sb.append(Integer.toString((mdbyte & 0xff) + 0x100, 16).substring(1));
  }

  fis.close();
  md.reset();
  return sb.toString();
 }
}

Conclusion

Physical memory is limited. Hence it is of utmost importance that a particular file is identified using a unique identifier in terms of a fixed hash value. Not only will it be beneficial for optimum space utilization, it will also be useful to track it if necessary.

To learn more about the hash alogorithms, visit – https://www.tutorialspoint.com/cryptography/cryptography_hash_functions.htm