Why should hashtable size be prime number?

  • Lets say our hashtable size is 100 (which is NOT a prime number).
  • Say we wish to hash some value x and its multiples 2x, 3x, 4x and so on. If x was = 2 then {2, 4, 6, 8, .., inf} would be our values. Since our hashtable is of size 100, only 50% buckets would be occupied.
  • If x was = 3 then {3, 6, 9, 12, …, inf} would be our values. Only 33% would be occupied.
  • I hope you see the problem there ^. We want a situation where for any key would be spread across buckets in the whole table and not just some % of the table.
  • To formalize, for any x, your values are going to be spread in s number of buckets (given by the formula below).
    spread (s) = table_length / GreatestCommonFactor(table_length, x)

    table_length = 10
    x = 2

    s = 10 / GreatestCommonFactor(10, 2)
    s = 10 / 2
    s = 5

    Meaning values will be spread only in 5 out of 10 buckets (i.e. 50% of the table)

  • We need GreatestCommonFactor(table length, x) to be 1 if we want to spread (s) to be table length (meaning map to all buckets in hashtable). So lets make x and table length co-prime1 for which we need to make table length prime!

[1] In number theory, two integers a and b are said to be coprime if the only positive integer that divides both of them is 1.