jack: (Default)
[personal profile] jack
What's the simplest hash function which will take an integer, and return an integer from 0 to N-1, where simple arithmetic cycles in the input don't produce visually obvious cycles in the output? Eg. if the input is 0,2,4,6,8... the output should not all be even. I assumed something simple like "multiply by blah, raise to the something power mod N" would work, but I wasn't sure.

A few caveats:

* Obviously using a non-trivial but non-cryptographic hash (eg. md5) will work, as will seeing a random number generator with the input and using the first number generated. The question is, is there a SIMPLER solution or not.

* I'm happy to contrain N to be prime >2 (or at least, odd), then the trivial hash "output = input % N" works fine.

The use case is, in a tile-based game, many floor types (grass, floorboards, etc) are not drawn with the same image in each tile, because it looks ridiculous to see regemented rows of grass with exactly the same pattern in each square, but rather using a random image from a small set of similar but different tile images.

But when I say "random", it's useful if that's determined by the tile's position, because (a) I don't have to worry about generating the values and caching them (b) each tile on the map uses the same image even if they're generated in a different order, so I don't have to keep rechecking that it looks ok (especially if I expand the concept to include more varied tiles that are still applied randomly).

This seemed like a sensible question, but I tried asking on stack overflow, hoping for "yes, just use X" or "no, I don't think you can do better than the random version", but all I got was:

* I was stupid to even consider it, I should bite the bullet and generate a random value for each tile in the map when it was loaded
* I was stupid to even consider it, I should bite the bullet and make sure I always design prime numbers of tiles whenever I'm designing the art.

OK, yeah, that will work, but once I've separated the parts of code that "draw a tile" from the parts that decide "what's the probability of each tile", I don't want mush them together again (a) wasting MY time in order to save a few cycles doing rand() and (b) meaning a trivial error in designing the art will create giant moire patterns across the whole landscape.

Date: 2012-08-08 10:48 am (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
The first thing that springs to mind is exponentiation mod a prime, i.e. compute g^input mod p, where p is prime and g is a primitive root. If p is of the order of a machine word then that shouldn't be too expensive (modular exponentiation is the slow step in asymmetric crypto but that's because the modulus is thousands of bits long). Of course that still has a period of p-1, so if you eventually want to narrow down to a number from 0,...,N-1 where N is the number of distinct grass tiles you could be bothered to draw, I'd imagine a sensible plan would be to pick p such that it's congruent to 1 mod N but about the size of a machine word rather than just N+1. Then you compute (g^input mod p) mod N, and your results should be equidistributed across all outputs but (due to discrete log problem being a pain) lack obvious patterns.

There may be a simpler and faster option still, but that's what I'd try first.

Date: 2012-08-08 10:49 am (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
(That said, perhaps it would make life noticeably easier if you picked p to be half the size of a machine word. Then your modular multiplication step doesn't have to worry about overflow, so the implementation is much closer to foolproof.)

Date: 2012-08-08 10:59 am (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
Wait, that's obviously foolish, isn't it? Because reducing mod p then mod N will probably reintroduce patterns mod N due to taking the low rather than high bits of the mod-p value. (If N divided p, that would be literally true, but even though it doesn't, it's probably not ideal.)

So it would probably make more sense to take your value in 1,...,p-1 and reduce it to one of N possible outputs by division rather than modular reduction, i.e. compute ((g^input mod p)-1)/k, where k = (p-1)/N.

Date: 2012-08-08 11:11 am (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
Here's a quick piece of Python that I think more or less proves the concept:

#!/usr/bin/env python
import sys
N = 7
p = 65267
k = (p-1)/N
g = 12345
def hash(x):
    return (pow(g, x, p) - 1) / k
for w in range(20):
    for y in range(N):
        for x in range(w):
            sys.stdout.write("%d" % hash(y*w+x))
        sys.stdout.write("\n")
    sys.stdout.write("\n")

The loop at the end tests what happens if you take the sequence hash(x) for x=0,1,... and wrap it across lines of various widths (which I assume is more or less what your use case for grass tiles was going to do). I certainly can't see any obviously distracting pattern there.

As a minor refinement, I chose my p there to be a safe prime on the general principle that it would probably have less risk of patterns and it doesn't cost any extra run-time effort to pick your fixed compile-time prime more carefully.

Of course, prototyping in Python is all very well but if Python isn't your target language then some poor muggins is going to have to actually implement a repeated-squaring exponentiator :-) but that isn't too difficult.

Date: 2012-08-08 11:16 am (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
I am three-quarters asleep! Of course, you can't pick p to be both a safe prime and congruent to 1 mod N. So my k there was accidentally rounded down from a non-integer, and there's a bug in that code which means that it will occasionally output 7. Ahem. I should of course have picked a prime such that p-1 had factors 7 (which we actually want), 2 (which we can't avoid), and a large prime, such as 65423.

(Alternatively, you could pick a safe prime, round k up rather than down, and live with the tiny imbalance in the frequency of the different tiles.)

Date: 2012-08-08 12:24 pm (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
In that case I'd be inclined to think it's simplest just to pick one good fixed prime (my 65267 above is the largest safe prime less than 2^16), compute k as (p+N-2)/N (i.e. ceil rather than floor of the probable-non-integer (p-1)/N), and live with the small imbalance, which will be of the order of N/2^16, i.e. not noticeable.

Date: 2012-08-08 01:40 pm (UTC)
ciphergoth: (Default)
From: [personal profile] ciphergoth
This may be a bit more complex than you want, but is nearly as fast as CityHash and with much stronger assurance:

http://eprint.iacr.org/2012/351