Simple hash without obvious periodicities
Aug. 8th, 2012 11:14 amWhat's the simplest hash function which will take an integer, and return an integer from 0 to N-1, where simple arithmetic cycles in the input don't produce visually obvious cycles in the output? Eg. if the input is 0,2,4,6,8... the output should not all be even. I assumed something simple like "multiply by blah, raise to the something power mod N" would work, but I wasn't sure.
A few caveats:
* Obviously using a non-trivial but non-cryptographic hash (eg. md5) will work, as will seeing a random number generator with the input and using the first number generated. The question is, is there a SIMPLER solution or not.
* I'm happy to contrain N to be prime >2 (or at least, odd), then the trivial hash "output = input % N" works fine.
The use case is, in a tile-based game, many floor types (grass, floorboards, etc) are not drawn with the same image in each tile, because it looks ridiculous to see regemented rows of grass with exactly the same pattern in each square, but rather using a random image from a small set of similar but different tile images.
But when I say "random", it's useful if that's determined by the tile's position, because (a) I don't have to worry about generating the values and caching them (b) each tile on the map uses the same image even if they're generated in a different order, so I don't have to keep rechecking that it looks ok (especially if I expand the concept to include more varied tiles that are still applied randomly).
This seemed like a sensible question, but I tried asking on stack overflow, hoping for "yes, just use X" or "no, I don't think you can do better than the random version", but all I got was:
* I was stupid to even consider it, I should bite the bullet and generate a random value for each tile in the map when it was loaded
* I was stupid to even consider it, I should bite the bullet and make sure I always design prime numbers of tiles whenever I'm designing the art.
OK, yeah, that will work, but once I've separated the parts of code that "draw a tile" from the parts that decide "what's the probability of each tile", I don't want mush them together again (a) wasting MY time in order to save a few cycles doing rand() and (b) meaning a trivial error in designing the art will create giant moire patterns across the whole landscape.
A few caveats:
* Obviously using a non-trivial but non-cryptographic hash (eg. md5) will work, as will seeing a random number generator with the input and using the first number generated. The question is, is there a SIMPLER solution or not.
* I'm happy to contrain N to be prime >2 (or at least, odd), then the trivial hash "output = input % N" works fine.
The use case is, in a tile-based game, many floor types (grass, floorboards, etc) are not drawn with the same image in each tile, because it looks ridiculous to see regemented rows of grass with exactly the same pattern in each square, but rather using a random image from a small set of similar but different tile images.
But when I say "random", it's useful if that's determined by the tile's position, because (a) I don't have to worry about generating the values and caching them (b) each tile on the map uses the same image even if they're generated in a different order, so I don't have to keep rechecking that it looks ok (especially if I expand the concept to include more varied tiles that are still applied randomly).
This seemed like a sensible question, but I tried asking on stack overflow, hoping for "yes, just use X" or "no, I don't think you can do better than the random version", but all I got was:
* I was stupid to even consider it, I should bite the bullet and generate a random value for each tile in the map when it was loaded
* I was stupid to even consider it, I should bite the bullet and make sure I always design prime numbers of tiles whenever I'm designing the art.
OK, yeah, that will work, but once I've separated the parts of code that "draw a tile" from the parts that decide "what's the probability of each tile", I don't want mush them together again (a) wasting MY time in order to save a few cycles doing rand() and (b) meaning a trivial error in designing the art will create giant moire patterns across the whole landscape.
no subject
Date: 2012-08-08 10:48 am (UTC)There may be a simpler and faster option still, but that's what I'd try first.
no subject
Date: 2012-08-08 10:49 am (UTC)no subject
Date: 2012-08-08 10:59 am (UTC)So it would probably make more sense to take your value in 1,...,p-1 and reduce it to one of N possible outputs by division rather than modular reduction, i.e. compute ((g^input mod p)-1)/k, where k = (p-1)/N.
no subject
Date: 2012-08-08 11:04 am (UTC)I hope there may be a simpler constant-time solution, but I'm not sure if there is or not, and I think this is good enough.
I agree you probably want to divide at the end rather than take mods.
no subject
Date: 2012-08-08 11:11 am (UTC)The loop at the end tests what happens if you take the sequence
hash(x)forx=0,1,...and wrap it across lines of various widths (which I assume is more or less what your use case for grass tiles was going to do). I certainly can't see any obviously distracting pattern there.As a minor refinement, I chose my p there to be a safe prime on the general principle that it would probably have less risk of patterns and it doesn't cost any extra run-time effort to pick your fixed compile-time prime more carefully.
Of course, prototyping in Python is all very well but if Python isn't your target language then some poor muggins is going to have to actually implement a repeated-squaring exponentiator :-) but that isn't too difficult.
no subject
Date: 2012-08-08 11:16 am (UTC)(Alternatively, you could pick a safe prime, round k up rather than down, and live with the tiny imbalance in the frequency of the different tiles.)
no subject
Date: 2012-08-08 11:43 am (UTC)FWIW, N is going to vary: probably the most common case is 2[*], but there'll probably a bunch of small integer values, so we may choose several p in advance or need to generate it.
[*] Well, ok, the most common case is 1, but that's a special case :)
no subject
Date: 2012-08-08 12:24 pm (UTC)no subject
Date: 2012-08-08 01:03 pm (UTC)no subject
Date: 2012-08-08 01:40 pm (UTC)http://eprint.iacr.org/2012/351