jack

What's the simplest hash function which will take an integer, and return an integer from 0 to N-1, where simple arithmetic cycles in the input don't produce visually obvious cycles in the output? Eg. if the input is 0,2,4,6,8... the output should not all be even. I assumed something simple like "multiply by blah, raise to the something power mod N" would work, but I wasn't sure.

A few caveats:

* Obviously using a non-trivial but non-cryptographic hash (eg. md5) will work, as will seeing a random number generator with the input and using the first number generated. The question is, is there a SIMPLER solution or not.

* I'm happy to contrain N to be prime >2 (or at least, odd), then the trivial hash "output = input % N" works fine.

The use case is, in a tile-based game, many floor types (grass, floorboards, etc) are not drawn with the same image in each tile, because it looks ridiculous to see regemented rows of grass with exactly the same pattern in each square, but rather using a random image from a small set of similar but different tile images.

But when I say "random", it's useful if that's determined by the tile's position, because (a) I don't have to worry about generating the values and caching them (b) each tile on the map uses the same image even if they're generated in a different order, so I don't have to keep rechecking that it looks ok (especially if I expand the concept to include more varied tiles that are still applied randomly).

This seemed like a sensible question, but I tried asking on stack overflow, hoping for "yes, just use X" or "no, I don't think you can do better than the random version", but all I got was:

* I was stupid to even consider it, I should bite the bullet and generate a random value for each tile in the map when it was loaded
* I was stupid to even consider it, I should bite the bullet and make sure I always design prime numbers of tiles whenever I'm designing the art.

OK, yeah, that will work, but once I've separated the parts of code that "draw a tile" from the parts that decide "what's the probability of each tile", I don't want mush them together again (a) wasting MY time in order to save a few cycles doing rand() and (b) meaning a trivial error in designing the art will create giant moire patterns across the whole landscape.

#!/usr/bin/env python import sys N = 7 p = 65267 k = (p-1)/N g = 12345 def hash(x): return (pow(g, x, p) - 1) / k for w in range(20): for y in range(N): for x in range(w): sys.stdout.write("%d" % hash(y*w+x)) sys.stdout.write("\n") sys.stdout.write("\n")

Flat | Top-Level Comments Only

From:

simont

The first thing that springs to mind is exponentiation mod a prime, i.e. compute g^input mod p, where p is prime and g is a primitive root. If p is of the order of a machine word then that shouldn't be too expensive (modular exponentiation is the slow step in asymmetric crypto but that's because the modulus is thousands of bits long). Of course that still has a period of p-1, so if you eventually want to narrow down to a number from 0,...,N-1 where N is the number of distinct grass tiles you could be bothered to draw, I'd imagine a sensible plan would be to pick p such that it's congruent to 1 mod N but about the size of a machine word rather than just N+1. Then you compute (g^input mod p) mod N, and your results should be equidistributed across all outputs but (due to discrete log problem being a pain) lack obvious patterns.

There may be a simpler and faster option still, but that's what I'd try first.

(That said, perhaps it would make life noticeably easier if you picked p to be half the size of a machine word. Then your modular multiplication step doesn't have to worry about overflow, so the implementation is much closer to foolproof.)

Wait, that's obviously foolish, isn't it? Because reducing mod p then mod N will probably reintroduce patterns mod N due to taking the low rather than high bits of the mod-p value. (If N divided p, that would be literally true, but even though it doesn't, it's probably not ideal.)

So it would probably make more sense to take your value in 1,...,p-1 and reduce it to one of N possible outputs by division rather than modular reduction, i.e. compute ((g^input mod p)-1)/k, where k = (p-1)/N.

jack

Thank you! Yes, that's just the sort of thing I was thinking of, but couldn't bring to mind exactly how you'd do it.

I hope there may be a simpler constant-time solution, but I'm not sure if there is or not, and I think this is good enough.

I agree you probably want to divide at the end rather than take mods.

Here's a quick piece of Python that I think more or less proves the concept:

The loop at the end tests what happens if you take the sequence hash(x) for x=0,1,... and wrap it across lines of various widths (which I assume is more or less what your use case for grass tiles was going to do). I certainly can't see any obviously distracting pattern there.

As a minor refinement, I chose my p there to be a safe prime on the general principle that it would probably have less risk of patterns and it doesn't cost any extra run-time effort to pick your fixed compile-time prime more carefully.

Of course, prototyping in Python is all very well but if Python isn't your target language then some poor muggins is going to have to actually implement a repeated-squaring exponentiator :-) but that isn't too difficult.

I am three-quarters asleep! Of course, you can't pick p to be both a safe prime and congruent to 1 mod N. So my k there was accidentally rounded down from a non-integer, and there's a bug in that code which means that it will occasionally output 7. Ahem. I should of course have picked a prime such that p-1 had factors 7 (which we actually want), 2 (which we can't avoid), and a large prime, such as 65423.

(Alternatively, you could pick a safe prime, round k up rather than down, and live with the tiny imbalance in the frequency of the different tiles.)

Oh, shiny! Thank you! Yes, it's aimed at python, although the only other likely candidate is C, and I assume I could implement the hash in C if I needed to :)

FWIW, N is going to vary: probably the most common case is 2[*], but there'll probably a bunch of small integer values, so we may choose several p in advance or need to generate it.

[*] Well, ok, the most common case is 1, but that's a special case :)

In that case I'd be inclined to think it's simplest just to pick one good fixed prime (my 65267 above is the largest safe prime less than 2^16), compute k as (p+N-2)/N (i.e. ceil rather than floor of the probable-non-integer (p-1)/N), and live with the small imbalance, which will be of the order of N/2^16, i.e. not noticeable.

That seems sensible. I'm annoying if the error is 50%, but 10% is fine, and we can safely ignore anything over 2^16 :)

ciphergoth

This may be a bit more complex than you want, but is nearly as fast as CityHash and with much stronger assurance:

http://eprint.iacr.org/2012/351

Simple hash without obvious periodicities

Simple hash without obvious periodicities

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

Active Recent Entries