Ads

Oct. 3rd, 2012 12:43 pm
jack: (Default)
[personal profile] jack
I've recently been seeing ads for "muslim dating site" and "mature dating site". I've no idea what google thinks I'm doing...

Date: 2012-10-03 12:15 pm (UTC)
ptc24: (Default)
From: [personal profile] ptc24
It would be amusing to get these things to show their working. When I say "amusing", I mean that it would take quite a while for people to understand what was being presented to them, and in the mean time the opportunities for misunderstanding would be many and varied.

I expect in a nice big data set there are going to be lots and lots of correlations. These will include:

1) Correlations which are large and strong and meaningful, which everyone understands.
2) Correlations which look like noxious stereotypes, and which would be noxious stereotypes if taken as strong correlations, but which are actually weak correlations.
3) "The crud factor" - weak but statistically significant correlations which represent real true facts about the world, but where there's no better explanation beyond "A can cause B can cause C can cause D can cause E, F can cause G can cause H can also cause E, therefore A is weakly correlated with F". See this (page 204 onwards) for more details: "everything correlates to some extent with everything else".
4) Lots and lots of statistically insignificant correlations, individually far too unreliable to be meaningful, but if you take a great big huge pile of them and let them "average out" in some way, they can collectively be useful.

I say 4 from personal experience with machine learning systems - when I tried to say to some system "OK, only use statistically significant correlations in your working" (at the standard 5% level), it gave substantially worse results than if I let it use statistically insignificant correlations too.

I sort of have a feeling that people's unconscious/implicit/associative/intuitive thought processes are sort of like this, only with even more nasty complications.

Upshot: if you let someone know what google was actually thinking, they'd probably have a fit.

Active Recent Entries