Update 6/15: After several days, I returned to top three out of the 656 competitors (or teams). 3.74428 would be enough to lead a week ago but times are changing. We are dangerously approaching the 3.8 territory at which I am likely to lose a $100 bet that the final score won't surpass 3.8, and I am contributing to this potential loss myself. ;-)
...and some relativistic kinematics and statistics...In the
ATLAS machine learning contest, somebody jumped above me yesterday so I am at the
fourth place (out of nearly 600 athletes) right now. Mathieu Cliche made
Dorigo's kind article about me (yes, some lying anti-Lumo human trash has instantly and inevitably joined the comments) a little bit less justifiable. The leader's advantage is 0.02 relatively to my score. I actually believe that up to 0.1 or so may easily change by flukes so the first top ten if not top hundred could be in a statistical tie – which means that the final score, using a different part of the dataset, may bring anyone from the group to the top.
(Correction in the evening. It's the fifth place now, BlackMagic got an incredible 3.76 or so. I am close to giving up because the standard deviation in the final score is about 0.04, I was told.)
I have both "experimental" and theoretical reasons to think that 0.1 score difference may be noise. Please skip this paragraph if it becomes too technical. Concerning the "experimental" case, well, I have run several modified versions of my code which were extremely similar to my near-record at AMS=3.709 but which seemed locally better, faster, less overfitted. The expected improvement of the score was up to 0.05 but instead, I got 0.15 deterioration. Concerning the theoretical case, I believe that there may be around 5,000 false negatives among the 80,000+ or so (out of 550,000) that the leaders like me are probably labeling as "signal". The root mean square deviation for 5,000 is \(\sqrt{5,000}\sim 70\) so statistically, \(5,000\) really means \(5,000\pm 70\) which is \(1.5\%\). That translates to almost \(1\%\) error in \(\sqrt{b}\) i.e. \(1\%\) error in \(s/\sqrt{b}\) (the quantity \(s\) probably has a much smaller relative statistical error because it's taken from the 75,000 base) which is 0.04 difference in the score.
It may be a good time to try to review some basics of the contest. Because the contest is extremely close to what the statisticians among the experimental particle physicists are doing (it's likely that any programming breakthrough you would make would be directly applicable), this review is also a review of basic particle physics and special relativity.
The basic purpose of the content is simple to state. It combines particle physics with machine learning, something that computer programmers focusing on statistics and data mining know very well and that is arguably more important for you to win than particle physics. (A huge majority of the contestants are recidivists and mass Kagglers, often earning huge salaries in data-mining departments of banks and corporations. Some of the "similar people" came from experimental particle physics but it's less than 10% so I estimate that 5% of the world's "best statistical programmers" of this kind are working in experimental particle physics.) You download the
data, especially two large files, "training" and "test".
The training file contains 250,000 events with weights (they are really telling you "how many times each event should be copied" for them to cover all the possibilities with the right measure, sort of). Each event is labeled as "s" (signal) or "b" (background). You and your computer are being told 250,000 times: look at this event, baby, and remember and learn and learn and learn, events like this (whatever it means) are "s" or events like that are "b". Then you are asked 550,000 times whether another event is "b" or "s" and you must make as many correct guesses as possible. It's easy, isn't it?
Well, your score isn't really the absolute number of correct guesses or the absolute number of wrong guesses or the ratio. None of these "simple" quantities is behaving nicely. Instead, your score is essentially\[
{\rm AMS}_2 = \frac{s}{\sqrt{b}}
\] It's the "signal over the square root of noise".
(
The exact formula is more complicated and involves logarithms but I believe that the difference between the simplified formula above and the exact one doesn't impact any participant at all – and you couldn't even find out which one is being used "experimentally" – except that the exact formula with the logarithms is telling you to avoid tricks to try to guess just a few "s" events and make \(b=0\). That could give you an infinite \({\rm AMS}_2\) score but the regulated formula with the logarithms would punish you by effectively adding some false negatives, anyway. The exact formula reduces to my approximate one in the \(b\gg s\) limit which you can check by a Taylor expansion up to the second order. Check this
PDF file with the detailed technical paper-like documentation for the contest which is an extended version of this blog post.)
The signal \(s\) is the number of true positives – events that you label as "s" and they are indeed "s"; the background \(b\) is the number of false positives – events that you label as "s" but they are disappointingly "b". More precisely, we are not counting the events as "one". Each event contributes to \(s\) or to \(b\) according to its weight. The weights of the training b-events are about \(2-5\) or so while the weights of the training \(s\) events are about \(0.001-0.01\) or so, i.e. two or three orders of magnitude smaller. These asymmetric weights are why the actual numbers \(s,b\) substituted to the score formula obey \(s\ll b\).
No comments: