banner image
Sedang Dalam Perbaikan

Higgs contest: the hard way to return to top 3

Now it's a good (but not stellar) moment for the Higgs ATLAS-Kaggle challenge. If you look at its leaderboard, only one minor permutation of the top 7 rankings (879 teams compete in total) has occurred in the last 7 days:



Due to a permutation of the top 2 places exactly 7 days ago, this screenshot became a bit obsolete minutes after I posted it.

And – the T.A.G. duo will surely agree – it is a small change in a good direction. ;-) And it was so hard to achieve this small change of the AMS score! What have I done?




I decided that one particular algorithm isn't good enough. It's better to write a code that simulates many programmers who are programming machine-learning algorithms and who are killing the programmers who are not good enough.




So I downloaded a Windows desktop OS emulator for Nokia Lumia 520, installed VirtualBox under this Windows system, along with Ubuntu Linux. In that system, I programmed a virtual empire that I call "The Matrix String".

This string-like landscape is a very nice environment for the programmers who live there. The inhabitants have to enjoy something that looks like an exciting life to them. Otherwise, as I realized, they don't perform too well.

Of course, their ultimate job is to write down an algorithm to optimally classify the 550,000 events in the contest. But they don't really know.

There are 220 copies of a city called Székesfehérvár – it's one of the Hungarian words I am proud to have mastered. If you have trouble with the name of the town, just call it "Stool Belgrade" which is the English translation. I am building five new copies of the town every day.

There are many T.A.G.'s hanging everywhere in the cities but I hope that they are not too important anymore! ;-) More importantly, there are numerous copies of two programmers over there. Their names are
Gábor "Neo" Melis

Morphine "Northern Lights Haze" Morpheus
They are designed to resemble the top two contestants in the contest as accurately as I could imagine them. Mr Morphine is trying to convince Gábor "Neo" Melis that he (Neo) is "the One". And make no doubts about it, I also think that Gábor Melis is "the One".



Today, in order to improve the top score by 0.006, after 10 days or so with no improvements, I had to fight against Gábor "Neo" Melis. It was tough. It seems to me that he has won again.

If I happened to win, to be eligible for the prize, I would have to reproduce the exact algorithms that generated the winning submission. So it's important to remember every motion of my hands in the fight against Melis, and so on. Weeks ago, it would have been impossible. Right now, however, it seems that I have gotten more disciplined in creating backups. So all the copies of Gábor "Neo" Melis that had to fight have a code that is saved somewhere, much like the program that determined every motion of the hands in the fight above.

As usually in the morning, I have run out of my limit of 5 submissions per day. But the new 3.76704 submission is relatively new and opens an uncharted territory so it is remotely conceivable that there exists a very minor modification of this code that improves the score sufficiently to beat the real leaders, "Neo" and "Morphine".

"Morphine" is the current leader whose AMS score is 0.03951 above mine. It's just slightly above one percent of my score. To beat him or her (there is at least one woman in the contest, Tatiana Likhomanenko is 20th after 3 submissions only, scary!), one has to improve the score by more than one percent.

It means to increase the number (well, the total weight) of true positives \(s\) by one percent while not increasing the \(b\), or to decrease the number of false positives (well, their total weight) \(b\) by two percent (because AMS is essentially \(s/\sqrt{b}\) while not lowering \(s\), or some linear combination of these options.

It may be done. Maybe.

Of course, the temporary leaderboard may be a misleading benchmark to estimate the final score which will be calculated exactly from the 450,000 test.csv collisions that are not included among the 100,000 collisions used to calculate the preliminary leaderboard. It is plausible that the "differences between AMS of two contestants" will change by 0.1 in average (root mean square) relatively to the preliminary leaderboard so it's possible everyone in the top 20 or 50 has a significant chance. I could do calculations and simulations that would clarify these matters but I think it's better to spend time on improving my (at least temporary) AMS score.

But while "the Matrix String" technology to optimize the machine learning hasn't produced a truly remarkable improvement in the preliminary AMS score, one that could beat "Neo" and "Morphine", for example, I have some reasons to think that its underlying idea is so robust that it could achieve a higher final score than other algorithms (and perhaps other contestants' algorithms).
Higgs contest: the hard way to return to top 3 Higgs contest: the hard way to return to top 3 Reviewed by MCH on July 02, 2014 Rating: 5

No comments:

Powered by Blogger.