Testing ANN libraries

This week I validated the benchmark and corrected it with mentor’s comments. I applied it to a test dataset of 1000 maps and applied a linear fit to see how it will escalate. I also installed some ANN libraries in the Neurovault’s docker image and I tested them and tried to made an idea of how they work. I also made experiments with different distance metrics outside the libraries and using them. I coded a dictionary-based indexing, allowing to follow us the indexed maps.

Correcting the benchmark

Last week I worked in implementing a benchmark that will allow us to compare the previous implementation with the new one. This week, I’ve made some minor corrections and improvements thanks to the comments of my mentor. Here, I’m showing some plots corresponding to the new corrections. I have deleted the "busy server" option and added a linear fitting to the results, a downloading of a test-set of 1000 maps and I changed the query function from the specific one being used now to "find_similar" which is the generic one, no matter which is the function below.

The new plot for indexing:

I had problems when trying to make the exponential fitting, this it was not working for me, but I don’t think exponential fitting makes much sense with this kind of growing shape.

In the other hand, the query benchmark resulted in a very efficient result:

Since we are now querying how much "find_similar" takes to return the page, and the page is limited to 100 similar maps, the benchmark stops growing at 100. I fitted the line to the 1st 100 results. Actually, what can be happening is that the query itself is not significant wrt the time django needs to create the html, so it could be that the real fitting should have been done from the 101 value on:

ANN libraries

With respect to to ANN libraries, I installed and tested Nearpy, Panns, Flann and LSHF-BallTree-KDTree ANNs from scikit-learn library in the Neurovault’s docker image. I made a bunch of experiments, like building, fitting and querying to understand similarities and specific pros/cons between them and get to know better the different constraints that have to be taken into account to make a proper choice.

Results

Here, I show a toy-benchmark of Nearpy (I have built the engine with binary projections of size 20 and 10 hashes):

Results from queries: they can be tested manually in Neurovault using the neurovault.org/images/number/ adress. For example, after querying this image the Engine gives this image as result, which is PRETTY IMPRESSIVE, taking into account that I’ve done no effort to optimize the Engine and the distance used (Euclidean) needs of a regularization of the data (after performing it, probably results would be better).

Results are not "complete". It gives just the ones that are closer, but not the rest. This results must be taken carefully, as they are just a toy-bench, not real data, not real results and not tested if results are accurate or just random. But it is useful to give an idea of the scale and magnitudes we are talking about. I am super-impressed by the bench results (and Nearpy was one of the worst ANN libraries in the global bench…).

Benchmark plot:

For the next week I will keep testing the libraries and reading docs to evaluate which is the solution that best fits us.