>What is the reason for adding this layer, is it to keep one NN which is getting trained on all input features as against 1/4 feature list which goes in ensemble-net architecture.
No, this is _before_ I have discovered EnsembleNet architecture. So at this point, no EnsembleNet exists yet.
And this is not “added” layer, that is the very basic idea of taking pre-last layer activations and using them as a descriptor, e.g. fc7 features in AlexNet. So it is post-processing, the same as L2Norm.
I`ll try to explain again.
before PCB/Ensemble idea:
Training : backboneCNN -> linear(2048) ->ReLU -> clf(==linear(5004)).
Testing:
1) predictionA (same as in training)= backboneCNN -> linear(2048) ->ReLU -> clf(==linear(5004)).
2)predictionB (NN): backboneCNN -> linear(2048) -> L2Norm()
Save this for all the training data and test, find the nearest neigbours for each test image.
3) predictionFinal = predictionA + predictionB
With PCB/Ensemble: the same, but with concating all the 4 heads