Convolution filters are decomposed into filter atoms and coefficient layers for effective model tuning. The process includes two steps: spatial-only convolution with filter atoms, which focuses on spatial information, and cross-channel mixing with atom coefficients that combine the intermediate features to produce output features. While the model retains efficient processing by treating the convolution operation as a single layer, it allows for optimization by adjusting only the filter atoms to cater to specific tasks, with coefficients remaining static from pre-trained models.
This approach involves decomposing each convolutional layer F into two standard convolutional layers: a filter atom layer D and an atom coefficient layer α with 1 × 1 filters that represent combination rules of filter atoms.
During model tuning, α is obtained from the pre-trained model and remains unchanged, while only filter atoms D adapt to the target task.
The convolution operation is still performed as one layer, without generating intermediate features, to avoid memory cost.
In practice, the two-step convolution operation allows filter atoms D to focus exclusively on spatial convolution, while atom coefficients α handle cross-channel mixing.
#convolutional-neural-networks #filter-decomposition #model-tuning #feature-extraction #deep-learning
Collection
[
|
...
]