Is Matrix Neural Network the Alternative of Convolutional Neural Network?
Currently (2025), deep learning is the most important and popular methodology in artificial intelligence (AI) and artificial neural network (ANN) is the foundation of deep learning. The main drawback of ANN is the boom problem of a huge number of parametric weights when ANN in deep learning establishes a large number of hidden layers. The boom problem can be alleviated by high-performance computer but will be serious in case of high-dimension input data like image. The excellent solution for image processing within context of deep learning is that large parametric weight vector is reduced into much smaller window encoded by a so-called filtering kernel which is often 3×3 matrix or 5×5 matrix which is convoluted over entire image data. ANN with support of such filtering kernel is called convolutional neural network (CNN). Many researches prove that CNN is feasible and effective in image processing. The hidden cause of the effectiveness of CNN is that the visionary structure of an image is aggregated in such a way that filtering kernel is ideal to extract image features. However, it is not asserted that matrix-based filtering kernel is appropriate to other high-dimension data that is not image. Another solution of the boom problem is that large parametric weight vector is organized as matrix that is the same structure of 2-dimension data like image, which leads to a so-called matrix neural network (MNN) whose parameters are weighted matrices. Computation cost of MNN is decreased significantly in comparison with ANN but it is necessary to test the effectiveness of MNN with respect to CNN. This is the main hypothesis “whether MNN is the alternative of CNN” which is tested in this research, hinted by the research title. Moreover, transformer which is the new trend (2025) in AI and deep learning, which aims to improve/replace traditional ANN by self-supervised learning, in which attention is the significant mechanism of self-supervised learning. Anyhow, attention which is the cornerstone of transformer is the representation of internal structure/relationship inside high-dimension data like image. Therefore, the implicit deep meanings of attention and filtering kernel are similar, which represents feature of data, which does not go beyond parametric weights too. In general, the research has two goals: 1) explaining and implementing ANN, CNN, and transformer (attention) and 2) applying analysis of variance (ANOVA) into evaluating the effectiveness of ANN, CNN, and transformer (attention) within context of image classification. The ultimate result is that it is not asserted that MNN is the alternative of CNN but MNN can be an optional choice for implementing ANN in context of image processing instead of focusing on the unique CNN solution. Moreover, the incorporation of MNN and attention in implementing transformer produces a compromising solution of high performance and computational cost.