traditional neural network classify an image based on optimal weighs and bias. It does not learn the semantics of the image.
Convuluted neural network uses kernels to highlight (pick up, understand) the structure of an image like vertical edges, horizontal edges, orientation, texture and colour.
Traditional neural network learns as weak even if image is scrambled (transposing pixels in the same way for every input picture). CNN could not as this will destroy the structural information of the picture.
CNN “convert” the original input into a set of structural information representational input to a traditional neural network to do the final classification. Similar class would generate a similar representation by the convolution layers before input to the dense layer ( ie the traditional neural network).
CNN can give the ability of the model to see.