Adacompress: Adaptive compression for online computer vision services
Big data and deep learning has been merged to create the great success of artificial intelligence which increases the burden on the network's speed, computational complexity, and storage in many applications. The image Classification task is one of the most important computer vision tasks which has shown a high dependency on Deep Neural Network to improve their performance in many applications. Recently, they tend to use different image classification model on the cloud just to share the computational power between the different user as done in this paper. Most of the researchers in the literature work to improve the structure and increase the depth of DNNs to achieve better performance from the point of how the features are represented and crafted using conventional neural networks (CNNs). As the most well-known image classification datasets are compressed using JPEG as it is optimized for Human Visual System (HVS) not the machines (DNNs).
This paper is reviewing different approaches in terms of optimizing the image representation for the machine to increase the compression ratio of the image while maintaining the same classification accuracy. This paper covers mainly two stages in any compression pipeline which are quantization table design and color space. The motivation in this review paper is to search for the most effective way to optimize an image representation for a machine and HVS, which both can have the advantage of recognizing the image class and be aligned to the same image quality.
One of the major parameters that can be changed in the JPEG pipeline is the quantization table, which is the main source of artifacts to be added in the image to make it lossless compression. The authors got motivated to change the JPEG configuration to optimize the uploading rate of different cloud computer vision without considering pre-knowledge of the original model and dataset. In contrast to other papers in the literature which they adjust the JPEG configuration according to retrain the parameters or the structure of the model. They considered the lack of undefining the quantization level which decreases the image rate and quality but the deep learning model can still recognize it. The authors used Deep Reinforcement learning (DRL) in an online manner to choose the quantization level to upload an image to the cloud for the computer vision model and this is the only approach to design an adaptive JPEG based on RL mechanism.
The approach is designed based on an interactive training environment which represents any computer vision cloud services, then they needed a tool to evaluate and predict the performance of quantization level on an uploaded image, so they used a deep Q neural network agent. They feed the agent with a reward function which considers two optimization parameters, which are accuracy and image size, that work with iterative behavior interacting with the environment. The environment is exposed to different images with different virtual redundant information that needs an adaptive solution for each image to select the suitable compression level for the model. Thus, they designed an explore-exploit mechanism to train the agent on different scenery which is designed in deep Q agent as an inference-estimate-retain mechanism to control to restart the training procedure for each image. The authors verify their approach by providing some analysis and insight using Grad-Cam by showing some patterns of each image with its own corresponding quality factor. Each image shows a different response from a deep model to show that images are more sensitive to large smooth areas, while is more robust compression for images with complex textures.
The authors used a pre-trained model as a feature extractor to select a Quality Factor (QF) for the JPEG. I think what would be missing that they did not report the distribution of each of their span of QFs as it is important to understand which one is expected to contribute more. In my video, I have done one experiment using Inception-V3 to understand if it is possible to get better accuracy. I found that it is possible by using the inception model as a pre-trained model to choose the QF but the mobile models are shallower than the inception models which make it less complex for edge devices. I think it is possible to achieve at least the same accuracy or even more if we replaced the mobile model with the inception.