Generative Models in Machine Learning and their Applications to Digital Image Generation

Authors

  • Oscar Contreras Carrasco Universidad Privada del Valle

DOI:

https://doi.org/10.52428/20758944.v17i51.110

Keywords:

Neural networks, Deep Learning, Generative models, Convolutional neural networks, GAN, VAE

Abstract

Within the Machine Learning field, two types of algorithms can be distinguished if the nature of their outputs as a perspective is acknowledged. Discriminative and generative models associate data with a response and create new data based on a probabilistic distribution of latent variables, respectively. In recent years, significant progress in the Deep Learning subject has been published, which is the study of deep neural networks. Hence, convolutional neural networks have gained significant territory in different tasks regarding image analysis processing. Among the applications of convolutional neural networks, a few of them are Image classification, object detection, instance segmentation, and facial recognition, among others. However, the field of Deep Learning has seen progress not just in these areas but also in the ability of models to generate new images. Thus, a wide variety of generative models for different purposes has been developed; facial images generation of people who do not exist in real life is an example of the latter. Thereupon, this article aims to analyze different generative models for digital image processing and the theoretical aspects that define the generative models in the field of Deep Learning. Twofold essential models are developed in this article: Generative Adversarial Networks (GAN) and Variational Autoencoders (VAE).

Downloads

Download data is not yet available.

References

Abuselidze, G., & Mamaladze, L. (2021). The impact of artificial intelligence on employment before and during pandemic: A comparative analysis. Journal of Physics: Conference Series, 1840(1), 012040. https://doi.org/10.1088/1742-6596/1840/1/012040

Ali, S., DiPaola, D., Lee, I., Hong, J., & Breazeal, C. (2021). Exploring Generative Models with Middle School Students. En Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1–13). Association for Computing Machinery. https://doi.org/10.1145/3411764.3445226

Bang, D., & Shim, H. (2018). MGGAN: Solving Mode Collapse using Manifold Guided Training. arXiv.org:1804.04391 [cs]. Obtenido de : http://arxiv.org/abs/1804.04391

Bank, D., Koenigstein, N., & Giryes, R. (2021). Autoencoders. arXiv:2003.05991 [cs, stat]. http://arxiv.org/abs/2003.05991

Bishop, C. (2006). Pattern Recognition and Machine Learning. Springer.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language Models are Few-Shot Learners. arXiv:2005.14165 [cs]. http://arxiv.org/abs/2005.14165

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. https://arxiv.org/abs/1606.03657v1

Chernyavskiy, A., Ilvovsky, D., & Nakov, P. (2021). Transformers: “The End of History” for NLP? arXiv:2105.00813 [cs]. http://arxiv.org/abs/2105.00813.

Cockburn, I. M., Henderson, R., & Stern, S. (2018). The Impact of Artificial Intelligence on Innovation (Working Paper No 24449; Working Paper Series). National Bureau of Economic Research. https://doi.org/10.3386/w24449.

Deng L., (2012) The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web], IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141-142, doi: 10.1109/MSP.2012.2211477.

Dumoulin, V., & Visin, F. (2016). A guide to convolution arithmetic for deep learning. https://arxiv.org/abs/1603.07285v2

Gholamalinezhad, H., & Khosravi, H. (2020). Pooling Methods in Deep Neural Networks, a Review. arXiv:2009.07485 [cs]. http://arxiv.org/abs/2009.07485

Gm, H., Gourisaria, M. K., Pandey, M., & Rautaray, S. S. (2020). A comprehensive survey and analysis of generative models in machine learning. Computer Science Review, 38, 100285. https://doi.org/10.1016/j.cosrev.2020.100285.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.

Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative Adversarial Networks. arXiv:1406.2661 [cs, stat]. http://arxiv.org/abs/1406.2661.

Gui, J., Sun, Z., Wen, Y., Tao, D., & Ye, J. (2020). A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications. arXiv:2001.06937 [cs, stat]. http://arxiv.org/abs/2001.06937

He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs]. http://arxiv.org/abs/1512.03385.

Janiesch, C., Zschech, P., & Heinrich, K. (2021). Machine learning and deep learning. Electronic Markets, 31(3), 685–695. https://doi.org/10.1007/s12525-021-00475-2.

Jeon, W., Ko, G., Lee, J., Lee, H., Ha, D., & Ro, W. W. (2021). Chapter Six—Deep learning with GPUs. En S. Kim & G. C. Deka (Eds.), Advances in Computers (Vol. 122, pp. 167–215). Elsevier. https://doi.org/10.1016/bs.adcom.2020.11.003.

Jiang, Y., Chang, S., & Wang, Z. (2021). TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up. arXiv:2102.07074 [cs]. http://arxiv.org/abs/2102.07074.

Khan, A., Sohail, A., Zahoora, U., & Qureshi, A. S. (2020). A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review, 53(8), 5455–5516. https://doi.org/10.1007/s10462-020-09825-6.

Kingma, D. P., & Welling, M. (2014). Auto-Encoding Variational Bayes. arXiv:1312.6114 [cs, stat]. http://arxiv.org/abs/1312.6114.

Kotsiantis, S., Zaharakis, I., & Pintelas, P. (2006). Machine learning: A review of classification and combining techniques. Artificial Intelligence Review, 26, 159–190. https://doi.org/10.1007/s10462-007-9052-3.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, 1097–1105.

Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016). Autoencoding beyond pixels using a learned similarity metric. arXiv:1512.09300 [cs, stat]. http://arxiv.org/abs/1512.09300.

Li, Z., Yang, W., Peng, S., & Liu, F. (2020). A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. arXiv:2004.02806 [cs, eess]. http://arxiv.org/abs/2004.02806

Mirza, M., & Osindero, S. (2014). Conditional Generative Adversarial Nets. https://arxiv.org/abs/1411.1784v1

Ng, A. Y., & Jordan, M. I. (2001). On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, 841–848.

Oord, A. van den, Kalchbrenner, N., & Kavukcuoglu, K. (2016). Pixel Recurrent Neural Networks. arXiv:1601.06759 [cs]. http://arxiv.org/abs/1601.06759.

O’Shea, K., & Nash, R. (2015). An Introduction to Convolutional Neural Networks. ArXiv e-prints. https://arxiv.org/abs/1511.08458.

Pu, Y., Gan, Z., Henao, R., Yuan, X., Li, C., Stevens, A., & Carin, L. (2016). Variational Autoencoder for Deep Learning of Images, Labels and Captions. arXiv:1609.08976 [cs, stat]. http://arxiv.org/abs/1609.08976.

Radford, A., Metz, L., & Chintala, S. (2016). Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv:1511.06434 [cs]. http://arxiv.org/abs/1511.06434.

Raschka, S. (2017). Naive Bayes and Text Classification I - Introduction and Theory. arXiv:1410.5329 [cs]. http://arxiv.org/abs/1410.5329.

Razavi, A., Oord, A. van den, & Vinyals, O. (2019). Generating Diverse High-Fidelity Images with VQ-VAE-2. arXiv:1906.00446 [cs, stat]. http://arxiv.org/abs/1906.00446.

Rish, I. (2001). An Empirical Study of the Naïve Bayes Classifier. IJCAI 2001 Work Empir Methods Artif Intell, 3. doi : 10.1.1.330.2788.

Ruder, S. (2017). An overview of gradient descent optimization algorithms. arXiv:1609.04747 [cs]. http://arxiv.org/abs/1609.04747.

Shekhovtsov, A., Schlesinger, D., & Flach, B. (2021). VAE Approximation Error: ELBO and Conditional Independence. arXiv:2102.09310 [cs, stat]. http://arxiv.org/abs/2102.09310

Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs]. http://arxiv.org/abs/1409.1556

Singh, G., Kumar, B., Gaur, L., & Tyagi, A. (2019). Comparison between Multinomial and Bernoulli Naïve Bayes for Text Classification. 2019 International Conference on Automation, Computational and Technology Management (ICACTM), 593–596. https://doi.org/10.1109/ICACTM.2019.8776800

Spiegel, M. R., Schiller, J. J., & Srinivasan, R. A. (2000). Schaum’s outline of theory and problems of probability and statistics; 2nd ed. McGraw-Hill. https://cds.cern.ch/record/450344

Tan, Q., Gao, L., Lai, Y.-K., & Xia, S. (2018). Variational Autoencoders for Deforming 3D Mesh Models. arXiv:1709.04307 [cs]. http://arxiv.org/abs/1709.04307.

Wang, M., & Deng, W. (2021). Deep Face Recognition: A Survey. Neurocomputing, 429, 215–244. https://doi.org/10.1016/j.neucom.2020.10.081.

Wang, Q., Ma, Y., Zhao, K., & Tian, Y. (2020). A Comprehensive Survey of Loss Functions in Machine Learning. Annals of Data Science. https://doi.org/10.1007/s40745-020-00253-5

Yamashita, R., Nishio, M., Do, R. K. G., & Togashi, K. (2018). Convolutional neural networks: An overview and application in radiology. Insights into Imaging, 9(4), 611–629. https://doi.org/10.1007/s13244-018-0639-9

Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas, D. (2016). StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. https://arxiv.org/abs/1612.03242v2

Published

10-12-2021

How to Cite

Contreras Carrasco, O. (2021). Generative Models in Machine Learning and their Applications to Digital Image Generation. Journal Boliviano De Ciencias, 17(51), 79–109. https://doi.org/10.52428/20758944.v17i51.110

Issue

Section

Review Paper