Generative Models in Machine Learning and their Applications to Digital Image Generation
DOI:
https://doi.org/10.52428/20758944.v17i51.110Keywords:
Neural networks, Deep Learning, Generative models, Convolutional neural networks, GAN, VAEAbstract
Within the Machine Learning field, two types of algorithms can be distinguished if the nature of their outputs as a perspective is acknowledged. Discriminative and generative models associate data with a response and create new data based on a probabilistic distribution of latent variables, respectively. In recent years, significant progress in the Deep Learning subject has been published, which is the study of deep neural networks. Hence, convolutional neural networks have gained significant territory in different tasks regarding image analysis processing. Among the applications of convolutional neural networks, a few of them are Image classification, object detection, instance segmentation, and facial recognition, among others. However, the field of Deep Learning has seen progress not just in these areas but also in the ability of models to generate new images. Thus, a wide variety of generative models for different purposes has been developed; facial images generation of people who do not exist in real life is an example of the latter. Thereupon, this article aims to analyze different generative models for digital image processing and the theoretical aspects that define the generative models in the field of Deep Learning. Twofold essential models are developed in this article: Generative Adversarial Networks (GAN) and Variational Autoencoders (VAE).
Downloads
References
Abuselidze, G., & Mamaladze, L. (2021). The impact of artificial intelligence on employment before and during pandemic: A comparative analysis. Journal of Physics: Conference Series, 1840(1), 012040. https://doi.org/10.1088/1742-6596/1840/1/012040
Ali, S., DiPaola, D., Lee, I., Hong, J., & Breazeal, C. (2021). Exploring Generative Models with Middle School Students. En Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1–13). Association for Computing Machinery. https://doi.org/10.1145/3411764.3445226
Bang, D., & Shim, H. (2018). MGGAN: Solving Mode Collapse using Manifold Guided Training. arXiv.org:1804.04391 [cs]. Obtenido de : http://arxiv.org/abs/1804.04391
Bank, D., Koenigstein, N., & Giryes, R. (2021). Autoencoders. arXiv:2003.05991 [cs, stat]. http://arxiv.org/abs/2003.05991
Bishop, C. (2006). Pattern Recognition and Machine Learning. Springer.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language Models are Few-Shot Learners. arXiv:2005.14165 [cs]. http://arxiv.org/abs/2005.14165
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. https://arxiv.org/abs/1606.03657v1
Chernyavskiy, A., Ilvovsky, D., & Nakov, P. (2021). Transformers: “The End of History” for NLP? arXiv:2105.00813 [cs]. http://arxiv.org/abs/2105.00813.
Cockburn, I. M., Henderson, R., & Stern, S. (2018). The Impact of Artificial Intelligence on Innovation (Working Paper No 24449; Working Paper Series). National Bureau of Economic Research. https://doi.org/10.3386/w24449.
Deng L., (2012) The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web], IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141-142, doi: 10.1109/MSP.2012.2211477.
Dumoulin, V., & Visin, F. (2016). A guide to convolution arithmetic for deep learning. https://arxiv.org/abs/1603.07285v2
Gholamalinezhad, H., & Khosravi, H. (2020). Pooling Methods in Deep Neural Networks, a Review. arXiv:2009.07485 [cs]. http://arxiv.org/abs/2009.07485
Gm, H., Gourisaria, M. K., Pandey, M., & Rautaray, S. S. (2020). A comprehensive survey and analysis of generative models in machine learning. Computer Science Review, 38, 100285. https://doi.org/10.1016/j.cosrev.2020.100285.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative Adversarial Networks. arXiv:1406.2661 [cs, stat]. http://arxiv.org/abs/1406.2661.
Gui, J., Sun, Z., Wen, Y., Tao, D., & Ye, J. (2020). A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications. arXiv:2001.06937 [cs, stat]. http://arxiv.org/abs/2001.06937
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs]. http://arxiv.org/abs/1512.03385.
Janiesch, C., Zschech, P., & Heinrich, K. (2021). Machine learning and deep learning. Electronic Markets, 31(3), 685–695. https://doi.org/10.1007/s12525-021-00475-2.
Jeon, W., Ko, G., Lee, J., Lee, H., Ha, D., & Ro, W. W. (2021). Chapter Six—Deep learning with GPUs. En S. Kim & G. C. Deka (Eds.), Advances in Computers (Vol. 122, pp. 167–215). Elsevier. https://doi.org/10.1016/bs.adcom.2020.11.003.
Jiang, Y., Chang, S., & Wang, Z. (2021). TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up. arXiv:2102.07074 [cs]. http://arxiv.org/abs/2102.07074.
Khan, A., Sohail, A., Zahoora, U., & Qureshi, A. S. (2020). A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review, 53(8), 5455–5516. https://doi.org/10.1007/s10462-020-09825-6.
Kingma, D. P., & Welling, M. (2014). Auto-Encoding Variational Bayes. arXiv:1312.6114 [cs, stat]. http://arxiv.org/abs/1312.6114.
Kotsiantis, S., Zaharakis, I., & Pintelas, P. (2006). Machine learning: A review of classification and combining techniques. Artificial Intelligence Review, 26, 159–190. https://doi.org/10.1007/s10462-007-9052-3.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, 1097–1105.
Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016). Autoencoding beyond pixels using a learned similarity metric. arXiv:1512.09300 [cs, stat]. http://arxiv.org/abs/1512.09300.
Li, Z., Yang, W., Peng, S., & Liu, F. (2020). A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. arXiv:2004.02806 [cs, eess]. http://arxiv.org/abs/2004.02806
Mirza, M., & Osindero, S. (2014). Conditional Generative Adversarial Nets. https://arxiv.org/abs/1411.1784v1
Ng, A. Y., & Jordan, M. I. (2001). On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, 841–848.
Oord, A. van den, Kalchbrenner, N., & Kavukcuoglu, K. (2016). Pixel Recurrent Neural Networks. arXiv:1601.06759 [cs]. http://arxiv.org/abs/1601.06759.
O’Shea, K., & Nash, R. (2015). An Introduction to Convolutional Neural Networks. ArXiv e-prints. https://arxiv.org/abs/1511.08458.
Pu, Y., Gan, Z., Henao, R., Yuan, X., Li, C., Stevens, A., & Carin, L. (2016). Variational Autoencoder for Deep Learning of Images, Labels and Captions. arXiv:1609.08976 [cs, stat]. http://arxiv.org/abs/1609.08976.
Radford, A., Metz, L., & Chintala, S. (2016). Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv:1511.06434 [cs]. http://arxiv.org/abs/1511.06434.
Raschka, S. (2017). Naive Bayes and Text Classification I - Introduction and Theory. arXiv:1410.5329 [cs]. http://arxiv.org/abs/1410.5329.
Razavi, A., Oord, A. van den, & Vinyals, O. (2019). Generating Diverse High-Fidelity Images with VQ-VAE-2. arXiv:1906.00446 [cs, stat]. http://arxiv.org/abs/1906.00446.
Rish, I. (2001). An Empirical Study of the Naïve Bayes Classifier. IJCAI 2001 Work Empir Methods Artif Intell, 3. doi : 10.1.1.330.2788.
Ruder, S. (2017). An overview of gradient descent optimization algorithms. arXiv:1609.04747 [cs]. http://arxiv.org/abs/1609.04747.
Shekhovtsov, A., Schlesinger, D., & Flach, B. (2021). VAE Approximation Error: ELBO and Conditional Independence. arXiv:2102.09310 [cs, stat]. http://arxiv.org/abs/2102.09310
Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs]. http://arxiv.org/abs/1409.1556
Singh, G., Kumar, B., Gaur, L., & Tyagi, A. (2019). Comparison between Multinomial and Bernoulli Naïve Bayes for Text Classification. 2019 International Conference on Automation, Computational and Technology Management (ICACTM), 593–596. https://doi.org/10.1109/ICACTM.2019.8776800
Spiegel, M. R., Schiller, J. J., & Srinivasan, R. A. (2000). Schaum’s outline of theory and problems of probability and statistics; 2nd ed. McGraw-Hill. https://cds.cern.ch/record/450344
Tan, Q., Gao, L., Lai, Y.-K., & Xia, S. (2018). Variational Autoencoders for Deforming 3D Mesh Models. arXiv:1709.04307 [cs]. http://arxiv.org/abs/1709.04307.
Wang, M., & Deng, W. (2021). Deep Face Recognition: A Survey. Neurocomputing, 429, 215–244. https://doi.org/10.1016/j.neucom.2020.10.081.
Wang, Q., Ma, Y., Zhao, K., & Tian, Y. (2020). A Comprehensive Survey of Loss Functions in Machine Learning. Annals of Data Science. https://doi.org/10.1007/s40745-020-00253-5
Yamashita, R., Nishio, M., Do, R. K. G., & Togashi, K. (2018). Convolutional neural networks: An overview and application in radiology. Insights into Imaging, 9(4), 611–629. https://doi.org/10.1007/s13244-018-0639-9
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas, D. (2016). StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. https://arxiv.org/abs/1612.03242v2
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 Oscar Contreras Carrasco
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License 4.0 that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.