The nowadays supervised machine learning algorithms use the feature description to classify objects. Such a description may include a great number of features provided the task demands it. In the work the genetic algorithm based feature selection as a part of the software complex of bibliographic data processing has been described. The analysis of the problem situation within the framework of the subject area, related to the feature description size of the bibliographic data objects, has been carried out. A method of solving the given problem due to the genetic algorithm feature selection has been proposed. The paper includes the general principles of the software model and the implementation details in the Python programming language. The problem of feature description and re-learning in bibliographic data processing has been solved, it has been shown that learning and re-learning accelerates without loss of the classification quality. The developed software for genetic algorithm feature selection can be applied within the framework of the software complex for bibliographic data processing.The following results have been obtained during the computational experiment: the number of features used decreased from 26 to 15, and the quality of classification increased by 3 % due to the elimination of features that contribute to retraining.
1. Петров Е.Н. Исследование и разработка методики и алгоритма классификации библиографических данных с помощью условно-случайных полей // Итоги диссертационных исследований. Т. 2. Материалы Х Всероссийского конкурса молодых ученых. М.: РАН, 2018. С. 91–98.
2. Петров Е.Н., Черников Б.В., Борисова Е.А. Верификация методики классификации библиографических данных на основе условно-случайных полей. Ч.1 // Современные наукоемкие технологии. 2019.
№ 11. С. 113–118.
3. Berry M.W., Mohamed A.Z., Yap B.W. Supervised and unsupervised learning for data science. Springer International Publishing, 2019. 187 p.
4. Вороновский Г.К., Махотило К.В., Петрашев С.Н., Сергеев С.А. Генетические алгоритмы, искусственные нейронные сети и проблемы виртуальной реальности. X.: Основа, 1997. 112 с.
5. An improved gene expression programming based on niche technology of outbreeding fusion /
C.X. Wang, J.J. Zhang, S.L. Wu et al // Informatica. 2017. Vol. 41. P. 25–30.
6. Wang J., Li Z., Huang W., Xiao K. Character information extraction based on CRFsuite // 2016 International Conference on Advanced Electronic Science and Technology (AEST 2016). Atlantis Press, 2016.
P. 147–154.