The problem of search for the necessary information in an unstructured data volume is urgent, because the unstructured information contains a unique potential for the extraction of new knowledge. The complexity of the unstructured data processing is determined by their variety, strong context dependence and dynamic character. The volumes of the stored and transmitted data every year increase. The number of the parameters, characterizing the data, is also steadily increasing. The existing algorithms of the information retrieval systems do not provide flexible functionality for searching through various collections of documents or web pages. The complexity of the subject search in the given document segment is connected with the necessity to pre-configure the parameters of the mathematical models of the search systems. The values of the parameters permitting to improve the relevance of the search query result have been determined. The use of the genetic algorithm and its operation, mutation and crossover operations, and the probabilistic values for each of the operations have been considered. In this study the chromosomes are numerical values of the coefficients represented in a binary form. Based on the results of the work of the genetic algorithm the coefficients for three families of the search systems have been determined: Apache Lucene, Xapian, Sphinx. On the test samples the performance metrics of each of the search systems have been evaluated: accuracy, completeness, exactness, F-measure and errors. According to the results of the assessment, the metric values increase from 7% to 15% and the search error is reduced from 15% to 50%.
1. Блог компании «Sphinx Technologies Inc». Как устроено ранжирование. – URL: https://habrahabr.ru/company/sphinx/blog/62287/ (да-та обращения: 01.04.2017).
2. WaveAccess. Полнотекстовый поиск с использованием Apache Lucene, 2 сентября 2014. – URL: http://www.waveaccess.ru/blog/2014/september/02/полнотекстовый-поиск-с-использованием-apache-lucene.aspx (дата обращения: 01.04.2017).
3. Андреев М. Генетический алгоритм. Просто о сложном. – URL: https://habrahabr.ru/post/128704/ (дата обращения: 01.04.2017).