1c full text search. Full text search and its features

3.4.9 full-text search: Automated documentary search, in which the entire text or significant parts of the text is used as a search image of a document. Full text searching, fr. Recherche en texte integral)

Full text index

The first versions of full-text search programs involved scanning the entire contents of all documents in search of a given word or phrase. When using this technology, the search took a very long time (depending on the size of the database), and on the Internet it would be impossible. Modern algorithms in advance, a so-called full-text index is formed for searching - a dictionary in which all words are listed and it is indicated in which places they occur. If there is such an index, it is enough to search the right words in it and then a list of documents in which they occur will immediately be obtained.

Notes

see also


Wikimedia Foundation. 2010 .

See what "Full-text search" is in other dictionaries:

    Automated information retrieval, in which the full text or essential parts of the text are used as a search image of a document. In English: Full text searching See also: Automated information retrieval ... ... Financial vocabulary

    full text search- Automated documentary search, in which the full text or essential parts of the text are used as a search image of the document. [GOST 7.73 96] Topics search and dissemination of information Generalizing terms information ... Technical Translator's Handbook

    full text search- 3.4.9 full-text search: Automated documentary search, in which the full text or significant parts of the text are used as the search image of the document en Full text searching fr Recherche en texte integral Source ...

    full text search- Rus: full text search Eng: full text searching Fra: recherche en texte integral Automated document search, in which its full text or significant parts of the text is used as a search image of a document. GOST 7.73 ... Dictionary of Information, Library and Publishing

    FULL TEXT SEARCH- according to GOST 7.73–96 “Search and dissemination of information. Terms and definitions” is an automated documentary search, in which its full text or essential parts of the text are used as a search image of a document ... Office work and archiving in terms and definitions

    Data retrieval is a branch of computer science that studies algorithms for searching and processing information in both structured (see e.g. databases) and unstructured (e.g., Text Document) data. Data search is inextricably linked with the concept ... ... Wikipedia

    Information retrieval (IP) is the process of searching for unstructured documentary information and the science of this search. Contents 1 History 2 Information retrieval as a process ... Wikipedia

    - (English Information retrieval) the process of searching for unstructured documentary information that satisfies information needs (English) Russian, and the science about this search ... Wikipedia

    GOST 7.73-96: System of standards for information, librarianship and publishing. Search and dissemination of information. Terms and Definitions- Terminology GOST 7.73 96: System of standards for information, librarianship and publishing. Search and dissemination of information. Terms and definitions original document: 3.2.5 automated information search system: IPS, ... ... Dictionary-reference book of terms of normative and technical documentation

Books

  • History of Byzantium. Reader. Part 2. Historical Documents and Researches (DVD), Vladimir Martov, "Directmedia Publishing" publishing house releases a new series "Clio", which is a series of publications on world history. Open a series of readers on the history of Byzantium - "Historians of Byzantium" ... Category: History. Multimedia Publisher:

Full text search engine

Basic features of full-text search

  • transliteration support (writing Russian words with Latin characters in accordance with GOST);
  • substitution support (writing part of the characters in Russian words with single-key Latin characters);
  • the possibility of fuzzy search (the letters in the found words may differ) with an indication of the fuzziness threshold;
  • the ability to specify the scope of the search for the selected metadata objects;
  • presentation of search results in XML and HTML format with highlighting of found words;
  • full-text indexing of the names of standard fields ("Code", "Name", etc.) in all configuration languages;
  • performing a search taking into account synonyms of Russian, English and Ukrainian languages;
  • the morphological dictionary of the Russian language contains a number of specific words related to areas of activity automated using the 1C: Enterprise software system;
  • the ability to use additional full-text search dictionaries;
  • The delivered dictionaries include dictionary bases and dictionaries of thesaurus and synonyms of Russian, Ukrainian and English provided by Informatics.

Full text search in the database

The mechanism of full-text search in the data of the 1C:Enterprise 8 system allows you to search the database with the indication of search operators (AND, OR, NOT, NEAR, etc.).

The full-text search mechanism is based on the use of two components:

  • a full-text index that is created for the current database and then periodically updated as needed;
  • full text search tools.

Creating and updating a full-text index can be done interactively, in 1C:Enterprise 8 mode, or programmatically, using the built-in language. Below is the dialog for managing full-text indexing in 1C:Enterprise mode:

To perform a search for data in a database, for example, the Data Search processing below can be used.

In the presented example, documents were found whose details contain values ​​starting with "Complete" and "vent" - the counterparty "Complete TD" and details containing various forms of the word "fan".

The 1C:Enterprise 8 system allows you to selectively include the data of application objects and their details in full-text search. It is also possible to limit the search scope to only specified configuration objects.

Full text search in the help system

The 1C:Enterprise 8 help system also implements a full-text search that allows you to use the search operators AND, OR, NOT, NEAR, etc. In this case, the found words are highlighted.

Software interface

The following application objects are used:

  • Full Text Search Manager
  • ListFullTextSearch
  • ItemListFullTextSearch

The FullTextSearch Manager has methods for building a search index, checking its validity, and creating a search list of the FullTextSearchList type for a given query.

The FullTextSearch Manager is available as a property of the FullTextSearch global context.

The FullTextSearch List provides access to search results. You can also specify the search scope as an array of configuration metadata elements.

The result of the search is a FullTextSearchListItem.

Search string operators

The following search operators are allowed in the input line:

AND (AND or #) - search for data containing all words; example: "record AND document" - the details must contain both "carrying out" and "document" (taking into account morphology);

OR (OR or | or,) - search for at least one of the listed words; example: "record OR document" - at least one of the words "record" or "document" must be in the details;

NOT (NOT or ~) - search for data whose details contain the first word, but not the second; example: "closing NOT month" - all containing "closing" but not containing the word "month" will be found. The use of "~" at the beginning of a line is not allowed;

NEAR/n (NEAR/[+/-]n) - search for data containing the specified words in one attribute, taking into account morphology at a distance of n words between words.

The sign indicates in which direction from the first word the second word will be searched ("+" - after the first; "-" - before the first word).

If the sign is not specified, then data containing the specified words at a distance of n words from each other will be found.

The order of the words doesn't matter.

  • "hair dryer NEAR/3 air" - data will be found in which "air" is no more than 3 words before or after "hair dryer";
  • hair dryer NEAR/+3 air - data will be found in which "air" is no more than 3 words after "hair dryer";
  • hair dryer NEAR/-3 air - data will be found in which "air" is no more than 3 words before "hair dryer".

NEAR (NEAR) - a simplified distance operator: both words are located no further than 8 words from each other; example: "holding a NEAR document";

"" (text in quotation marks) - search for an exact phrase, taking into account the morphology, example: "sending a document" - equivalent to: holding /1 document;

() - word grouping (any number of nesting levels); example: "(posting | statement) # (invoice, document)";

* - search using a wildcard (replacing the end of a word). More than 1 significant character must be entered; example: "docu*" - finds "document", "document", "documentary", etc.;

# - fuzzy search for words with a given number of differences from the specified one (if not specified, then = 1); example: query "#System" will find "system", "system"; query "System#2" will find "sittama", "settema";

Search taking into account synonyms of Russian, English and Ukrainian languages. "!" placed before the corresponding word; example: searching for "!red tile", will also find "scarlet tile" and "coral tile".

If no operators are specified (words are typed with a space), then the program searches for all words from the query using the AND operator.

Examples

SearchList = FullTextSearch.CreateList("", 20); SearchList.GetDescription = true;

ArrayMD = New Array(); ArrayMD.Add(Metadata.Catalogs.Products); ArrayMD.Add(Metadata.Documents.CashReceipt);

SearchList.SearchArea = ArrayMD; SearchList.SearchString = SearchInputField; SearchList.PortionSize = PortionSize; SearchList.FirstPart();

If SearchList.FullCount() = 0 Then If SearchList.TooManyResults() Then Warning("Too many results, please refine your query."); EndIf; Return; EndIf;

Count = SearchList.FullCount();

StrHTML = SearchList.GetDisplay(FullTextSearchDisplayType.HTMLText); Report(StrHTML);

For each index=0 By LookupList.Count-1 Loop element = LookupList.Get(index); Report(element.View); EndCycle;

Peculiarities

Full-text search works on the entire data array, so when using it, it is necessary to pass the result through a security filter.

For example, in a multibase system, you need to cut off objects from other bases.

In addition, such filtering closely intersects with access control. It is known that very often the "hole" in security is precisely the search mechanisms.

The functionality of the new search is based on two mechanisms:
- full-text search (works very fast and requires a minimum of computing resources);
- search by means of a DBMS (in the general case, the duration of the search and the cost of computing resources are proportional to the amount of information in the table).

In the current implementation, the list will be searched without use full-text search in the following cases ():
- full-text index is disabled on the level information base;
- the object of the main table is not indexed by the full-text index;
- as a result of a search using full-text search, an error was received.

If full-text search is enabled in the infobase, and the index is not updated at all or partially (from my practice of 95% of the Customers' infobases), then the user will receive either an unreliable or empty search result during the search.

We ask Firm 1C - how to be? How to ensure that search results are always valid?
We get the answer: Yes, in order for search results to be up-to-date when full-text search is enabled, you need to ensure that the full-text search index is up-to-date. There are no other options for effective and up-to-date search yet ().

And whether there is generally "an actual full-text index"? Depends on the number of users, the intensity of information changes in the database, and the frequency of index updates. Typically, an index update is run every 60 seconds. Well, if not many objects were changed, and the procedure managed to process all the changes in these 60 seconds. And if you did a reposting of a group of documents, or a mass rewriting of the directory? In this case, no one can guarantee the time after which the search in the index will again give reliable data.
In principle, this is not particularly critical, except for a few situations. A common option for users to work is to set a selection in the list by some value, for example, "Counterparty", enter a new one or copy an existing document and write it down. With the old search new document instantly visible in the list. Now the user will see it only after N seconds at best, where N is closer to 50-60 seconds rather than 2-3.
If you do not notice that there is no new document and provide information to someone based on the selected results, then it will be obviously unreliable.

It was in the case normal operation with information base. What happens in specific situations? I'll give you a couple of examples.
1) In the working database, the full-text index is enabled and updated frequently. The user asks to deploy a copy of the working database to him, so that he can analyze the data on it.
We restore the backup and give access. But full-text search will not work, because. the index is stored not in the DBMS, but in separate files(both in file and in client-server version). The index is not in the dt file.
those. in order for the user to be able to use the list search, the full-text index in this database must be turned off. True, the user will be slightly surprised that the search will take much longer. Or rebuild the index across the entire database.

2) (Actual for more or less large bases). In the production database, the full-text index is enabled and updated frequently. The end of the month comes and the closing of the period begins. We are starting to bulk load and transfer documents. To reduce the load on the system, we block the execution routine tasks, respectively, and updating the index stops. Users will be, to put it mildly, at a loss - why are there no new or changed documents in the lists. The only way out is to disable full-text search for the infobase, and, accordingly, get more heavy load on equipment due to a heavy search for all the details.

Thus, it seems to me that the index update operation will become another headache for infobase administrators.
The system, which previously guaranteed 100% accuracy and relevance of information at any time, is now turning into help system which cannot be completely certain.
And users get another reason to reproach IT people - "your system is not working properly."



Loading...
Top