Showing 1923 of 1923 factors ~
100% of
all factors
P R
#0Page rank. The factor is remapped.
T R
#1Textual relevance (maxfreq - the frequency of the most frequent word, which makes sense of the length of the document).
L R
#2Link Relevance. The factor is remapped.
Pr Bonus
#3Priority bonus, priority 7 - text priority. Factor is binary, has value 0 for all single word queries, and value 1 for almost all two or more word queries, except for a very small number of responses, for which there are no links that passed the quorum, and the text did not pass the quorum either.
T Rp1
#4Priority strict for TR is text priority - there are all query words somewhere in the document (and they pass contextual restrictions of the query, for example, both words d.b. in the same sentence).
T Rp2
#5The phrase priority for TR is text priority - there are all query words in a row in the document.
L Rp1
#6(strict) have all query words in one link.
L Rp2
#7(phrase) have all query words in a row in one link.
T Rtitle
#8The presence of the exact phrase (query text) in the title (to be exact, in the first sentence of the document). Context constraints and stop words are taken into account exactly as in TRp2, i.e. factor[8] minors factor[5]
T Rhr
#9A quorum site was encountered in which all word positions are marked as having BEST_RELEV relevance (header or meta keywords).
Removed_10
#10News
#11This is news (determined by distinctive ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/Klassificacionnye?v=tkd#h45859-3 patterns in url)) ).
Shop
#12This is a store offer (determined by the characteristic ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/Klassificacionnye?v=tkd#h45859-4 patterns in url`)) ). Not used (deprecated)
Cat
#13This is a directory (determined by characteristic ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/Klassificacionnye?v=tkd#h45859-2 patterns in the url)) or by the Yandex directory).
Ya Bar
#14Attendance from Bar - ((http://wiki.yandex-team.ru/AndrejjKostjagin/YaBarLog/HostStat Data Description)). The factor is remapped.
Long
#15Long document (the longer the document, the greater the value of the factor).
T Rhitw
#16Hitweigt is a variant of textual relevance, in which the weights of all hits are considered equal (i.e. no premiums for title and word proximity are taken into account). In this case the relevant hits must pass the constraints of the syntactic wizard, i.e. we can assume that the TRhitw factor is 0 if and only if SoftAndOk is 0
Long Query
#17The sum of the idf of the query words. The name does not reflect the essence: for example, for the query 'Gadyach' this factor will be greater than for the query 'Moscow Peter Yekaterinburg Samara'.
Pure Text
#18Long text without references.
Root
#19It's a muzzle.
Removed20
#20Removed21
#21Geo
#22Indicates a match between the user's region and the site at the country level. The factor is binary: 1-match, 0-no. Based on ((http://wiki.yandex-team.ru/ЯндексПоиск/КлассификацияСайтовИСтраниц/Географическая/ИспользованиеВПоиске geoclassification of sites))
Matching thematic spectra of the query and the document. Subject of the query is the result of work ((http://wiki.yandex-team.ru/EvgenijjKroxalev/subquery SubquerySearch wizard rules)) Subjects of the document are taken from the Yandex catalog
S R
#24A complex static rank, assembled from static components by a separate formula((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/#oftnd1 *)).
T Rref
#25Factor about the number of refines. The query language has a user refines ('word preceded by a percent sign') feature. This is supposed to mean something like 'it would be nice to have a word in the document'. The only known ((http://staff.yandex-team.ru/gulin Andrey Gulin)) valuable use of this feature is querying [%official %site FirmName]. This feature is unknown to users, since it is not described in any documentation. It is planned that it will disappear from the query language, but the words with USER_REFINE priority will remain in the wizard. The factor tells you how many maximum USER_REFINE words were encountered simultaneously within a single quorum hit. It is said to be between 0 and 3 (if >3, it is said to be 3). This number is mapped to the half-interval [0,1)
T Rboost
#26The number by which some link factors (namely, factors number 6, 7, 47, 66) are multiplied if the textual relevance is 0 and there are few links
T R L Rlemma
#27In textual relevance, a lemma match occurred.
Remapped mascot feature TrafgraphOutAll_share_d
Relev Sents Dssm
#29Dssm model, trained on reformulations, uses relevant sentences in the document part
The value of the news detector calculated in behemoth. Always 0 when the detector value is less than the threshold.
L R Hit Num100
#31The converted number of query words in all url links.
L R Hit Num Gt16
#32The document LR>20 has the number of occurrences of the query words in the links > 16, factor about LR.
Pct Links
#33For documents with high LR - normalized link relevance without regard to proximity, for documents with low LR 0
Has L R
#34Url high LR.
Link Quality
#35Quality of incoming references (Leschiner's classifier) - broken, see [405]
CosineMatchMaxPrediction factor value for the AliceMusic stream
Num Links
#37Number of incoming links. Remaps.
Popular Q
#38Popularity of the request
T R Unmapped
#39TR divided by the cube of the number of words in the query and converted by the standard remapTR.
Rus Lang
#40The language of the document is Russian.
Add Time
#41Page addition time, more is an older document; put the root of the time mapped to the interval [0,1] so that 3+ years gives 1.
Is Main Page
#42If the main page of the owner (most often a second-level domain, such as xxxx.ru), the factor is 1. For bomzhatniki, hosting, personal blogs, etc. (eg, Lyfjornal, narod.ru, etc.) - third-level domains (such as xxxxx.narod.ru) will also have a factor of 1.
Add Time M P
#43The owner (host?) main page addition time, remaps in the same way as AddTime.
The value of the AnnotationMaxValueWeighted factor for the AliceMusic streamer
How often the URL is clicked on this query - CTR multiplied by the correction factor
Text B M25
#46Simple BM25 by text.
Link B M25
#47Simple BM25 by links, link weights are not taken into account.
T L B M25
#48Simple BM25 by text and links at the same time.
T Lp1
#49All query words are in the text + links.
Adv
#50There are ads on the site.
Yandex Adv
#51There are Yandex ads on the site.
No Spam
#52Spam classifier by anti-spam chips recognized the site as NOT(!) spam. I.e. 0=spam, 1=good.
Txt Pair
#53Simple BM25 by word pairs - we take all pairs of query words and count the number of their occurrences in the text of the document. We use sum of word weights as pair weight. Comm Doesn't work if query has stop word
Lnk Pair
#54Same as TxtPair, but for links; link weights are not taken into account.
Txt Break
#55BM25 from the number of sentences in the document in which it occurs.
Txt Head
#56BM25 by the words in the title only.
Txt Hi Rel
#57BM25 on words only with high rel bits ('significant', with highlighting (<b>, etc.)).
Removed_58
#58Word Count
#59Min(number of query words/10, 1.f)
Inv Word Count
#601 / number_words_in_request.
Has No T R
#61The document does not have a TR.
Has No L R
#62The document does not have LR.
There is no information about clickability for this url for this request 1 - request or request-url is not in the clickbase, 0 - request-url is in the clickbase
For this query there is no information about clickability 1 - the query is not in the clickbase, 0 - the query is in the clickbase.
Hops
#65The number of hops of the url in a roundtrip (like less - closer to the muzzle, the smaller the value (0 - muzzle, 1 - cannot be reached from the muzzle, 0 < can be reached from the muzzle < 1). Normal value for nost root is 0.0039).
Log L R
#66Logarithm from LR, linearly mapped in [0,1].
Txt Pair Ex
#67presence of word pairs in exact form
Txt Break Ex
#68the number of sentences in which there are many words in the exact form
Txt Head Ex
#69the presence of words in the title in the exact form
Txt Hi Rel Ex
#70BM25 in exact form
Txt Bm25 Ex
#71A simple BM25 in precise form.
Txt Pair Sy
#72presence of word pairs with synonyms (>=TxtPair)
Txt Break Sy
#73the number of sentences in which there are many words with synonyms taken into account
Txt Head Sy
#74the presence of words in the title, taking into account synonyms
Txt Hi Rel Sy
#75BM25 including synonyms
Txt Bm25 Sy
#76Simple BM25 with synonyms in mind.
How often the URLs of the given domainId are clicked on the given query - CTR domainId multiplied by the correction factor
For this domainId for this query there is no information about clickability 1 - request or request-owner is not in the clickbase, 0 - request-owner is in the clickbase
Clickability of the owner regardless of the request
Megafon
#80Relative frequency of query words in links (1 - query words often occur in links, 0.3 - rarely); more precisely, the value of this factor is pessimized if: TR=0 && LR=0 && (no links with all query words) && (no quorum) && (at least one pair of query words occurs in the text)
X L Rp0
#81The links have all the words of the query
X L Rp1
#82One link has all the words of the query
X L Rp2
#83There is a link that passed the quorum
X L Rgood
#84What proportion of links are "good"
X L Rmany Bad
#85How many "bad" links (bad = dpr = 0)
X L Rmax Dpr
#86Maximum dpr reference
X L Rtfidf
#87TfIdf is usual TF*IDF by links. The word frequency in the references is multiplied by the inverse document frequency and summed over all words, then normalized to the document length.
X L Rrelev
#88Link relevance by Gulin
X L Rrelev200
#89Link Relevance by Gulin
X L Rlog Relev
#90Link relevance by Gulin
B Fexact
#91There is an exact form of all query words in the text/links
B Flemma
#92There is a lemma of all query words in the text/links
Soft And Ok
#93The document passed softand by the syntax wizard's constraints. Only for documents with textual relevance. For single-word queries it is always 1.
New Link Quality
#94Incoming link quality classifier 2 - broken, see [407]
Ukrainian
#95equals one if the site has a Ukrainian geo-attribute (ie, 1 - Ukrainian site)
Is Blog
#96Blog page
Is Livejournal
#97Page from livejournal.com
Removed_98
#98Spam2
#99Alexeyev's automatic spam classifier, probability that the site is spam (0 not spam, 1-spam)
Text Features
#100Text quality. Calculated according to a rather complicated formula
Text Like
#101Text quality (Alekseev's classifier)
Removed_102
#102Removed_103
#103Ya Bar Core Owner
#104The core audience of owners according to Yandex.Browsing
Ya Bar Core Host
#105Host audience kernel according to Yandex.Browsing
Has Ya Bar Core
#106Does the host have a kernel
Spam Karma
#107Spam name karma of antispammers - probability that the host is spam; based on whois information
Music Q
#108musicality of the request. The results of the work of wizard Anton Konygin.
X L Exact Matches
#109the number of links that exactly match the query
Doc Len
#110Document length in sentences
Url Len
#111URL length divided by 5
The commerciality of the query according to the Direkta phrase dictionary: 0 - maximum commerciality, 1 - minimum commerciality.
Host Size
#113Raskovalov's host size in the documents without taking into account the doubles (each doubling is counted in the factor by an independent document)
Is H T M L
#114Document type - HTML
Link Speed
#115The number inverse of the variance of the times of occurrence of links with the query words
X Th L Rrelev
#116Link relevance with thematicity
X Th L Rrelev200
#117Link relevance with thematicity
X Th L Rlog Relev
#118Link relevance with thematicity
X Lerf L Rrelev
#119Link relevance taking into account the quality of each link
X Lerf L Rrelev200
#120Link relevance taking into account the quality of each link
X Lerf L Rlog Relev
#121Link relevance taking into account the quality of each link
Link relevance, taking into account the quality of each link and the thematicity of each link
Link relevance, taking into account the non-commerciality of each link
Link relevance, taking into account the non-commerciality of each link and thematicity
Link relevance, taking into account the non-commerciality of each link and the quality of each link
Link relevance, taking into account the non-commerciality of each link, the quality of each link and thematicity
Geo City Proxim
#127Means matching the region mentioned in the query and the found sites at the region level. The factor is binary: 1-match, 0-no. Based on ((http://wiki.yandex-team.ru/ЯндексПоиск/КлассификацияСайтовИСтраниц/Географическая/ИспользованиеВПоиске geoclassification of sites))
Percentage of incoming links with query words
Percentage of incoming links with all query words
Porno Query
#130Does the query contain words from yweb/pornofilter/porno.query.
Is Porno
#131Porn Chick document
Is Comm
#132A document from a commercially-available book. Not used (deprecated)
Is Fake
#133fake document
Is S E O
#134The title of the page contains commercial vocabulary. Not used (deprecated)
Is Wiki
#135page from ru.wikipedia.org
Is E Shop
#136commercial page (Savin's classifier)
Geo Region Proxim
#137the document does not contain all query words (to the nearest synonym)
Num Words T R Sy
#139Percentage of query words in the document (to the nearest synonym)
Has All Words T R Sy
#140the document has all the words of the query (accurate to a synonym)
Num Words L R
#141Percentage of query words in links (accurate to synonym)
Has All Words L R
#142the links have all the words of the query (accurate to a synonym)
Pay Detector Predict
#143The value of the commerce detector calculated in the behemoth.
Txt Inv Pair
#144TR by pairs of query words in reverse order
Lnk Inv Pair
#145LR by pairs of query words in reverse order
Txt Skip Pair
#146TR by pairs of query words through one word in texts
Lnk Skip Pair
#147LR by pairs of query words through one word in the texts
Num Words T R Fm
#148percentage of all query words in the text (to the exact form)
Has All Words T R Fm
#149the document has all the words of the query (to the exact form)
Q Diversity
#150Degree of centralization of the points from which the query is set
Q Blog
#151Does the query contain blog language?
X Geo L Rlog Relev
#152log(LR, narrowed by the user's country)
log(LerfLR, narrowed to the user's country)
Non Commercial Query
#154Binary non-commerciality: QueryNonCommerciality > 0.965.
Number of links that match the query text (other remap)
XLerfLRlogRelev (normalized by the sum of the Lerf-weights of all links, not by the sum of their initial weights)
XNonCommLRlogRelev (normalized by the sum of the NonComm weights of all references, not by the sum of their initial weights)
Link relevance, taking into account the non-commerciality of each link and thematicity
XNonCommLerfNormLRlogRelev (normalized to the sum of NonCommLerf-weights of all links, not the sum of their original weights)
Link relevance, taking into account the non-commerciality of each link, the quality of each link and thematicity
Nevasca1
#161Not used Content Duplication. The 'goodness' of a host (0 to 1), calculated based on how many and which hosts borrow content from this one.
Nevasca2
#162Not used Content Duplication. Host 'badness' (0 to 1) - proportional to the number of secondary content on the host.
Link Age
#163Average age of links that contributed something to LR LinkAge=Min(log(average link age)/7, 1), for 1 took 3 years
T Len
#164Page text length in words TLen = Map(number of words, 1/400), where Map(x, y) = x*y / (1 + x*y)
Is Unreachable
#165The page is unreachable via links from the muzzle.
X Lang L Rlog Relev
#166LR with reference and query language matching
LR taking into account the coincidence of the language of the link and the request and the tipping
The ratio of the number of clicks on the given url to all clicks on the request
The ratio of the number of clicks on the given domainId to all clicks on the query
[Bug: Copy Factor 45] How often a given URL is clicked on - CTR multiplied by the correction factor
What part (on average per session) of clicked on this request with user's city added to it is this url. It is counted by user's sessions.
How often a given URL is clicked on for a given query - CTR multiplied by the correction factor, by small regions from relev_regions.web.txt
How often the URLs of a given domainId are clicked on for a given query - CTR domainId multiplied by the correction factor, by small regions from relev_regions.web.txt
the ratio of the number of clicks on the given url to all clicks on the query, by small regions from relev_regions.web.txt
the ratio of the number of clicks on the given domainId to all clicks on the query, by small regions from relev_regions.web.txt
Query URL Clicks Combo, by small regions from relev_regions.web.txt
Query DOwner Clicks Combo, by minor regions from relev_regions.web.txt
X L R Catalog Relev
#178LR by catalog descriptions
LR by unsubscription in Yandex.Catalog
Exact Word Order Len
#180Length of maximum matching forms in text and query
Weight of the maximum form match in the text and query
Word Order Len
#182Length of maximal lemma match in the text and query
Word Order Weight
#183Weight of the maximum lemma match in the text and query
Link Max Age
#184Maximum age of a significant accumulation of links that have contributed something to the LR
T Rp1 All
#185Variants of the relevant factors, taking into account the stop words
L Rp1 All
#186Variants of the relevant factors, taking into account the stop words
T Lp1 All
#187Variants of the relevant factors, taking into account the stop words
B Fexact All
#188Variants of the relevant factors, taking into account the stop words
B Flemma All
#189Variants of the relevant factors, taking into account the stop words
Passage Legacy T R
#190TR best passages - how good a snippet can get
Txt B M25 Atten Syn
#191TR with a discount for the offer number
Max Word Host Rank
#192Host rank by the most expressed query word (usually the name of the site)
Max Word Host Clicks
#193The clickability of domAttr by the maximum word expressed. For example for all queries that have the word wikipedia click on wikipedia.
Dom Phrase Rank
#194HostRank by individual words
Clickability of the domain by words
Is Forum
#196The URL satisfies the FORUM_DETECTOR regularity
AnnotationMatchWeightedValue factor value for the AliceMusic streamer
Is Obsolete
#198There is an ancient date in the URL. Ancient news are recognized. Factor 1 if url has year <=2007.
T R With Stops
#199Weight of the maximum form match in the text and query
L R With Stops
#200Weight of the maximum form match in the text and query
Has Payments
#201The page is about 'paying for SMS'.
Is Link Pessimised
#202Anti-spammers have pessimized the site - all dynamic link factors are zeroed. zerolnk.flt
Eshop Value
#203Shopify the page
Porno Value
#204The pornographic nature of the page
Remapped mascot feature TrafgraphOutAll_share_m
Remapped mascot feature TrafgraphOutAllSE_share_d
Remapped mascot feature TrafgraphOutAllSE_share_m
No Ext Clicks Share
#208Remapped mascot feature NoExtClicksShare
Search engine traffic - conversions from search engines to the site (2nd formula)
Search engine traffic - conversions from search engines to the site (2nd formula)
Dom Phrase Yabar
#211Visits to the site from search engines for individual words, according to the bar
The value of the BclmMixPlainK000001 factor for the AliceMusic stream
Query Url L C S
#213The largest common substring of the url and query, normalized by the length of the url
Only Url
#214All matches are in the URL only, no matches in the text of the page
Three levels of matching user and page geography
Three levels of link and query region matching
Geo Country Proxim
#221Geographic proximity
Is Nav Query
#222Is the query navigable, by clickability of the answers
Max Word Host Ya Bar
#223The most characteristic query word corresponding to the site, according to the bar
Clickability of the host by the first query word. Quite often the first (last) word of the query is an explicit indication of the site where the information should be searched for.
The value of the CMMatchTop5AvgMatch factor for the AliceMusic stream
The average continuous user time (in seconds) on the host pages after clicking on the query from a search engine (the factor depends on the pair (query,domAttr)).
the average continuous time (in seconds) of user's stay on the host's pages after the query from the search engine (the factor depends on the pair (query,domAttr)). According to Yandex.Bar/Elements/Browser internal counter
the average number of active actions (clicks, keystrokes) by users during a user's continuous presence on the host's pages after switching from the search engine (the factor depends on the pair (query,domAttr)). According to Yandex.Bar/Elements/Browser internal counter
Number of unique visitors from search engines for a particular query
the average continuous time (in seconds) a user is on a page after clicking on a query from a search engine (the factor depends on the pair (query,url)).
the average continuous time (in seconds) a user is on a page after he goes from a search engine (the factor depends on the pair (query,url)). According to the internal counter of Yandex.Bar/Elements/Browser
the average number of active actions (clicks, keystrokes) by users on the page after clicking on the query from the search engine (the factor depends on the pair (query,url))
A pool of PRS logs is tagged using Bert trained on sinsig. The dssm model is trained on this pool, using BaseRegionChain
A pool of PRS logs is tagged using Bert trained for relevance. The dssm model is trained on this pool, using BaseRegionChain
PerWordCMMaxMatchMin factor value for the AliceMusic stream
Value of the AttenV1_Bm15_K05 factor for the AliceMusic stream
The value of the AnnotationMaxValueWeighted factor for the AliceMusic streamer
Is Foreign Query
#241The request is not in Russian
Is Foreign Cluster
#242document from a foreign cluster
Page Region Size In
#243Page region size
The factor is inversely proportional to the size of the page region
Query Region Size
#245Request region size
The factor is inversely proportional to the size of the region of the request
Geo Geometry Proxim
#247Geographical proximity of the user and the site
Characterizes the promotion of the site by link rings. The value is the share of external links that are included in the link rings and link exchanges.
Yabar Host Visitors
#249the number of unique visitors, remaps exponentially
Share of traffic from search engines
the share of visits to the site not through links (hand-dialed or bookmarked)
Yabar Host Avg Time
#252the average continuous active user time (in sec) on the host pages
Yabar Host Avg Time2
#253the average continuous user time (in sec) on the host's pages. According to the internal counter of Yandex.Bar/Elements/Browser
The average number of active actions (clicks, keystrokes) by users when a user is continuously on the host pages (in sec).
implementation of the algorithm described in the article ((http://wiki.yandex-team.ru//h.yandex.net/?http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fpeople%2Ftyliu%2Ffp032-liu.pdf http://research.microsoft.com/en-us/people/tyliu/fp032-liu.pdf))
Yabar Url Visits
#256Attendance of the url according to me-bar data
Yabar Url Visitors
#257Number of unique visitors to the url
Yabar Url Avg Time
#258The average time of user's presence on the page. It is counted as the difference between adjacent transitions.
This is SEA factor = s4_r/ (k_r+10) where s4_r - number of clicks > 180 sec, k_r - total number of clicks. It is calculated taking into account the reformulations.
This is SEA factor = s4_r/ (k_r+10) where s4_r - number of clicks > 180 sec, k_r - total number of clicks. It is calculated taking into account the reformulations. Localized version
Url Query Variety
#261The degree of diversity of queries clicked on this url
Is Comm By Keywords
#262The page is commercial by keyword. Not used (deprecated)
Doc Idf Sum_broken
#263Idf by different parts of the document, broken, not used
Title Idf Sum_broken
#264Idf by different parts of the document, broken, not used
Idf by different parts of the document, broken, not used
Idf by different parts of the document, broken, not used
X L R Video Relev
#267The link factor about having a video on the page.
Aux Text B M25
#268BM25 by user region for localizable queries, for non-localizable queries in CUBE - country. Texts of queries sent for regions can be viewed in relev_regions.txt in the wizard
Aux Link B M25
#269Same for link relevance
Share of incoming sales links. An algorithm for recognizing commercial links has been implemented. The factor is remapped to [0,1] if the share of such links > 50%, otherwise 0. ((http://wiki.yandex-team.ru/SvetlanaShorina/topseolinks sample of cheated sites))
The previous factor multiplied by PornoQuery
CommLinksSEOHosts factor multiplied by NonCommercialQuery
Tovar Category Query
#273The query mentions a product category. Not used (deprecated)
The query mentions a vendor. Not used (deprecated)
Diversity2
#275Geographical distribution of the request
Night Query
#276The request is mostly made at night
Morning Query
#277The request is made mostly in the morning
Day Query
#278The request is mostly made during the day
Evening Query
#279The request is mostly made in the evening
Hour Diversity
#280The severity of querying at different times of the day
L Cor
#281Characterizes the frequency of words in links. The factor is large if the word played in the link relevance is rare for links.
Subquery Th Match A
#282Matching thematic spectra of the query and the document. The subject of the query is the result of the work ((http://wiki.yandex-team.ru/EvgenijjKroxalev/subquery SubquerySearch wizard rules)) The topic of the document is determined by the automatic classifier
T R Doc Quorum
#283Weight of query words that are in the text
L R Doc Quorum
#284Weight of query words that are in the links
T R L R Doc Quorum
#285Weight of query words that are in the text and links
Entropy - click distribution
Entropy - distribution of displays
Entropy - distribution of clicks/shows ratio
X Porno L Rlog Relev
#289Document porn on the text of the link
Document porn on the text of the link, a different rationing
X Porno Query
#291PornoQuery classifier, a different dictionary than PornoQuery
Value of the AttenV1_Bm15_K05 factor for the AliceMusic stream
Geographical proximity of the country of the site and the country of the request
Url Domain Fraction
#294Covering the domain with three letters from the query. (Chelyabinsk lottery - chelloto. Translate the query into transliteration, find the three letters that are covered (che, hel, lot, olo), see what proportion of all three letters are covered.)
Same as the previous factor, but about the whole url except the domain
Specifical Query
#296The query is locale-specific. The query is often reformulated with an explicit region assignment. ((https://ml.yandex-team.ru/archive/thread1433892/#message1433892 more info))
Joker Len
#297We count text features, assuming that the page's title is assigned to each of its sentences, i.e. the distance between a word from the title and any other word is 1 sentence. Len - maximum ratio of words from the query found in some sentence of the text (with the assigned title) in relation to the length of the query. Пример [Хармс цирк Вертунов] для ((http://wiki.yandex-team.ru//h.yandex.net/?http%3A%2F%2Fwww.wikilivres.info%2Fwiki%2F%25D0%25A6%25D0%25B8%25D1%2580%25D0%25BA_%25D0%25A8%25D0%25B0%25D1%2580%25D0%25B4%25D0%25B0%25D0%25BC_%28%25D0%25A5%25D0%25B0%25D1%2580%25D0%25BC%25D1%2581%29 этого документа))
Joker Weight
#298The ratio of the sum of the idf of the encountered words in the sentence+title to all words.
Exact Joker Len
#299The same as JokerLen, on the exact forms
Exact Joker Weight
#300Same as JokerWeight, by exact shapes
Remapped mascot feature More120SecVisitsNotSearchShare
Lnk Break
#302Analogs to the corresponding text factors for links. BM25 from the number of links in which there was a match.
Lnk Bm25 Ex
#303Simple BM25 on the exact form in the reference texts
Lnk Pair Sy
#304The presence of word pairs in links, taking into account synonyms
Lnk Brk Sy
#305Number of links that passed the threshold
Lnk Bm25 Sy
#306Simple BM25 by links with synonyms
Video Query
#307Video request
Clickability of the owner regardless of the request, separately by region
Entropy - click distribution. Regionalized
Entropy is the distribution of displays. Regionalized
Entropy - distribution of clicks/shows ratio. Regionalized
Adultness
#312equals 2 * NastyContent
Host Adultness
#313equals 2 * NastyContent
K C Host Adultness
#314always zero
Is Com
#315.com domain
Is Ua
#316Domain in the zone .ua
Is Not Ru
#317The domain is not in the .ru zone
X L R Market Relev
#318LR by links from Yandex.Market
Poetry
#319The poetry of the document
Poetry Quad
#320Maximum poetry of the quatrain
Eng Lang
#321The language of the document is English
The query is completely covered by two exact groups consisting of exact match words of the query in a row ((http://wiki.yandex-team.ru/poiskovajaplatforma/tr/CoverageByGroups Progroup coverage))
There is a group consisting of exact match words of the query, covering the query (possibly with an omission, addition or substitution of a word)
The fraction of the query covered by the longest group consisting of any hits (including word forms and synonyms). Possibly with omission, addition or substitution of a word
Characterizes the proximity of time profiles of the request and documents on business days
Characterizes the proximity of time profiles of the request and documents on weekends
Cyr Lang
#327The language of the document is Cyrillic
Geo Regionality U
#328Query factors - the result of ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/GeoRegionality query geolocalization classifier))U- geo-relevant - regional output by query is meaningless
Geo Regionality R
#329R- georelevant - regional results in the output could be useful, but no more than that
Geo Regionality V
#330V- geovital - regional issuance is fundamental
Url Has No Digits
#331There are no numbers in the url
The value of the AllWcmMaxMatch factor for the AliceMusic stream
CosineMatchMaxPrediction factor value for the AliceMusic stream
Syn S1
#334Indicates how unnatural the text is from the point of view of the Russian language. Evaluate how much of the document text can be considered synonymizer-generated or automatic at all. ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/antispam?v=1il#h58953-2 more details))
Syn F Lremap1
#335Indicates how unnatural the text is from the point of view of the Russian language. Evaluate how much of the document text can be considered synonymizer-generated or automatic at all. ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/antispam?v=1il#h58953-2 more details))
Syn F Lremap2
#336Indicates how unnatural the text is from the point of view of the Russian language. Evaluate how much of the document text can be considered synonymizer-generated or automatic at all. ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/antispam?v=1il#h58953-2 more details))
nd/k normalized time to click
nd/i
nd/k
w/k
o/i
selected formula
r_s4b/(r_k + 10)
Synt Quality
#344Does the query have full parsing
Page Date
#345The date of the document, which is written on the page, is remapped by the square root
Visits P Visitors
#346Remapped mascot feature VisitsPVisitors
Additional factors about the promotion of the site link rings , ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/antispam?v=181r#h58953-4 more info))
Additional factors about the promotion of the site link rings , ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/antispam?v=181r#h58953-4 more info))
Additional factors about the promotion of the site link rings , ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/antispam?v=181r#h58953-4 more info))
Has Text Pos
#350The document has textual relevance
Q Segments B M25
#351BM25, where the 'words' are the highlighted query segments
Q Segments Weight
#352The 'weight' of the query segments in the text
Indicator of the unnaturalness of the text from the point of view of the Russian language. The number of bad word pairs in the text, renormalized in the interval [0,1] by the formula z/(z+10)
Proportion of bad pairs among all pairs found in the table: z/(x+1), where z is the number of bad pairs in the text, and x is the number ((http://wiki.yandex-team.ru/EvgenijjGrechnikov/TestSynonimizers 2000-relevant)) of pairs
Num Latin Letters
#355the number of Latin letters in the text (not counting the markup), cornered in [0,1] by the formula n/(n+100)
Additional factors about the promotion of the site link rings , ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/antispam?v=181r#h58953-4 more info))
Doc Idf Sum Fixed
#357Previous factors - corrected
Title Idf Sum Fixed
#358Previous factors - corrected
Previous factors - corrected
Previous factors - corrected
factor, cleverly combined from FRC and pseudo-CTR
factor, cleverly combined from FRC and pseudo-CTR
L R Amortized By Age
#363Link relevance with pessimization for link age
Rus Words In Text
#364The number of words in the text (the Word is what the lemmer highlighted), is mapped to [0,1] by the formula x/(x+A)
Rus Words In Title
#365Number of Russian words in the title
Mean Word Length
#366Average word length
Percentage of words inside the <a>...</a> tag of all words
Percentage of words outside the tags (outside the <> brackets) of all words
Percent Freq Words
#369Percentage of words that are the 200 most frequent words in the language from the number of all words in the text
Number of the 500 most popular language words used in the text, divided by 500
Trigrams Prob
#371The logarithm of the geometric mean probability of trigrams in the text. (the probability of a trigram is the number of its occurrences in the text divided by the number of all trigrams) , displayed in [0,1] by the formula -x(x+A)
Trigrams Cond Prob
#372The logarithm of the geometric mean of the conditional probabilities of trigrams. the conditional probability of a trigram is its probability divided by the probability of the bigram of the first two words
Dopp D Owner P C T R
#373An analogue of the QueryDOwnerClicksPCTR factor, differs from it in that queries are normalized by doppelgangers (details of such normalization are at ((http://staff.yandex-team.ru/finder by Andrei Plakhov)), code -ysite/yandex/doppelgangers)
An analogue of QueryDOwnerClicksPCTR factor, differs from it in that queries are normalized by doppelgangers (details of such normalization are in ((http://staff.yandex-team.ru/finder Andrei Plakhov)), code -ysite/yandex/doppelgangers). Localized to relev_regions.web.txt
Dopp Url P C T R
#375An analogue of the QueryUrlClicksPCTR factor, differs from it in that queries are normalized by doppelgangers (details of such normalization are at ((http://staff.yandex-team.ru/finder by Andrei Plakhov)), code - ysite/yandex/doppelgangers)
An analogue of QueryUrlClicksPCTR factor, differs from it in that queries are normalized by doppelgangers (details of such normalization are in ((http://staff.yandex-team.ru/finder Andrei Plakhov)), code - ysite/yandex/doppelgangers). Localized to relev_regions.web.txt
Url B M25
#377BM25 by URL
Has Big Picture
#378There is a big picture on the page
Matrix Net
#379A MatrixNet formula is applied to all factors (TG_UNUSED - to prevent entering any formulas)
Dater Age
#380The difference between the current date and the date of the document defined by DaterAge, 1 - document date is current, 0 - document is 10 years old or more, If no date is defined, equals 0. Attention!((1 - DaterAge)*60)^2 = page age in days.
hard pessimization (aka PR=0), binary factor, counts in anti-spam
C In Degree1
#382Host factors, determine link-stuffed sites - second and third inbound degrees ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/antispam?v=181rh58953-4#cindegree12 more info))
C In Degree2
#383Host factors, determine link-stuffed sites - second and third inbound degrees ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/antispam?v=181rh58953-4#cindegree12 more info))
Number of incoming links without Russian letters. Remapsed.
Text Max Forms
#385Maximum number of forms for all query words - max for all query words number_form_for_word/64
Text Weighted Forms
#386Weighted by word weights, the sum of the number of forms is the sum over all words of the query number_form_for_word/64*word_weight; remap of the form x/(1 + x).
Text Forms
#387Unweighted sum of number of forms - sum over all query words number_form_for_word/64/number_words_query
Link Max Forms
#388Maximum number of forms for all query words
Link Weighted Forms
#389Weighted by word weights, the sum of the number of forms
Link Forms
#390Unweighted sum of the number of forms
T R_ W1
#391Analogs of factors of the same name, word weight = 1
X L R_ W1
#392Analogs of factors of the same name, word weight = 1
Text B M25_ Fm_ W1
#393Analogs of factors of the same name, word weight = 1
Text B M25_ Sy_ W1
#394Analogs of factors of the same name, word weight = 1
Link B M25_ W1
#395Analogs of factors of the same name, word weight = 1
T L B M25_ W1
#396Analogs of factors of the same name, word weight = 1
Q Segments Breaks
#397Query segments are parts of a query that are themselves frequent queries. The factor shows how much the segments break in the text. value 0 - all words occur only within the designated segments, 1 -- all occurrences break segments
The value of the CMMatchTop5AvgMatch factor for the AliceMusic stream
Numerals Portion
#399Proportion of different parts of speech in the text. proportion of numerals (among all words in which we were able to recognize the part of speech)
Particles Portion
#400particle fraction
Adj Pronouns Portion
#401proportion of pronoun adjectives
Adv Pronouns Portion
#402proportion of pronouns
Verbs Portion
#403verb proportion
the proportion of words that can be both masculine and feminine nouns, but not neuter, among all nouns (examples: 'hummingbird' is an example of indefinite gender, which can be defined in two ways, 'Alexandra' is a homonym).
Link Quality Fixed
#405Quality of incoming references (Leschiner's classifier) corrected
Whether or not LinkQuality was counted for this page (not counted if there are few links) corrected
Incoming link quality classifier 2 corrected
Is Org
#408In the query, the name of the organization (example: Gazprom, Gazprom) ((http://wiki.yandex-team.ru/ArsenGadzhikurbanov/Wares Description))
CMMatchTop5AvgMatchValue factor value for the AliceMusic stream
Longest Text
#410The size of the largest text segment of the page (from the [18] PureText factor)
Smart Ukrainian
#411Smart Belorussian
#412L R Without Rare
#413link relevance without regard to rare words
Number of different internal links per page
A city is defined for the site
Query factors - result ((http://wiki.yandex-team.ru/PoiskovajaPlatforma/Lingvistika/ZaprosnyjeFactory/LocalizovannyjeZaprosy query geolocalization classifier)) - new version of factors [328]-[330]: U - geo-relevant - regional query output is meaningless;
Query factors - result ((http://wiki.yandex-team.ru/PoiskovajaPlatforma/Lingvistika/ZaprosnyjeFactory/LocalizovannyjeZaprosy query geolocalization classifier)) - new version of factors [328]-[330]: R - georelevant - regional results in the output could be useful, but no more than that;
Query factors - result ((http://wiki.yandex-team.ru/PoiskovajaPlatforma/Lingvistika/ZaprosnyjeFactory/LocalizovannyjeZaprosy query geolocalization classifier)) - new version of factors [328]-[330]: V - geolocal - regional issuance is fundamental.
PerWordCMMaxPredictionMin factor value for the AliceMusic stream
Ukrain Page Rank
#420Ukrainian Page rank
Q Class Download
#421=1 - on Download formula. Class queries: download/view online/play/photo/listen
Q Class Brandnames
#422Query classifier result - the query has words from the appropriate dictionary. brand
Q Class Disease
#423medical dictionary
Q Class Kak
#424Question
Q Class Moscow
#425a request specific to Moscow
Q Class O A O
#426organization
Q Class Porno
#427porn
Q Class Travel
#428travels
Video Rating
#429The popularity of video, comes from video
Frequency of links to the site
Link Almost Period
#431Number of almost-periodic references
Q D Owner Stat Power
#432The number of impressions by request, normalized x/(100 + x).
Q Url Stat Power
#433The number of impressions of the url on the request, normalized x/(100 + x).
Has Li Ru Counter
#434LiveInternet counter
Popularity of the owner in queries
DSSM model with early binding, trained on reformulations, and pre-trained on ASR hypotheses of musical queries to Alice
Model trained on PRS-log pool on Bert's prediction trained on sinsig_ce with a threshold of 0.5, using a chain of regions to country
DSSM model with early binding, trained on reformulations and retrained on music requests to Alice
Eleven factors based on statistical properties of the distributions of incoming vertex degrees referring to a fixed vertex of the hostgraph.((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/hostdegree details))
The value of the pirate detector calculated in behemoth.
Yandex music canonized url type - album
Calculated as (10-x) where x is the return of the document in days (continuous) relative to the validity time of the document in the samovar
Host In Query
#443Document host recognized in query
Vital Host In Query
#444The URL consists only of the host that is recognized in the request
URL is a Yandex news story
URL feature computed from rapid clicks spy_log counters with decay of 1 day
URL feature computed from rapid clicks spy_log counters with decay of 1 day
URL feature computed from rapid clicks spy_log counters with decay of 0.5 days
URL feature computed from rapid clicks spy_log counters with decay of 0.5 day
Timestamp
#450They are calculated as (80 - x) / 80, where x is the age of the document in hours. The factors make sense only for the quickbot base (the last 80 hours). They are not used in ranking. They are used in reranking.
Add Time Full
#451They are calculated as (80 - x) / 80, where x is the age of the document in hours. The factors make sense only for the quickbot base (the last 80 hours). They are not used in ranking. They are used in reranking.
Swbm25
#452The clever BM25 in a sliding window. The window size is set in sentences. Use "jokers" for titles and the beginning of the document. Morphological proximity and text structure are taken into account. The weight of the window fades with distance from the beginning of the document.
Factor about how good a snippet can get.
Txt Pair_ W1
#454Simple BM25 by word pairs - take all pairs of query words and count the number of their occurrences in the text of the document. Weight =1. Comm Doesn't work if query has stop word
Aura Doc Log Shared
#455The logarithm of the number of shingles on which a given document is not unique
Aura Doc Log Author
#456The logarithm of the number of shingles on which a given document owner is recognized as an author
Average weight of non-unique shingles of this document
Mascot feature MarketQualityRating
Medical host quality for new marks.
Medical host quality for new marks for experiments.
Fin Law Host Quality
#461Finance or law host quality for new marks.
Finance or law host quality for new marks for experiments.
Sos Host Quality
#463Finance or law host quality for new marks.
Finance or law host quality for new marks for experiments.
Factor for host in list of documentation cs hosts for experiments
Remved_466
#466Reg Host Rank
#467It is calculated in the same way as HostRank factor, but not on the whole owner-graph, but on its subgraph consisting of owners of the given region. Region belonging is determined by TLD, or by presence in index of pages from given owner which geo or geoa classifier says that they are from given region. Mapped in the same way as the HostRank factor, to a number from 0 to 1 with 256 gradations
Reg Is Wiki
#468Document from the language section of wikipedia corresponding to the user region
Language Compliance
#469The language of the document corresponds to the language of the request
Country Popular Q
#470Popularity of the request within the country
Country Q Diversity
#471Degree of centralization of the points from which the request is made (within the country)
Country Q Diversity2
#472Geographical distribution of the request within the country
Country Hour
#473The hour in which this request is most frequently asked
The severity of querying at different times of the day (within the country)
Removed_475
#475National Domain
#476The country of the document (domain) and the country of the user are the same ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/OpisanijaFaktorov#nationaldomain details))
Is Porno Advert
#477There's a porn ad on the page
URL feature computed from rapid clicks spy_log counters with decay of 3 days
Country localizability classifier - how much the query implies the country context
Num Slashes
#480Number of slashes in the url
BM25 with different parameters for different fields, including incoming anchortext. The text weights of incoming links to the page are normalized according to the delta page rank of the link
Watch Video
#482Built-in video player on the page
Download Video
#483Video for download
URL feature computed from rapid clicks spy_log counters with decay of 3 days
URL feature computed from rapid clicks spy_log counters with decay of 14 days
Sub Relevance
#486A service factor that was needed to search the site, and will still be needed in the future.
Gsk Url Model
#487The factor is calculated from the text of the url using the quality/seq/gsk sequence classifier
Url Trigrams
#488Model with learning each trigram on '+' and '-' urls. It does not depend on the query.
URL feature computed from rapid clicks spy_log counters with decay of 14 days
Rc Spylog Age
#490Age of rapid clicks spy_log update, in seconds
Rc Spylog Freshness
#491Freshness of rapid clicks spy_log update
Ymw Full
#492The size of the minimum chunk of text that includes all of the query words in the document. Not currently used. ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/YMW more info))
Bclm
#493Buettcher, Clarke, and Lushman Name Factor (modified) ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/BCLm more info))
A measure of the 'commerciality' of a query. It is a complex factor calculated by MatrixNet using a formula based on the purchasing dictionary in directe-mail + logs of user requests + additional intent dictionaries. Requests with the intent to buy factor tends to ->1 product queries ->0.6 with the intent not to buy, reviews, etc. -> 0 ((http://wiki.yandex-team.ru/AntonNeljubin/FaktorydljaNovogoKlassifikatorazaprosov factors classifier))((http://wiki.yandex-team.ru/JandeksPoisk/Antispam/AntiSEO/KlassifikatorKommercheskixZaprosov more about him))
Field L M
#495Unigram linguistic model. The language model is modeled by document, smoothed by the general language model. When building a model by document, information about what field of the document the query word occurred in (Title, head, or plain text) is used
Matching geography defined from document url and query city (ip or lr)
Matching geography defined from document url and query area (ip or lr)
Coincidence of geography, defined from document url and country of request (ip or lr). Relevant for Russia and Ukraine.
Match the geography defined from the document url and the city in the query (GeoCity rule)
The value of the forked commerce detector calculated in behemoth.
Title Trigrams Query
#501Calculates the query coverage by the alphabetic trigrams of the document header
Title Trigrams Title
#502Calculates the header coverage by the alphabetic trigrams of the document title
Inlinks Model
#503Probabilistic model based on the texts of incoming links
Counts the sum of occurrences of the following: a sequence of query words longer than two occurring in one sentence; normalized to the length of the document.
Counts the sum of occurrences of the following form: a sequence of query words longer than two, occurring in one link; normalized to the number of links.
Owner Nav Quota
#506Share of clicks on navigation requests
Geo Relev Alien City
#507The result has a geo-reference that does not match the user's geography at the city level ([415]==1 && [215]==0)
Geovitability of the query for results from the user's region
Geovitability of the query for results not from the user's region
Host Reliability
#510the percentage of URLs that respond without errors
Dmoz Theme Match All
#511Matching thematic spectrum (by DMOZ) of the query and the document. The subject of the query is determined by ((http://wiki.yandex-team.ru/JandeksPoisk/ZarubezhnyjjInternet/DMOZqueryClassifier1 DMOZTheme wizard rule)) Document subject is determined by the automatic classifier
Matching thematic spectrum (by DMOZ) of the query and the document. Subject of the query is determined by the best result ((http://wiki.yandex-team.ru/JandeksPoisk/ZarubezhnyjjInternet/DMOZqueryClassifier1 DMOZTheme wizard rules)) Document subject is determined by automatic classifier
Mpsa
#513Estimates the minimum distance between pairs of query words, taking into account the distance of the pair from the beginning of the document (Minimal Pair Size with Attenuation). By pairs we mean all consecutive bigrams of query words. Thus, the number of pairs is equal to the number of words in the query reduced by 1. Accordingly, the factor makes sense for queries consisting of more than one word.((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/MPSA MPSA))
Bclm2
#514It differs from BCLm in that the weights of all words are counted equally. ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/BCLm2 BCLm2))
Absolute P L M
#515Text relevance based on the language model, taking into account the absolute position. We go through the text with a box of 20 words, build for each box a language model (that is, probability distribution on the words of the Russian language) and calculate the probability of generating a query. For the distance from the beginning of the document penalize the model.
Page Region Coverage
#516Page Region Size
#517Page region size
Freshness of rapid clicks spy_log update, calculated at the request time
Is Geo
#520Releases to base searches under the isgeo name the maximum weight of the encountered geo-object in the query. By geo-object we understand an object of category Geo, Geo1, GeoAddr, GeoAddr1, LandMark, LandMark1 (see ((http://wiki.yandex-team.ru/AlekseySokirko/QueryObjects som's markup)).((http://wiki.yandex-team.ru/ArsenGadzhikurbanov/Wares Details))
Is Music
#521Drops the maximum weight of the encountered object of category Music or Music1 in the query to the base searches under the name ismusic. (see ((http://wiki.yandex-team.ru/AlekseySokirko/QueryObjects som's markup)).((http://wiki.yandex-team.ru/ArsenGadzhikurbanov/Wares Details))
Bclm Lite
#522A modification of the Bclm2 factor, lightened for use in Fastranck. The main difference is that BclmLite does not use absolute word offsets relative to the beginning of the document. Instead the factor works with regular positions of the form <Number_offer, Position_in_offer>. The proximity between words is only taken into account within a sentence.((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/BCLmLite BCLmLite))
Nearby Query
#523Results in the immediate vicinity ([pharmacies], [children's polyclinic]) are important when answering the query
City Query
#524When answering a query, the results within the city are important (the bulk of localizable queries)
Adm Query
#525When answering a query, the results from the user's area, region ([airport], [dairy]) are important
Num Links From M P
#526Number of incoming links from mordas
Ymw Full2
#527Corrected YmwFull. Only differs from previous version by behavior on 2-word queries. ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/YMW more info))
Full Quorum
#528Binary factor, every word of the query is in the text or in the links
Aux C Text B M25
#529uses 'country aux tree' (auxqc)
Aux C Link B M25
#530uses 'country aux tree' (auxqc)
Soft404
#531Page - '404' (share of '404' tokens in relation to the total number of tokens on the page)
URL feature computed at the request time from rapid clicks spy_log counters with decay of 1 day
D B M25
#533BM25, in which the weight of the word is machine-like
Factor evaluates how query words are grouped with each other in the text of the document without regard to their order. ((http://wiki.yandex-team.ru/SergejjKrylov/QueryWordCohesionTR description))
nd/k normalized time to click
URL feature computed at the request time from rapid clicks spy_log counters with decay of 0.5 days
nd/k
w/k
o/i
selected formula
r_s4b/(r_k + 10)
Number of letters in the Aux segment
Number of gaps in the Aux segment
Number of commas in the Content segment
Is Shop
#545The page is the store. ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/OpisanijaFaktorov#isshop description)). Not used (deprecated)
Aura Doc Log Origin
#547The logarithm of the number of shingles in the document added by the site host as original texts in ((http://wiki.yandex-team.ru/JandeksPoisk/Jekosistema/MarketingPR/Webmasters/plan/vtorcontect Originality plugin)). It doesn't take part in the formula, it's needed for re-ranking of doubles
Average filtered number of sources of document authorship. Not included in the formula, needed for re-ranking of duplicates
((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/OpisanijaFaktorov#queryreftrigrams description))
((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/OpisanijaFaktorov#queryreftrigrams description))
Idf Variance
#551IDF variance of query words if there are text hits in the document (mixed query-text factor)
Url N Grams Model
#552UrlNGramsModel ranking factor in erf
National Language
#553The language of the document corresponds to the country of the request
Owner Is Commercial
#554Locm
#558Word order in references.
The degree of diversity of queries clicked by this url is counted by region
nd/i
Filtration Segments
#561Proportion of query segments present in the text
The language of the document is one of the allowed for Turkey (Turkish, English, German, French, Arabic, Azeri) or the document has zero length. At the search stage it is calculated only for IsRealGeoLocal queries.
D B M25_2
#563A variation on the theme ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/DBM25 DBM25)), see ysite/yandex/relevance/dbm25.cpp
Geo Dispersion
#564Dispersion of document reference regions
Number of clicks on the owner and the number of clicks on the request more than 5
B M25 Fd P R Fixed
#566BM25FdPR with normalization to the average document length depending on the document language. ((http://wiki.yandex-team.ru/BM25FRework Test Results.))
Language Popularity
#567The popularity of the document language. A number from 0 to 1. ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/LanguagePopularity LanguagePopularity))
The sum of factors QueryDOwnerClicksFRC and BM25FdPRFixed with weights 0.358449 and 0.184922 respectively. The '565' in the factor name should not be taken literally, it is either a legacy or a typo.
The sum of factors 192 and 341 with weights of 0.298942 and 0.454625, respectively.
URL feature computed at the request time from rapid clicks spy_log counters with decay of 3 days
URL feature computed at the request time from rapid clicks spy_log counters with decay of 14 days
Tocm
#572Factor evaluates the difference between the positions of words in the header and the positions of words in the query
Lang Dispersion
#574Dispersion of languages in xmap
Has Misspell
#575There is a typo in the query
D B M30 Smerch
#576A variation on the theme ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/DBM25 DBM25)), see ysite/yandex/relevance/dbm25.cpp
The url is known to show up too often with very low relevance (by bert and/or by bm25)
Url Link Percent
#578The ratio of the number of incoming links whose text is a URL to the number of all incoming links
Dssm Bert Distill L2
#579A pool of PRS logs is tagged using Bert trained on sinsig. The dssm model is trained on this pool, using BaseRegionChain
The number of 'notbooks' in the url
Url Len2
#581URL length to within a character. Disabled in production.
Is Hub
#582Haboost of the page
Static Title Comm
#583Degree of commerciality of the page header. Not used (deprecated)
BM25 of the page title by its text
BM25 page title by the text of the links to it
Seo In Pay Links
#586Number of incoming seo-trash links between hosts
Static URL factor by search sessions for 1600 days calculated by mobile sessions. Average DwellTime, and DwellTime from session is truncated if more than 180 seconds
Static URL factor by search sessions for 1600 days calculated by mobile sessions. Probability that the click on the URL will be more than 120 seconds
Static URL factor by search sessions for 1600 days calculated by mobile sessions. The probability that the URL will not be clicked if at least one URL is clicked is lower.
Static URL factor by search sessions for 1600 days calculated by mobile sessions. Average DwellTime, and DwellTime from session is truncated if more than 3600 seconds. Localization to country level.
Static URL factor by search sessions for 1600 days calculated by mobile sessions. Average DwellTime, and DwellTime from session is truncated if more than 180 seconds. Localization to country level.
Hp Detector Predict
#592The value of the health detector calculated in behemoth.
Is Feed Listing
#593OffersBase feature for ecoboost.
Is Feed Main
#594OffersBase feature for ecoboost.
Is Feed Stratocaster
#595OffersBase feature for ecoboost.
Is Feed Any
#596OffersBase feature for ecoboost.
Share of unique title trigrams in the link trigrams
Share of unique link trigrams in the title trigrams
Trash Adv
#599The publicity of the page
Metrika Url Visits
#600Similar to YabarUrlVisits
Url Geo Adms
#601The document URL corresponds to the user's region(s) ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/geo/RegNavQueries /JandeksPoisk/KachestvoPoiska/geo/RegNavQueries))
Url Geo City
#602The URL of the document corresponds to the user's city
Reg Nav Query
#603Regional navigational query - there are one or more navigational results in the user's region
Yabar Url Lc Ac
#604Number of sessions in which the url was the last, divided by the number of sessions in which the url appeared
The sum of the maximum SourceRank values for each incoming link, taking into account the uniqueness of the owner.
D B M35
#606BM25 by texts and links with special scales by level of matching (form, lemma, synonym)
T R L R Quorum Fm
#607Weight of query words that are in the text in exact form
T R L R Quorum Lemma
#608Weight of query words that are in the text with exact lemma
T R L R Quorum Syn
#609Weight of query words that are in the text
Is Hum
#610Drops the maximum weight of the encountered object of category Hum or Hum1 in the query to base searches under the name ishum. (See ((http://wiki.yandex-team.ru/AlekseySokirko/QueryObjects soma markup)).((http://wiki.yandex-team.ru/ArsenGadzhikurbanov/Wares#ishum Details))
Is Text
#611Drops the maximum weight of the encountered object of category Text or Text1 in the query to the base searches under the name istext. (See ((http://wiki.yandex-team.ru/AlekseySokirko/QueryObjects soma markup)).((http://wiki.yandex-team.ru/ArsenGadzhikurbanov/Wares#istext Details))
Is Picture
#612Drops the maximum weight of the encountered Picture or Picture1 category object in the query to the base searches under the name ispicture. (See ((http://wiki.yandex-team.ru/AlekseySokirko/QueryObjects som's markup)).((http://wiki.yandex-team.ru/ArsenGadzhikurbanov/Wares#ispicture Details))
Max One
#613Returns under the name wmaxone the maximum degree of naming of the encountered objects in the query. (see ((http://wiki.yandex-team.ru/AlekseySokirko/QueryObjects som's markup)).((http://wiki.yandex-team.ru/ArsenGadzhikurbanov/Wares#maxone More))
Min One
#614Returns, under wminone, the maximum degree of naming of the encountered objects in the query. (see ((http://wiki.yandex-team.ru/AlekseySokirko/QueryObjects som's markup)).((http://wiki.yandex-team.ru/ArsenGadzhikurbanov/Wares#minone More))
Oq Bm25 Str
#615Bm25 by query index for domAttr
Oq Bm25 Lem
#616Bm25 by query index for domAttr
Oq Bm25 Syn
#617Bm25 by query index for domAttr
Oq Bclm Weighted
#618BCLM by query index for domAttr
Oq Bclm Plain
#619BCLM by query index for owners
Links Alive
#620Allows you to assess whether a document is 'live' in terms of references to it coming in.
Small Window
#621Maximum sum of query word weights in a window of 50 words
Metrika Url Visitors
#622Similar to YabarUrlVisitors
Metrika Url Avg Time
#623Similar to YabarUrlAvgTime
The core audience of pages that have a Metrics counter
Share of clicks on this url among all clicks on similar requests
Regex Ctr
#626corrected CTR of this url for all similar queries
Clickability of the domain by bigrams (without taking into account thesaurus query extensions)
Dom Phrase Yabar Bi
#628Visits to the site from search engines by bigrams, according to Bar (without taking into account thesaurus query extensions)
Clickability of the host for the last word of the query (without taking into account thesaurus query extensions)
Host Has Feed Urls
#630OffersBase feature for ecoboost.
Is Feed Offer
#631OffersBase feature for ecoboost.
Host Ecom Kernel1
#632Business kernel.
Host Ecom Kernel2
#633Business kernel.
Host Ecom Kernel3
#634Business kernel.
URL feature computed at the request time from rapid clicks search counters with decay of 1 day
Syn Set Locm
#636A copy of the ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/Locm LOCM)) factor for ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/Synset synsets)).
Syn Set Link B M25
#637Copy of LinkBM25 factor for ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/Synset synsets)).
URL feature computed at the request time from rapid clicks search counters with decay of 30 days
Removed_639
#639The most probable topic of the query defined ((http://wiki.yandex-team.ru/JandeksPoisk/ZarubezhnyjjInternet/DMOZqueryClassifier1 DMOZTheme wizard rule)), only the most popular topics are taken into account (but there are more than in the DmozQueryThemes factor). The factor contains probability of matching of the query to the theme, but for each theme is taken a different interval on the interval [0...1].
Dmoz Query Themes
#641Query topic defined ((http://wiki.yandex-team.ru/JandeksPoisk/ZarubezhnyjjInternet/DMOZqueryClassifier1 DMOZTheme sorcerer rule)), only a few of the most popular topics are taken into account.
0 or 1 depending on the presence of an explicit need_photo intent in the request from the variety
0 or 1 depending on whether the query has an explicit need_map intent of the variety
Long Query Syn
#644Factor is analogous to LongQuery (sum of query word idf), but with 'correct' synonyms. Specifically, the minimum of idf (i.e. the most frequent) of synonyms and words is selected.
The url contains a token that matches the short name of the user's country. The factor counts only on the EU thread.
Turkey Page Rank
#646Personalized Turkish PageRank
Expected Found
#647Expected number of searches on the query
Share of unique trigrams of the footer fragment in the link trigrams
Share of unique link trigrams among the fragment of footer trigrams
Binary logarithm of the query probability by the erratum service language model
Url Is Market Offer
#651Url is an offerer in the latest version of the marketplace base.
D B M40
#652A variation on the theme ((http://wiki.yandex-team.ru/JandeksPoisk/KachestvoPoiska/ObshayaFormula/TekushhieKomponenty/DBM25 DBM25)), see ysite/yandex/relevance/dbm25.cpp
Removed_653
#653B M25_0
#654BM25 variation
B M25_1
#655BM25 variation
B M25_0123
#656BM25 variation
'Fixed' clicks counted with RequestAggregateLib
'Fixed' clicks counted with RequestAggregateLib. Regional Version
Regional Attendance of the url according to me-bar data
Average time of user's stay on the host in case of external (from another nonsearch site) access from a particular URL
Average 'depth' (number of hits within a host) of user's stay on the host during external (from another nonsearching site) visits from a particular URL
D B M Numbers
#662DBM separately by number
D B M Geo
#663DBM separately by geo-request objects
D B M Substantive
#664DBM separately for nouns
Avg Session Len
#665Average length of the logical session in which there was a request
Bclm (weighted) by lyrics from hops.
Yabar Url Downloads
#667bounce rate
Bocm
#668Assesses whether the positions of words in the document sentences match the positions of words in the query.
Host User Leakage
#669The churn rate of users from search after a visit to the site
Fio Match
#670The document contains the name from the request.
Is Index Page
#671This is index.(html/php/aspx?/...), without cgi parameters. It counts for all doublers.
Is Index Page Soft
#672This is index.(html/php/aspx?/...), possibly with cgi parameters. It counts for all doublers.
Is Owner
#673Whether the host is its own owner, conditionally Host == Owner(Host).
Min Path Len
#674Minimum PathAndQuery length over all half-doubles.
Regionalized version of XLerfGeoLRlogRelev factor (only links from the country of the request are taken)
Regionalized (only links from request country are taken) variant of XNonCommLerfNormLRlogRelev factor
Locm Cnt
#677Regionalized (only links from the country of the request are taken) variant of Locm factor
X L Rrelev Cnt
#678Regionalized (only links from the country of request are taken) variant of XLRrelev factor
Regionalized (only links from the country of request are taken) variant of XLerfLRrelev200 factor
Nav Linear
#680((http://wiki.yandex-team.ru/JandeksPoisk/Antispam/polunavigacionnyezaprosy#faktornavigacionnostiparyurl-zapros classifier)) pairs vitals [query-url], url vitals for query if value on it >0.5
Rank Com Goodness
#681Classifier for commercial site evaluations
There is a direct link to the file on the document
The document has a link to filehosting
0 or 1 - whether the request matches the regulars from the ticket
0 or 1 - whether the request matches the regulars from the ticket
0 or 1 - whether the request matches the regulars from the ticket
Qr Tur
#687Predicting the proportion of "good" (at least with two different cities and frequency>=10) mentions of the query with geography in Turkey
The result of the lexical query classifier that predicts the probability of a click on the 3561 subject page
The result of the lexical query classifier that predicts the probability of a click on the 3973 subject page
Is Nav Mx Query
#690Query 'navigability' rank
Regional traffic from search engines for a particular query
Clicks on the urls shown in the output for queries that have gone to other search engines
Showing urls in the output for queries that have gone to other search engines
Classification of commerciality of the site
Host Is Market Offer
#695In the latest version of the base of the marketplace there are offers from this host.
Bclm Max
#696The proximity of the query words to the hardest word.
The url satisfies the regexp-expression defined in the prone
Has User Reviews
#698The document contains user feedback/comments
Share of clicks on a given url among all clicks on similar queries, country version, see ((http://wiki.yandex-team.ru/Development/Poisk/arcadia/indexregex indexregex))
Regex Ctr Reg
#700corrected CTR of this url for all similar queries, country version, see ((http://wiki.yandex-team.ru/Development/Poisk/arcadia/indexregex indexregex))
Found
#701Average amount found by query
Angle in the Depth Nodes space, counted by words only (Min for all)
D B M15 Wares
#703Classifier that approximates the quality of commercial sites based on user behavior data
Doc Create Month
#705Document creation time with month accuracy 1.0 -- current month, 0 -- 10 years ago and older. Temporarily disabled
Doc Update Month
#706Document update time with month accuracy 1.0 -- current month, 0 -- 10 years ago and older. Temporarily disabled
X L R Source Rank
#707X L R Main Page
#708The year distribution likelihood function in the document. Temporarily disabled
Host Num Sovetnik
#710Num of Sovetnik urls
Lcm Var
#711The variance of the number of query words in the links.
The arithmetic average of date positions in the document. Temporarily disabled
D B M15 Wares2
#713Cabm
#714BM with fading in the text of the catalog references.
Average url position by normalized query
Average domAttr position by normalized query
Beast Url Mean Pos
#717Average url position for all queries
Beast Host Mean Pos
#718Average host position for all queries
Number of requests per url
Number of requests per host
implementation of the algorithm described in the article ((http://wiki.yandex-team.ru//h.yandex.net/?http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fpeople%2Ftyliu%2Ffp032-liu.pdf http://research.microsoft.com/en-us/people/tyliu/fp032-liu.pdf)) by major regions (TRUBK)
Removed_722
#722Proportion of document words from segments with score > 2.
Total Dups
#725Rank Boost Goodness
#726Site quality rank used for Moscow commercial formula boosts
Factor is used in SelectionRank. TG_UNUSED: should not be included in formulas to avoid feedback
URL feature computed at the request time from rapid clicks search counters with decay of 3 days
Comm Rus
#737Weight of the document according to the one-word dictionary of commercial vocabulary
Wiki Link Count
#738Ukr Is Query Lang
#741Shows that the request is in Ukrainian
Queries Avg C M2
#742Average query commerciality
Qi Query Count
#743Number of queries in the group of frequency queries similar to the specified one
FRC group of frequency queries similar to the specified one, with averaging through the sum of clicks and impressions
FRC group of frequency queries similar to the specified one, with averaging through the sum of clicks and impressions, according to regional statistics
URL feature computed from rapid clicks search frozen counters with decay of 1 day
Word Host Wiki Sum
#747The relative popularity of the word-host pair, where word is the word in the title of the Wikipedia article and host is the host referenced in the article.
Relative clickability of countryId-word-host triplets according to Yandex searches.
Relative clickability of countryId-word-host triplets according to data from popular search engines according to Bara and SimilarGroup logs.
Share of clicks on this url among all clicks on similar queries, calculated by popular search engine
Petal length Depth Nodes, calculated for hosts
Angle variance in Nodes Time space, calculated for hosts
0.9-quantile of the lobe length in Nodes Time space, calculated for hosts
Average by the words of the query the probability of downloading a file from the host after a click.
Nasty Content
#755The nastiness factor of content.
CTR by click data, request normalized by synset
Regional CTR by click data, query normalized by synset
Static trigrams intercection of url and queries by which users visited the url.
Adv Aspam
#759Has Porno Query
#760The result of the sorcerer's rule.
Q U Bm15 Weighted
#761Weighted BM15 for a query by index document - a list of queries to which it has been navigated.
Probability of downloading from the host after the click (according to Bar logs).
Number of chains by request / (number of chains in which the url participated + number of chains by request).
N Hop Is Final
#766The number of chains in which the url was last, normalized to the total number of chains in which the url was.
Visits From Wiki
#767Number of hits to the Wikipedia url
URL feature computed from rapid clicks search frozen counters with decay of 30 days
Reg Browser User Hub
#769Indicator of the page as a hub (how many pages Bar users go to from it).
Aux Title B M25
#770It counts TextBM25 in the title by the text of the user's region name - similar to factor 268.
Bclmf
#771BCLM for Annotation index, doc text and links.
Dssm probability prediction by url + title that there are no products on the page.
FRC of a popular search engine by browser logs
Log Ctr Mean
#774Weighted mean of log(query_clicks)/log(query_shows) for given host. Weights are proportional to log(query_shows) + 0.2.
The number of hits on the url occurring in the chain of hops, normalized to the total number of hits on the request.
The probability of the url being the last on the request in the chain of hops.
Dssm probability prediction by url + title that there is one product on the page.
Dssm probability prediction by url + title that there are many products on the page.
URL feature computed from rapid clicks search frozen counters with decay of 3 days
The geo-referencing of the city level is defined for the url according to BUKI-1125 rules
Country level geo-referencing is defined for the url according to BUKI-1125 rules
Factor GeoRelevRegionCity by geoa attribute
Factor GeoRelevRegionRegion by attribute geoa
GeoGeometryProxim factor by geoa attribute
GeoRelevAlienCity factor by geoa attribute
GeoVQueryInUserCity factor by geoa attribute
GeoVQueryInAlienCity factor by geoa attribute
Page Region Size Geo
#788Factor PageRegionSize by attribute geo
PageRegionCoverage factor by geo attribute
The PageRegionCoverage factor by the adresa attribute
Factor GeoRelevRegionCity by attribute adresa
What fraction (on average per session) of clicked on this query url is this url. Calculated by user sessions.
Owner Is Actual Shop
#793Ovner is a store
Owner Is Service
#794Ovner is a service
Bclm (plane) by the texts from the hops.
FRC on transitions from queries that were set by the user several times
Average weight of impressions on the first page; a click weighs 1, a non-click weighs 1 according to the SBM_GAMMAS table
Average weight of impressions on the first page; click weights 1, non-click weights 1 according to the SBM_GAMMAS table. Regional version
the half sum of the evaluation of the url position with the median position for all similar queries by bist
Host feature computed at the request time from rapid clicks spy_log counters with decay of 3 days
Host feature computed at the request time from rapid clicks spy_log counters with decay of 3 days
Host feature computed at the request time from rapid clicks spy_log counters with decay of 14 days
Host feature computed at the request time from rapid clicks spy_log counters with decay of 14 days
Host feature computed at the request time from rapid clicks spy_log counters with decay of 3 days
Host feature computed at the request time from rapid clicks spy_log counters with decay of 14 days
Host feature computed from rapid clicks spy_log counters with decay of 3 days
Host feature computed from rapid clicks spy_log counters with decay of 3 days
Host feature computed from rapid clicks spy_log counters with decay of 14 days
Host feature computed from rapid clicks spy_log counters with decay of 14 days
Host feature computed from rapid clicks spy_log counters with decay of 3 days
Host feature computed from rapid clicks spy_log counters with decay of 14 days
Finetuned reformulations DSSM to commercial clicked bargain odd-like target from visit log
Distributor Hosts
#813Is video distributor legal
Average value of feature OneProductProbability
Average value of feature ManyProductsProbability
Average value of feature PayDetectorPredict
Owner Is Partner
#817Ovner is a partner
Shop In Shop Url
#818The document is ShopInShop
The value of the conversion rate of the query calculated in behemoth.
Factor by name from the original query Computed from the contents of the document. Algorithm: Chain0Wcm.
At least one of the offers from the distributed scheme has a status of availability.
There is not a single offerer in the unraveled scheme.
Bad Ytier Url
#823For the url from ytier it is known that it has low quality content
Norm Ytier Url
#824For the url from ytier we know that its content is of acceptable quality
Good Ytier Url
#825For the url from ytier it is known that he has good quality content
Best Ytier Url
#826For the url from ytier it is known that he has excellent quality content
On the host there is a purchase on the EUOM.
There is a VISIT LOG purchase on the host.
The URL is a product on the Marketplace.
The URL is a product on the Marketplace and has an offerid.
The URL is ShopInShopCPA.
At least one of the offers from the distributed scheme has the status of unavailability.
There is a purchase on the EUOM.
There is a VISIT LOG purchase on the owner.
Nav Parasites
#835Dssm probability prediction by url + title that the document is a sponger.
The PartnerOfferContent available field in the new parser.
Offer Availability
#837In the offerer from the new parser the field PartnerOfferContent available == true.
Normalized corrected clicks count by query with user's city(gc=) mentioned
Normalized corrected clicks maximum ratio by query with user's city(gc=) mentioned
Normalized corrected clicks maximum ratio by query with not user's city(gc=) mentioned
Fast Mx
#841The value of PurchaseTotalPredict calculated in the behemoth.
The value of SerpSummarySurplusPredict calculated in behemoth.
Yabar Url Revisits
#844User retrievability at the url
The value of RequestWith120D3ClickPartPredict calculated in behemoth.
Value of the query detector of the spongers calculated in the behemoth.
Logarithm of the average time a user was on a host with localization by country; calculated from Yabar logs
Ratio of dwell time on a host in a given region to dwell time on a host in all regions
Ratio of dwell time on the page in the given region to the dwell time on the page for all regions
The more users add to bookmarks a url, the more factor value it has
Sos Dssm
#851Predicting sos.dssm model by url + title.
Med Dssm
#852Predicting med.dssm model by url + title.
Fin Law Dssm
#853Predicting fin_law.dssm model by url + title.
Wiki Infobox
#854This url has a link from Infoboxes on Wikipedia.
Cruelty Dssm
#855Predicting cruelty.dssm model by url + title.
Half Ecom Predict
#856The value of HalfEcomPredict, calculated in behemoth.
A factor similar to RegexMaxClickPercentReg, but calculated by preffix-suffix generalization.
A factor similar to RegexMaxClickPercentYabarReg, but calculated by preffix-suffix generalization.
Dssm Navigation L2
#859A request-document model of navigability.
Average slope angle in the vertex-hanging plane
QueryUrl factor. Value - result of collaborative data filtering for the QueryUrlCorrectedCtr factor
Full Matrix Net
#862The value of the MatrixNet slow ranking model.
Fast Matrix Net
#863The value of the MatrixNet fast ranking model.
Filter Matrix Net
#864The value of the MatrixNet filter model.
Factor in the text of the query and the title of the document, assessing the correspondence of the numeric ranges at the marker words
Full Polynom
#867The Polynom value of the slow ranking model.
Fast Polynom
#868The value of Polynom fast ranking model.
Filter Polynom
#869The value of the Polynom filter ranking model.
An indication that the document was received by machine translation
Med Dssm With Trash
#871Predicting med_with_trash.dssm (med. doc. model with lerne trash infusion) model by url + title.
Predicting fin_law_with_trash.dssm (fin_law_with_trash.dssm) model by url + title.
Factor by name from the original query It is counted by the content of the document. Minimum window size, which includes all the words of the query. Normalized by the number of words in the query.
Factor by name from the original query Document text. CosineMatchMaxPrediction algorithm.
Factor by all names from the original query Aggregation by all extensions. Aggregation type by extension: largest factor value; Computed by document content. Algorithm: Chain0Wcm.
Factor by all names from the original query Aggregation by all extensions. Type of aggregation by extensions: the largest factor value; Computed by document content. Minimum window size that includes all query words. Normalized by the number of words in the query.
Share of the url in the total number of clicks per session on the request (synnorm).
The average share of clicks on this url for this query among all clicks on this query (synnorm) during the day.
The average share of clicks on this url for this query among all clicks on this query (qnorm) during the day.
QI version of factor 861. MaxValue over the set of popular similar queries.
QI version of factor 798. MaxValue over the set of popular similar queries.
Factor by all names from the original query Aggregation by all extensions. Type of aggregation by extensions: largest factor value; Document text. CosineMatchMaxPrediction algorithm.
Dssm Page Quality
#883Dssm, predicting page quality score for a document
Has Turbo Ecom
#884Memorandum Url Type
#885Query-url factor. The value is the result of collaborative data filtering for the SamplePeriodDayFrc factor
The value of the MatrixNet fast filter model.
Fast Filter Polynom
#888The value of Polynom fast filter ranking model.
QI version of factor 879.
Meta Matrix Net
#890The value of MatrixNet on the meta.
Meta Polynom
#891Meaning of Polynom on the Mete.
Short Video
#892A document is a short video (ticktock, reels, shorts).
The document is a telegram channel in web format.
Telegram Post
#894The document is a post in a telegram.
CorrectedCtrReg factor in the annotation index, AnnotationMatchPrediction factor
CorrectedCtrReg factor in the annotation index, QueryMatchPrediction factor
CorrectedCtrReg factor in the annotation index, ValueWcmAvg factor
CorrectedCtrReg factor in the annotation index, factor Bm15V4K5
Is Not Cgi
#899Factor about presence of '?' symbol in url. Equals zero if url has cgi parameters (more precisely: all duplicates have '?' symbol in url).
Alice Click Dssm
#900DSSM click prediction from Alice-specific data
Factor by phone attributes tel_full from the original query Text document. Bocm15 word weight aggregation algorithm. The normalization coefficient is 0.01.
Removed_902
#902SamplePeriodDayFrc factor in the annotation index, QueryMatchPrediction factor
SamplePeriodDayFrc factor in the annotation index, AnnotationMatchPrediction factor
OneClick factor in the annotation index, QueryMatchPrediction factor
OneClick factor in the annotation index, AnnotationMatchPrediction factor
One Click Bm15 A K4
#907OneClick factor in the annotation index, factor Bm15AK4
OneClick factor in the annotation index, factor BocmWeightedW1K3
LongClick factor in the annotation index, QueryMatchPrediction factor
LongClick factor in the annotation index, AnnotationMatchPrediction factor
Long Click Bm15 A K4
#911LongClick factor in the annotation index, factor Bm15AK4
LongClick factor in the annotation index, factor BocmWeightedW1K3
SplitDwellTime factor in the annotation index, QueryMatchPrediction factor
SplitDwellTime factor in the annotation index, AnnotationMatchPrediction factor
BQPR factor in the annotation index, QueryMatchPrediction factor
BQPR factor in the annotation index, AnnotationMatchPrediction factor
YabarVisits factor in the annotation index, QueryMatchPrediction factor
YabarVisits factor in the annotation index, AnnotationMatchPrediction factor
YabarTime factor in the annotation index, QueryMatchPrediction factor
YabarTime factor in the annotation index, AnnotationMatchPrediction factor
SimpleClick factor in the annotation index, QueryMatchPrediction factor
SimpleClick factor in the annotation index, AnnotationMatchPrediction factor
LongClick factor in the annotation index, BocmPlain factor
Collaborative filtering result for factor FI_DBM35 from random log in annotation index, FullMatchPrediction factor
Collaborative filtering result for factor FI_DBM35 from random log in the annotation index, factor AnnotationMatchPrediction
OneClick factor in the annotation index, SynonymMatchPrediction factor
OneClick factor in the annotation index, FullMatchPrediction factor
OneClick factor in the annotation index, ValueWcmAvg factor
OneClick factor in the annotation index, BocmWeightedMaxK1 factor
OneClick factor in the annotation index, factor Bm15StrictK2
OneClick factor in the annotation index, factor Bm15MaxK3
OneClick factor in the annotation index, factor BclmPlainW1K3
OneClick factor in the annotation index, ValueWcmMax factor
OneClick factor in the annotation index, ValueWcmPrediction factor
OneClick factor in the annotation index, BclmWeightedK3 factor
BQPR factor in the annotation index, factor BocmWeightedW1K3
BQPR factor in the annotation index, factor Bm15StrictK2
SplitDwellTime factor in the annotation index, factor BocmWeightedMaxK1
SplitDwellTime factor in the annotation index, FullMatchPrediction factor
SplitDwellTime factor in the annotation index, ValueWcmAvg factor
CorrectedCtrReg factor in the annotation index, factor Bm15StrictK2
Predicting the proportion of queries with geography by the bag of words built for a query
Is Exact Url
#943The query is a url to the exact point and space characters - the wizard's isurl rule is used
Collaborative filtering result for factor FI_DBM35 from random log in annotation index, factor ValueWcmMax
Collaborative filtering result for factor FI_DBM35 from random log in the annotation index, factor ValueWcmAvg
Collaborative filtering result for factor FI_DBM35 from random log in annotation index, factor Bm15StrictK2
Collaborative filtering result for factor FI_DBM35 from random log in the annotation index, factor BclmPlainW1K3
Collaborative filtering result for factor FI_DBM35 from random log in annotation index, factor BclmWeightedK3
Collaborative filtering result for factor FI_DBM35 from random log in annotation index, factor BocmWeightedW1K3
CorrectedCtrXfactor in the annotation index, AnnotationMatchPrediction factor
CorrectedCtrXfactor in the annotation index, QueryMatchPrediction factor
CorrectedCtrXfactor in the annotation index, ValueWcmMax factor
CorrectedCtrXfactor in the annotation index, ValueWcmAvg factor
CorrectedCtrXfactor in the annotation index, BocmWeightedW1K3 factor
CorrectedCtrXfactor in the annotation index, factor BclmPlainK3
CorrectedCtrXfactor in the annotation index, factor BclmMixPlainW1K1
Predicting the total timestamp to the end of the session if this request-document pair is implemented
Alice Timespent
#958Predicting the contribution of this query-document pair to the timespan
SamplePeriodDayFrc factor in the annotation index, ValueWcmAvg factor
SamplePeriodDayFrc factor in the annotation index, factor Bm15MaxK3
SamplePeriodDayFrc factor in the annotation index, factor BocmWeightedK3
SamplePeriodDayFrc factor in the annotation index, factor BocmDoubleK5
SplitDwellTime factor in the annotation index, factor Bm15MaxK3
SimpleClick factor in the annotation index, factor BclmWeightedK3
Predicting the percentage of track length that will be played if this request-track pair is implemented
The probability that the region predicted by the yweb/robot/urlgeo_ml model is correct, assuming the predicted city
PopularSEFRCBrowser factor in the annotation index, AnnotationMatchPrediction factor
PopularSEFRCBrowser factor in the annotation index, SynonymMatchPrediction factor
PopularSEFRCBrowser factor in annotation index, ValueWcmPrediction factor
PopularSEFRCBrowser factor in the annotation index, factor BclmWeightedV2K3
PopularSEFRCBrowser factor in the annotation index, factor BclmMixPlainW1K1
Calculated by the link index. Max(sum(idf)) over all links that are subsets of query / sum(idf) for query
OneClick factor in the annotation index, AnnotationMatchPredictionWeighted factor
LongClick factor in the annotation index, AnnotationMatchPredictionWeighted factor
YabarTime factor in the annotation index, AnnotationMatchPredictionWeighted factor
Page Has Maps Api
#976Equals one if the page connects the js-api of any geo-data provider
LongClickSamplePeriod factor in the annotation index, AnnotationMatchPrediction factor
LongClickSamplePeriod factor in the annotation index, QueryMatchPrediction factor
LongClickSamplePeriod factor in the annotation index, ValueWcmAvg factor
LongClickSamplePeriod factor in the annotation index, ValueWcmPrediction factor
LongClickSamplePeriod factor in the annotation index, factor BclmPlainW1K3
LongClickSamplePeriod factor in the annotation index, factor BclmWeightedK3
LongClickSamplePeriod factor in the annotation index, factor BocmWeightedW1K3
LongClickSamplePeriod factor in the annotation index, factor BclmPlainK5
LongClickSamplePeriod factor in the annotation index, factor BclmWeightedV2K3
LongClickSamplePeriod factor in the annotation index, factor BocmDoubleK5
LongClickSamplePeriod factor in the annotation index, factor Bm15StrictK2
Normalized corrected clicks maximum ratio by query with user's city(gc=) mentioned equal by region
Normalized corrected clicks maximum ratio by query with user's city(gc=) mentioned equal to user's region
BQPR on the sampled period. Annotation Index. Factor WcmCoverageMax
BQPR on the sampled period. Annotation Index. FullMatchPrediction Factor
BQPR on the sampled period. Annotation Index. Factor AnnotationMatchPredictionWeighted
BQPR on the sampled period. Abstract Index. Factor ValuePcmAvg
BQPR on the sampled period. Abstract Index. Factor ValueWcmAvg
BQPR on the sampled period. Abstract Index. Factor Bm15V4K8
BQPR on the sampled period. Abstract Index. Factor BocmWeightedV4K8
BQPR on the sampled period. Annotation Index. SampleWcmMax factor
BQPR on the sampled period. Annotation Index. SynonymMatchPrediction factor
BQPR on the sampled period. Annotation Index. AnnotationMatchPrediction factor
BQPR on the sampled period. Annotation Index. Factor SuffixMatchCount
BQPR on the sampled period. Annotation Index. Factor WcmCoveragePrediction
DoubleFrc in the annotation index, FullMatchPrediction factor
DoubleFrc in the annotation index, SynonymMatchPrediction factor
DoubleFrc in the annotation index, AnnotationMatchPrediction factor
DoubleFrc in the annotation index, AnnotationMatchPredictionWeighted factor
DoubleFrc in the annotation index, QueryMatchPrediction factor
Double Frc Value Wcm Avg
#1007DoubleFrc in the annotation index, ValueWcmAvg factor
DoubleFrc in the annotation index, factor BocmWeightedMaxK1
Double Frc Bm15 V4 K5
#1009DoubleFrc in the annotation index, factor Bm15V4K5
DoubleFrc in the annotation index, factor BocmWeightedV4K5
DoubleFrc in the annotation index, factor BocmDoubleK1
R E M O V E D_1012
#1012R E M O V E D_1013
#1013R E M O V E D_1014
#1014R E M O V E D_1015
#1015R E M O V E D_1016
#1016R E M O V E D_1017
#1017R E M O V E D_1018
#1018R E M O V E D_1019
#1019R E M O V E D_1020
#1020R E M O V E D_1021
#1021R E M O V E D_1022
#1022R E M O V E D_1023
#1023Xf Dt Show All Min W
#1024Linguistic Boosting Factor. Extension type: XfDtShow. Factor: minimum extension weight.
Linguistic Boosting Factor. Type of extensions: XfDtShow. Factor: Bm15 by stream group 2. Maximum value of the factor by extensions.
Linguistic Boosting Factor. Type of extensions: XfDtShow. Factor: BclmWeightedFLogW0 by stream group 3. Maximum value of the factor by extensions.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: Bm15FLogW0 by url and title. Maximal value of the factor by extensions.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: CosineMaxMatchPrediction by text and title. Maximal value of the factor by extensions.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: Bm15 by url. Maximal value of the factor by extensions.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: FullMatchValue by stream LongClickSP. Maximum weighted value of the factor by extensions.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: FullMatchValue by OneClick stream. Maximum weighted value of the factor by extensions.
Linguistic Boosting Factor. Type of extensions: XfDtShow. Factor: Bm15FLog by stream group 1. Weighted average of the factor values multiplied by the weight (\\frac{\\Sum W_i * (W_i * F_i)}{\\Sum W_i}) by extensions.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: Bm15FLogW0 by url and title. Weighted average of factor values multiplied by weight (\\frac{\\\Sum W_i * (W_i * F_i)}{\\Sum W_i}) by extensions.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: MinWindowSize by text. Weighted average of the factor values by extensions.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: mesh OriginalRequestFractionExact by streamer group for mesh factors (text, title, annotation streamers).
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: Bagging CosineMaxMatchPrediction by Streaming LongClickSP.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: bag CosineMatchWeightedValue by Stream LongClickSP.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: Bag AnnotationMatchAvgValue by Stream SimpleClick.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: bag CosineMaxMatcg by Title.
Linguistic Boosting Factor. Type of extensions: XfDtShow. Factor: BclmWeightedFLogW0 by stream group 3. Minimum weighted value of the factor by extension top.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: AnnotationMatchWeightedValue by stream LongClickSP. Minimum weighted value of the factor by extension top.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: AnnotationMatchWeightedValue by stream LongClickSP. Minimum weighted value of the factor on the extension top normalized to the maximum weight on the extension top.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: Chain0WCM by text. Weighted average of the factor values multiplied by the weight (\\frac{\\Sum W_i * (W_i * F_i)}{\\Sum W_i}) of the extension top.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: FullMatchValue by stream LongClickSP. Weighted average of the factor values multiplied by the weight (\\frac{\\Sum W_i * (W_i * F_i)}{\\Sum W_i}) of the extension top.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: FullMatchValue by Stream OneClick. Weighted average of the factor values multiplied by the weight (\\frac{\\Sum W_i * (W_i * F_i)}{\\Sum W_i}) of the extension top.
Linguistic Boosting Factor. Type of extensions: XfDtShow. Factor: BclmWeightedFLogW0 by stream group 3. Weighted average of the factor values by the extension top.
OneClickFrc counted by the sampled period and collaboratively extended, FullMatchPrediction factor
OneClickFrc counted by the sampled period and collaboratively extended, AnnotationMatchPredictionWeighted factor
OneClickFrc calculated from the sampled period and collaboratively extended, ValueWcmAvg factor
OneClickFrc calculated from the sampled period and collaboratively extended, WcmMax factor
OneClickFrc calculated from the sampled period and collaboratively extended, WcmCoveragePrediction factor
OneClickFrc calculated from the sampled period and collaboratively extended, WcmCoverageMax factor
OneClickFrc calculated from the sampled period and collaboratively extended, PcmMax factor
OneClickFrc counted by the sampled period and collaboratively extended, PrefixMatchCount factor
OneClickFrc counted by the sampled period and collaboratively extended, SuffixMatchCount factor
OneClickFrc calculated from the sampled period and collaboratively extended, factor Bm15V0W1K1
Is Local Probability
#1057The meaning of the locality classifier for the query
Is Relev Locale R U
#1058relev_local == ru
Is Relev Locale U A
#1059relev_local == ua
Is Relev Locale B Y
#1060relev_local == by
Is Relev Locale K Z
#1061relev_local == kz
Is Relev Locale T R
#1062local_report == tr
relev_locale == world
Q Class Porno Vw
#1064Porn query classification result from Wizard (iad_vw flag, based on Vowpal Wabbit)
Full Url Fraction
#1065Covering URLs with trigrams from query. Analogue of UrlDomainFraction,UrlPathAndParamsFraction factors.
QueryDwellTime, FullMatchPrediction factor
QueryDwellTime, SynonymMatchPrediction factor
QueryDwellTime, factor AnnotationMatchPrediction
QueryDwellTime, фактор AnnotationMatchPredictionWeighted
QueryDwellTime, QueryMatchPrediction factor
QueryDwellTime, ValueWcmAvg factor
QueryDwellTime, factor BclmPlainW1K3
QueryDwellTime, factor Bm15CoverageV4K3
QueryDwellTime, factor BclmPlainK4
QueryDwellTime, factor BocmWeightedV4K5
More90 Sec Visits Share
#1076Percentage of visits, for which the dwell time during the day on the host is more than 90 sec.
More160 Sec Visits Share
#1077Percentage of visits, for which the time during the day on the host is more than 160 sec
Rank Hacked Nova Php
#1078Rank of hacked sites
Rank Ags4
#1079Morning ags4
Maximum QsRank on the owner
Average QsRank on the main domain
Percentage of users returning within a month
Number of users who returned during the month
Rank Xit Door
#1084Dorway Rank
Share of capital letters in Title
Share of incoming traffic from search engines among all incoming traffic
Share of direct visits among all incoming traffic
Average QsRank in the sliding window
Min Owner Qs Rank
#1089Minimum QsRank
Avg Numhops
#1090Medium Hops
Url Bm15 K01
#1091Bm15K01 factor over hits from Url
Title Bm15 K01
#1092Bm15K01 factor over hits from Title
Title Bocm15 K001
#1093Bocm15K001 factor over hits from Title
Text Bm11 Norm16384
#1094Bm11Norm16384 factor over hits from Text
Text Bocm11 Norm256
#1095Bocm11Norm256 factor over hits from Text
CosineMatchMaxPrediction factor over hits from Text
Bm15FLogK0001 factor over hits from FieldSet1 stream
Bm15FLogK0001 factor over hits from FieldSet2 stream
BclmWeightedFLogW0K0001 factor over hits from FieldSet3 stream
Bm15FLogW0K00001 factor over hits from FieldSetUT stream
Body Chain0 Wcm
#1101Chain0Wcm factor over hits from Body
Body Pair Min Proximity
#1102PairMinProximity factor over hits from Body
Body Min Window Size
#1103MinWindowSize factor over hits from Body
CosineMatchMaxPrediction factor over hits from PopularSeFrcBrowser stream
MixMatchWeightedValue factor over hits from DoubleFrc stream
AnnotationMaxValueWeighted factor over hits from DoubleFrc stream
AnnotationMaxValue factor over hits from DoubleFrc stream
AnnotationMatchWeightedValue factor over hits from DoubleFrc stream
AllWcmWeightedValue factor over hits from DoubleFrc stream
AllWcmMatch95AvgValue factor over hits from DoubleFrc stream
AllWcmWeightedPrediction factor over hits from DoubleFrc stream
AllWcmMatch80AvgValue factor over hits from DoubleFrc stream
FullMatchValue factor over hits from DoubleFrc stream
FullMatchAnyValue factor over hits from DoubleFrc stream
ExactQueryMatchAvgValue factor over hits from DoubleFrc stream
BclmMixPlainKE5 factor over hits from OneClickFrcXfSp stream
Bm15StrictAnnotationK01 factor over hits from OneClickFrcXfSp stream
AllWcmWeightedValue factor over hits from OneClickFrcXfSp stream
AllWcmWeightedPrediction factor over hits from OneClickFrcXfSp stream
AllWcmMatch80AvgValue factor over hits from OneClickFrcXfSp stream
MixMatchWeightedValue factor over hits from OneClickFrcXfSp stream
AnnotationMatchWeightedValue factor over hits from OneClickFrcXfSp stream
BclmPlaneProximity1Bm15W0Size1K0001 factor over hits from OneClickFrcXfSp stream
BclmWeightedProximity1Bm15Size1K001 factor over hits from OneClickFrcXfSp stream
BclmMixPlainKE5 factor over hits from BQPRSample stream
AllWcmWeightedValue factor over hits from BQPRSample stream
AllWcmWeightedPrediction factor over hits from BQPRSample stream
AllWcmMaxPrediction factor over hits from BQPRSample stream
AllWcmMatch80AvgValue factor over hits from BQPRSample stream
MixMatchWeightedValue factor over hits from BQPRSample stream
CosineMatchMaxPrediction factor over hits from BQPRSample stream
AnnotationMaxValueWeighted factor over hits from BQPRSample stream
AnnotationMaxValue factor over hits from BQPRSample stream
AnnotationMatchWeightedValue factor over hits from BQPRSample stream
Bocm15K001 factor over hits from BQPRSample stream
BclmPlaneProximity1Bm15W0Size1K0001 factor over hits from BQPRSample stream
BclmWeightedProximity1Bm15Size1K001 factor over hits from BQPRSample stream
BclmPlaneProximity1Bm15W0Size1K0001 factor over hits from LongClickSP stream
Bm15MaxAnnotationK001 factor over hits from LongClickSP stream
FullMatchValue factor over hits from LongClickSP stream
MixMatchWeightedValue factor over hits from LongClickSP stream
CosineMatchMaxPrediction factor over hits from LongClickSP stream
AnnotationMaxValue factor over hits from LongClickSP stream
AnnotationMaxValueWeighted factor over hits from LongClickSP stream
AnnotationMatchWeightedValue factor over hits from LongClickSP stream
AllWcmMatch95AvgValue factor over hits from LongClickSP stream
AllWcmWeightedValue factor over hits from LongClickSP stream
AllWcmMaxMatch factor over hits from LongClickSP stream
AllWcmWeightedPrediction factor over hits from LongClickSP stream
Bocm15K001 factor over hits from LongClickSP stream
QueryPrefixMatchOriginalWordValue factor over hits from LongClickSP stream
BclmPlaneProximity1Bm15W0Size1K0001 factor over hits from SamplePeriodDayFrc stream
AttenV1Bm15K05 factor over hits from SamplePeriodDayFrc stream
FullMatchValue factor over hits from SamplePeriodDayFrc stream
FullMatchAnyValue factor over hits from SamplePeriodDayFrc stream
AllWcmWeightedValue factor over hits from SamplePeriodDayFrc stream
AllWcmWeightedPrediction factor over hits from SamplePeriodDayFrc stream
AllWcmMatch95AvgValue factor over hits from SamplePeriodDayFrc stream
AllWcmMatch80AvgValue factor over hits from SamplePeriodDayFrc stream
MixMatchWeightedValue factor over hits from SamplePeriodDayFrc stream
AnnotationMatchWeightedValue factor over hits from SamplePeriodDayFrc stream
AnnotationMaxValue factor over hits from SamplePeriodDayFrc stream
AnnotationMaxValueWeighted factor over hits from SamplePeriodDayFrc stream
Bocm15K001 factor over hits from SamplePeriodDayFrc stream
AllWcmWeightedValue factor over hits from CorrectedCtrXFactor stream
AllWcmMaxPrediction factor over hits from CorrectedCtrXFactor stream
AllWcmWeightedPrediction factor over hits from CorrectedCtrXFactor stream
AllWcmMatch80AvgValue factor over hits from CorrectedCtrXFactor stream
MixMatchWeightedValue factor over hits from CorrectedCtrXFactor stream
AnnotationMatchWeightedValue factor over hits from CorrectedCtrXFactor stream
BclmPlaneProximity1Bm15W0Size1K001 factor over hits from CorrectedCtrXFactor stream
BclmWeightedProximity1Bm15Size1K001 factor over hits from CorrectedCtrXFactor stream
AllWcmMaxPrediction factor over hits from LongClick stream
MixMatchWeightedValue factor over hits from LongClick stream
AnnotationMaxValueWeighted factor over hits from LongClick stream
FullMatchValue factor over hits from LongClick stream
AnnotationMatchWeightedValue factor over hits from LongClick stream
AllWcmWeightedValue factor over hits from SimpleClick stream
AllWcmWeightedPrediction factor over hits from SimpleClick stream
AllWcmMaxPrediction factor over hits from SimpleClick stream
MixMatchWeightedValue factor over hits from SimpleClick stream
AnnotationMatchWeightedValue factor over hits from SimpleClick stream
AnnotationMaxValueWeighted factor over hits from BrowserPageRank stream
AnnotationMatchWeightedValue factor over hits from BrowserPageRank stream
AnnotationMaxValue factor over hits from BrowserPageRank stream
Bocm15K001 factor over hits from BrowserPageRank stream
MixMatchWeightedValue factor over hits from OneClick stream
FullMatchValue factor over hits from OneClick stream
AnnotationMatchWeightedValue factor over hits from OneClick stream
AllWcmWeightedPrediction factor over hits from SplitDwellTime stream
Bm15MaxAnnotationK001 factor over hits from SplitDwellTime stream
BclmWeightedProximity1Bm15Size1K0001 factor over hits from QueryDwellTime stream
AttenV1Bm15K001 factor over hits from QueryDwellTime stream
MixMatchWeightedValue factor over hits from QueryDwellTime stream
AnnotationMaxValueWeighted factor over hits from QueryDwellTime stream
AnnotationMaxValue factor over hits from QueryDwellTime stream
AnnotationMatchWeightedValue factor over hits from QueryDwellTime stream
AllWcmWeightedValue factor over hits from QueryDwellTime stream
AllWcmMatch80AvgValue factor over hits from QueryDwellTime stream
BclmPlaneProximity1Bm15W0Size1K0001 factor over hits from RandomLogDBM35 stream
Bm15StrictAnnotationK001 factor over hits from RandomLogDBM35 stream
MixMatchWeightedValue factor over hits from RandomLogDBM35 stream
AnnotationMaxValueWeighted factor over hits from RandomLogDBM35 stream
AnnotationMatchWeightedValue factor over hits from RandomLogDBM35 stream
AllWcmWeightedValue factor over hits from RandomLogDBM35 stream
FullMatchValue factor over hits from RandomLogDBM35 stream
ExactQueryMatchAvgValue factor over hits from RandomLogDBM35 stream
Is Relev Locale I D
#1208relev_local == id
Is Mobile Beauty
#1209Binary factor about the mobile adaptability of the document. Taken from erf
Foreign Domain
#1210In cases where FI_NATIONAL_DOMAIN is 0 and herf.NationalDomainId is full, put 1
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: MixMatchWeightedValue by stream QueryDwellTime. Maximum weighted value of the factor by extensions.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: MixMatchWeightedValue by stream QueryDwellTime. Weighted average value of the factor by extensions.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: MixMatchWeightedValue by stream QueryDwellTime. Minimum weighted value of the factor by extension top.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: AnnotationMatchWeightedValue by stream QueryDwellTime. Minimum weighted value of the factor by extension top.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: AnnotationMatchWeightedValue by stream QueryDwellTime. Maximum weighted value of the factor by extensions.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: AllWcmMatch95AvgValue by stream QueryDwellTime. Weighted average value of the factor by extension top.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: MixMatchWeightedValue by stream BQPRSample. Maximum weighted value of the factor by extensions.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: MixMatchWeightedValue by Stream DoubleFrc. Maximum weighted value of the factor by extensions.
DSSM model trained on clicks.
DSSM model trained on clicks.
DSSM model trained on clicks.
DSSM model trained on clicks.
DSSM model trained on clicks.
DSSM model trained on clicks.
DSSM model trained on clicks.
DSSM model trained on clicks.
Medical2 Url Quality
#1227Neural model of content quality for medical topics
Is Desktop Request
#1228request came from yandsearch (rearr.is_desktop == 1)
Is Mobile Request
#1229request came from touchsearch (rearr.is_mobile == 1)
Is Tablet Request
#1230request came from padsearch (rearr.is_tablet == 1)
Request Is From Android
#1231request came from device with Android OS (rearr.dd_osfamily == Android)
Request Is From I O S
#1232request came from device with iOS (rearr.dd_osfamily == iOS)
Request Is From Windows
#1233request came from device with Windows OS (rearr.dd_osfamily == Windows)
request does not come from devices with Android, iOS or Windows OS (rearr.dd_osfamily != [Android, iOS, Windows])
Embed Video Broken
#1235A broken embedded video on the page.
FullMatchValue factor over hits from CorrectedCtrLongPeriod stream
MixMatchWeightedValue factor over hits from CorrectedCtrLongPeriod stream
AnnotationMaxValueWeighted factor over hits from CorrectedCtrLongPeriod stream
AnnotationMatchWeightedValue factor over hits from CorrectedCtrLongPeriod stream
AllWcmMatch95AvgValue factor over hits from CorrectedCtrLongPeriod stream
AllWcmMatch80AvgValue factor over hits from CorrectedCtrLongPeriod stream
AllWcmWeightedValue factor over hits from CorrectedCtrLongPeriod stream
AllWcmWeightedPrediction factor over hits from CorrectedCtrLongPeriod stream
Neural model of content quality for medical topics (for exponents)
BclmMixPlainKE5 factor over hits from NHopSumDwellTime stream
Match80AvgValue factor over hits from NHopSumDwellTime stream
Fin Law Url Quality
#1247Neural model of content quality for financial and legal topics
MixMatchWeightedValue factor over hits from NHopSumDwellTime stream
Neural model of content quality for financial and legal topics (for exponents)
BclmMixPlainKE5 factor over hits from FirstClickDtXf stream
FullMatchValue factor over hits from FirstClickDtXf stream
AnnotationMaxValueWeighted factor over hits from FirstClickDtXf stream
AnnotationMatchWeightedValue factor over hits from FirstClickDtXf stream
BclmPlaneProximity1Bm15W0Size1K001 factor over hits from FirstClickDtXf stream
Linguistic Boosting Factor. Type of extensions: RequestWithRegionName. Bm11 by document text and title
Linguistic Boosting Factor. Type of extensions: RequestWithRegionName. CosineMatchMaxPrediction by document text and title
Linguistic Boosting Factor. Extension type: RequestWithRegionName. Factor: AnnotationMatchWeightedValue by Stream LongClick.
Linguistic Boosting Factor. Extension type: RequestWithRegionName. Factor: FullMatchValue by Stream OneClick.
Linguistic Boosting Factor. Extension type: RequestWithRegionName. Factor: AnnotationMatchValue by Stream OneClick.
Linguistic Boosting Factor. Extension type: RequestWithRegionName. Factor: AnnotationMatchWeightedValue by stream LongClickSP.
Linguistic Boosting Factor. Extension type: RequestWithRegionName. Factor: FullMatchValue by stream LongClickSP.
Linguistic Boosting Factor. Extension type: RequestWithRegionName. Factor: AnnotationMaxValueWeighted by Stream BQPRSample.
Linguistic Boosting Factor. Extension type: RequestWithRegionName. Factor: Bm15 by stream group 1.
Linguistic Boosting Factor. Extension type: RequestWithRegionName. Factor: Bm15 by stream group 2.
Linguistic Boosting Factor. Extension type: RequestWithRegionName. Factor: BclmWeightedFLogW0 by Streaming Group 3.
Linguistic Boosting Factor. Extension type: RequestWithRegionName. Chain0Wcm factor by document text
Query Doc Random
#1267Random float in [0,1] by user request and document
Sos Url Quality
#1268Neural model of content quality for sos topics
Sum Flash Area
#1269the ratio of the total area of all Flash blocks to the screen area
Sos Url Quality Fresh
#1270Neural model of content quality for sos topics (for expos)
Url Host Fraction
#1271Copy of old version No.294 factor. Added for use on L3 stage only. Coverage of the domain with three letters from the query. (Chelyabinsk lottery - chelloto. Translate query into transliteration, find three-letter words that are covered (che, hel, lot, olo), see what proportion of all three letters are covered)
Url Hits Coverage
#1272Fast version of FI_URL_DOMAIN_FRACTION
Alice Timespent Sum
#1273Prediction of the session timestamp subject to the implementation of this query-document pair
Dssm Dog L3
#1274Request-document dssm that predicts document sobriety
Tiktok Tag
#1275The document is a selection from the /tag ticktock
Tiktok Discovery
#1276The document is a selection from the /discovery tiktok
Tiktok Music
#1277The document is a selection from the /music tiktok
Dssm Sinsig L2
#1278The request-document model of synsig.
Factor on the original query. Calculated using tokenized url. CosineMatchMaxPrediction algorithm.
Factor on the original query. Calculated by tokenized url. Weight of the hit is multiplied by 1/ (1 + position of the word in the sentence) Algorithm of word weights aggregation: Bm15. The normalization coefficient is 0.5.
Factor by original request. It is counted by document title. The word weights aggregation algorithm is BclmMixPlain: a linear mixture of annotation Bclm weight and weighted Positionless word weight, then word counters are aggregated through bm15. The normalization factor is 10^(-5).
Factor by original request. It is counted from the title of the document. CMMatchTop5AvgMatchValue algorithm.
Factor by original request. It is counted by the title of the document. Degree of query word coverage with exact form (without synonyms).
Factor by original request. It is counted by the title of the document. Weight of the hit is multiplied by 1/ (1 + position of the word in the sentence) Algorithm of word weights aggregation: Bm15. Normalization coefficient 0.5.
Factor in the original request. It is counted by the content of the document. The word weights aggregation algorithm is BclmMixPlain: a linear mixture of annotation Bclm weight and weighted Positionless word weight, then word counters are aggregated through bm15. The normalization factor is 10^(-5).
Factor in the original request. It is calculated from the contents of the document. CosineMatchMaxPrediction algorithm.
Factor in the original request. Calculated from the contents of the document. AllWcmWeightedPrediction algorithm.
Factor in the original request. It is counted by the content of the document. Algorithm of word weights aggregation Bocm15. The normalization coefficient is 0.01.
Factor in the original request. It is counted by the contents of the document. Algorithm: QueryPartMatchSumValueAny.
Factor in the original request. It is counted by the content of the document. Degree of query word coverage with exact form (without synonyms).
Factor in the original request. It is counted by the content of the document. The degree of coverage of the query words in the exact form.
Factor in the original request. It is counted by the content of the document. Scale Aggregation Algorithm: Bm15MaxAnnotation Normalization Factor 0.01.
Has Cloaking
#1293Dssm Full Split Bert
#1294Social Url Is Verified
#1295Url is a channel/post from a verified social network account
Dssm Mimicration Url
#1296Dssm, predicting whether a site is a mimicry
Removed_1297
#1297Removed_1298
#1298MetaPolyGen8
CMMatch80AvgValue factor over hits from QueryDwellTime stream
CMMatchTop5AvgMatch factor over hits from DoubleFrc stream
PerWordCMMaxMatchMin factor over hits from OneClickFrcXfSp stream
PerWordCMMaxMatchMin factor over hits from FirstClickDtXf stream
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: PerWordCMMaxMatchMin by stream LongClickSP. Maximum weighted value of the factor by extensions.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: PerWordCMMaxMatchMin by Stream OneClick. Maximum weighted value of the factor by extensions.
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: PerWordCMMaxMatchMin by stream FirstClickDtXf. Minimum weighted value of the factor by extension top.
Removed_1307
#1307Removed_1308
#1308Distance To Ankara
#1309Distance from the city from which the request was made to Ankara
Distance To Magadan
#1310Distance from the city where the request was made to Magadan
Latitude
#1311Geographical latitude of the city from where the request was made
Longitude
#1312Geographical longitude of the city from where the query was made
FullMatchValue factor over hits from LongClick stream (Mobile sessions filtered)
CosineMatchMaxPrediction factor over hits from LongClick stream (Mobile sessions filtered)
AnnotationMatchWeightedValue factor over hits from LongClick stream (Mobile sessions filtered)
AllWcmMatch95AvgValue factor over hits from LongClick stream (Mobile sessions filtered)
AllWcmWeightedValue factor over hits from LongClick stream (Mobile sessions filtered)
AllWcmWeightedPrediction factor over hits from LongClick stream (Mobile sessions filtered)
CMMatchTop5AvgValue factor over hits from LongClick stream (Mobile sessions filtered)
Bm15MaxAnnotationK001 factor over hits from LongClick stream (Mobile sessions filtered)
Linguistic Boosting Factor. Extension type: XfDtShow. Factor: PerWordCMMaxMatchMin on incoming links. Maximum weighted value of the factor by extensions.
Removed_1322
#1322Removed_1323
#1323U S Long Period Url Ctr
#1324Static URL factor by search sessions for 1600 days. Normal Ctr.
Static URL factor by search sessions over 1600 days. Average DwellTime, and DwellTime from session is truncated if more than 3600 seconds
Removed_1326
#1326Static URL factor by search sessions over 1600 days. Average DwellTime, and DwellTime from session is truncated if more than 180 seconds
Static URL factor by search sessions for 1600 days. Probability that the click on the URL will be more than 120 seconds
Static URL factor by search sessions for 1600 days. Logarithm of the number of hits.
Removed_1330
#1330Static URL factor by search sessions over 1600 days. The probability that the URL will be clicked if at least one URL is not clicked is higher.
Static URL factor by search sessions over 1600 days. The probability that the URL will not be clicked if at least one URL is clicked is lower.
Static URL factor by search sessions for 1600 days. Normal Ctr. Localization to country level.
Static URL factor by search sessions over 1600 days. Average DwellTime, with DwellTime from session truncated if more than 3600 seconds. Localization to country level.
Static URL factor by search sessions for 1600 days. The probability that the click on the URL will be more than 120 seconds. Localization to country level.
Static URL factor by search sessions for 1600 days. Average URL position for all queries. Localization to the country level.
Static URL factor by search sessions for 1600 days. Logarithm of the number of impressions. Localization to the country level.
DSSM model trained on clicks. Takes bigrams into account.
MixMatchWeightedValue factor over hits from FirstLastClick stream (Mobile sessions filtered)
CosineMatchMaxPrediction factor over hits from FirstLastClick stream (Mobile sessions filtered)
FullMatchValue factor over hits from FirstLastClick stream (Mobile sessions filtered)
AllWcmMatch95AvgValue factor over hits from FirstLastClick stream (Mobile sessions filtered)
CMMatchTop5AvgValue factor over hits from FirstLastClick stream (Mobile sessions filtered)
AllWcmWeightedValue factor over hits from FirstLastClick stream (Mobile sessions filtered)
Is Qvoice
#1345Was the request made by voice
AllWcmWeightedValue factor over hits from AvgDTWeightedByRankMobile stream (Mobile sessions filtered)
AllWcmMatch95AvgValue factor over hits from AvgDTWeightedByRankMobile stream (Mobile sessions filtered)
CMMatchTop5AvgValue factor over hits from AvgDTWeightedByRankMobile stream (Mobile sessions filtered)
AnnotationMatchWeightedValue factor over hits from AvgDTWeightedByRankMobile stream (Mobile sessions filtered)
FullMatchValue factor over hits from AvgDTWeightedByRankMobile stream (Mobile sessions filtered)
MixMatchWeightedValue factor over hits from AvgDTWeightedByRankMobile stream (Mobile sessions filtered)
Linguistic Boosting Factor. Type of extensions: XfDtShow. Factor: AvgPerTrigramMaxValueAny by stream group 5. Weighted average value of the factor by the top of extensions.
AvgPerTrigramAvgValueAny factor by CorrectedCtrLongPeriod Stream
DSSM model trained on clicks. Takes bigrams into account. Embeddings for documents are computed offline.
Rank Artroz
#1355The quality rank of the texts on the host. The higher - the more likely that the host is full of articles - rewrite, bad copywriting, ordered on content exchanges. It burns harder as query aggregation.
Minimum from gradients according to the bigram LogDwelltime model.
Maximum of the gradients according to the bigram LogDwelltime model.
The second central point (variance) from the gradients according to the bigram LogDwelltime model.
The third central point from the gradients according to the bigram LogDwelltime model.
Dssm Vk Popularity
#1360The probability that vk.com host is popular for this query according to the corresponding dssm model.
Dssm Onliner Popularity
#1361The probability that the onliner.by host is popular for this query according to the corresponding dssm-model.
Removed_1362
#1362Removed_1363
#1363Dssm Rambler Popularity
#1364The probability that the host rambler.ru is popular for this query according to the corresponding dssm-model.
The probability that the host expertcen.ru is popular for this query according to the corresponding dssm-model.
Dssm Sunhome Popularity
#1366The probability that the host sunhome.ru is popular for this query according to the corresponding dssm-model.
Static URL factor by browser logs for a maximum period. Percentage of traffic from social networks in all traffic from other hosts and search.
Static URL factor by browser logs for maximum period. Average number of direct descendants from the host spent more than 90 seconds on it. The descendant is direct only if there is a link from our page to the descendant and it was clicked.
Static URL factor by browser logs over maximum period. The average maximum tree depth with the root in the current URL when the URL is visited from other hosts.
Static URL factor by browser logs over maximal period. The number of times the page has been accessed from the serp divided by the total number of pages accessed from the serp. The closer to 1, the more times the page was opened as the only page in the session.
Static URL factor by browser logs for the maximum period. Average length of search sessions, when the page was navigated to from the serp
Static URL factor by browser logs for maximal period. See the wiki for the formula to calculate the factor.
Static URL factor by browser logs for maximal period. See the wiki for the formula to calculate the factor.
Static URL factor by browser logs for maximal period. Probability that the user will spend > 120 seconds on the page.
Static URL factor by browser logs for maximum period. The number of leaves in the URL subtree. In this case leaves are pages from which there were no jumps.
Static URL factor from browser logs for the maximum period. The average time spent on the page and in all the descendants of the page (URLs to which were navigated) from the host. Cut if total Dt is more than 10 minutes
Static URL factor by browser logs for maximal period. Minimum unix time when page first appeared in logs.
Static URL factor by browser logs for maximal period. Difference between average and minimum unix time when page appeared in logs.
U B Long Period Latitude
#1379Static URL factor by browser logs for maximum period. The average latitude from where the page was viewed.
Static URL factor by browser logs for maximum period. Average longitude from where the page was viewed.
Static URL factor by browser logs for maximum period. Probability of download from page
Static URL factor by browser logs for maximum period. Probability of image download from page
Static URL factor by browser logs for maximum period. Probability of downloading torrent file from page
Static URL factor by browser logs for maximal period. See wiki for factor calculation formula. Localization to country level.
Static URL factor by browser logs for maximum period. The number of leaves in the URL subtree. In this case leaves are pages from which there were no jumps. Localization to the country level.
Static URL factor from browser logs for the maximum period. The average time spent on the page and in all the descendants of the page (URLs to which were navigated) from the host. Cut if total Dt is more than 10 minutes. Localization to country level.
The sum of the query's scoring words according to the 3grams-yandex-direct language model.
The sum of the query's scoring words according to the web-mt language model.
U B Long Period Rank
#1389Static URL factor based on browser logs over a maximum period. Rank, based only on UBLP counters, which allows to find many SBR losses
Linguistic Boosting Factor. Extension type: Qfuf. Factor: BclmWeightedFLogW0_K0.001 by FieldSet3. Weighted average of factor values by top-10 extensions.
Linguistic Boosting Factor. Extension type: QueryToText. Factor: by MinWindowSize by document content. Weighted average of factor values by extensions.
Query To Text All Avg W
#1392Linguistic Boosting Factor. Average weight of extensions of QueryToText type.
Linguistic Boosting Factor. Extension type: Qfuf. Factor: MixMatchWeightedValue by QueryDwellTime stream. Weighted average of the factor values by extensions.
Linguistic Boosting Factor. Extension type: QueryToText. Factor: MinWindowSize by document content. Weighted average of the factor values by the top 10 extensions.
Linguistic Boosting Factor. Extension type: Qfuf. Factor: Bm15FLogW0_K0.0001 by url and header. Maximum value of the factor by extensions.
Linguistic Boosting Factor. Extension type: Qfuf. Factor: BclmWeightedFLogW0_K0.001 by FieldSet3. Weighted average of factor values by extensions.
Qfuf All Avg W
#1397Linguistic Boosting Factor. The average weight of Qfuf-type extensions.
Linguistic Boosting Factor. Extension type: QueryToText. Factor: PairMinProximity by document content. Average of factor values by extensions.
Qfuf All Total W
#1399Linguistic Boosting Factor. Type of extensions: Qfuf. The renormalized total weight of the extensions.
Linguistic Boosting Factor. Extension type: QueryToText. Factor: Bocm11_Norm256 by document text. Average value of the factor by extensions.
Linguistic Boosting Factor. Extension type: Qfuf. Factor: CosineMatchMaxPrediction by document text. Maximal value of the factor by extensions.
Linguistic Boosting Factor. Extension type: Qfuf. Factor: Bm15FLog_K0.001 by FieldSet1. Weighted average of factor values with quadratic weight by the top 10 expansions by factor value.
Linguistic Boosting Factor. Type of extensions: Qfuf. Factor:Bocm11_Norm256 by document text. Maximal value of the factor by extensions.
Linguistic Boosting Factor. Extension type: Qfuf. Factor: Bm15FLogW0_K0.0001 by url and header. Weighted average of factor values by extensions.
DSSM model trained on clicks, target=OneClicks/Clicks. Takes bigrams into account.
Dssm Query Dwell Time
#1406DSSM model trained on clicks, target=QueryDwellTime stream value. Takes bigrams into account.
The normalized sum of the weights of the query words that occurred in the text of the document or links to it.
The normalized sum of the query word weights that EQUAL_BY_STRING in the document text or links to it.
The normalized sum of the weights of the query words that appeared in the text of the document.
The normalized sum of the weights of the query words that appeared in the links to the document.
The normalized sum of the query word weights that EQUAL_BY_STRING in the document references.
The normalized sum of weights by IFiltrationModel of query words that were encountered in the text of the document or references to it.
The normalized sum of weights by IFiltrationModel for query words that EQUAL_BY_STRING in the document text or links to it.
The normalized sum of weights by IFiltrationModel of query words that EQUAL_BY_LEMMA in the document text or links to it.
The normalized sum of weights by IFiltrationModel of query words, which occurred in links to the document.
Normalized sum of weights by IFiltrationModel of query words that are EQUAL_BY_STRING in document references.
Linguistic Boosting Factor. Type of extensions: Qfuf. Aggregation by all extensions. Highest factor value. By stream from LinkAnnIndicator link index. Algorithm AnnotationMaxValueWeighted - maximum weight (by MainWeights word weights) of annotation coverage, weighted by annotation weight
Linguistic Boosting Factor. Type of extensions: Qfuf. Aggregation by all extensions. Highest factor value. By stream from LinkAnnIndicator link index. Algorithm AnnotationMaxValueWeighted - maximum weight (by MainWeights word weights) of annotation coverage, weighted by annotation weight
Linguistic Boosting Factor. Type of extensions: XfDtShow. Aggregation by all extensions. Largest weighted value of the factor. Normalized to the maximum weight of the extension. Based on stream from LinkAnnIndicator link index. PerWordCMMaxMatchMin algorithm: minimum CMMaxMatch weight by words.
Linguistic Boosting Factor. Type of extensions: XfDtShowKnn (closest by dssm-model, trained to predict XfDtShow extensions). Aggregation over all extensions. Highest weighted factor value. A mixture of multiple streamlines, the weight is computed from a fixed polynomial of the component weights on a given annotation. The word weights aggregation algorithm is BclmMixPlain: a linear mixture of the annotation Bclm weight and the weighted Positionless word weight, then the word counters are aggregated via bm15. The normalization factor is 10^(-5).
Linguistic Boosting Factor. Type of extensions: XfDtShowKnn (closest by dssm-model, trained to predict XfDtShow extensions). Aggregation over all extensions. Highest weighted factor value. Normalized to the maximum weight of the extension. Stream: CorrectedCtrLongPeriod. Degree of query word coverage with exact form (without synonyms).
Linguistic Boosting Factor. Type of extensions: Qfuf. Aggregation over all extensions. Largest weighted value of the factor. Normalized to the maximum weight of the extension. Vpcg result for long long period, data: CorrectedClicks. Average weight of the annotations among those in which the query turned out to be an exact substring.
Xf Dt Show Knn All Max W F Max W Corrected Ctr Long Period Bclm Plane Proximity1 Bm15 W0 Size1 K0001
#1423Linguistic Boosting Factor. Type of extensions: XfDtShowKnn (closest by dssm-model, trained to predict XfDtShow extensions). Aggregation over all extensions. Largest weighted factor value. Normalized to the maximum weight of the extension. Stream: CorrectedCtrLongPeriod. Algorithm BclmPlaneProximity1Bm15W0Size1: uses bclm with weightless weighting if there are multiple query words, if there is one word then the match-weighted sum of hits is used. The normalization coefficient is 0.001.
Xf Dt Show Knn All Avg W
#1424Linguistic Boosting Factor. Type of extensions: XfDtShowKnn (closest by dssm-model, trained to predict XfDtShow extensions). Aggregation over all extensions. Average weight of extensions.
Document dssm model language classifier rus.
Document dssm model language classifier eng.
Document dssm model language classifier other.
Removed_1428
#1428Removed_1429
#1429alice_aramusic_dssm
#1430Predicting the DSSM model to determine irrelevant Alice responses
The average value of News by request for the year. Calculated offline.
The average value of AddTime by request for the year. Calculated offline.
The average value of TxtHiRelSy for the query per year. Calculated offline.
The average TextLike value per query for the year. Calculated offline.
The average HasNoAllWordsTRSy value by query per year. Calculated offline.
The average IsForum on the request for the year. Calculated offline.
The average HasPayments on request for the year. Calculated offline.
Average value of YabarHostAvgTime2 per request per year. Calculated offline.
The average value of YabarUrlVisitors by request for the year. Calculated offline.
Average value of QueryDOwnerOnlyClickRate by request per year. Calculated offline.
The average DaterAge on request for the year. Calculated offline.
The average value of LongestText by query for the year. Calculated offline.
The average value of DifferentInternalLinks by query per year. Calculated offline.
Average value of QueryDOwnerOnlyClickRate_Reg by query per year. Calculated offline.
The average IsHub value per query per year. Calculated offline.
Removed_1447
#1447Average BM25_0 on request for the year. Calculated offline.
The average Bocm on demand for the year. Calculated offline.
The average IsIndexPage value for the query per year. Calculated offline.
The average value of QueriesAvgCM2 per request per year. Calculated offline.
Average BrowserHostDownloadProbability by request per year. Calculated offline.
The average value of RegBrowserUserHub per query per year. Calculated offline.
The average value of AuxTitleBM25 on the request for the year. Calculated offline.
The average value of QueryUrlCorrectedCtrXfactor by query per year. Calculated offline.
Average QueryToDocAllSumFCountTextBm11Norm16384 by query for the year. Calculated offline.
Average value of XfDtShowAllSumWFSumWBodyMinWindowSize by request per year. Calculated offline.
The weighted average of the IsMainPage clicks per query per year. Calculated offline.
The click-weighted average of YabarUrlAvgTime on the request for the year. Calculated offline.
The click-weighted average of DifferentInternalLinks by query for the year. Calculated offline.
Weighted average dwelltime-amy value of UrlDomainFraction by query per year. Calculated offline.
BM25FdPR with normalization to the average document length depending on the document language. Only text hits are used.
Domain Has Metrika
#1463Does owner have metrika or not
Has Sideblock
#1464The document has a turbo page for the mobile platform.
Document annotations count in the whole history of the Search (DSSM AnnReg models helper)
Document annotation words count in the whole history of the Search (DSSM AnnReg models helper)
Document annotation regions count in the whole history of the Search (DSSM AnnReg models helper)
Removed_1468
#1468Removed_1469
#1469Query-MainContentKeywords similarity, target: logDwellTime
Yellowness Max
#1473Maximum value of domain yellowness (based on Toloka)
Yellowness Mean
#1474Mean value of domain yellowness (based on Toloka)
Yellowness Median
#1475Median of domain yellowness (based on Toloka)
Yellowness Min
#1476Minimum value of domain yellowness (based on Toloka)
Dssm Boosting query self similarity for XfWeight model.
Dssm Boosting AvgTop02Score aggregation for XfWeight model over 5-means centroids.
Dssm Boosting AvgTop04Score aggregation for XfWeight model over 5-means centroids.
Dssm Boosting AvgTop02ScoreAvgClusterTop3Weighted aggregation for XfWeight model over 5-means centroids.
Dssm Boosting AvgTop02Score aggregation for XfWeight model over 5-means centroids (query as expansion).
Dssm Boosting AvgTop02ScoreAvgClusterTop3Weighted aggregation for XfWeight model over 5-means centroids (query as expansion).
Dssm Boosting query self similarity for XfOne model.
Dssm Boosting Score aggregation for XfOne model over 1-means centroids.
Dssm Boosting ScaledSumWeight aggregation for XfOne model over 1-means centroids.
Dssm Boosting Score aggregation for XfOne model over 1-means centroids (query as expansion).
Dssm Boosting ScoreAvgNearest1Weighted aggregation for XfOne model over 1-means centroids (query as expansion).
Dssm Boosting ScoreAvgNearest5Weighted aggregation for XfOne model over 1-means centroids (query as expansion).
Dssm Boosting Score aggregation for XfOneSe model over 1-means centroids.
Dssm Boosting ScoreScaledSumWeighted aggregation for XfOneSe model over 1-means centroids.
Dssm Boosting ScoreAvgNearest5Weighted aggregation for XfOneSe model over 1-means centroids.
Dssm Boosting query self similarity for Ctr model.
Dssm Boosting Score aggregation for Ctr model over 1-means centroids.
Dssm Boosting Score aggregation for Ctr model over 1-means centroids (query as expansion).
Dssm Boosting ScoreScaledSumWeighted aggregation for Ctr model over 1-means centroids (query as expansion).
Dssm Boosting ScoreAvgNearest1Weighted aggregation for Ctr model over 1-means centroids (query as expansion).
Yellowness Dispersion
#1497Yellowness distribution dispersion of domain (based on Toloka)
The vpcg result for the long long period, data: CorrectedClicks. FullMatchPrediction Factor
The vpcg result for the long long period, data: CorrectedClicks. Factor AllWcmMatch95AvgValue
The vpcg result for the long long period, data: CorrectedClicks. Factor CMMatchTop5AvgValue
Result vpcg for the long long period, data: CorrectedClicks. Factor AnnotationMaxValueWeighted
Result vpcg for long long period, data: CorrectedClicks. Factor MixMatchWeightedValue
The vpcg result for the long long period, data: CorrectedClicks. Factor CMMatchTop5AvgPrediction
Dssm Ctr No Miner
#1504DSSM model trained on CTRs without miner.
Predicting dssm (url + title) trained on page_quality signal and embedded in RTHub, first slot.
Predicting dssm (url + title), trained on page_quality signal and embedded in RTHub, second slot.
The main components of request embedding from the DssmCtrNoMiner model
The main components of request embedding from the DssmCtrNoMiner model
The main components of request embedding from the DssmCtrNoMiner model
The main components of request embedding from the DssmCtrNoMiner model
The main components of request embedding from the DssmCtrNoMiner model
The main components of request embedding from the DssmCtrNoMiner model
DSSM model trained on click odd pool
DSSM model trained on click personalization pool
DSSM model trained on click triangle pool
Linguistic Boosting Factor. Extension type: RequestWithRegionName. Factor: CMMatchTop5AvgMatchValue by Stream FloatMultiplicity of the LinkAnn index
Removed_1517
#1517Linguistic Boosting Factor. Factor: PerWordAMMaxValueMin by Stream FloatMultiplicity of the LinkAnn index
Linguistic Boosting Factor. Factor: AttenV1Bm15K001 by Stream FloatMultiplicity of the LinkAnn index
Linguistic Boosting Factor. Factor: Bocm11Norm256 by Stream IsExternal of the LinkAnn index
Removed_1521
#1521Linguistic Boosting Factor. Extension type: RequestWithRegionName. Factor: AnnotationMaxValue by Stream FloatMultiplicity of the LinkAnn index
DSSM model trained on clicks without miner (with no-clicks and AM-hard negatives). Takes bigrams into account.
AVG aggregation of HasPayments web factor using random log
AVG aggregation of VideoQuery web factor using random log
AVG aggregation of SyntQuality web factor using random log
PERCENTALE_90 aggregation of GeoRegionalityVNew web factor using random log
AVG aggregation of QClassDownload web factor using random log
AVG aggregation of IsMusic web factor using random log
PERCENTALE_25 aggregation of QueryThEncyclopedic web factor using random log
AVG aggregation of CommercialOwnerRank_Reg web factor using random log
PERCENTALE_25 aggregation of YabarWordDepthNodesGradientMin web factor using random log
AVG aggregation of PopularSEFRCBrowser web factor using random log
AVG aggregation of URLClicksMaxGeoRegionFRCRatio web factor using random log
PERCENTALE_90 aggregation of UBLongPeriodDirectHChildren90CntFromExtHost web factor using random log
PERCENTALE_90 aggregation of UBLongPeriodDtUrlHChildrenCut600Reg web factor using random log
AVG aggregation of IsPicture web factor using random log
AVG aggregation of ErratumLogQueryProbability web factor using random log
Removed_1539
#1539Removed_1540
#1540Removed_1541
#1541Predicted by the query and country, using dssm-model the length of the click from the given country.
Predicted by the neural network average News on demand for the year.
Predicted by the neural network, the average AddTime value of the request for the year.
Predicted by the neural network average value of TxtHiRelSy on the query for the year.
Predicted by the neural network average TextLike value by query for the year.
Predicted by the neural network average HasNoAllWordsTRSy on the query for the year.
Predicted by the neural network of the average IsForum on the request for the year.
Predicted by the neural network average HasPayments by request for the year.
Predicted by the neural network average value of YabarHostAvgTime2 on the request for a year.
Predicted by the neural network average value of YabarUrlVisitors by query for the year.
Predicted by neural network average QueryDOwnerOnlyClickRate for the year.
Predicted by the neural network average DaterAge on demand for the year.
Predicted by the neural network average value of LongestText by query for the year.
Predicted by the neural network average DifferentInternalLinks by query for the year.
Predicted by the neural network average QueryDOwnerOnlyClickRate_Reg value for the year.
Removed_1557
#1557Removed_1558
#1558Type of canonized url of Yandex music - track
Predicted by the neural network average Bocm on demand for the year.
The average IsIndexPage value of the query for the year predicted by the neural network.
Predicted by the neural network average QueriesAvgCM2 value by query for the year.
The average BrowserHostDownloadProbability value per request per year predicted by the neural network.
Predicted by a neural network of the average RegBrowserUserHub value per query for the year.
Predicted by the neural network average AuxTitleBM25 for the query for the year.
Predicted by neural network average QueryUrlCorrectedCtrXfactor for the year.
Predicted by neural network average QueryToDocAllSumFCountTextBm11Norm16384 for the year.
Predicted by the neural network average XfDtShowAllSumWFSumWBodyMinWindowSize for the year.
Predicted by the neural network of the weighted average of clicks IsMainPage on the request for the year.
Predicted by the neural network weighted average by clicks YabarUrlAvgTime on the request for the year.
Predicted by the neural network of the average click-weighted value of DifferentInternalLinks by query for the year.
Predicted by neural network weighted average dwelltime-amy value of UrlDomainFraction by query for the year.
Linguistic Boosting Factor. Type of extensions: XfDtShowKnn. Factor: BclmWeightedFLogW0 by stream group 3. Maximum weighted value of the factor.
Linguistic Boosting Factor. Type of extensions: XfDtShowKnn. Factor: Bm15FLog by stream group 2. Maximum weighted value of the factor.
Linguistic Boosting Factor. Extension type: XfDtShowKnn. Factor: Bag OriginalRequestFraction by Stream FieldSetBagOfWords.
Linguistic Boosting Factor. Extension type: XfDtShowKnn. Factor: MixMatchWeightedValue by stream QueryDwellTime. Maximum weighted value of the factor normalized to the total weight.
Linguistic Boosting Factor. Extension type: XfDtShowKnn. Factor: Bm15 by Title stream. Total weighted values of the factor multiplied by the weight (\\frac{\\Sum W_i * (W_i * F_i)}{\\Sum W_i}) normalized by the total weight.
Linguistic Boosting Factor. Type of extensions: XfDtShowKnn. Factor: BclmWeightedFLogW0 by stream group 3. Minimum value of the factor by extension top.
Linguistic Boosting Factor. Type of extensions: XfDtShowKnn. Factor: BclmWeightedFLogW0 by stream group 3. Total weighted values of the factor multiplied by the weight (\\frac{\\Sum W_i * (W_i * F_i)}{\\Sum W_i}) normalized by the total weight.
Linguistic Boosting Factor. Type of extensions: XfDtShowKnn. Factor: Bm15FLog by stream group 1. Maximum weighted value of the factor.
Linguistic Boosting Factor. Type of extensions: XfDtShowKnn. Factor: Bm15FLog by stream group 1. Total weighted value of the factor normalized to the total weight.
Linguistic Boosting Factor. Extension type: XfDtShowKnn. Factor: Bag AnnotationMatchAvgValue by Stream LongClickSP.
Linguistic Boosting Factor. Type of extensions: XfDtShowKnn. Factor: Bm15FLog by stream group 1. Total weighted values of the factor multiplied by the weight (\\frac{\\Sum W_i * (W_i * F_i)}{\\Sum W_i}) on the expansion group normalized by the total weight on the expansion group.
Linguistic Boosting Factor. Type of extensions: XfDtShowKnn. Factor: Bm15FLog by stream group 1. Minimum weighted value of the factor on the extension top normalized to the maximum weight on the extension top.
Linguistic Boosting Factor. Extension type: XfDtShowKnn. Factor: PairMinProximity by Stream Body. Maximum weighted value of the factor normalized to the total weight.
Linguistic Boosting Factor. Type of extensions: XfDtShowKnn. Factor: Bm15FLog by stream group 1. Total weighted values of the factor multiplied by the weight (\\frac{\\Sum W_i * (W_i * F_i)}{\\Sum W_i}) normalized by the total weight.
Linguistic Boosting Factor. Extension type: XfDtShowKnn. Factor: Bag AnnotationMatchAvgValue by Stream SimpleClick.
Linguistic Boosting Factor. Extension type: XfDtShowKnn. Factor: Bag CosineMaxMatch by Stream Title.
Predicting the probability that the query is localizable according to the Regionality5 rule.
Removed_1590
#1590Removed_1591
#1591Removed_1592
#1592Removed_1593
#1593Document has Fio from original request
Page Quality Experiment1
#1595Factor for experiments Page Quality 1
DSSM model trained on clicks without miner (with no-clicks and am_hard negatives 50/50 and then on am_hard negatives only). Takes bigrams into account.
Dssm Boosting Score aggregation for XfOneSeAmSsHard model over 1-means centroids.
Dssm Boosting ScoreAvgClusterTop3Weighted aggregation for XfOneSeAmSsHard model over 1-means centroids.
Page Quality Experiment2
#1599Factor for experiments Page Quality 2
Yellowness Img Max
#1600Average by url maximum yellowness of teaser image
Yellowness Img Avg
#1601Average by url average yellowness of teaser image
Yellow Img Share
#1602Ratio of yellow images in teasers on host
Yellow Img Count
#1603Average yellow images count on host
Teasers Count
#1604Average teasers count on host
Teasers Area
#1605Average teasers area on host
Yellowness Txt Min
#1606Average by url minimum yellowness of teaser text
Yellowness Txt Avg
#1607Average by url average yellowness of teaser text
Has Adv Clickable B G
#1608Background is clickable advertisement
Adv Nets Area
#1609Average ratio of adverts on screen
Adv Nets Area First Page
#1610Ratio of adverts on screen on main page
Adv Nets Count
#1611Average count of adverts on screen
Ratio of outgoing advertisement traffic to all traffic (desktop)
Ratio of outgoing real-time bidding traffic to all traffic (desktop)
News Agency Rating
#1614Rating of news agency from agencies.json (Yandex.News resource)
Linguistic Boosting Factor. Extension type: QueryToTextByXfDtShowKnn. Factor: Norm256 by stream Bocm11. Total weighted factor values multiplied by weight (\\frac{\\Sum W_i * (W_i * F_i)}{\\Sum W_i}).
Linguistic Boosting Factor. Extension type: QueryToTextByXfDtShowKnn. Factor: MinWindowSize by Stream Body. Total weighted factor values multiplied by weight (\\frac{\\\Sum W_i * (W_i * F_i)}{\\Sum W_i}) by the extension top normalized by the total weight by the extension top.
Linguistic Boosting Factor. Extension type: QueryToTextByXfDtShowKnn. Factor: MinWindowSize by Stream Body. Total weighted factor values multiplied by weight (\\frac{\\Sum W_i * (W_i * F_i)}{\\Sum W_i}) normalized by total weight.
Linguistic Boosting Factor. Extension type: QueryToTextByXfDtShowKnn. Factor: Norm256 by stream Bocm11. Total weighted factor values multiplied by weight (\\frac{\\Sum W_i * (W_i * F_i)}{\\Sum W_i}) by extension top.
Linguistic Boosting Factor. Extension type: QueryToTextByXfDtShowKnn. Minimal extension weight.
Linguistic Boosting Factor. Type of extensions: QueryToTextByXfDtShowKnn. The arithmetic mean of the weights of the extensions.
Linguistic Boosting Factor. Type of extensions: QueryToTextByXfDtShowKnn. Total weight of extensions.
Linguistic Boosting Factor. Extension type: QueryToTextByXfDtShowKnn. Factor: Bag OriginalRequestFraction by Stream FieldSetBagOfWords.
Page Quality Experiment3
#1623Factor for Page Quality 3 experiments
Characterizes the query by the degree of change from adding a fixed word (number of some year), uses dssm model DssmBoostingXfOneSeAmSsHard
Characterizes the query by the degree of change from adding a fixed word ('online' for Cyrillic), uses the dssm model DssmBoostingXfOneSeAmSsHard
Characterizes the query by the degree of change from removing a fixed word ('site' for Cyrillic), uses the dssm model DssmBoostingXfOneSeAmSsHard
Doc Source Fresh
#1627Document from shards with fresh
For each word offline the average HasNoTr value is calculated for the queries for 3 months. Then the maximum of this value is taken for all query words.
The average IsLJ value is calculated for each word in the offline query over 3 months. Then the maximum of this value is taken for all query words.
Removed_1630
#1630The average BclmLite value is calculated for each word in the offline query over 3 months. Then the minimum of this value is taken for all query words.
For each word offline the average DBM40 for the queries for 3 months is calculated. Then for all non-stop query words the maximum of this value is taken.
For each word offline the average value of IsDesktopRequest for queries over 3 months is calculated. Then the maximum of this value is taken for all non-stop query words.
The average RLQAvgHasNoAllWordsTrSyn value is calculated for each word in the offline query over 3 months. Then the maximum of this value is taken for all query words.
The average DssmAggregatedAnnReg value is calculated for each word in the offline query for 3 months. Then the maximum of this value is taken for all query words.
For each word offline the average value of MetaNumUrlsPerHostFixed by queries for 3 months is calculated. Then the maximum of this value is taken for all query words.
For each word offline the average value of MaxSDIsNavMxQueryMax is calculated for the queries for 3 months. Then for all non-stop query words the maximum of this value is taken.
AVG aggregation of VisitsFromWiki web factor using random log
Page Quality Experiment4
#1639Factor for experiments Page Quality 4
PERCENTALE_25 aggregation of NavLinear web factor using random log
PERCENTALE_90 aggregation of Found web factor using random log
AVG aggregation of SubqueryThMatch web factor using random log
Page Quality Experiment5
#1643Factor for experiments Page Quality 5
AVG aggregation of SegmentWordPortionFromMainContent web factor using random log
AVG aggregation of XfDtShowAllMaxFFieldSet2Bm15FLogK0001 web factor using random log
AVG aggregation of QueryRegionSize web factor using random log
Doc From Web Tier1
#1647The document came from WebTier1
AVG aggregation of IsRelevLocaleUA web factor using random log
PERCENTALE_90 aggregation of QfufAllSumWFSumWFieldSet3BclmWeightedFLogW0K0001 web factor using random log
PERCENTALE_90 aggregation of DssmBoostingCtrQuerySelfSimilarity web factor using random log
AVG aggregation of QueryToDocAllSumFCountTextBocm11Norm256 web factor using random log. NOTE: QueryToDocAllSumFCountTextBocm11Norm256 has been removed.
PERCENTALE_90 aggregation of IsNavMxQuery web factor using random log
Doc From Platinum0
#1653The document came from Platinum0
AVG aggregation of DBM15Wares2 web factor using random log
PERCENTALE_90 aggregation of UrlNGramsModel web factor using random log
A neural document model for finding unexpected tin
Medical host quality fresh.
PERCENTALE_25 aggregation of DssmBoostingCtrKMeans1ScoreScaledSumWeightedQE web factor using random log
PERCENTALE_90 aggregation of LongClickMobileAllWcmWeightedValue web factor using random log
PERCENTALE_25 aggregation of DssmVkPopularity web factor using random log
AVG aggregation of UBLongPeriodVisitsSNProb web factor using random log
PERCENTALE_90 aggregation of CountryQueryRegionality web factor using random log
PERCENTALE_90 aggregation of TRhitw web factor using random log
PERCENTALE_90 aggregation of UBLongPeriodAvgSearchDuration600 web factor using random log
AVG aggregation of RequestIsFromIOS web factor using random log
PERCENTALE_90 aggregation of DssmQueryEmbeddingCtrNoMinerPca4 web factor using random log
AVG aggregation of XfDtShowAllMaxFFieldSetUTBm15FLogW0 web factor using random log
PERCENTALE_25 aggregation of UrlTrigrams web factor using random log
PERCENTALE_90 aggregation of DssmQueryEmbeddingCtrNoMinerPca1 web factor using random log
AVG aggregation of IsRelevLocaleKZ web factor using random log
PERCENTALE_90 aggregation of TextFeatures web factor using random log
1 if host include js from marketgid.com
Has Js From Rfity Com
#16731 if host include js from rfity.com
Dssm Google Specificity
#1674DSSM prediction of google specificity for query
Owner Website Attention
#1675Site owner pays attention to site details (at least once in quarter)
Removed1676
#1676Chat Score
#1677Chat info. positive / events or zero
Host Player View Depth
#1678Host player info. Relation between view time and video duration
1 if host include js from google-analytics.com
1 if host include js from googleapis.com
Has Js From Facebook Net
#16811 if host include js from facebook.net
Has Js From Mc Yandex Ru
#16821 if host include js from mc.yandex.ru
Average value of RandomLogQueryAvgAddTime of the closest knn queries.
Average value of RandomLogQueryAvgTxtHiRelSy of the nearest knn queries.
Average value of RandomLogQueryAvgTextLike of the closest knn queries.
Average value of RandomLogQueryAvgIsForum of the queries closest to knn.
Average value of RandomLogQueryAvgHasPayments of the nearest knn queries.
Average value of RandomLogQueryAvgDifferentInternalLinks of the closest knn queries.
Average value of RandomLogQueryAvgIsTargetBussinessCard of the nearest knn queries.
Average value of RandomLogQueryAvgQueryToDocAllSumFCountTextBm11Norm16384 of the nearest knn queries.
Average value of RandomLogQueryAvgXfDtShowAllSumWFSumWBodyMinWindowSize of the nearest knn queries.
Host Speed From Spylog
#1692Host speed estimation
Host Official
#1693Is site official
Removed_1694
#1694Host Cy100log
#1695Quality link from good sites estimation
Weight sum of each non-unique nevasca shingle
Host Nevasca2 Fresh Week
#1697Nevasca shingle quantity in last week
Greentraffic share (aka direct visits). Desktop
Greentraffic share (aka direct visits). Mobile
Greentraffic absolute (desktop)
Host Return Rate Month
#1701Visits averaged by user
Host Biz Kernel
#1702Host Biz Kernel Quantile
#1703Has Video
#17041 if video on page
Stream PCtrNew from yandex video
Stream PCtrNew from yandex video
Stream PCtrNew from yandex video
Stream PCtrNew from yandex video
Stream PCtrNew from yandex video
Stream PCtrNew from yandex video
Has Turbo
#1711The document has a turbo page. It depends on the platform
Medical host quality for metric.
Initial query with verb removal. It is counted by the title of the document. Algorithm of word weights aggregation: Bm15. The normalization coefficient is 0.1.
Initial query with verb removal. It is computed from a compassionate stream consisting of a tokenized url and a document title. Algorithm of word weights aggregation: Bm15FLogW0. The normalization coefficient is 0.0001.
Original query with verb removal. It is counted by the contents of the document. The minimum size of the window in which all query words are included. Normalized by the number of words in the query.
Initial query with verb removal. Calculated using tokenized url. Algorithm of word weights aggregation: Bm15. The normalization coefficient is 0.1.
RMSE aggregation of Long web factor using random log
RMSE aggregation of IsOrg web factor using random log
RMSE aggregation of GskUrlModel web factor using random log
RMSE aggregation of DaterStatsAverageSourceSegment web factor using random log
RMSE aggregation of VisitsFromWiki web factor using random log
RMSE aggregation of XfDtShowBagOfWordsTitleCosineMaxMatch web factor using random log
RMSE aggregation of UBLongPeriodDownloadsProb web factor using random log
RMSE aggregation of MetaAvgIsNotCgi meta factor using random log
RMSE aggregation of MetaRmsSynPercentBadWordPairs meta factor using random log
RMSE aggregation of MetaPosTrigramsProb meta factor using random log
PERCENTALE_90 aggregation of Bocm web factor using random log
PERCENTALE_90 aggregation of SegmentWordPortionFromMainContent web factor using random log
PERCENTALE_90 aggregation of IsMobileBeauty web factor using random log
PERCENTALE_90 aggregation of USLongPeriodUrlWinsProb web factor using random log
PERCENTALE_90 aggregation of DssmBoostingXfWeightKMeans5AvgTop02ScoreQE web factor using random log
PERCENTALE_90 aggregation of DssmBoostingCtrKMeans1Score web factor using random log
PERCENTALE_90 aggregation of SDIsNavMxQueryMax meta factor using random log
PERCENTALE_90 aggregation of MetaWeb764Web1076ProductInvAvg meta factor using random log
PERCENTALE_90 aggregation of MetaWeb1099Web1219ProductInvPos meta factor using random log
PERCENTALE_90 aggregation of MetaMaxDssmMiddleVsShortLongHardNoClicks meta factor using random log
MAX aggregation of NumLinksFromMP web factor using random log
MAX aggregation of NavLinear web factor using random log
MAX aggregation of DaterStatsAverageSourceSegment web factor using random log
MAX aggregation of WeightedSumIsIndexPageIsNavMxQuery web factor using random log
MAX aggregation of QueryToDocAllSumFCountTextBocm11Norm256 web factor using random log. NOTE: QueryToDocAllSumFCountTextBocm11Norm256 has been removed.
MAX aggregation of DssmBigramsQueryDerivativeMax web factor using random log
MAX aggregation of DssmQueryCountryToUrlEstimatedDistance web factor using random log
MAX aggregation of MetaWeb764Web1076ProductInvAvg meta factor using random log
LOGAVG aggregation of TextFeatures web factor using random log
LOGAVG aggregation of DocLen web factor using random log
LOGAVG aggregation of IsHTML web factor using random log
LOGAVG aggregation of HasLevensht1QueryFragment web factor using random log
LOGAVG aggregation of HeadingIdfSumFixed web factor using random log
LOGAVG aggregation of AdvPronounsPortion web factor using random log
LOGAVG aggregation of LongestText web factor using random log
LOGAVG aggregation of CountryHour web factor using random log
LOGAVG aggregation of MetrikaUrlAvgTime web factor using random log
LOGAVG aggregation of WikiLinkCount web factor using random log
LOGAVG aggregation of BrowserUrlDwellTimeRegionFrc web factor using random log
LOGAVG aggregation of WikiInfobox web factor using random log
LOGAVG aggregation of QueryDocTitleRangesMatchingScore web factor using random log
LOGAVG aggregation of IsMobileBeauty web factor using random log
LOGAVG aggregation of QueryToTextAllSumWFSumWBodyMinWindowSize web factor using random log
LOGAVG aggregation of DssmRandomLogQueryAvgDifferentInternalLinks web factor using random log
LOGAVG aggregation of MetaUrlDirectChildrenCnt meta factor using random log
LOGAVG aggregation of MetaWeb1241Web1299ProductInvPos meta factor using random log
LOGAVG aggregation of MetaEpsHashShareNationalLanguage meta factor using random log
Is Https
#1764The document has the https protocol
The Levenshtein distance between the query and the url of the form youtubecom/watch normalized to the maximum of the length of the query and the url
The length of the longest common substring between the url and the query normalized to the query length
The sigmoid normalized value of the porn text query classifier as estimated from Toloka
Binarized value of the porn text query classifier by estimates from Toloka
The [0,1] value of the porn text query classifier as estimated by the web classifier and additional dictionaries
Binarized using fxlists text query classifier porn value by web classifier estimates and additional dictionaries
Dirty Language In Query
#1771The presence of foul language in the query. 0 - absent, 0.5 - not hard, 1 - hard
Porn Markers In Query
#1772Presence of porn markers in the query (0 - yes, 1/3 - no, 1 - query 'gray')
Dssm Panther Terms
#1773Adultness Prod
#1774Document Classification of Pornography, Fiches by Document Text
Adultness Url
#1775Document pornography classifier, document url based features
Nasty Image Value
#1776Document classifier for pornography, document image-based features (information is taken from the Picture Index)
Nasty Video
#1777Document classifier for pornography, chips by video document (information is taken from the Video index)
Nasty Host
#1778Host pornography classifier, chips about pornography queries that were shown and clicked on by the host
Official In Query
#1779The presence of the word official in a lemmatized query
Wiki In Query
#1780Presence of the word wikipedia in a lemmatized query
Not In Query
#1781The presence in a lemmatized query of the word not and similar in meaning
Price In Query
#1782The presence in the lemmatized query of the words buy, price and similar in meaning
The return factor on the host. Percentale aggregation with 0.25f of DwellTimeSumFraction
Doc From Quick Med
#1784The document came from QuickMed
Return Factor per host. Percentale aggregation with a factor of 0.99f of the AverageReturnTime chip
Return Factor per host. Percentale aggregation with a factor of 0.97f of the AverageReturnTime fic
Return Factor per host. GreaterFraction aggregation with 0.99f of fic AverageReturnTime
The return factor on the host. Percentale aggregation with a factor of 0.99f of the AverageLogReturnTime chip
The return factor on the host. GreaterFraction aggregation with 0.9f of AverageLogReturnTime
Returns factor on the host. LessFraction aggregation with 0.05f fic FirstClickDwellTime
Host return factor. WeightedAverage aggregation of AverageVisitsPer3Hours
Medical Host Quality
#1792Medical host quality.
Has Turbo App
#1793The document has a turbo page for the desktop platform. Updates on top of the base are delivered via saas.
Host return factor. WeightedAverage aggregation of AverageDwellTimePerHour feature
The return factor on the host. LessFraction aggregation with 0.1f of fic AverageDwellTimePer3Hours
Host return factor. Max aggregation of AverageDwellTimePerWeek feature
The median dwelltime of the request over the entire history. The dwelltime is truncated to 6000. The query is normalized by doppelgangers
The number of query hits with more than one click in the whole history. The query is normalized by doppelgangers
Share of displays of the query with more than one click from all displays for the whole history. The query is normalized by doppelgangers
Owner aggregation of RandomLogWordMaxMetaNumUrlsPerHostFixed web factor using random log, aggregation type is PERCENTALE_90
Owner aggregation of MetaWeb1099Web1219ProductInvPos meta factor using random log, aggregation type is LOGAVG
Owner aggregation of DssmDwelltimeRegChainTrainedEmbedding meta factor using random log, aggregation type is PERCENTALE_90
Owner aggregation of DssmRandomLogQueryAvgHasPayments web factor using random log, aggregation type is LOGAVG
Owner aggregation of UBLongPeriodBrowseFrc web factor using random log, aggregation type is PERCENTALE_90
Owner aggregation of MetaUrlChildrenCnt meta factor using random log, aggregation type is LOGAVG
Owner aggregation of MetaRmsDifferentInternalLinks meta factor using random log, aggregation type is PERCENTALE_25
Owner aggregation of RandomLogWordMaxHasNoTr web factor using random log, aggregation type is PERCENTALE_90
Owner aggregation of MetaResidUSLongPeriodUrlWinsProb meta factor using random log, aggregation type is RMSE
Owner aggregation of PornoQuery web factor using random log, aggregation type is LOGAVG
Owner aggregation of NationalLanguage web factor using random log, aggregation type is LOGAVG
Owner aggregation of PercentVisibleContent web factor using random log, aggregation type is PERCENTALE_90
Owner aggregation of MetaWeb1241Web1299ProductInvPos meta factor using random log, aggregation type is PERCENTALE_25
Owner aggregation of LinkAnnFloatMultiplicityAttenV1Bm15K001 web factor using random log, aggregation type is LOGAVG
Owner aggregation of UBLongPeriodLeavesCnt web factor using random log, aggregation type is RMSE
Owner aggregation of NumLinksFromMP web factor using random log, aggregation type is LOGAVG
Owner aggregation of DssmRandomLogQueryAvgDifferentInternalLinks web factor using random log, aggregation type is PERCENTALE_25
Owner aggregation of IsOrg web factor using random log, aggregation type is RMSE
Owner aggregation of QSegmentsBM25 web factor using random log, aggregation type is MAX
Owner aggregation of SegmentAuxAlphasInText web factor using random log, aggregation type is RMSE
Owner aggregation of RandomLogQueryDwelltimeWeightedAvgUrlDomainFraction web factor using random log, aggregation type is LOGAVG
Owner aggregation of RandomLogWordSkipStopWordsMaxIsDesktopRequest web factor using random log, aggregation type is LOGAVG
Owner aggregation of VisitsFromWiki web factor using random log, aggregation type is RMSE
Owner aggregation of IsText web factor using random log, aggregation type is RMSE
Owner aggregation of DBMSubstantive web factor using random log, aggregation type is MAX
Owner aggregation of DaterStatsAverageSourceSegment web factor using random log, aggregation type is RMSE
Owner aggregation of IsMobileBeauty web factor using random log, aggregation type is LOGAVG
Owner aggregation of LongClickSPMixMatchWeightedValue web factor using random log, aggregation type is PERCENTALE_90
Owner aggregation of FemAndMasNounsPortion web factor using random log, aggregation type is LOGAVG
Owner aggregation of TrigramsProb web factor using random log, aggregation type is PERCENTALE_90
Owner aggregation of DaterStatsYearNormLikelihood web factor using random log, aggregation type is PERCENTALE_25
Owner aggregation of UrlPathAndParamsFraction web factor using random log, aggregation type is MAX
Query To Text All Avg
#1832The average value for the query factor according to QueryToText linguobusting, calculated in the LingBoostQueryFeatures behemoth rule
The average value for the query factor according to QueryToTextByXfDtShowKnn lingvobusting, calculated in the LingBoostQueryFeatures behemoth rule
Xf Dt Show All Total W
#1834sum / (sum + 10) for the query factor according to XfDtShow lingvobusting, calculated in the LingBoostQueryFeatures behemoth rule
Xf Dt Show Quantile01
#1835Quantile 0.1 for query factor according to XfDtShow lingvobusting, calculated in behemoth rule LingBoostQueryFeatures
Quantile 0.1 for query factor according to XfDtShowKnn lingvobusting, calculated in behemoth rule LingBoostQueryFeatures
Quantile 0.9 for query factor according to XfDtShowKnn lingvobusting, calculated in LingBoostQueryFeatures behemoth rule
Qfuf All Total Weight
#1838sum / (sum + 10) for the query factor according to Qfuf lingvobusting, calculated in the LingBoostQueryFeatures behemoth rule
Qfuf All Avg
#1839The average value for the query factor according to Qfuf lingvobusting, calculated in the LingBoostQueryFeatures behemoth rule
Is Tas Ix
#1840The site is located in the Tas-IX network (relevant to Uzbekistan)
Dssm Boosting Score for SerpSimilarityHard model over 1-means centroids.
Page Quality Host
#1842Page quality aggregated by host (avg).
Is Relev Locale U Z
#1843relev_local == uz
25% quantile of time from the previous query before the current query. The query is normalized by doppelgangers
The result of applying a neural model trained to distinguish long clicks from other events, the input of the model are word and bigram counters, calculated from text streamlines (Title, Body, Url).
Is Mobile Beauty Host
#1846Is this host adapted for mobile devices
Linguistic Boosting Factor. Extension type: QfufFilteredByXfOneSe (qfuf, filtered by dssm-model XfOneSe). Aggregation over all extensions. Highest factor value. Weighted stream aggregation of Url, Title, Body, CorrectedCtr, LongClick, OneClick, BrowserPageRank, SplitDwellTime, SamplePeriodDayFrc, SimpleClick, YabarVisits, YabarTime. Word weights aggregation algorithm: Bm15FLog (Bm15 aggregation of word occurrence logarithms). The normalization coefficient is 0.001.
Linguistic Boosting Factor. Extension type: QfufFilteredByXfOneSe (qfuf, filtered by dssm-model XfOneSe). Aggregation over all extensions. Highest factor value. Weighted aggregation of Title, Body, LongClick, LongClickSP, OneClick streamlines. Algorithm of word weights aggregation: BclmWeightedFLogW0. Normalization coefficient 0.001.
The linguistic boosting factor. Extension type: QfufFilteredByXfOneSe (qfuf, filtered by dssm-model XfOneSe). Aggregation over all extensions. Highest factor value. Counted by a compassionate stream consisting of a tokenized url and a document title. Algorithm of word weights aggregation: Bm15FLogW0. The normalization coefficient is 0.0001.
Linguistic Boosting Factor. Extension type: QfufFilteredByXfOneSe (qfuf, filtered by dssm-model XfOneSe). Aggregation over all extensions. Highest factor value. Counted by document title. Algorithm of word weights aggregation: Bm15. Normalization coefficient 0.1.
The linguistic boosting factor. Type of extensions: QfufFilteredByXfOneSe (qfuf, filtered by dssm-model XfOneSe). Aggregation by top-10 (by factor value) extensions. Weighted sum of factor weights. Normalized by total weight of extensions. Weighted stream aggregation of Url, Title, Body, CorrectedCtr, LongClick, OneClick, BrowserPageRank, SplitDwellTime, SamplePeriodDayFrc, SimpleClick, YabarVisits, YabarTime. Word weights aggregation algorithm: Bm15FLog (Bm15 aggregation of word occurrence logarithms). The normalization coefficient is 0.001.
The linguistic boosting factor. Type of extensions: QfufFilteredByXfOneSe (qfuf, filtered by dssm-model XfOneSe). Aggregation by top-10 (by factor value) extensions. Weighted sum of factor weights. Normalized by total weight of extensions. Calculated by document content. Minimum window size that includes all query words. Normalized by the number of words in the query.
Factor on filtered original query: dssm-distance from query without words to original query is calculated, then cutoff by threshold. Weighted stream aggregation Url,Title,Body,Links,CorrectedCtr,LongClick,OneClick,BrowserPageRank,SplitDwellTime,SamplePeriodDayFrc,SimpleClick,YabarVisits,YabarTime. Word weights aggregation algorithm: Bm15FLog (Bm15 aggregation of word occurrence logarithms). The normalization coefficient is 0.001.
Factor on filtered original query: the dssm-distance from query without words to the original query is calculated, followed by a threshold cutoff. It is computed by compassed stream, consisting of tokenized url and document header. Word weight aggregation algorithm: Bm15FLogW0. The normalization coefficient is 0.0001.
Dssm Ctr Eng Ss Hard
#1855DSSM model trained on cross language CTRs using serp similarity hard miner.
Removed_1856
#1856For all words of the query weights are calculated by the query-mutation method (distance between queries in the presence and absence of a word). We take the sum of weights of words found in the title, divided by the sum of weights of all words.
For all query words, the weight is calculated using the query-mutation method (the distance between queries if a word is present or absent). The maximum weight among the words missing in the document title is taken.
Neuro Text Model Long Click Predictor By Word And Bigram Counters Without Title With S S Hards
#1859The result of applying a neural model trained to distinguish long clicks from other events, the input of the model are word and bigram counters, calculated from text streamlines (Body, Url).
Removed_1860
#1860Dater Add Time80 Hours
#1861Calculated as (80-x) where x is the document's age in hours (continuous). Uses data from the RobotAddTime dater
Dater Add Time10 Days
#1862Calculated as (10-x) where x is the document's age in days (continuous). Uses data from the RobotAddTime dater
Dater Age10 Days
#1863The difference between the current date and the date of the document defined by RobotAddTime, 1 - the date is equal to the current date, 0 - the document is 10 days or more, or the date is not defined
Linguistic Boosting Factor. Type of extensions: XfOneSeKnn (closest by dssm-model, trained to predict XfDtShow extensions). Aggregation over all extensions. Highest weighted factor value. Normalized to the maximum weight of the extension. Weighted aggregation of stream Url,Title,Body,Links,CorrectedCtr,LongClick,OneClick,BrowserPageRank,SplitDwellTime,SamplePeriodDayFrc,SimpleClick,YabarVisits,YabarTime. Word weights aggregation algorithm: Bm15FLog (Bm15 aggregation of word occurrence logarithms). The normalization coefficient is 0.001.
Linguistic Boosting Factor. Type of extensions: XfOneSeKnn (closest by dssm-model, trained to predict XfDtShow extensions). Aggregation over all extensions. Highest weighted factor value. Normalized to the maximum weight of the extension. TODO Algorithm: Maximum weight of the fully matched query annotation. Calculated by OneClick stream.
Linguistic Boosting Factor. Type of extensions: QueryToTextByXfOneSeKnn (QueryToText extensions XfOneSeKnn). Aggregation by top-10 (by factor value) extensions. Weighted sum of factor weights. Normalized by total weight of extensions. Calculated by document content. Minimum window size that includes all query words. Normalized by the number of words in the query.
Linguistic Boosting Factor. Extension type: QueryToTextByXfOneSeKnn (QueryToText extensions XfOneSeKnn). Aggregation over all extensions. Weighted sum of factor weights. Normalized by total weight of extensions. Weighted aggregation of Title, Body, LongClick, LongClickSP, OneClick strips. Word weights aggregation algorithm: BclmWeightedFLogW0. Normalization coefficient 0.001.
Is International Domain
#1868Domain in the international zone
Is Memorandum Query
#1869The request was recognized as having an interest in copyrighted works protected by the Anti-Piracy Memorandum.
Host Video Stevenson
#1870The host contains pirated videos protected by the Anti-Piracy Memorandum.
Host Video Distributor
#1871host contains videos protected by the Anti-Piracy Memorandum.
Average host freshness over 30 days
Proportion of documents with positive freshness surplus from the host in 30 days
Host Stevenson Binary
#1874Stevenson
Stevenson
Stevenson
Stevenson
Host Stevenson Weight
#1878Stevenson
Video Intendance Predict
#1879The renormalized predicate ethos classifier by markup on the relevance of the video.
Piracy Predict
#1880Renormalized ethos predictor of the classifier trained on the synthetic sample 'query is typical for a pirate site' vs 'query is typical for a site far from it'
F R E E_ S L O T_1881
#1881there has never been a non-zero feature in this slot
Stevenson Dssm Predictor
#1882Regression on dssm embeddings to separate memorandum and non-memorandum requests
Memorandum Predict
#1883A renormalized ethos predicate of a classifier trained to distinguish memorandum queries from random
Piracy Predict Dssm
#1884Regression on dssmembeds to separate pirate-specific and non-pirate-specific queries
DSSM model, which predicts the logarithm of the longest click on the serpent. As negative examples, we choose urls from past queries of the same user, with a maximum time between queries of no more than 7 minutes (superhards on reformulations)
Doc From Quick
#1886The document came from Quick but not from QuickRt
Doc From Quick Rt
#1887The document came from QuickRt
Doc From Callisto
#1888The document came from Callisto
Legal Players
#1889Feature LegalPlayers from VideoIndex
Social Networks Players
#1890Feature SocialNetworksPlayers from VideoIndex
Stevenson Players
#1891Feature StevensonPlayers from VideoIndex
DSSM model with early binding, trained on reformulations, which predicts the logarithm of the longest click on the serpent.
Has News Agency Rating
#1893Rating of news agency from agencies.json > 0 (Yandex.News resource)
Weekday query probability
Indicator of the quality of the site in terms of factors about user behavior, aggregated to the owners.
Hit Contexts Dssm
#1896Neural network value for contexts of query hits in document text. Predicts relevance-all-8-years. Uses formula ussr-dump-20190719 prs-20190720 all-8-years [t > 0.25] CrossEntropy 20k 0.25 -S 0.8 -Z 1 predictions for learning.
Antispam Ban
#1897Bans of Antispam from erf
DSSM model trained on the reformulation pool, which in the query part besides the query itself receives 4 XfDt extensions with the highest weight
Aggregated by the closest on the host LogAvg-statistics of the IsMobileRequest factor
LogAvg-statistics of the NanobtaniumQueryWordTitle5nDist2maxXMax factor aggregated by the closest urls on the host
Antispam Ban Gsm
#1901Bans on gsm of Antispam from erf
Antispam Ban Fresh
#1902Bans on fresh of Antispam from erf
The average IsBlog by query for the year. Calculated offline.
Has Turbo Mobile
#1904The document has a turbo page for the mobile platform. Updates on top of the base are delivered via saas.
Has Turbo Desktop
#1905The document has a turbo page for the desktop platform. Updates on top of the base are delivered via saas.
Model trained on prediction estimate formula ussr-dump-20190719 prs-20190720 all-8-years [t > 0.25] CrossEntropy 20k 0.25 -S 0.8 -Z 1.
Removed_1907
#1907Random Commercial
#1908The 'random' factor for commercial sites.
Neural document model for finding unexpected tin (for exps)
Features calculated on url with request multitokens expansion
Features calculated on url with request multitokens expansion
Model trained on prediction estimates by formula ussr-dump-20190719 prs-20190720 all-8-years [t > 0.25] CrossEntropy 20k 0.25 -S 0.8 -Z 1 and pre-trained on relevance estimates.
Queries Ratio Morda2
#1913The share of queries that showed the owner's face among all queries that showed the owner in the last week.
Percentage of visits from the document sickle that are at 0 hops. Over 30 days.
Queries Avg Top
#1915The average position of the owner on the queries for the last week.
The ratio of mobile to desktop by search engine traffic.
Mobile-to-desktop ratio for all outbound traffic.
Avg Is Org
#1918The average value of the query factor isorg for queries with the given owner for the last week.
The average ratio of punctuation to all separators in the owner's documents.
Fresh Detector Predict
#1920The value of the freshness detector calculated in behemoth. Always 0 when the detector value is less than the threshold.
host contains videos protected by the Anti-Piracy Memorandum.
Host Memorandum Weight
#1922Stevenson