What started with a convo with Dave Davies and then taken further when Bill Slawski and I were chatting, is apparently now a thing. LOL. Most of the SEO world doesn’t really have an interest in information retrieval, never mind (Google) search patent awards (unless they can glean something to manipulate). But hey… let’s do this.. (be sure to read Bill’s take on the year)
Below are some of the more interesting, (to me) Google search patent awards from 2019. Bear in mind this is just the ones that I personally found interesting… from my collection of 115 that I collected over the year.
Top 10 Google Patent Awards
Let’s get into it.. I’ll give my own thoughts, the Abstract and a few lines of interest in case you’d like to dig in deeper.
Identifying a primary version of a document
I found this one particularly interesting as it’s long been a talking point in the industry as to how Google does attribution and quite often, how they get it wrong. And really, I found it a bit notable because you can sort of see how they might actually get it wrong.
Abstract;
“A system and method identifies a primary version out of different versions of the same document. The system selects a priority of authority for each document version based on a priority rule and information associated with the document version, and selects a primary version based on the priority of authority and information associated with the document version.”
Interesting;
“… it is typical that a particular document or portion thereof, appears in a number of different versions or forms in various online repositories.”
“search results including different versions of the same document may crowd out diverse contents that should be included.”
“For each document version, a priority of authority is selected based on the metadata information associated with the document version, such as the source, exclusive right to publish, licensing right, citation information, keywords, page rank, and the like. The document versions are then determined for length qualification using a length measure. The version with a high priority of authority and a qualified length is deemed the primary version of the document. If none of the document versions has both a high priority and a qualified length, then the primary version is selected based on the totality of information associated with each document version.”
Resource scoring adjustment based on entity selections
This one was a good read not only because of the systems they discuss in it, but also because of the mentions of “boosting/dampening” and even an “authority score”. We (SEO Dojo group) were discussing EAT and if there’s any real documentation around expertise/authority/trust out there. The mentions of the authority score, did catch my attention.
Abstract;
“Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, are provided for resource scoring adjustment based on entity selection. In one aspect, a method includes the actions of accessing resource data that specifies, for each of a plurality of resources, a resource identifier and one or more referenced entities, and accessing search term data that specifies a plurality of search terms, and for each search term, a selection value for each resource, each selection value being based on user selections of search results that referenced the resource to which the selection value corresponds. From the resource data and search term data, for each search term and each entity, a search term-entity selection value is determined that is based on the selection values of resources that reference the entity and that were referenced by search results in response to a query that included the search term.”
Interesting;
“Search engines score resources based on a variety of factors. Examples of such factors include an information retrieval score and an authority score. The information retrieval score is a measure of relevance of a query to resource content, and the authority score is a measure of importance of a resource relative to other resources.”
“Resources that tend to satisfy users’ informational needs are typically selected more frequently than resources that do not tend to satisfy users’ informational needs for certain queries. Search and selection data are stored by the search engine, and thus the search engine can determine, for certain queries, which resources tend to better satisfy users’ informational needs. Based on this information, the search scores can be adjusted so that better performing resources receive a scoring “boost.””
“…a search term-entity selection value that is based on the selection values of resources that reference the entity and that were referenced by search results in response to a query that included the search term; and storing, by the data processing apparatus, the search term-entity selection values in a data storage.”
“Resources that do not have sufficient search and selection data from which a reliable performance inference can be drawn can still be boosted or demoted with a high degree of accuracy based on the search term-entity selection values. This also increases coverage of behavioral based scoring adjustments to resources for which very little behavioral data is available.”‘
Question answering using entity references in unstructured data
Obviously, a search engine is, to a large degree, about answering questions. For this one I not only enjoyed seeing how they might approach this, but also the use of entities, something that’s become more and more part of the system over the last few years. I also enjoy when they have some real-world examples, it makes it a bit easier for peeps to understand.
Abstract;
“Methods, systems, and computer-readable media are provided for collective reconciliation. In some implementations, a query is received, wherein the query is associated at least in part with a type of entity. One or more search results are generated based at least in part on the query. Previously generated data is retrieved associated with at least one search result of the one or more of search results, the data comprising one or more entity references in the at least one search result corresponding to the type of entity. The one or more entity references are ranked, and an entity result is selected from the one or more entity references based at least in part on the ranking. An answer to the query is provided based at least in part on the entity result.”
Interesting;
“In some implementations, a system receives a natural language query such as a “who” question. For example, “Who is the President?” or “Who was the first person to climb Mt. Everest?” In some implementations, the system retrieves a number of search results, for example, a list of references to webpages on the Internet. In some implementations, the system retrieves additional, preprocessed information associated each respective webpage of at least some of the search results. In some implementations, the additional information includes, for example, names of people that appear in the webpages. In an example, in order to answer a “who” question, the system compiles names appearing in the first ten search results, as identified in the additional information. The system identifies the most commonly appearing name as the answer, and returns that answer to the user. It will be understood that in some implementations, the system answers questions other than “who” questions using the above-described technique, such as “what” or “where” questions.”
“For example, the entity reference [George Washington] may have a higher topicality score on a history webpage than on a current news webpage. In another example, the entity reference [Barak Obama] may have a higher topicality score on a politics website than on a law school website.”
System and method for confirming authorship of documents
We’re all fairly in tune with the value of authorship as a form of citation when it comes to various known EAT concepts. We’re also familiar with entities. In this patent I enjoyed delving into multiple angles of how these types of signals can play into the Google search ecosystem.
Abstract;
“A system, computer-readable storage medium storing at least one program, and a computer-implemented method for confirming authorship of documents is presented. A document hosted on a website of a domain is accessed, where the document includes an authorship identifier asserting authorship of the document by an entity. Authorship of the document by the entity is conditionally confirmed when a profile for the entity is associated with the authorship identifier and when the profile for the entity indicates that the entity has confirmed that the authorship identifier is included in documents authored by the entity that are hosted on the first website of the first domain. Responsive to confirming authorship of the document by the entity, application of a confirmed authorship process to the document is permitted. Responsive to failing to confirm authorship of the document by the entity, application of the confirmed authorship process to the document is barred.”
Interesting;
“For example, an article may include a byline listing entities that authored the article. However, the authorship information may not be accurate or may not be valid. For example, an article may include authorship information asserting that a particular entity authored the article when in fact the article was not authored by the particular entity”
“In some embodiments, the authorship identifier includes an email link to an email address for a purported author of or contributor to the first document, where the email link includes a predefined authorship attribute (e.g., the authorship attribute 220). For example, the email link may be <a href=”mailto:johndoe@example.com” rel=”author”>Email me</a>. In these embodiments, prior to conditionally confirming the authorship of the first document, the authorship confirmation module 618 requests that the entity confirm that the email address associated with the profile for the entity is the email address that the entity includes in documents authored by the entity”
Interpreting user queries based on nearby locations
This is definitely a fun one. They discuss various situations including an example of looking at a monument in a foreign land and asking “what is that?”. The system would use geo-location to better understand what you might be looking at. There’s other implementations, but it’s great how far search has moved over the years. From the desktop, to the real world.
Abstract;
“Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving a query provided from a user device, and determining that the query is implicitly about some entity, and in response: obtaining an approximate location of the user device when the user device provided the query, obtaining a set of entities including one or more entities, each entity in the set of entities being associated with the approximate location, and determining that the query is implicitly about an entity in the set of entities, and in response: providing a revised query based on the query and the entity, the revised query explicitly referencing the entity.”
Interesting;
“implementations of the present disclosure are directed to identifying a set of entities based on an approximate location of a user device that submits a query, and rewriting the query to explicitly reference an entity of the set of entities, which the query is determined to implicitly reference.”
“obtaining an approximate location of the user device when the user device provided the query, obtaining a set of entities including one or more entities, each entity in the set of entities being associated with the approximate location, and determining that the query is implicitly about an entity in the set of entities, and in response: providing a revised query based on the query and the entity, the revised query explicitly referencing the entity.”
“For example, the user can be standing near a monument and can submit the query [what is this monument], without having to first determine the name of the monument. In some examples, the user does not need to know how to properly pronounce and/or spell the name of the entity. For example, a user that does not speak German can be on vacation in Zurich, Switzerland and can submit the query [opening hours], while standing near a restaurant called “Zeughauskeller,” which may be difficult to pronounce and/or spell for the user.”
Systems and methods for modifying search results based on a user’s history
While most that know me understand my stance on implicit user feeback (as a direct ranking mechanism) I do actually like seeing some of it used in an instance such as this. From the old days of personalization, to patents such as this one that discuss past sessions, it’s certainly something that we’ve seen a lot of over the years. A worthy addition to add some diversity to the list.
Abstract;
“A user’s prior searching and browsing activities are recorded for subsequent use. A user may examine the user’s prior searching and browsing activities in a number of different ways, including indications of the user’s prior activities related to advertisements. A set of search results may be modified in accordance with the user’s historical activities. The user’s activities may be examined to identify a set of preferred locations. The user’s set of activities may be shared with one or more other users. The set of preferred locations presented to the user may be enhanced to include the preferred locations of one or more other users. A user’s browsing activities may be monitored from one or more different client devices or client application. A user’s browsing volume may be graphically displayed.”
Interesting;
“Over time, a user will have executed a history of search queries, results which were examined, advertisements that were clicked on, and other various browsing activities which reflect the user’s preferences and interests. Oftentimes a user may be interested in examining the user’s such prior activities. It would be desirable to permit the user to use the prior activities to enhance the user’s searching and browsing experience.”
“At least one of the search results is identified as having been returned to a previous search requester in response to a previous search query. The associated search result ranking value of the identified search result is modified and the obtained search results are ordered in accordance with the modified search result ranking value.”
“In some embodiments, the activities may be one or more of various types of user activity, including, but not limited to) submitting search queries to a search engine, selecting (e.g., by clicking on) results returned from the search engine, selecting various advertisements returned with the results from the search engine, selecting other informational items presented on a search results page, browsing various web pages or locations, clicking through on advertisements on the browsed pages, reviewing product reviews and other user browsing activities monitored via a number of different ways, or other activities associated with various client applications such as (but not limited to) instant messaging, chat rooms participation, email management, document creation and editing, or any generalized file activity (such activities collectively referred to as “prior activities”). According to some embodiments, the collected history is used to create one or more derived pieces of information.”
Predictive information retrieval
Predictive stuff has been on the rise lately as far as the number of awards I’ve seen the last few years. Beyond that I just find this kinda’ stuff quite interesting. It’s interesting how Google tries to think ahead of the user and where the session is going. This type of system is important also for Google’s conversational search desires.
Abstract;
“A computer-implemented method for generating results for a client-requested query involves receiving a query produced by a client communication device, generating a result for the query in response to reception of the query, determining one or more predictive follow-up requests before receiving an actual follow-up request from the client device, and initiating retrieval of information associated with the one or more predictive follow-up requests, and transmitting at least part of the result to the client device, and then transmitting to the client device at least part of the information associated with the one or more predictive follow-up requests.”
Interesting;
“…to provide predictive information so that the information can be downloaded to a user’s device before the user makes an explicit request for the information. As a result, the user will not need to wait for the download to occur once they make an explicit request for the information.”
“The method comprises receiving a query produced by a client communication device, generating a result for the query in response to reception of the query, determining one or more predictive follow-up requests before receiving an actual follow-up request from the client device, and initiating retrieval of information associated with the one or more predictive follow-up requests.”
“In another embodiment, advertising matches for the predictive follow-up requests may be identified before an actual follow-up request from the client device is received.”
“the predictive follow-up requests may comprise information at locations associated with the hyperlink information, and the predictive follow-up requests may comprise requests for a plurality of search types, such as web search, news search, shopping search, image search, blog search, or news group search. In addition, an advertising selector may be provided to identify advertising content associated with the predictive follow-up requests before a follow-up request is made.”
Systems and methods for improving the ranking of news articles
This patent was a bit of a personal-level one for me. I have done a fair bit of work with some larger media websites and this re-release was a must read when it popped up on my radar. To be honest I haven’t had the time to look at the original to see what changed, I’ll do that in the new year. Anyway, for anyone using Google news (or even Discover), this is a good read.
Abstract;
“A system ranks results. The system may receive a list of links. The system may identify a source with which each of the links is associated and rank the list of links based at least in part on a quality of the identified sources. “
Interesting;
“one or more of a number of articles produced by the first source during a first time period, an average length of an article produced by the first source, an amount of important coverage that the first source produces in a second time period, a breaking news score, network traffic to the first source, a human opinion of the first source, circulation statistics of the first source, a size of a staff associated with the first source, a number of bureaus associated with the first source, a breadth of coverage by the first source, a number of different countries from which traffic to the first source originates, and a writing style used by the first source.”
“While each of the hits in the ranked list may relate to the desired topic, the news sources associated with these hits, however, may not be of uniform quality. For example, CNN and BBC are widely regarded as high quality sources of accuracy of reporting, professionalism in writing, etc., while local news sources, such as hometown news sources, may be of lower quality.
Therefore, there exists a need for systems and methods for improving the ranking of news articles based on the quality of the news source with which the articles are associated.”“The method includes receiving a list of links, identifying, for each of the links, a source with which the link is associated, and ranking the list of links based at least in part on a quality of the identified sources.”
“a method for determining a quality of a news source is provided. The method may include determining one or more metric values for the news source based at least in part on at least one of a number of articles produced by the news source during a first time period, an average length of an article produced by the news source, an amount of important coverage that the news source produces in a second time period, a breaking news score, an amount of network traffic to the news source, a human opinion of the news source, circulation statistics of the news source, a size of a staff associated with the news source, a number of bureaus associated with the news source, a number of original named entities in a group of articles associated with the news source, a breadth of coverage by the news source, a number of different countries from which network traffic to the news source originates, and the writing style used by the news source. The method may further include calculating a quality value for the news source based at least in part on the determined one or more metric values.”
Query suggestions based on entity collections of one or more past queries
Over the years there’s been more than a few folks that tried to reverse engineer Google Suggest to figure out how the larger set of algorithms are doing things. While I ain’t a fan of that approach, I do find predictive/suggest systems interesting. Add in some entities and it’s one that was a must read for me this year.
Abstract;
“Methods and apparatus for providing query suggestions to a user based on one or more past queries submitted by the user. Candidate query suggestions responsive to a current query may be identified. A candidate query similarity measure may be determined for a given candidate query suggestion based on matching entities related to the given candidate query suggestion and the one or more past queries. In some implementations, the similarity measure of the given candidate query suggestion may be based on a comparison of current entities of the given candidate query suggestion that match entities of one or more past queries, to a group of the current entities that includes entities that do not match the entities of one or more past queries. In some implementations a ranking of the candidate query suggestions may be determined based on the similarity measure.”
Interesting;
“When entering a query, a user may desire to have one or more suggestions provided that are based on the entered query. For example, a user may enter a partial query and desire query suggestions to be provided that are based on the partial query.”
“For example, a user may have issued a past query of “perseus”. The user may be interested in learning more about Greek mythology and may start typing a current query such as “an” after issuing the past query of “persues”. “andromeda” may not by default be provided as a query suggestion in response to the partial query “an” due to, for example, a low ranking associated with the query suggestion “andromeda” based on traditional or other query suggestion ranking techniques. Based on techniques described herein, one or more entity collections associated with the query suggestion “andromeda” may be identified as past entity collections that are also associated with the past query “perseus” such as, for example, the entity collections of “art subjects”, “fictional characters”, “film characters”, “constellations”, etc. Based on identification of the entity collections that are associated with the query suggestion “andromeda” and that are also associated with the past query “perseus”, the query suggestion “andromeda” may be promoted as a query suggestion for the partial query “an”. For example, the ranking associated with the query suggestion “andromeda” may be promoted so that “andromeda” is more likely to be provided as a query suggestion.”
“The method may further include: identifying rankings of each of the current entity collections for the given candidate query suggestion; wherein the comparison of the current entity collections that match the past entity collections to the current entity collections of the group includes: comparing the rankings of the current entity collections that match the past entity collections to the rankings of the current entity collections of the group. Comparing the rankings of the current entity collections that match the past entity collections to the rankings of the current entity collections of the group may include: comparing the sum of the rankings of the current entity collections that match the past entity collections to the sum of the rankings of the entity collections of the group.”
Deriving and using interaction profiles
What can I say… maybe; I told you so? Seriously though, I’ve long debated the use of implicit user feedback within Google (CTR, dwell time, scrolling etc). To me it’s more what I’ve been told by Googler over the years; as a tool for grading algos and search quality. This patent delves into that very topic. Get it into ya.
Abstract;
“Systems and methods for deriving and using an interaction profile are described. In one described method, a plurality of metrics indicating a level of satisfaction for search results is determined. The metrics comprise at least one of click-duration data, multiple-click data, and query-refinement data. The values of the metrics for a plurality of instances of an object, such as search results from a search engine, are determined. An interaction profile for the object, based at least in part on the values of the metrics for a plurality of instances of the first object, is then determined. This interaction profile may be used in a variety of ways, such as determining the quality of ranking algorithms and detecting undesirable search results.”
Interesting;
“…a first variation of a scoring algorithm for scoring electronic documents responsive to search queries in the class with a second variation of the scoring algorithm for scoring the electronic documents responsive to search queries in the class,”
“The present invention relates generally to systems and methods for data analysis. The present invention relates particularly to systems and methods for deriving and using an interaction profile, such as a click profile.”
“When a user performs a search on a commercial search engine and then clicks on the results, the commercial search engine may gather information about which results were presented to the user and about the particular results the user clicked. The commercial search engine operators may then use this information to evaluate the quality of the search, to improve the search, and to perform machine learning to improve the quality of the search results.”
BOUNUS ROUND
Again we’re seeing some personalization from predictive elements based on past sessions. Adding to that some categorization and it does further cement some concepts that a lot of SEOs might not be aware of. It’s amazing when folks talk about “testing” when they don’t really get that these kind of systems can REALLY mess with the results. LOL.
Abstract;
“A system and method are disclosed for categorizing search terms. The system accesses search history for the search terms. The system also categorizes each of the search terms based on the number of times that the respective search term appears in the search history. If the number of times the search term appears in the search history exceeds a first threshold, a search result of the search term is determined and the search term is categorized as a type that is provided for registration to a user with recognized association with the search term or a type that is excluded from registration, where the categorizing based on a ratio of a number of times the search result was selected subsequent to receiving the search term to the number of times that the search term appears in the search history.”
Interesting;
“The method further includes categorizing each of the one or more search terms based on the determined number of times the respective search term appears in the search history. For each of the one or more search terms, the categorizing includes in a case where the number of times that the search term appears in the search history exceeds a first threshold value, identifying, from the search history, a search result of the search term, and categorizing, the search term as a type that is provided for registration to a user with recognized association with the search term or a type that is excluded from registration, wherein the categorizing is based on a ratio of a number of times the search result was selected subsequent to receiving the search term to the number of times that the search term appears in the search history.”
“The search term may be categorized as a type that is provided for registration to a user with recognized association with the search term if the ratio of the number of times the search result was selected subsequent to receiving the search term to the number of times that the search term appears in the search history exceeds a third threshold value. The search term may be categorized as a type that is excluded from registration if the ratio of the number of times the search result was selected subsequent to receiving the search term to the number of times that the search term appears in the search history does not exceed the third threshold value. The search result may include a character string for a uniform resource locator, where the character string includes the search term.”