Find Similar Advanced Keyword Analysis Tool (Multiple Documents)
The Find Similar Advanced keyword analysis tools differ from the Keyword Analyzer in several ways:
- Find Similar Advanced uses a powerful algorithm called TF/IDF (Term Frequency/Inverse Document Frequency) to find "important" terms in the source document.
- Find Similar Advanced defaults to a Boolean OR operator compared to the Boolean AND in the Keyword Tool.
- Terms are weighted when using Find Similar Advanced keyword analysis tools.
An "important" term is defined as a term that appears in the patent with a high frequency but does not appear very often in the entire patent corpus. As a result, TF/IDF doesn't need stop terms. Terms like the, and, but, with, etc. appear so often in the corpus that they are never identified as important terms. Even patentese terms such as method, apparatus, or system appear so frequently in patent data that the algorithm never deems them important.
Because of the weighting, and the nature of the Boolean OR operator, this analysis tool does a much better job of creating inclusive lists. This does mean that the lists can be very large. However, the most relevant patents will be displayed at the top of the search results list.
Term weighting boosts a term's relevance in the search results and increases the overall precision of your query. Patents with higher weighted terms appear higher in the search results. Weighting terms does NOT impact the recall, which means the same number of documents will be returned by the search engine no matter what weighting you use. But the order in which they appear will be affected.
The Find Similar Advanced Keyword Analysis tool for multiple documents is operates in much the same way as the single document version. The main difference is where you find the tool (and of course that you can do up to 10 documents at a time, as opposed to just a single document).
To access the Find Similar analysis tools (for a multiple documents) first run a search, then from the Search Results, select up to 10 documents and go to Selected-->Find Similar.
From here, you can choose either Find Similar (Limit 10), which runs the search automatically, or Find Similar Advanced (Limit 10), which allows you to modify the search in the same way as the single patent version.
Note that the document numbers are all included in the bar at the top of the Find Similar Advanced window. The rest of the instructions are exactly the same as the instructions for the Single Patent version.
The Find Similar Advanced window will ultimately help you create a sophisticated query that looks something like this:
(TAC:(battery OR tray^0.77 OR conversion^0.76 OR electric^0.76 OR vehicle^0.74 OR motor^0.74 OR batteries^0.74 OR slider^0.71 OR chassis^0.71 OR enclosures^0.61 OR floating^0.61 OR charging^0.6 OR 25.1^0.6 OR cabinet^0.59 OR swapping^0.59 OR track^0.58 OR 102139622^0.58 OR cabin^0.58 OR linear^0.57 OR double^0.57 OR sliding^0.56 OR pack^0.56 OR synchronous^0.56 OR guide^0.55 OR exchange^0.55 OR compartment^0.55 OR moving^0.54 OR station^0.54 OR socket^0.54 OR frontal^0.54 OR securely^0.53 OR belt^0.53 OR drive^0.53 OR laterally^0.53 OR subframe^0.53)) AND APD:([NOW-10YEARS TO NOW])
This query could take an hour or more to build manually, but you can do it here in seconds!
The Find Similar Advanced window shown here reflects the same query but in a form that is much easier to interpret and modify by the user.
The default window has the following presets:
- The Boolean OR operator is used by default in the Operator column (1).
- Keywords are listed in the Term column (2) in descending order of importance.
- "Importance," which becomes the weighting or "term boost," is listed in the Weight column (3).
- The Source is, by default, TAC (Title, Abstract, and Claims), but you can change it. Here I changed it to the Specification.
The query that you generate is completely modifiable:
- To delete a term, click the trash icon in the right most column.
- To require a term, (i.e., switch to a Boolean AND) double-click the word "Favored (OR)" and switch it to the "Required (AND)" operator choice.
- To exclude a term, (i.e., switch to a Boolean NOT) double-click the word "Favored (OR)" and switch it to the "Excluded (NOT)" operator choice.
- To add synonyms, double-click the term and enter synonyms for the term displayed, joined by another Boolean OR operator (you can add the Boolean AND as well, but it will then make them both required).
- To add terms not found in the patent, click Add Term in the toolbar. You can then add a term and give it an operator and a weight.
The original suggested query was manually modified in the figure above. Each modification causes a small red triangle to appear in the cell to let you know it was modified.
The query generated in the example above will look like this:
(SPEC:atomizer^100 AND SPEC:((atomization AND bottle)^97 OR nicotine^97 OR cigarette^95 OR mm-1.3^94 OR smoking^92 OR mouthpiece^92 OR bottle^90 OR ripple^89 OR reed^89 OR smoker^88 OR colpitts^87 OR separator^86 OR piezoelectric^86 OR supplying^85 OR ejection^83 OR cavity^83 OR postponed^81 OR vapor^80 OR substance^80 OR smoker's^79 OR shell^79 OR aerosol^79 OR substitutes^77 OR sensor^77 OR microswitch^76 OR foam^75 OR exhilarant^74 OR liquid^72 OR inebriety^72 OR stream^72 OR porous^71 OR magnetic^71 OR 0.1-3.1^70)) AND APD:([NOW-20YEARS TO NOW])
Once you have the query the way you want it, simply click Search.
Note that you can also generate a matrix directly from this window, as opposed to clicking search then selecting Analyze-->Concept Landscape Matrix from the Search Results.
The search results above clearly demonstrate the power of keyword rich, weighted, and broad OR'd queries. But it is important to notice a few key options here.
First, by default, the Class Fingerprint option is on. What this does is take the CPC and IPC classes from the original patent and uses them to add to the search. This narrows the search into what we call the Class Neighborhood. Basically, you are allowing the examiners to help you focus your search in on relevant patents.
Also notice that the Max No. Documents is set to 200 by default. You can use this dropdown to limit the number returned even farther, or to see every single document that might meet your search. How many you want to view and work with is up to you.
Although not shown in the above, you can also scroll down on the left-hand panel and use the facets, just like you would in any other set of Search Results. That way, you can reduce the set further to just a particular assignee or some other facet that you want to see.