This article will explain how the search algorithm works behind the scenes in both the Data Portal plugin and the QueryApp API.
Please note that we regularly adjust and tweak this algorithm. When a keyword search is used and the results are sorted by “best”, we score the results to show the most relevant first. By default all resources are given a score of 50. When using a matching search (taxonomy or other fields) without search terms all records are scored as 250 as no scoring takes place.
The score is increased by a set amount if any of the following are true (in order of higher increase to lower increase). Short words less than 3 characters are ignored as are some very commonly used words like “service”, “program”, “resource”, etc.
- The search words appear in the public name
- The search words appear in the alternative name
- The search words appear in the taxonomy terms
- The search words appear in the description
If a phrase or multiple words are used to search (instead of a single word), the above scoring takes place for each word. Additionally if the entire or part of the phrase matches, the score is increased. The algorithm will also look at how many of the words were matched overall and increase or decrease the score respectively. Theoretically searching for a multi word agency name the record for that agency receives very high score of 300+ and poor matches will be below 50. Records that are further away receive a slight decrease in their score.
The Data Portal plugin supports a “Low Score Filter” (if advanced options are enabled) which can be used to remove poorer matching resources from the results, if used the ideal setting should never be 50 or greater.
The API will return the internal score in the response and if you are curious to see what scoring is using the Data Portal plugin, add &score=true to the search results url.