Home > apache solr > Search API Solr: How to index fulltext fields with and without stemming?

Search API Solr: How to index fulltext fields with and without stemming?

January 1Hits:4
Advertisement

In short: Is there any way to index the stemmed and non-stemmed version of Fulltext fields in Drupal Search API Solr without resorting to hardcoding field data into copyfield in the schema file?

(and if not, what's the safest, most Drupal / Search API friendly approach to doing this? e.g. using Drupal field machine names in the Solr schema file, maybe?)



Background: A common practice when working with Apache Solr is to use a stemmer like SnowballPorterFilterFactory (stemmers make searches match by the grammatical 'stem' of a word, so for example, searches on "walking" match content with "walked"), then, to copy fulltext fields (e.g. using copyfield), stemming one of the two duplicates and not stemming the other.

This popular approach has two advantages (at the cost of more processing):

  • Exact matches are indexed more highly than close matches - a search on "walking" matches content with "walking" twice (stemmed and unstemmed) and content with "walked" once (stemmed only)
  • You're guaranteed to not have any awkward cases where a search on a stemmable term fails to match content that contains that original exact term*.

As I understand it, when Solr is used outside of Drupal, this is usually done by hardcoding <copyfield> declarations into the Solr schema file for each field that is to be stemmed.



(for the sake of this question, imagine a Search API Solr search indexing nodes, processing as fulltext the fields Title, Body, Teaser, and one custom field named Notes)



The problem: With Drupal and the Search API module's Search API Solr, the fields are configured by Drupal dynamically - rather than being hardcoded in the Solr schema.txt file. It's not clear how to make the two approaches play nicely together.

Is there a better, more Drupal-friendly way to index stemmed and unstemmed content than hard-coding the field names into something based on copyfield in the schema.txt file?

If hard-coding is the answer, what names should be used, what care needs to be taken to avoid namespace problems?



Another popular approach with Solr outside of Drupal is to create a composite field of all your fulltext fields, combined, and treat it as a string rather than fulltext - so it bypasses filters and simply mops up and boosts any words that are searched for exactly as they appear in the text. Search API has two features under Workflow that look promising for this, Aggregated fields and Complete entity view - but (as far as I can tell) both can only index as Fulltext and would therefore be stemmed just like the regular field.



I can't see anything on this specific to Search API Solr. The closest I can see is this issue for D8 core search, but that's rather different. I can't find any information about anything like this for Search API.



*(for example, I find that with stemming, searches on 'unravelling' don't match content containing 'unravelling', but searches on 'unravel' do. It's being stemmed down to 'unravel', but for some reason it's not recognising 'unravelling' as a valid extension of the stem 'unravel'. 'Unpublished' is another example (unpublish matches, unpublished doesn't). I've seen various reports of similar language-specific stemming issues. Keeping an unstemmed copy seems to be the standard approach, but I can't see any clean way to do this in Drupal)



Also posted as a support issue on the Search API Solr queue (yes, cross posting like this is okay, actually encouraged to make life easier for maintainers)

Answers

Good question, and also a topic that should probably be brought up as a feature request for both Apache Solr Integration and Search API Solr at some point. Searching both stemmed and unstemmed content can often lead to significantly better results.

For Search API Solr, there's no way to do this without modifying the schema.xml bundled with the module. But you're probably already aware of that since you're using stemming and the bundled schema does not have stemming enabled by default.

The key to doing this without lots copyField definitions is to use a combination of dynamicField and copyField. You'll also need two different fieldType definitions, one for unstemmed and one for stemmed text.

Try these steps:

  1. Duplicate the entire definition for <fieldType name="text"> as <fieldType name="stemmed_text"> and uncomment the SnowballPorterFilterFactory filter.
  2. Add a dynamicField definition using this new fieldType. The data does not need to be stored, only indexed. Example: <dynamicField name="stemmed_*" type="stemmed_text" termVectors="true" stored="false" />
  3. Add a wildcard copyField definition that copies all text fields into corresponding stemmed text fields. Example: <copyField source="t_*" dest="stemmed_*" />
  4. Reindex.
  5. Add "qf" params to the Solr query for all stemmed fields you want to search. You can do this via hook_search_api_solr_query_alter(). If you want all text fields, you can look for field names prefixed with "t_".

Bonus: Sometimes it helps to boost matches against the unstemmed field in order to make more exact results appear first. You can add boost factors to the field names in the qf param.

Bonus #2: The pf param ("phrase field") is another good way to boost exact matches. This lets you give priority to results where terms are in the same order as what the user entered.

Apache Solr Integration module:

For those using apachesolr.module, matching against both stemmed/unstemmed can be done without changing schema.xml. The bundled schema includes lots of dynamicFields and is quite flexible. By default, searchable fields are stemmed. You can simply copy content fields to unstemmed dynamicFields in hook_apachesolr_index_documents_alter() and then add "qf" params for those fields via hook_apachesolr_query_alter().

Related Articles

  • Search API Solr: How to index fulltext fields with and without stemming?January 1

    In short: Is there any way to index the stemmed and non-stemmed version of Fulltext fields in Drupal Search API Solr without resorting to hardcoding field data into copyfield in the schema file? (and if not, what's the safest, most Drupal / Search AP

  • How can I use stemmer with Search API Solr but show non-stemmed results in the autocomplete form?November 26

    I'm trying to use the Greek stemmer with Apache Solr. If I edit the schema.xml and add the lines: <filter class="solr.GreekLowerCaseFilterFactory"/> <filter class="solr.GreekStemFilterFactory"/> in both analyzers (index and

  • How to boost relevance of Search API solr with the value of a certain field?July 15

    I am using Search API solr on my site. To get better search results I would like to boost the relevance by the value of a certain numeric field. Say, all items have a field A, which is numeric. Now I want to boost the relevance of those itemes by the

  • How do I get the placeholder attribute into a Search API Solr Search field?November 30

    I'm using Search API Solr Search and can't seem to get a placeholder attribute in it. I've tried using hook form alter as I would on a normal search block, but doesn't seem to work with search api solr. The id of the form = views-exposed-form-recipe-

  • Search API Solr entity reference fieldMay 14

    I am having an issue with Search API Solr i am not able to search for content via the entity reference field i have a content type structured like this Title body Author Publisher (Entity Reference to a Node) when i search for content by Title, body

  • Search api & solr search = no results found

    Search api & solr search = no results foundMarch 13

    Configuration: I am using Drupal 7, search api and search api solr. Description & problem: I am trying to use search api and solr search(as a back end) to search for nodes. I have a solr server running in localhost:8983/solr/admin/. I create a solr s

  • 'Flags: Node flag relationship' - Search Api - Solr - ViewsJanuary 6

    Flag - 7.x-3.0-alpha4 Search API - 7.x-1.3 Solr search - 7.x-1.0-rc2+13-dev Apache Solr - 3.5 I have Flag 3.x working fine in my Solr Views. I am able to flag/unflag correctly. Flag comes with a default 'My Bookmarks' view, that shows the current use

  • How to send a custom Solr Query using Search API SolrMay 28

    I'd like to send custom solr query (preferably in hook_init) using Search API Solr to a specific URL. Something like: function hook_init() { $url = "admin/luke?show=schema"; $solr = new SearchApiSolrService(); // must have an instance of SearchA

  • How to boost relevance of Search API solr with creation date?March 18

    We have replaced Drupal's core search with SolR but we have noticed a recursiveness issue regarding the search results, for ex. when searching "executive committee", documents created in 2006 and 2007 are listed before newer documents created in

  • Text does not get highlighted in views using search api solrJune 25

    I'm trying to highlight my search key in the result of a view using apache solr + search api solr module. I've created a server + index using the search api solr module. I've enabled highlighting option on both of them (In advanced setting on the ser

  • Search api solr deleted terms showing up in facet

    Search api solr deleted terms showing up in facetOctober 7

    Why would taxonomy terms that I have previously deleted show up in my facet? I'm using search api solr. There is no term named "23"; it is a tid, but tid 23 is not in the database yet my facet shows 29 items as being tagged with it. I have flush

  • Advanced Search (Search Api Solr)January 6

    I'm trying to build a advanced search for a eRecruiting site build with the recruiter distribution. The Recruiter Distribution comes with 3 different search two for applicant that are What ? (Job Search) and Where (Job Location). The another one is t

  • Search API Solr prioritise nodesSeptember 4

    I'm using the Search API programatically to run searches on a Solr index. All returns back fine, I'm just looking for a way to prioritise certain nodes by node ID. Is this possible? I'm currently sorting by Search API Solr's "distance" by using

  • Search Api Solr : SearchApiException while deleting items from server Solr: "504" StatusSeptember 17

    I'm having trouble getting Search API Solr to index, these are the errors I am receiving. Been searching the next for almost a week now. An AJAX HTTP error occurred. HTTP Result Code: 502 Debugging information follows. Path: /batch?id=221&op=do Statu

  • Search API sorts module doesn't work with search API solr and search API ajax modulesFebruary 11

    I am trying to use Search API sorts, Search API Solr and Search API AJAX in my Drupal 7 project. Solr and AJAX work fine but when I try to sort, it doesn't work. The url is products/category/rings?sort=field_product%3Atitle&order=desc But the product

  • Search API Solr fulltext search improve resultsDecember 1

    I use Search_api_solr module with drupal. I have a node with "Sony W800a TV" when I search w800 there is no result but when I search w800a I see this node in results. How Can I Fix this? Thanks --------------Solutions------------- You're looking

  • How do I get Search API Solr to index comments?September 5

    I'm not sure if it's actually doing this; I can't see any options for comment fields in the configuration. How can I check? Does it say explicitly somewhere? --------------Solutions------------- Apachesolr Comments By default the apachesolr module in

  • Search API Solr integration with fivestar (or similar) rating system (fascet and sort)September 26

    I'm attempting to sort nodes by ratings using the Search API faceted search with Solr integration. I've already set up fivestar ratings (about 9 per node, its a large multi-axis rating system.) but i'm unable to index these ratings! Can someone help

  • Making sticky results in Search API SolrFebruary 27

    We want to be able to promote some nodes on the search results. I thought about using the "Sticky" option, so if a node is sticky, it should be on top of the results. I've tried using the "Boost" option in the index field settings but

  • Search Api Solr - Pointing term links inside a view to its facetDecember 16

    Using Search Api. Solr as a backened. Search pages built with Views. When adding a term field to a view, that term links to the core taxonomy term pages. That term should instead link to it's facet(not adding to the other facets if the user had previ

Copyright (C) 2018 ceus-now.com, All Rights Reserved. webmaster#ceus-now.com 14 q. 0.658 s.