TVs. Consoles. Projectors and accessories. Technologies. Digital TV

PINQ - Interrogated datasets. Faceted search. How to build a faceted search using facet counters? Benefits of Faceted Search

We took a quick look at the installation and basic syntax of PINQ, a port of LINQ to PHP. In this article, we'll look at how to use PINQ to simulate the faceted search feature in MySQL.

In this article we will not cover all aspects of faceted search. Interested people can search for suitable information on the Internet.

A typical faceted search works like this:

  • The user enters a keyword, or several keywords to search. For example, “router” to search for products in which the word “router” appears in the description, keywords, category name, tags, etc.
  • The site returns a list of products that match these criteria.
  • The site provides several links to customize your search terms. For example, it may allow you to specify specific router manufacturers, or set a price range, or other features.
  • The user can continue to specify additional search criteria in order to obtain the data set of interest.

Faceted search is quite popular and a powerful tool that can be seen on almost any e-commerce website.

Unfortunately, faceted search is not built into MySQL. So what should we do if we still use MySQL, but want to give the user this opportunity?

With PINQ, which has a similar, powerful and simple approach, we can achieve the same behavior as if we were using other database engines.

Expanding the demo from the first part

Comment: All code from this part, and from the first part, can be found in the repository.

In this article, we'll expand on the demo from Part 1 with a significant improvement in the form of faceted search.

Let's start with index.php and add the following lines to it:

$app->get("demo2", function () use ($app) ( global $demo; $test2 = new pinqDemo\Demo($app); return $test2->test2($app, $demo->test1 ($app)); $app->get("demo2/facet/(key)/(value)", function ($key, $value) use ($app) ( global $demo; $test3 = new pinqDemo\Demo($app); return $test3->test3($app, $demo->test1($app), $key, $value ));

The first route takes us to a page to view all posts that match the keyword search. To keep the example simple, we select all books from the book_book table. It will also display the resulting data set and a set of links to specify the search criteria.

In real applications, after clicking on such links, all facet filters will adjust to the boundary values ​​of the resulting data set. The user will thus be able to sequentially add new search conditions, for example, first select a manufacturer, then specify a price range, etc.

But in this example we will not implement this behavior - all filters will reflect the boundary values ​​of the original data set. This is the first limitation and the first candidate for improvement in our demo.

As you can see in the code above, the actual functions are located in another file called pinqDemo.php. Let's take a look at the corresponding code that provides the faceted search feature.

Aspect class

First, let's create a class that represents an aspect. In general, an aspect should contain several properties:

  • The data it operates on ( $data)
  • The key by which the grouping is performed ( $key)
  • Key type ($type). Can be one of the following:
    • specify the full string for an exact match
    • specify part of the string (usually the initial one) to search by pattern
    • indicate a range of values, for grouping by range
  • if the key type is a range of values, you need to define a value step to determine the lower and upper bounds of the range; or if the type is part of a string, you must specify how many first letters will be used for grouping ($range)

Grouping- the most critical part of the aspect. All aggregated information that an aspect may be able to return depends on the grouping criteria. Typically the most used search criteria are “Full String”, “Part of String”, or “Value Range”.

Namespace classFacet ( use Pinq\ITraversable, Pinq\Traversable; class Facet ( public $data; // Original data set public $key; // field by which to group public $type; // F: entire row; S: start strings; R: range; public $range; // plays a role only if $type != F ... public function getFacet() ( $filter = ""; if ($this->type == "F") // entire line ( ... ) elseif ($this->type == "S") // start of line ( ... ) elseif ($this->type == "R") // range of values ​​( $ filter = $this->data ->groupBy(function($row) ( return floor($row[$this->key] / $this->range) * $this->range; )) ->select(function (ITraversable $data) ( return ["key" => $data->last()[$this->key], "count" => $data->count()]; )); return $filter; ) ) )

The main function of this class is to return a filtered dataset based on the original dataset and aspect properties. From the code it is clear that for various types accounts are used various ways grouping data. In the code above, we showed what the code might look like if we group the data by a range of values ​​in increments specified in $range.

Setting aspects and displaying source data

Public function test2($app, $data) ( $facet = $this->getFacet($data); return $app["twig"]->render("demo2.html.twig", array("facet" = > $facet, "data" => $data)); private function getFacet($originalData) ( $facet = array(); $data = \Pinq\Traversable::from($originalData); // 3 creation examples different aspect objects, and return the aspects $filter1 = new \classFacet\Facet($data, "author", "F"); $filter2 = new \classFacet\Facet($data, "title", "S", 6) ; $filter3 = new \classFacet\Facet($data, "price", "R", 10); $facet[$filter1->key] = $filter1->getFacet(); ] = $filter2->getFacet(); $facet[$filter3->key] = $filter3->getFacet(); return $facet;

In the getFacet() method we do the following:

  • Convert the original data into a Pinq\Traversable object for further processing
  • We create three aspects. The 'author' aspect will group by the author field, and implement grouping by the entire row; aspect 'title' - by the title field with grouping by part of the line (by the first 6 characters); aspect 'price' - by the price field with grouping by range (in increments of 10)
  • Finally, we extract the aspects and return them to the test2 function so that they can be output to the template for display

Outputting aspects and filtered data

In most cases, filters will be displayed as a line, and will lead you to view the filtered result.

We've already created a route ("demo2/facet/(key)/(value)") to display faceted search results and filter links.

The route takes two parameters, depending on the key being filtered by and the value for that key. The test3 function that is bound to this route is shown below:

Public function test3($app, $originalData, $key, $value) ( ​​$data = \Pinq\Traversable::from($originalData); $facet = $this->getFacet($data); $filter = null; if ($key == "author") ( $filter = $data ->where(function($row) use ($value) ( ​​return $row["author"] == $value; )) ->orderByAscending( function($row) use ($key) ( return $row["price"]; )) ; ) elseif ($key == "price") ( ... ) else //$key==title ( .. . ) return $app["twig"]->render("demo2.html.twig", array("facet" => $facet, "data" => $filter));

Basically, depending on the key, we apply filtering (an anonymous function in the where statement) according to the passed value and get the following set of filtered data. We can also set the order of data filtering.

Finally, we display the raw data (along with filters) in the template. This route uses the same pattern we used in "demo2".

Search Bar

    (% for k, v in facet %)
  • ((k|capitalize))
    • (% for vv in v %)
    • ((vv.count))((vv.key))
    • (%endfor%)
    (%endfor%)

We need to remember that the aspects generated by our application are nested arrays. At the first level, this is an array of all aspects, and, in our case, there are three of them (for author, title, price, respectively).

Each aspect has a key-value array, so we can iterate over it using normal methods.

Notice how we build the URLs for our links. We use both the outer loop key (k) and the inner loop keys (vv.key) as parameters for the route ("demo2/facet/(key)/(value)"). The size of the arrays (vv.count) is used for display in the template.

The first image shows the original data set, and the second image is filtered by price range from $0 to $10, and sorted by author.

Great, we were able to simulate faceted search in our application!

Before finishing this article, we need to take a final look at our example and determine what can be improved and what limitations we have.

Possible improvements

In general, this is a very basic example. We've just gone over the basic syntax and concepts and implemented them as a working example. As previously stated, we have several areas that could be improved for greater flexibility.

We need to implement “overlay” search criteria, since the current example limits us to the ability to apply search filtering only to the original data set; we cannot apply faceted search to an already filtered result. This is the biggest improvement I can imagine.

Restrictions

The facet search implemented in this article has serious limitations (which may also apply to other facet search implementations):

We fetch data from MySQL every time

This application uses the Silex framework. Like any single entry point framework like Silex, Symfony, Laravel, its index.php (or app.php) file is called every time a route is parsed and controller functions are executed.

If you look at the code in our index.php, you will notice that the following line of code:

$demo = new pinqDemo\Demo($app);

is called every time the application page is displayed, which means the following lines of code are executed every time:

Class Demo ( private $books = ""; public function __construct($app) ( $sql = "select * from book_book order by id"; $this->books = $app["db"]->fetchAll($sql ; )

Will it be better if we don't use a framework? Well, despite the fact that developing applications without frameworks is not a good idea, I can say that we will encounter the same problems: data (and state) is not preserved between different HTTP requests. This is a fundamental characteristic of HTTP. This can be avoided by using caching mechanisms.

We saved some SQL queries using aspects. Instead of passing one select query to retrieve the data, and three group by queries with corresponding where clauses, we ran just one where query, and used PINQ to get the aggregated information.

Conclusion

In this part, we implemented the ability to facet search a collection of books. As I said, this is just a small example, which has room for improvement, and which has a number of limitations.

Smart filter or Faceted search is a filter by product category, which can be seen in large online stores and the same Yandex.market. It helps to consistently sort products with the properties the user needs, weeding out everything unnecessary. This is a very convenient option that allows you to quickly find the desired product or material on the site.

And so let's move on directly to installing and configuring the modules we need

First, we will need to download and install the following modules: Search API, Search API Database Search, Entity API and Views.

On the modules page we enable:

  • Search API
  • Search views
  • Database search
  • Entity API
  • Views
  • Views UI
  • Ctools

Creating a search server

Let's go to Configuration > Search and metadata > Search API(/admin/config/search/search_api) and click Add server.
Then enter the server name in the drop-down list Service class choose Database service and save.

Creating an Index

Let's go to Configuration > Search and metadata > Search API(/admin/config/search/search_api), click Add server (Add index).
Enter the name of the index in the field Item type (Item type) select ‘ Material‘, in the field Server choose Database server, click Creating an index.


In the form that opens, check the boxes by which the sorting will be performed and save.
To be able to sort by node name, turn on the title and select the type opposite it in the drop-down list string, not fulltext. You cannot sort by fulltext.

In the next form that opens Filters(workflow) I left everything as default, go to the tab View (Status), and press Index now (Index Now).
After indexing is complete, we will create a search page.

Creating a Search Page

Let's go to Structure > Views and click Add a new view (Add new view).
In the new view in the drop-down list Show (Show) select the index we previously created, fill in the remaining fields (name, title and path) as you need.


Next, click Save and configure(Continue & edit) Set up the view as usual. In the filtering criteria, I added showing only published materials and the desired node type and configured the display of the necessary fields (you need to add these fields to the index in order to be able to filter by them).

At this stage we are done with setting up the view, now let's move directly to the facet filter.

A/search_api_ranges.module +++ b/search_api_ranges.module @@ -144.11 +144.8 @@ function search_api_ranges_minmax($variables, $order = "ASC") ( // otherwise our min/max would always equal user input. $filters = &$query->getFilter()->getFilters(); foreach ($filters as $key => $filter) ( - - // Check for array: old style filters are objects which we can skip. - if (is_array($filter)) ( - if ($filter == $variables["range_field"] || ($filter != $variables["range_field"] && $filter == "")) ( - $ current_filter = $filters[$key]; + if(isset($filter->tags) && is_array($filter->tags))( + if(in_array("facet:".$variables["range_field"], $ filter->tags))( unset($filters[$key]); ) )

Patching JQuery UI Slider: setting up a redirect

In version 7x-1.5 of the module, I encountered the fact that if the slider widget was located on a page other than the search page, then after changing the price range, the direction was redirected to the current page, and not to the search page.
The error lies in the function search_api_ranges_block_slider_view_form_submit()(file search_api_ranges.module, line 364).
I didn’t really look into what was there and why, I just changed the code a little on line 427:

Drupal_goto($path, array("query" => array($params), "language" => $language)); + drupal_goto($values["path"], array("query" => array($params), "language" => $language));

after which the problem was solved.

Faceted navigation is a problem for all e-commerce sites. An excessive number of pages that are used for different variations of the same element poses a threat to search efficiency. This can negatively impact SEO and user experience. Experts from the SEO Hacker blog explained what faceted navigation is and how to improve it.

Faceted Navigation: Definition

This type of navigation is usually found in the sidebars of e-commerce sites and contains filters and facets - parameters that the user configures as desired. allows online store customers to search for the product they want using a combination of attributes that will filter products until users find what they need.

Facets and filters are different from each other. Here's the difference:

  • Facets are indexed categories. They help refine product listings and act as extensions of core categories. Facets add unique meaning to each choice the user makes. Since facets are indexed, they must send relevant signals to the search engine, ensuring that the page contains all the important attributes.

  • Filters are used to sort and refine items within lists. They are necessary for users, but not for search engines. Filters are not indexed because they do not change the content of the page, but only sort it in a different order. This results in multiple URLs having duplicate content.

Potential problems

Each possible facet combination has its own unique URL. It can cause some problems from an SEO perspective. Here are the main ones:

  • Duplicate content.
  • Waste of budget on scanning.
  • Eliminate link differences.

As your site grows, so does the number of duplicate pages. Incoming links may go to various duplicate pages. This reduces the value of links and limits the ability of pages to rank.

The likelihood of keyword cannibalization also increases. Multiple pages try to rank for the same keywords, resulting in less consistent and lower rankings. This problem could be avoided if each keyword was targeted only to a single page.

Faceted Navigation Solutions

When choosing a solution for faceted navigation, consider your end goal: increasing the number of pages you index or reducing the number of pages you don't want indexed. Here are some solutions that may be useful for you:

AJAX

If you use AJAX, a new URL is not created when the user clicks on a facet or filter. Since there will be no unique URLs for every possible facet combination, the problem of duplicate content, keyword cannibalization, and wasted indexing costs is potentially eliminated.

AJAX can only be effective before the e-commerce site is launched. It is not used to solve problems of existing resources. This method also requires certain expenses on your part.

noindex tag

The noindex tag is used to exclude bots specific page from the index. This way it won't show up in the results. Google search. This helps reduce the amount of duplicate content that appears in the index and search results.

This won't solve the crawl budget problem because bots will still visit your page. It also doesn't help distribute the value of the links.

The rel=canonical attribute

With this attribute, you tell Google that you have one main preferred page to index and rank, and all other versions of content from that page are just duplicates that don't need to be indexed.

Sofia Ibragimova

Content Marketer

If the same page on your site can be reached from multiple URLs, search robots will treat each URL as a separate page. Bots will decide that the content on your site is not unique, and this will negatively affect rankings and reduce your position in search results. To avoid this, specify the main canonical page by inserting the following sequence of characters into the HEAD block:

You can use canonical pages to solve the problem of duplicate content, and the share link will be merged with your main page. But there is a chance that bots will still crawl duplicate pages, which is a waste of crawling budget.

Robots.txt

Closing some pages from indexing allows you to achieve good results. It's simple, fast and reliable way. The easiest way to do this is to set a custom option to specify all the possible combinations of facets and filters that you want to block. Include it at the end of each URL you want to hide (http://full page address/robots.txt) or use the Robots meta tag in the HEAD area of ​​the page code.

When making changes to the URL, keep in mind that it takes 3-4 weeks for robots to notice and respond to these changes.

There are also certain problems. The value of links will be limited, and a blocked URL may be indexed due to the presence of external links.

Google Search Console

This is a great way to temporarily fix your problems while you work on creating a better and better convenient system navigation. You can use the Google Search Console to tell the search engine how to crawl your site.

  • Sign in account console and select the “Crawl” section:

  • Click on the “URL Parameters” button:

  • Indicate the impact each of your settings will have on the page and how you want Google to treat those pages.

Remember that this method only hides duplicate content from search engines. Google robots. Pages will still appear in Bing and Yahoo.

How to Improve Faceted Navigation

Let's briefly consider all the methods that allow you to create the correct faceted navigation:

  • Using AJAX
  • Remove or hide links to categories or filter pages that are missing content.
  • Allow indexing of certain combinations of facets that have high volume of search traffic
  • Setting up a site hierarchy through breadcrumbs in categories and subcategories.
  • Creating canonical (main) pages for duplicate content.
  • Consolidate indexing properties from component pages across the entire series using page markup with rel="next" and rel="prev" .

Conclusion

Each of the solutions mentioned has its own advantages and disadvantages. There is no universal solution; it all depends on the specifics of your business and the specific case. Optimized faceted navigation will allow your site to target a wider range of keywords. To avoid risk, make sure that the navigation not only meets the requirements of search robots, but also provides a good user experience.



Related publications