{"id":9090,"date":"2025-10-25T15:58:18","date_gmt":"2025-10-25T13:58:18","guid":{"rendered":"https:\/\/www.codecon.sk\/?p=9090"},"modified":"2026-03-11T16:03:37","modified_gmt":"2026-03-11T15:03:37","slug":"search-is-not-magic-with-postgresql","status":"publish","type":"post","link":"https:\/\/www.codecon.sk\/en\/search-is-not-magic-with-postgresql\/","title":{"rendered":"Search is not magic with PostgreSQL"},"content":{"rendered":"<div class=\"gn go gp gq gr\">\n<div class=\"ab cb\">\n<div class=\"ci bh fz ga gb gc\">\n<p id=\"614d\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Here I want to show you several approaches making search better for your data. I will show you powerful tools and some tips that I know thanks to days and weeks in it. In order not to be pure SQL I will also show implementation of approaches in the Elixir language with the great Ecto library. It\u2019s important to say that this \u201cproblem\u201d can not be solved only in the database. It has to be solved on both places where you have logic and data.<\/p>\n<p id=\"344b\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">A long time ago, when someone said \u201csearch\u201d to me, the first thing I thought of was:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"12ac\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">select<\/span> <span class=\"hljs-operator\">*<\/span> <span class=\"hljs-keyword\">from<\/span> products <span class=\"hljs-keyword\">where<\/span> name ilike <span class=\"hljs-string\">'%banana%`;<\/span><\/span><\/pre>\n<p id=\"34d4\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">It\u2019s easy with Ecto:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"a25c\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\">text = \"banana\"\r\nfrom(p in Products, where: ilike(p.name, \"%#{text}%\")) \r\n|&gt; Repo.all()<\/span><\/pre>\n<p id=\"5e3e\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Is it good enough? Is it safe? Well, it worked. We can still say that it works in some cases. But people are more lazy at these days, they don\u2019t pay attention to what they write. So if customers write\u00a0<code class=\"cx nt nu nv nk b\">ba<strong class=\"mi gv\">nna<\/strong>na<\/code>\u00a0it will not work. Because all your products with the name\u00a0<code class=\"cx nt nu nv nk b\">banana<\/code>\u00a0will just not match. You can say:<\/p>\n<blockquote class=\"nw nx ny\">\n<p id=\"c714\" class=\"mg mh nz mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Yeah, I don\u2019t care, it\u2019s just a search\u2026<\/p>\n<\/blockquote>\n<p id=\"3ebd\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Well, if you think it doesn\u2019t matter you can Google it or ask someone. For example some random statistics from the internet:<\/p>\n<blockquote class=\"nw nx ny\">\n<p id=\"9a1c\" class=\"mg mh nz mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\"><em><strong class=\"mi gv\">43% of retail customers\u00a0<\/strong>head directly to the search bar on a website.<\/em><\/p>\n<p id=\"b633\" class=\"mg mh nz mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\"><em>On-site search is used more frequently than the navigation menu (49%), filter feature (37%), and homepage recommendations (30%), with\u00a0<strong class=\"mi gv\">78%.<\/strong><\/em><\/p>\n<p id=\"291b\" class=\"mg mh nz mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\"><em>As many as<strong class=\"mi gv\">\u00a068% of customers would not return<\/strong>\u00a0to a site with a\u00a0<strong class=\"mi gv\">poor search<\/strong>\u00a0experience.<\/em><\/p>\n<\/blockquote>\n<p id=\"df4f\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\"><em class=\"nz\">Source:\u00a0<\/em><a class=\"af oa\" href=\"https:\/\/www.mailmodo.com\/guides\/ecommerce-site-search-statistics\/\" target=\"_blank\" rel=\"noopener ugc nofollow\"><em class=\"nz\">https:\/\/www.mailmodo.com\/guides\/ecommerce-site-search-statistics\/<\/em><\/a><\/p>\n<p id=\"302f\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">And I know a lot of sites where you can see that more than 80% of customers are using search instead of categories.<\/p>\n<p id=\"80ec\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\"><em class=\"nz\">Let\u2019s do a very simple calculation with fake data: You have 1000 visitors (per day\/month, doesn\u2019t matter). Based on the text below 790 visitors will use search. Let\u2019s say that 30% will get wrong results in search. That\u2019s 237 visitors who will get wrong results. What does it mean? Let\u2019s say you have a 20% conversion rate for converting regular visitors to paying customers (who have bought something). That\u2019s 47.4 customers you lost.<br \/>\nLet\u2019s say your average shopping cart is 80 (EUR, Dollars, \u2026). Basically this means that you have lost 3 792 per day\/month\/\u2026 . Take your statistics and work this out for yourself, or ask your boss to do it. Have I hot your attention?<\/em><\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"ab cb ob oc od oe\" role=\"separator\"><\/div>\n<div class=\"gn go gp gq gr\">\n<div class=\"ab cb\">\n<div class=\"ci bh fz ga gb gc\">\n<h2 id=\"b250\" class=\"oj no gu bf ok ol om on oo op oq or os ot ou ov ow ox oy oz pa pb pc pd pe pf bk\">Are there other options?<\/h2>\n<p id=\"0e43\" class=\"pw-post-body-paragraph mg mh gu mi b mj pg ml mm mn ph mp mq mr pi mt mu mv pj mx my mz pk nb nc nd gn bk\" data-selectable-paragraph=\"\">You probably already know the answer, so take a look at what we have on our plate for today with Elixir\/Ecto and Postgresql:<\/p>\n<ol class=\"\">\n<li id=\"43eb\" class=\"mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd pl pm pn bk\" data-selectable-paragraph=\"\">Similarity<\/li>\n<li id=\"28c4\" class=\"mg mh gu mi b mj po ml mm mn pp mp mq mr pq mt mu mv pr mx my mz ps nb nc nd pl pm pn bk\" data-selectable-paragraph=\"\">TS Vectors<\/li>\n<li id=\"963b\" class=\"mg mh gu mi b mj po ml mm mn pp mp mq mr pq mt mu mv pr mx my mz ps nb nc nd pl pm pn bk\" data-selectable-paragraph=\"\">Semantic Search<\/li>\n<li id=\"7ed0\" class=\"mg mh gu mi b mj po ml mm mn pp mp mq mr pq mt mu mv pr mx my mz ps nb nc nd pl pm pn bk\" data-selectable-paragraph=\"\">Hybrid Search<\/li>\n<li id=\"5a4d\" class=\"mg mh gu mi b mj po ml mm mn pp mp mq mr pq mt mu mv pr mx my mz ps nb nc nd pl pm pn bk\" data-selectable-paragraph=\"\">3rd party engines\/providers<\/li>\n<\/ol>\n<h2 id=\"acd6\" class=\"oj no gu bf ok ol pt on oo op pu or os ot pv ov ow ox pw oz pa pb px pd pe pf bk\">Similarity<\/h2>\n<p id=\"6da7\" class=\"pw-post-body-paragraph mg mh gu mi b mj pg ml mm mn ph mp mq mr pi mt mu mv pj mx my mz pk nb nc nd gn bk\" data-selectable-paragraph=\"\">It\u2019s a function or logic in Postgresql thanks to the\u00a0<em class=\"nz\">pg_trgm<\/em>\u00a0extension, definition from\u00a0<a class=\"af oa\" href=\"https:\/\/www.postgresql.org\/docs\/current\/pgtrgm.html\" target=\"_blank\" rel=\"noopener ugc nofollow\">docs<\/a>\u00a0is:<\/p>\n<blockquote class=\"nw nx ny\">\n<p id=\"8ab2\" class=\"mg mh nz mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">The\u00a0<code class=\"cx nt nu nv nk b\">pg_trgm<\/code>\u00a0module provides functions and operators for determining the similarity of alphanumeric text based on trigram matching, as well as index operator classes that support fast searching for similar strings.<\/p>\n<\/blockquote>\n<p id=\"fc3c\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Ok, but what about trigrams? What are they? It\u2019s important to understand what you\u2019re doing, not just to use a function without knowing what it does. Even more so when it\u2019s not that complicated to understand it.<\/p>\n<blockquote class=\"nw nx ny\">\n<p id=\"6222\" class=\"mg mh nz mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">A trigram is a group of three consecutive characters taken from a string. We can measure the similarity of two strings by counting the number of trigrams they share. This simple idea turns out to be very effective for measuring the similarity of words in many natural languages.<\/p>\n<\/blockquote>\n<p id=\"d7ab\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">That\u2019s much better but let\u2019s show you some examples to understand it better.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"59f2\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">CREATE<\/span> EXTENSION IF <span class=\"hljs-keyword\">NOT<\/span> <span class=\"hljs-keyword\">EXISTS<\/span> pg_trgm;<\/span><\/pre>\n<pre class=\"py nj nk nl bp nm bb bk\"><span id=\"09d3\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">select<\/span> show_trgm(<span class=\"hljs-string\">'banana'<\/span>);\r\n<span class=\"hljs-operator\">=<\/span><span class=\"hljs-operator\">&gt;<\/span> {  b, ba,ana,ban,na ,nan}\r\n\r\n<span class=\"hljs-keyword\">select<\/span> show_trgm(<span class=\"hljs-string\">'effective for measuring the similarity'<\/span>);\r\n<span class=\"hljs-operator\">=<\/span><span class=\"hljs-operator\">&gt;<\/span> {  e,  f,  m,  s,  t, ef, fo, me, si, th,ari,asu,cti,eas,ect,eff,fec,ffe,<span class=\"hljs-keyword\">for<\/span>,he ,ila,imi,ing,ity,ive,lar,mea,mil,ng ,<span class=\"hljs-keyword\">or<\/span> ,rin,rit,sim,sur,the,tiv,ty ,uri,ve }<\/span><\/pre>\n<p id=\"77be\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Basically super simple explanation: it splits text into multiple small texts and compares them with each other and counts how many of these small parts are the same.<\/p>\n<p id=\"2721\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">First, this function just shows how text is split into multiple strings that are compared and ranked for the actual search. Another function is used for the search:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"6f7e\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">select<\/span> name, similarity(name, <span class=\"hljs-string\">'banana'<\/span>)\r\n<span class=\"hljs-keyword\">from<\/span> products\r\n<span class=\"hljs-keyword\">order<\/span> <span class=\"hljs-keyword\">by<\/span> <span class=\"hljs-number\">2<\/span> <span class=\"hljs-keyword\">desc<\/span>;\r\n<span class=\"hljs-operator\">=<\/span><span class=\"hljs-operator\">&gt;<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">-------------+----------+<\/span>\r\n<span class=\"hljs-operator\">|<\/span>name         <span class=\"hljs-operator\">|<\/span>similarity<span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">-------------+----------+<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Banana Bread <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.54545456<\/span><span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Banan (lime) <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.5<\/span>       <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Banana Milk. <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.46153846<\/span><span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">-------------+----------+<\/span><\/span><\/pre>\n<p id=\"0a83\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">As you can see, it\u2019s quite simple. Thanks to the ranking you can sort by it, filter because it will show you ranking for everything in the table. So you will also have products with a ranking of 0. It depends on your dataset which sensitivity you want to achieve. Ideally you should try to limit it like this:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"e324\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">select<\/span> name, similarity(name, <span class=\"hljs-string\">'banana'<\/span>)\r\n<span class=\"hljs-function\"><span class=\"hljs-keyword\">from<\/span> products\r\n<span class=\"hljs-keyword\">where<\/span> <span class=\"hljs-title\">similarity<\/span>(<span class=\"hljs-params\">name, <span class=\"hljs-string\">'banana'<\/span><\/span>) &gt; 0.241\r\norder <span class=\"hljs-keyword\">by<\/span> 2 desc<\/span>;<\/span><\/pre>\n<p id=\"fe8f\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">This allows you to see how many results you will get and which results are at the end of the results. If it makes sense to show them last, try to reduce the number and see results even later. This will help you find the ideal number for you. Don\u2019t be afraid to use a longer number with more decimals.<\/p>\n<p id=\"cd84\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">What about Elixir and Ecto? There is nothing special to use if you allow extension in your database.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"d2fb\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">text<\/span> = <span class=\"hljs-string\">\"banana\"<\/span>\r\n<span class=\"hljs-keyword\">from<\/span>(p <span class=\"hljs-keyword\">in<\/span> Products,\r\n  <span class=\"hljs-keyword\">where<\/span>: fragment(<span class=\"hljs-string\">\"similarity(?, ?)\"<\/span>, p.name, ^<span class=\"hljs-keyword\">text<\/span>) &gt; <span class=\"hljs-number\">0.241<\/span>,\r\n  order_by: [desc: fragment(<span class=\"hljs-string\">\"similarity(?, ?)\"<\/span>, p.name, ^acc_query)]\r\n)\r\n|&gt; Repo.all()<\/span><\/pre>\n<p id=\"1bad\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Don\u2019t afraid to use similarity on multiple attributes together and combine them to get the best results. Thanks to\u00a0<a class=\"af oa\" href=\"https:\/\/www.postgresql.org\/docs\/current\/pgtrgm.html#PGTRGM-INDEX\" target=\"_blank\" rel=\"noopener ugc nofollow\">indexes<\/a>\u00a0you can get really fast responses. If you have several attributes or even a table merged with\u00a0<code class=\"cx nt nu nv nk b\">UNION<\/code>\u00a0or something like that, you can add constants as weights to lower some results. Because maybe it\u2019s more important to search in the product name than in ingredients, brands, shops, etc.<\/p>\n<p id=\"512b\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Are we done? Is this the best what we can do for the search? Again, it depends on your dataset and your usage. It may be sufficient, but may also have a lot of false positive search results. Just because some words contain the same trigrams, \u201csmall parts\u201d. It can be enough thanks to speed and a lot of results. But human languages are complex, there are different words with same meaning, changing letters in words like 1 banana and 2 bananas, etc. This can play an important role in search.<\/p>\n<h2 id=\"2687\" class=\"oj no gu bf ok ol pt on oo op pu or os ot pv ov ow ox pw oz pa pb px pd pe pf bk\">TS Vectors<\/h2>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"50e7\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">CREATE<\/span> EXTENSION IF <span class=\"hljs-keyword\">NOT<\/span> <span class=\"hljs-keyword\">EXISTS<\/span> vector;<\/span><\/pre>\n<p id=\"5e7f\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">To move forward and a little speed up reading a little, we can simplify it. Search with TS Vectors parses text into tokens based on the language configuration. Easier! Text is parsed into small words based on language, so for example\u00a0<code class=\"cx nt nu nv nk b\">cakes<\/code>\u00a0is parsed to\u00a0<code class=\"cx nt nu nv nk b\">cake<\/code>\u00a0with English language. Have a look:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"624a\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">select<\/span> to_tsvector(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'banana cakes for birthday parties'<\/span>);\r\n<span class=\"hljs-operator\">=<\/span><span class=\"hljs-operator\">&gt;<\/span> <span class=\"hljs-string\">'banana'<\/span>:<span class=\"hljs-number\">1<\/span> <span class=\"hljs-string\">'birthday'<\/span>:<span class=\"hljs-number\">4<\/span> <span class=\"hljs-string\">'cake'<\/span>:<span class=\"hljs-number\">2<\/span> <span class=\"hljs-string\">'parti'<\/span>:<span class=\"hljs-number\">5<\/span><\/span><\/pre>\n<p id=\"9409\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">See how sensitive it is when you change the language:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"d9eb\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">select<\/span> <span class=\"hljs-title\">to_tsvector<\/span>(<span class=\"hljs-params\"><span class=\"hljs-string\">'norwegian'<\/span>, <span class=\"hljs-string\">'banana cakes for birthday parties'<\/span><\/span>)<\/span>;\r\n=&gt; <span class=\"hljs-string\">'banan'<\/span>:<span class=\"hljs-number\">1<\/span> <span class=\"hljs-string\">'birthday'<\/span>:<span class=\"hljs-number\">4<\/span> <span class=\"hljs-string\">'cak'<\/span>:<span class=\"hljs-number\">2<\/span> <span class=\"hljs-string\">'parti'<\/span>:<span class=\"hljs-number\">5<\/span><\/span><\/pre>\n<p id=\"e2b0\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\"><code class=\"cx nt nu nv nk b\">cakes<\/code>\u00a0is not cake but\u00a0<code class=\"cx nt nu nv nk b\">cak<\/code>\u00a0because the system used a different dictionary to define the tokens and created a list of lexemes which are similar but different. It\u2019s a simple example but imagine some complications in your language which the change the way the word looks. You know better than me. But look at some examples in other languages:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"79a8\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">SELECT<\/span> to_tsvector(<span class=\"hljs-string\">'french'<\/span>, <span class=\"hljs-string\">'\u00e9conomie fran\u00e7aise est en croissance rapide.'<\/span>);\r\n<span class=\"hljs-operator\">=<\/span><span class=\"hljs-operator\">&gt;<\/span> <span class=\"hljs-string\">'croissanc'<\/span>:<span class=\"hljs-number\">5<\/span> <span class=\"hljs-string\">'fran\u00e7ais'<\/span>:<span class=\"hljs-number\">2<\/span> <span class=\"hljs-string\">'rapid'<\/span>:<span class=\"hljs-number\">6<\/span> <span class=\"hljs-string\">'\u00e9conom'<\/span>:<span class=\"hljs-number\">1<\/span>\r\n\r\n<span class=\"hljs-keyword\">SELECT<\/span> to_tsvector(<span class=\"hljs-string\">'german'<\/span>, <span class=\"hljs-string\">'Die wirtschaftliche Lage in Deutschland ist stabil und wachsend.'<\/span>);\r\n<span class=\"hljs-operator\">=<\/span><span class=\"hljs-operator\">&gt;<\/span> <span class=\"hljs-string\">'deutschland'<\/span>:<span class=\"hljs-number\">5<\/span> <span class=\"hljs-string\">'lag'<\/span>:<span class=\"hljs-number\">3<\/span> <span class=\"hljs-string\">'stabil'<\/span>:<span class=\"hljs-number\">7<\/span> <span class=\"hljs-string\">'wachsend'<\/span>:<span class=\"hljs-number\">9<\/span> <span class=\"hljs-string\">'wirtschaft'<\/span>:<span class=\"hljs-number\">2<\/span>\r\n\r\n<span class=\"hljs-keyword\">SELECT<\/span> to_tsvector(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'The quick brown fox jumps over the lazy dog.'<\/span>);\r\n<span class=\"hljs-operator\">=<\/span><span class=\"hljs-operator\">&gt;<\/span> <span class=\"hljs-string\">'brown'<\/span>:<span class=\"hljs-number\">3<\/span> <span class=\"hljs-string\">'dog'<\/span>:<span class=\"hljs-number\">9<\/span> <span class=\"hljs-string\">'fox'<\/span>:<span class=\"hljs-number\">4<\/span> <span class=\"hljs-string\">'jump'<\/span>:<span class=\"hljs-number\">5<\/span> <span class=\"hljs-string\">'lazi'<\/span>:<span class=\"hljs-number\">8<\/span> <span class=\"hljs-string\">'quick'<\/span>:<span class=\"hljs-number\">2<\/span><\/span><\/pre>\n<p id=\"5e43\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">It\u2019s important to say how the results are cleaned up for unimportant words. See the last example with English. All\u00a0<code class=\"cx nt nu nv nk b\">the<\/code>\u00a0are ignored. See also examples in other languages. They\u2019re called\u00a0<code class=\"cx nt nu nv nk b\">stop words<\/code>\u00a0and are basically discarded to make the search more clearer.<\/p>\n<p id=\"9475\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Ok first part is done, now the query itself, that\u2019s a little bit complicated. Simple example:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"43af\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">select<\/span>  to_tsquery(<span class=\"hljs-string\">'english'<\/span>,<span class=\"hljs-string\">'cooking'<\/span>);\r\n<span class=\"hljs-operator\">=<\/span><span class=\"hljs-operator\">&gt;<\/span> <span class=\"hljs-string\">'cook'<\/span><\/span><\/pre>\n<p id=\"5122\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Nice, but not usable, let\u2019s see in action:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"656d\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">select<\/span> name\r\n<span class=\"hljs-keyword\">from<\/span> products\r\n<span class=\"hljs-keyword\">where<\/span> to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name) @@ to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'cooking'<\/span>);\r\n<span class=\"hljs-operator\">=<\/span><span class=\"hljs-operator\">&gt;<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">-------------------------------+<\/span>\r\n<span class=\"hljs-operator\">|<\/span>name                           <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">-------------------------------+<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Beef <span class=\"hljs-operator\">-<\/span> Cooked, Corned          <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Ham <span class=\"hljs-operator\">-<\/span> Cooked Italian           <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Ham <span class=\"hljs-operator\">-<\/span> Cooked                   <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Oil <span class=\"hljs-operator\">-<\/span> Cooking Spray            <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pork <span class=\"hljs-operator\">-<\/span> Bacon Cooked Slcd       <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Lobster <span class=\"hljs-operator\">-<\/span> Cooked               <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Shrimp <span class=\"hljs-operator\">-<\/span> <span class=\"hljs-number\">16<\/span> <span class=\"hljs-operator\">-<\/span> <span class=\"hljs-number\">20<\/span> Cooked, Peeled<span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">-------------------------------+<\/span><\/span><\/pre>\n<p id=\"375c\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Well, it works but it would be nice to add sorting so that we are sure that the best results are at the top.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"eba3\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">select<\/span> name, ts_rank_cd(to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name), to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'cooking'<\/span>))\r\n<span class=\"hljs-keyword\">from<\/span> products\r\n<span class=\"hljs-keyword\">where<\/span> to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name) @@ to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'cooking'<\/span>)\r\n<span class=\"hljs-keyword\">order<\/span> <span class=\"hljs-keyword\">by<\/span> ts_rank_cd(to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name), to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'cooking'<\/span>));\r\n<span class=\"hljs-operator\">=<\/span><span class=\"hljs-operator\">&gt;<\/span> \r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">-------------------------------+----------+<\/span>\r\n<span class=\"hljs-operator\">|<\/span>name                           <span class=\"hljs-operator\">|<\/span>ts_rank_cd<span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">-------------------------------+----------+<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Beef <span class=\"hljs-operator\">-<\/span> Cooked, Corned          <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span>       <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Ham <span class=\"hljs-operator\">-<\/span> Cooked Italian           <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span>       <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Ham <span class=\"hljs-operator\">-<\/span> Cooked                   <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span>       <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Oil <span class=\"hljs-operator\">-<\/span> Cooking Spray            <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span>       <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pork <span class=\"hljs-operator\">-<\/span> Bacon Cooked Slcd       <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span>       <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Lobster <span class=\"hljs-operator\">-<\/span> Cooked               <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span>       <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Shrimp <span class=\"hljs-operator\">-<\/span> <span class=\"hljs-number\">16<\/span> <span class=\"hljs-operator\">-<\/span> <span class=\"hljs-number\">20<\/span> Cooked, Peeled<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span>       <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">-------------------------------+----------+<\/span><\/span><\/pre>\n<blockquote class=\"nw nx ny\">\n<p id=\"3ab4\" class=\"mg mh nz mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">This function (<code class=\"cx nt nu nv nk b\"><em class=\"gu\">ts_rank_cd<\/em><\/code>) computes the cover density ranking for the given document vector and query, as described in Clarke, Cormack, and Tudhope\u2019s \u201cRelevance Ranking for One to Three Term Queries\u201d in the journal \u201cInformation Processing and Management\u201d, 1999. Cover density is similar to\u00a0<code class=\"cx nt nu nv nk b\">ts_rank<\/code>\u00a0ranking except that the proximity of matching lexemes to each other is taken into consideration.<\/p>\n<\/blockquote>\n<p id=\"5274\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">I\u2019ve added the<code class=\"cx nt nu nv nk b\">ts_rank_cd<\/code>\u00a0column just to show you what\u2019s happening. This function uses weights to give a rank to the results. The defaults are<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"bed3\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\">{<span class=\"hljs-number\">0.1<\/span>, <span class=\"hljs-number\">0.2<\/span>, <span class=\"hljs-number\">0.4<\/span>, <span class=\"hljs-number\">1.0<\/span>}<\/span><\/pre>\n<p id=\"54b8\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Weights are used for labelled words\/vectors. Labeled words, vectors? What? Let\u2019s explain how it works by default.<\/p>\n<p id=\"4b9e\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Weights are defined by label, labels are\u00a0<code class=\"cx nt nu nv nk b\">{D, C, B, A}<\/code>\u00a0so weight for label\u00a0<code class=\"cx nt nu nv nk b\">D<\/code>\u00a0is\u00a0<code class=\"cx nt nu nv nk b\">0.1<\/code>\u00a0, etc. If weight is not defined for word\/column, label\u00a0<code class=\"cx nt nu nv nk b\">D<\/code>\u00a0is used. To make it even simpler, our search term\u00a0<code class=\"cx nt nu nv nk b\">cooking<\/code>\u00a0is (after being converted to ts vectors) is only present once in the result column, so the is\u00a0<code class=\"cx nt nu nv nk b\">1 * 0.1<\/code>. So in this case it would be wise to add another sorting based on your business logic, for example sorting by how many times the product has been bought.<\/p>\n<p id=\"ccd3\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">However, search only by product name is most often used in conjunction with searching by description.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"3b6d\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">SELECT<\/span> name,\r\n       ts_rank_cd(\r\n               to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name) <span class=\"hljs-operator\">||<\/span>\r\n               to_tsvector(<span class=\"hljs-string\">'english'<\/span>, description),\r\n               to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'cooking'<\/span>)\r\n       ) <span class=\"hljs-keyword\">AS<\/span> rank\r\n<span class=\"hljs-keyword\">FROM<\/span> products\r\n<span class=\"hljs-keyword\">WHERE<\/span> (to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name) <span class=\"hljs-operator\">||<\/span>\r\n       to_tsvector(<span class=\"hljs-string\">'english'<\/span>, description)) @@ to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'cooking'<\/span>)\r\n<span class=\"hljs-keyword\">ORDER<\/span> <span class=\"hljs-keyword\">BY<\/span> rank <span class=\"hljs-keyword\">DESC<\/span>;<\/span><\/pre>\n<p id=\"4a14\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">This takes vectors for\u00a0<code class=\"cx nt nu nv nk b\">name<\/code>\u00a0and\u00a0<code class=\"cx nt nu nv nk b\">description<\/code>\u00a0and concat vectors. Or you can use regular\u00a0<code class=\"cx nt nu nv nk b\">or<\/code>\u00a0expression:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"e54a\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">SELECT<\/span> name,\r\n       ts_rank_cd(\r\n               to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name) <span class=\"hljs-operator\">||<\/span>\r\n               to_tsvector(<span class=\"hljs-string\">'english'<\/span>, description),\r\n               to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'cooking'<\/span>)\r\n       ) <span class=\"hljs-keyword\">AS<\/span> rank\r\n<span class=\"hljs-keyword\">FROM<\/span> products\r\n<span class=\"hljs-keyword\">WHERE<\/span> to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name) @@ to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'cooking'<\/span>)\r\n   <span class=\"hljs-keyword\">OR<\/span> to_tsvector(<span class=\"hljs-string\">'english'<\/span>, description) @@ to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'cooking'<\/span>)\r\n<span class=\"hljs-keyword\">ORDER<\/span> <span class=\"hljs-keyword\">BY<\/span> rank <span class=\"hljs-keyword\">DESC<\/span>;\r\n<span class=\"hljs-operator\">=<\/span><span class=\"hljs-operator\">&gt;<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">---------------------------------+----+<\/span>\r\n<span class=\"hljs-operator\">|<\/span>name                             <span class=\"hljs-operator\">|<\/span>rank<span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">---------------------------------+----+<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Handcrafted Italian Pasta        <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.7<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Penne Primavera, Single  <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Cappellini, Dry          <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Tortellini, Fresh        <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Fusili, Dry              <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Penne, Rigate, Dry       <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Fusili Tri <span class=\"hljs-operator\">-<\/span> Coloured    <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Rotini, Colour, Dry      <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Shells, Medium, Dry      <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Cannelloni, Sheets, Fresh<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Orzo, Dry                <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Lasagna, Dry             <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Fettuccine, Egg, Fresh   <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Lasagna Noodle, Frozen   <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Linguini, Dry            <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">---------------------------------+----+<\/span><\/span><\/pre>\n<p id=\"6601\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">But it\u2019s important to note that if you use two columns to search, you can have two ranks. There are many ways to combine two ranks. By default you can concat both vectors and do a rank based on that, like in the previous example, so one rank. These ranks are just numbers so you can do whatever your maths skills will allow you.<\/p>\n<p id=\"8be1\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">So now we have a search in two columns but it\u2019s more important to show products that match the name better than description. So now we can use this weight.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"abd3\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">SELECT<\/span> name,\r\n       ts_rank_cd(\r\n           setweight(to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name), <span class=\"hljs-string\">'D'<\/span>) <span class=\"hljs-operator\">||<\/span>\r\n           setweight(to_tsvector(<span class=\"hljs-string\">'english'<\/span>, description), <span class=\"hljs-string\">'D'<\/span>),\r\n           to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'pasta'<\/span>)\r\n       ) <span class=\"hljs-keyword\">AS<\/span> rank\r\n<span class=\"hljs-keyword\">FROM<\/span> products\r\n<span class=\"hljs-keyword\">WHERE<\/span> setweight(to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name), <span class=\"hljs-string\">'D'<\/span>) @@ to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'pasta'<\/span>)\r\n   <span class=\"hljs-keyword\">OR<\/span> setweight(to_tsvector(<span class=\"hljs-string\">'english'<\/span>, description), <span class=\"hljs-string\">'D'<\/span>) @@ to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'pasta'<\/span>)\r\n<span class=\"hljs-keyword\">ORDER<\/span> <span class=\"hljs-keyword\">BY<\/span> rank <span class=\"hljs-keyword\">DESC<\/span>\r\nLIMIT <span class=\"hljs-number\">20<\/span>;\r\n<span class=\"hljs-operator\">=<\/span><span class=\"hljs-operator\">&gt;<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">---------------------------------+----+<\/span>\r\n<span class=\"hljs-operator\">|<\/span>name                             <span class=\"hljs-operator\">|<\/span>rank<span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">---------------------------------+----+<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Handcrafted Italian Pasta        <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.7<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Penne Primavera, Single  <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Cappellini, Dry          <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Tortellini, Fresh        <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Fusili, Dry              <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Penne, Rigate, Dry       <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Fusili Tri <span class=\"hljs-operator\">-<\/span> Coloured    <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Rotini, Colour, Dry      <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Shells, Medium, Dry      <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Cannelloni, Sheets, Fresh<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Orzo, Dry                <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Lasagna, Dry             <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Fettuccine, Egg, Fresh   <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Lasagna Noodle, Frozen   <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Linguini, Dry            <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.1<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">---------------------------------+----+<\/span><\/span><\/pre>\n<p id=\"55b2\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">You can see the exact same results. But we want the match with the product name to be at the top and the description to be a little lower.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"11de\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">SELECT<\/span> name,\r\n       ts_rank_cd(\r\n               <span class=\"hljs-string\">'{0.1, 0.2, 0.9, 1.0}'<\/span>,\r\n               setweight(to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name), <span class=\"hljs-string\">'A'<\/span>) <span class=\"hljs-operator\">||<\/span>\r\n               setweight(to_tsvector(<span class=\"hljs-string\">'english'<\/span>, description), <span class=\"hljs-string\">'B'<\/span>),\r\n               to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'pasta'<\/span>)\r\n       ) <span class=\"hljs-keyword\">AS<\/span> rank\r\n<span class=\"hljs-keyword\">FROM<\/span> products\r\n<span class=\"hljs-keyword\">WHERE<\/span> setweight(to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name), <span class=\"hljs-string\">'A'<\/span>) @@ to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'pasta'<\/span>)\r\n   <span class=\"hljs-keyword\">OR<\/span> setweight(to_tsvector(<span class=\"hljs-string\">'english'<\/span>, description), <span class=\"hljs-string\">'B'<\/span>) @@ to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'pasta'<\/span>)\r\n<span class=\"hljs-keyword\">ORDER<\/span> <span class=\"hljs-keyword\">BY<\/span> rank <span class=\"hljs-keyword\">DESC<\/span>\r\nLIMIT <span class=\"hljs-number\">20<\/span>;\r\n<span class=\"hljs-operator\">=<\/span><span class=\"hljs-operator\">&gt;<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">---------------------------------+---------+<\/span>\r\n<span class=\"hljs-operator\">|<\/span>name                             <span class=\"hljs-operator\">|<\/span>rank     <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">---------------------------------+---------+<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Handcrafted Italian Pasta        <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">6.3999996<\/span><span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Penne Primavera, Single  <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1<\/span>        <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Cappellini, Dry          <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1<\/span>        <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Tortellini, Fresh        <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1<\/span>        <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Fusili, Dry              <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1<\/span>        <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Penne, Rigate, Dry       <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1<\/span>        <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Fusili Tri <span class=\"hljs-operator\">-<\/span> Coloured    <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1<\/span>        <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Rotini, Colour, Dry      <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1<\/span>        <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Shells, Medium, Dry      <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1<\/span>        <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Cannelloni, Sheets, Fresh<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1<\/span>        <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Orzo, Dry                <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1<\/span>        <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Lasagna, Dry             <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1<\/span>        <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Fettuccine, Egg, Fresh   <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1<\/span>        <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Lasagna Noodle, Frozen   <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1<\/span>        <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Pasta <span class=\"hljs-operator\">-<\/span> Linguini, Dry            <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1<\/span>        <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">---------------------------------+---------+<\/span><\/span><\/pre>\n<p id=\"be8d\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">So this query will set weight to name with label\u00a0<code class=\"cx nt nu nv nk b\">A<\/code>\u00a0and label\u00a0<code class=\"cx nt nu nv nk b\">B<\/code>\u00a0for description. It also sets actual numbers for these labels. The results are the same, but the ranking is different. You can see howthe ranking has changed. But the current dataset doesn\u2019t show big differences.<\/p>\n<p id=\"44ee\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">It\u2019s important to have correct data for the search to work correctly.<\/p>\n<p id=\"2bb6\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Let\u2019s change the dataset. I found on good old internet (without AI) this dataset with products (bottles, nothing better, sorry) from\u00a0<a class=\"af oa\" href=\"https:\/\/data.world\/aewart\/project-2-data\" target=\"_blank\" rel=\"noopener ugc nofollow\">data.world<\/a>.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"274d\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">SELECT<\/span> name,\r\n       category,\r\n       ts_rank_cd(\r\n               <span class=\"hljs-string\">'{0.1, 0.2, 0.9, 1.0}'<\/span>,\r\n               setweight(to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name), <span class=\"hljs-string\">'A'<\/span>) <span class=\"hljs-operator\">||<\/span>\r\n               setweight(to_tsvector(<span class=\"hljs-string\">'english'<\/span>, category), <span class=\"hljs-string\">'B'<\/span>),\r\n               to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'decanters'<\/span>)\r\n       ) <span class=\"hljs-keyword\">AS<\/span> rank\r\n<span class=\"hljs-keyword\">FROM<\/span> bottle_products\r\n<span class=\"hljs-keyword\">WHERE<\/span> setweight(to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name), <span class=\"hljs-string\">'A'<\/span>) @@ to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'decanters'<\/span>)\r\n   <span class=\"hljs-keyword\">OR<\/span> setweight(to_tsvector(<span class=\"hljs-string\">'english'<\/span>, category), <span class=\"hljs-string\">'B'<\/span>) @@ to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'decanters'<\/span>)\r\n<span class=\"hljs-keyword\">ORDER<\/span> <span class=\"hljs-keyword\">BY<\/span> rank <span class=\"hljs-keyword\">DESC<\/span>\r\nLIMIT <span class=\"hljs-number\">20<\/span>;\r\n<span class=\"hljs-operator\">=<\/span><span class=\"hljs-operator\">&gt;<\/span> \r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">-------------------------------------------+------------------------------+----+<\/span>\r\n<span class=\"hljs-operator\">|<\/span>name                                       <span class=\"hljs-operator\">|<\/span>category                      <span class=\"hljs-operator\">|<\/span>rank<span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">-------------------------------------------+------------------------------+----+<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Giori Casino Decanter                      <span class=\"hljs-operator\">|<\/span>DECANTERS <span class=\"hljs-operator\">&amp;<\/span> SPECIALTY PACKAGES<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1.9<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Evan Williams Master <span class=\"hljs-keyword\">Select<\/span> Decanter       <span class=\"hljs-operator\">|<\/span>DECANTERS <span class=\"hljs-operator\">&amp;<\/span> SPECIALTY PACKAGES<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1.9<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Giori Two Dolphins Decanter                <span class=\"hljs-operator\">|<\/span>DECANTERS <span class=\"hljs-operator\">&amp;<\/span> SPECIALTY PACKAGES<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1.9<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Courvoisier Erte Vigne Cognac Decanter <span class=\"hljs-number\">1993<\/span><span class=\"hljs-operator\">|<\/span>DECANTERS <span class=\"hljs-operator\">&amp;<\/span> SPECIALTY PACKAGES<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1.9<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Giori Octopus Decanter                     <span class=\"hljs-operator\">|<\/span>DECANTERS <span class=\"hljs-operator\">&amp;<\/span> SPECIALTY PACKAGES<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1.9<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Giori Butterfly Decanter                   <span class=\"hljs-operator\">|<\/span>DECANTERS <span class=\"hljs-operator\">&amp;<\/span> SPECIALTY PACKAGES<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1.9<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Courvoisier Erte Decanter <span class=\"hljs-number\">1994<\/span>             <span class=\"hljs-operator\">|<\/span>DECANTERS <span class=\"hljs-operator\">&amp;<\/span> SPECIALTY PACKAGES<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1.9<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Giori Unicorn Decanter                     <span class=\"hljs-operator\">|<\/span>DECANTERS <span class=\"hljs-operator\">&amp;<\/span> SPECIALTY PACKAGES<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1.9<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Drambuie Liqueur W<span class=\"hljs-operator\">\/<\/span><span class=\"hljs-keyword\">empty<\/span> Decanter          <span class=\"hljs-operator\">|<\/span>DECANTERS <span class=\"hljs-operator\">&amp;<\/span> SPECIALTY PACKAGES<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1.9<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Jack Daniels <span class=\"hljs-number\">1954<\/span> Gold Medal Decanter      <span class=\"hljs-operator\">|<\/span>DECANTERS <span class=\"hljs-operator\">&amp;<\/span> SPECIALTY PACKAGES<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1.9<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Jack Daniels <span class=\"hljs-number\">1914<\/span> Gold Medal Decanter      <span class=\"hljs-operator\">|<\/span>DECANTERS <span class=\"hljs-operator\">&amp;<\/span> SPECIALTY PACKAGES<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1.9<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Jack Daniels <span class=\"hljs-number\">1913<\/span> Gold Medal Decanter      <span class=\"hljs-operator\">|<\/span>DECANTERS <span class=\"hljs-operator\">&amp;<\/span> SPECIALTY PACKAGES<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1.9<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Giori Golf Decanter                        <span class=\"hljs-operator\">|<\/span>DECANTERS <span class=\"hljs-operator\">&amp;<\/span> SPECIALTY PACKAGES<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1.9<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Jim Beam Black Holiday Decanter            <span class=\"hljs-operator\">|<\/span>DECANTERS <span class=\"hljs-operator\">&amp;<\/span> SPECIALTY PACKAGES<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1.9<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Jack Daniels <span class=\"hljs-number\">1904<\/span> Gold Medal Decanter      <span class=\"hljs-operator\">|<\/span>TENNESSEE WHISKIES            <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1<\/span>   <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Jack Daniels <span class=\"hljs-number\">1905<\/span> Gold Medal Decanter      <span class=\"hljs-operator\">|<\/span>TENNESSEE WHISKIES            <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1<\/span>   <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Jack Daniels <span class=\"hljs-number\">150<\/span>th Anniversary Decanter    <span class=\"hljs-operator\">|<\/span>TENNESSEE WHISKIES            <span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">1<\/span>   <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Cutty Sark W<span class=\"hljs-operator\">\/<\/span>decanter                      <span class=\"hljs-operator\">|<\/span>DECANTERS <span class=\"hljs-operator\">&amp;<\/span> SPECIALTY PACKAGES<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.9<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Cutty Sark Scotch W<span class=\"hljs-operator\">\/<\/span>ceramic Tumblers       <span class=\"hljs-operator\">|<\/span>DECANTERS <span class=\"hljs-operator\">&amp;<\/span> SPECIALTY PACKAGES<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.9<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">|<\/span>Glenmorangie <span class=\"hljs-number\">10<\/span>yo W<span class=\"hljs-operator\">\/<\/span>glass                  <span class=\"hljs-operator\">|<\/span>DECANTERS <span class=\"hljs-operator\">&amp;<\/span> SPECIALTY PACKAGES<span class=\"hljs-operator\">|<\/span><span class=\"hljs-number\">0.9<\/span> <span class=\"hljs-operator\">|<\/span>\r\n<span class=\"hljs-operator\">+<\/span><span class=\"hljs-comment\">-------------------------------------------+------------------------------+----+<\/span><\/span><\/pre>\n<p id=\"f5c9\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Nothing big but\u2026 So we search for word\u00a0<code class=\"cx nt nu nv nk b\">dencaters<\/code>\u00a0. You can see a better ranking for products containing word\u00a0<code class=\"cx nt nu nv nk b\">decanter<\/code>\u00a0in the name and also in the category. So thanks to this we will see all products in this order:<\/p>\n<ol class=\"\">\n<li id=\"1594\" class=\"mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd pl pm pn bk\" data-selectable-paragraph=\"\">with name with search term and products from relevant category,<\/li>\n<li id=\"24f5\" class=\"mg mh gu mi b mj po ml mm mn pp mp mq mr pq mt mu mv pr mx my mz ps nb nc nd pl pm pn bk\" data-selectable-paragraph=\"\">with name with search term from other categories,<\/li>\n<li id=\"c834\" class=\"mg mh gu mi b mj po ml mm mn pp mp mq mr pq mt mu mv pr mx my mz ps nb nc nd pl pm pn bk\" data-selectable-paragraph=\"\">other products from the relevant category.<\/li>\n<\/ol>\n<p id=\"d2ea\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Let\u2019s see the Elixir code for this:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"c6e5\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\">text <span class=\"hljs-operator\">=<\/span> \"decanters\"\r\nweights <span class=\"hljs-operator\">=<\/span> \"{0.1, 0.2, 0.9, 1.0}\"\r\n<span class=\"hljs-keyword\">language<\/span> <span class=\"hljs-operator\">=<\/span> \"english\"\r\nlimit <span class=\"hljs-operator\">=<\/span> <span class=\"hljs-number\">20<\/span>\r\n<span class=\"hljs-keyword\">offset<\/span> <span class=\"hljs-operator\">=<\/span> <span class=\"hljs-number\">0<\/span>\r\n\r\n<span class=\"hljs-keyword\">from<\/span>(bp <span class=\"hljs-keyword\">in<\/span> BottleProducts,\r\n  <span class=\"hljs-keyword\">where<\/span>:\r\n    fragment(\r\n      \"setweight(to_tsvector(?::text::regconfig', ?), 'A') @@ to_tsquery(?::text::regconfig, ?)\",\r\n      <span class=\"hljs-operator\">^<\/span><span class=\"hljs-keyword\">language<\/span>,\r\n      bp.name,\r\n      <span class=\"hljs-operator\">^<\/span><span class=\"hljs-keyword\">language<\/span>,\r\n      <span class=\"hljs-operator\">^<\/span>text\r\n    ) <span class=\"hljs-keyword\">or<\/span>\r\n      fragment(\r\n        \"setweight(to_tsvector(?::text::regconfig', ?), 'B') @@ to_tsquery(?::text::regconfig, ?)\",\r\n        <span class=\"hljs-operator\">^<\/span><span class=\"hljs-keyword\">language<\/span>,\r\n        bp.category,\r\n        <span class=\"hljs-operator\">^<\/span><span class=\"hljs-keyword\">language<\/span>,\r\n        <span class=\"hljs-operator\">^<\/span>text\r\n      ),\r\n  limit: <span class=\"hljs-operator\">^<\/span>limit,\r\n  <span class=\"hljs-keyword\">offset<\/span>: <span class=\"hljs-operator\">^<\/span><span class=\"hljs-keyword\">offset<\/span>,\r\n  order_by: [\r\n    <span class=\"hljs-keyword\">desc<\/span>:\r\n      fragment(\r\n        \"ts_rank_cd(?, setweight(to_tsvector(?::text::regconfig, ?), 'A') || setweight(to_tsvector(?::text::regconfig, ?), 'B'))\",\r\n        <span class=\"hljs-operator\">^<\/span>weights,\r\n        <span class=\"hljs-operator\">^<\/span><span class=\"hljs-keyword\">language<\/span>,\r\n        bp.name,\r\n        <span class=\"hljs-operator\">^<\/span><span class=\"hljs-keyword\">language<\/span>,\r\n        bp.category\r\n      )\r\n  ]\r\n)\r\n<span class=\"hljs-operator\">|<\/span><span class=\"hljs-operator\">&gt;<\/span> Repo.all()<\/span><\/pre>\n<p id=\"434c\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">It\u2019s a pretty long query and basically all the logic is in fragment where we have raw SQL. There is no other way to do it. You can write some macros which will show it much nicer in case you will use it on multiple places. Maybe there are some libraries for Ecto but basically it will just be a wrapper around fragment as in our case.<\/p>\n<p id=\"1d9e\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">This type of query does a lot of computation and it\u2019s possible to optimise it with a better query or simply by creating indexes. For example:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"ccca\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">CREATE<\/span> INDEX idx_bottle_products_name_tsvector <span class=\"hljs-keyword\">ON<\/span> bottle_products\r\n<span class=\"hljs-keyword\">USING<\/span> gin(to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name));<\/span><\/pre>\n<p id=\"8295\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">If you search with only one column:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"e6c3\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">SELECT<\/span> name,\r\n       category,\r\n       ts_rank_cd(\r\n               to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name),\r\n               to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'decanters'<\/span>)\r\n       ) <span class=\"hljs-keyword\">AS<\/span> rank\r\n<span class=\"hljs-keyword\">FROM<\/span> bottle_products\r\n<span class=\"hljs-keyword\">WHERE<\/span> to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name) @@ to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'decanters'<\/span>)\r\n<span class=\"hljs-keyword\">ORDER<\/span> <span class=\"hljs-keyword\">BY<\/span> rank <span class=\"hljs-keyword\">DESC<\/span>\r\nLIMIT <span class=\"hljs-number\">20<\/span>;<\/span><\/pre>\n<p id=\"74f0\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">I got results in 50 to 80ms without index and 20 to 30ms with index. It\u2019s important to say that you create an index for specific language. So if you have records that are in multiple languages, you need to create an index for each language. You can have multiple indexes for the same column. You just add a condition at the end. For example:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"5a92\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">CREATE<\/span> INDEX idx_bottle_products_name_english_tsvector <span class=\"hljs-keyword\">ON<\/span> bottle_products\r\n<span class=\"hljs-keyword\">USING<\/span> gin(to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name)) <span class=\"hljs-keyword\">where<\/span> language_id <span class=\"hljs-operator\">=<\/span> <span class=\"hljs-number\">1<\/span>;\r\n\r\n<span class=\"hljs-keyword\">CREATE<\/span> INDEX idx_bottle_products_name_german_tsvector <span class=\"hljs-keyword\">ON<\/span> bottle_products\r\n<span class=\"hljs-keyword\">USING<\/span> gin(to_tsvector(<span class=\"hljs-string\">'german'<\/span>, name)) <span class=\"hljs-keyword\">where<\/span> language_id <span class=\"hljs-operator\">=<\/span> <span class=\"hljs-number\">2<\/span>;<\/span><\/pre>\n<p id=\"3189\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">When you do search with these ts vectors with multiple columns it can be a little bit difficult to make a proper and fast query. Personally, I think that similarity shows better results than ts vectors, so I haven\u2019t gone deeper. But again, it\u2019s very dependent on your data. It may be much better in your cases.<\/p>\n<p id=\"96a2\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">In real life customers, visitors, users don\u2019t search with just one word. Usually two or sometimes even more.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"2437\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">SELECT<\/span> name,\r\n       category,\r\n       ts_rank_cd(\r\n               to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name),\r\n               to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'decanters bottle'<\/span>)\r\n       ) <span class=\"hljs-keyword\">AS<\/span> rank\r\n<span class=\"hljs-keyword\">FROM<\/span> bottle_products\r\n<span class=\"hljs-keyword\">WHERE<\/span> to_tsvector(<span class=\"hljs-string\">'english'<\/span>, name) @@ to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'decanters bottle'<\/span>)\r\n<span class=\"hljs-keyword\">ORDER<\/span> <span class=\"hljs-keyword\">BY<\/span> rank <span class=\"hljs-keyword\">DESC<\/span>\r\nLIMIT <span class=\"hljs-number\">20<\/span>;\r\n<span class=\"hljs-operator\">=<\/span><span class=\"hljs-operator\">&gt;<\/span> \r\nERROR: syntax error <span class=\"hljs-keyword\">in<\/span> tsquery: \"decanters bottle\"<\/span><\/pre>\n<p id=\"9ec4\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">That\u2019s because\u00a0<code class=\"cx nt nu nv nk b\">to_tsquery<\/code>\u00a0allows you to use a single word or query. This will be a bit wild. You have several options what to do. You can see a lots of examples in the\u00a0<a class=\"af oa\" href=\"https:\/\/www.postgresql.org\/docs\/current\/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES\" target=\"_blank\" rel=\"noopener ugc nofollow\">documentation<\/a>:<\/p>\n<blockquote class=\"nw nx ny\">\n<p id=\"d639\" class=\"mg mh nz mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\"><code class=\"cx nt nu nv nk b\">tsquery<\/code>\u00a0operators\u00a0<code class=\"cx nt nu nv nk b\">&amp;<\/code>\u00a0(AND),\u00a0<code class=\"cx nt nu nv nk b\">|<\/code>\u00a0(OR),\u00a0<code class=\"cx nt nu nv nk b\">!<\/code>\u00a0(NOT), and\u00a0<code class=\"cx nt nu nv nk b\">&lt;-&gt;<\/code>\u00a0(FOLLOWED BY)<\/p>\n<\/blockquote>\n<p id=\"cad5\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Let\u2019s look at two basic examples:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"0c73\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\">to_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'decanters | bottle'<\/span>)\r\n=&gt; decanters <span class=\"hljs-variable.constant\">OR<\/span> bottle\r\nto_tsquery(<span class=\"hljs-string\">'english'<\/span>, <span class=\"hljs-string\">'decanters &amp; bottle'<\/span>)\r\n=&gt; decanters <span class=\"hljs-variable.constant\">AND<\/span> bottle\r\n<\/span><\/pre>\n<p id=\"a78b\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">To move this problem to real code, let\u2019s look at the updated function for making query:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"cf91\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\">text = <span class=\"hljs-string\">\"decanters bottle\"<\/span> |<span class=\"hljs-params\">&gt; String.split(\" \") <\/span>|&gt; Enum.join(<span class=\"hljs-string\">\" &amp; \"<\/span>)\r\nweights = <span class=\"hljs-string\">\"{0.1, 0.2, 0.9, 1.0}\"<\/span>\r\nlanguage = <span class=\"hljs-string\">\"english\"<\/span>\r\nlimit = <span class=\"hljs-number\">20<\/span>\r\noffset = <span class=\"hljs-number\">0<\/span>\r\n\r\nfrom(bp <span class=\"hljs-keyword\">in<\/span> BottleProducts,\r\n  <span class=\"hljs-symbol\">where:<\/span>\r\n    fragment(\r\n      <span class=\"hljs-string\">\"setweight(to_tsvector(?::text::regconfig', ?), 'A') @@ to_tsquery(?::text::regconfig, ?)\"<\/span>,\r\n      ^language,\r\n      bp.name,\r\n      ^language,\r\n      ^text\r\n    ) <span class=\"hljs-keyword\">or<\/span>\r\n      fragment(\r\n        <span class=\"hljs-string\">\"setweight(to_tsvector(?::text::regconfig', ?), 'B') @@ to_tsquery(?::text::regconfig, ?)\"<\/span>,\r\n        ^language,\r\n        bp.category,\r\n        ^language,\r\n        ^text\r\n      ),\r\n  <span class=\"hljs-symbol\">limit:<\/span> ^limit,\r\n  <span class=\"hljs-symbol\">offset:<\/span> ^offset,\r\n  <span class=\"hljs-symbol\">order_by:<\/span> [\r\n    <span class=\"hljs-symbol\">desc:<\/span>\r\n      fragment(\r\n        <span class=\"hljs-string\">\"ts_rank_cd(?, setweight(to_tsvector(?::text::regconfig, ?), 'A') || setweight(to_tsvector(?::text::regconfig, ?), 'B'))\"<\/span>,\r\n        ^weights,\r\n        ^language,\r\n        bp.name,\r\n        ^language,\r\n        bp.category\r\n      )\r\n  ]\r\n)\r\n|<span class=\"hljs-params\">&gt; Repo.all()<\/span><\/span><\/pre>\n<p id=\"9655\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">That\u2019s a very small change:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"3901\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\">text = <span class=\"hljs-string\">\"decanters bottle\"<\/span> |<span class=\"hljs-params\">&gt; String.split(\" \") <\/span>|&gt; Enum.join(<span class=\"hljs-string\">\" &amp; \"<\/span>)<\/span><\/pre>\n<p id=\"db77\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">But it\u2019s not good because:<\/p>\n<ol class=\"\">\n<li id=\"50d9\" class=\"mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd pl pm pn bk\" data-selectable-paragraph=\"\">input can be different and it can lead to wrong query<\/li>\n<li id=\"17f3\" class=\"mg mh gu mi b mj po ml mm mn pp mp mq mr pq mt mu mv pr mx my mz ps nb nc nd pl pm pn bk\" data-selectable-paragraph=\"\">it assume we want to use only &amp; operator<\/li>\n<\/ol>\n<p id=\"e773\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Let\u2019s see the first problem:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"6059\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"> <span class=\"hljs-string\">\"decanters   bottle\"<\/span> |&gt; <span class=\"hljs-type\">String<\/span>.split(<span class=\"hljs-string\">\" \"<\/span>) |&gt; <span class=\"hljs-keyword\">Enum<\/span>.<span class=\"hljs-keyword\">join<\/span>(<span class=\"hljs-string\">\" &amp; \"<\/span>)\r\n<span class=\"hljs-string\">\"decanters &amp;  &amp;  &amp; bottle\"<\/span><\/span><\/pre>\n<p id=\"eb95\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">How can we fix it? It can be complex and handle more use cases but this is very simple solution:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"25f6\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"> <span class=\"hljs-string\">\"decanters   |bottle &amp; whisky  \"<\/span>\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.trim()\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.downcase()\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.replace(<span class=\"hljs-string\">\"\\&amp;\"<\/span>, <span class=\"hljs-string\">\"\"<\/span>)\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.replace(<span class=\"hljs-string\">\"\\|\"<\/span>, <span class=\"hljs-string\">\"\"<\/span>)\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.replace(~r\/\\s+\/, <span class=\"hljs-string\">\" \"<\/span>)\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.split(<span class=\"hljs-string\">\" \"<\/span>)\r\n    |&gt; <span class=\"hljs-keyword\">Enum<\/span>.<span class=\"hljs-keyword\">join<\/span>(<span class=\"hljs-string\">\" &amp; \"<\/span>)\r\n\r\n<span class=\"hljs-string\">\"decanters &amp; bottle &amp; whisky\"<\/span><\/span><\/pre>\n<p id=\"0d0a\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">The point is to normalise the input, remove the chars which the query uses and do the actual query with word parsing. It\u2019s very simple, probably can be done with one RegEx or another way. It\u2019s up to you to make it work with your logic and skills.<\/p>\n<p id=\"63f3\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">It\u2019s important to mention why I made\u00a0<code class=\"cx nt nu nv nk b\">String.downcase<\/code>. It\u2019s not necessary but it\u2019s nice to use some kind of common format and in the case of using caching search results, it can save a lot of hints when visitors enter something with capital letters.<\/p>\n<p id=\"d65f\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">The second problem is more complex, because if the search term consists of several words, we assume that it\u2019s an \u201cand\u201d operator. So each search result must contain all the words. It\u2019s very complex and really depends on the use of this search. I\u2019m not going to write code on how to do it, because basically it can be done in a whole new article. I think there are a lot of articles about parsing queries and making logic. Very simple code which I suggest not to use in production, can be something like this:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"d2d6\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-string\">\"decanters  or whisky\"<\/span>\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.trim()\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.downcase()\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.replace(<span class=\"hljs-string\">\"\\&amp;\"<\/span>, <span class=\"hljs-string\">\"\"<\/span>)\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.replace(<span class=\"hljs-string\">\"\\|\"<\/span>, <span class=\"hljs-string\">\"\"<\/span>)\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.replace(~r\/\\s+\/, <span class=\"hljs-string\">\" \"<\/span>)\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.split(<span class=\"hljs-string\">\" \"<\/span>)\r\n    |&gt; <span class=\"hljs-keyword\">Enum<\/span>.<span class=\"hljs-keyword\">join<\/span>(<span class=\"hljs-string\">\" &amp; \"<\/span>)\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.replace(<span class=\"hljs-string\">\"&amp; or &amp;\"<\/span>, <span class=\"hljs-string\">\"|\"<\/span>)\r\n\r\n<span class=\"hljs-string\">\"decanters | whisky\"<\/span><\/span><\/pre>\n<p id=\"298e\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">It will only work with one operator, when you enter multiple operators it may cause create logical problems.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"3c76\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-string\">\"decanters  or whisky or glass and water\"<\/span>\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.trim()\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.downcase()\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.replace(<span class=\"hljs-string\">\"\\&amp;\"<\/span>, <span class=\"hljs-string\">\"\"<\/span>)\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.replace(<span class=\"hljs-string\">\"\\|\"<\/span>, <span class=\"hljs-string\">\"\"<\/span>)\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.replace(~r\/\\s+\/, <span class=\"hljs-string\">\" \"<\/span>)\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.split(<span class=\"hljs-string\">\" \"<\/span>)\r\n    |&gt; <span class=\"hljs-keyword\">Enum<\/span>.<span class=\"hljs-keyword\">join<\/span>(<span class=\"hljs-string\">\" &amp; \"<\/span>)\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.replace(<span class=\"hljs-string\">\"&amp; or &amp;\"<\/span>, <span class=\"hljs-string\">\"|\"<\/span>)\r\n    |&gt; <span class=\"hljs-type\">String<\/span>.replace(<span class=\"hljs-string\">\"&amp; and &amp;\"<\/span>, <span class=\"hljs-string\">\"&amp;\"<\/span>)\r\n\r\n<span class=\"hljs-string\">\"decanters | whisky | glass &amp; water\"<\/span><\/span><\/pre>\n<p id=\"3490\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">You may have the problem that you will have products with similar names but really different meanings. For example chocolate and plate in chocolate colour. Similarity search will find similar parts and assume it\u2019s similar, ts vectors as well. Maybe it will be lower in results but still. It will show plates when customer want to buy chocolate for girl.<\/p>\n<h2 id=\"1666\" class=\"oj no gu bf ok ol pt on oo op pu or os ot pv ov ow ox pw oz pa pb px pd pe pf bk\">Semantic Search<\/h2>\n<p id=\"4606\" class=\"pw-post-body-paragraph mg mh gu mi b mj pg ml mm mn ph mp mq mr pi mt mu mv pj mx my mz pk nb nc nd gn bk\" data-selectable-paragraph=\"\">Big player in the game. Semantic search. Sounds modern, sounds like something with AI. Well, we can easily simplify it or define it as \u201cusing the meaning of words to search\u201d. People, especially developers, realised a very long time ago that search uses human languages which are much more complex and not as precise as programming languages. You can read more about the history of \u201cusing meaningful search\u201d in this nice article from Algolia (a really good 3rd party search provider):<\/p>\n<div class=\"pz qa qb qc qd qe\">\n<p><a href=\"https:\/\/www.algolia.com\/blog\/ai\/the-past-present-and-future-of-semantic-search\/\"><img loading=\"lazy\" decoding=\"async\" class=\"size-large wp-image-5828 aligncenter\" src=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.51.18-1024x254.png\" alt=\"\" width=\"1024\" height=\"254\" srcset=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.51.18-1024x254.png 1024w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.51.18-300x75.png 300w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.51.18-768x191.png 768w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.51.18.png 1360w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/p>\n<\/div>\n<p id=\"6d2d\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">You can read in the article AI search in 2007, vector search in 2013. So nothing to do with current (2024) hype about AI. These technologies are so old that they can already walk to school alone without their parents. But they still need to be guided and shown the right way to live.<\/p>\n<p id=\"b234\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">It\u2019s important to mention that Semantic Search it\u2019s a slightly broader concept, not anexact way how to do it. Main goal is to use tools that take meanings of what search term want to achieve and execute it with understanding of content and context with relevant data.<\/p>\n<p id=\"7225\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">For this purpose I will show the vector search. PostgreSQL is wondeful for using real data with vectors thanks to the\u00a0<code class=\"cx nt nu nv nk b\">pgvector<\/code>\u00a0extension.<\/p>\n<div class=\"pz qa qb qc qd qe\">\n<div class=\"qf ab is\">\n<div class=\"qg ab co cb qh qi\">\n<div class=\"qr l\">\n<p class=\"bf b dv z qj qk ql qm qn qo qp du\"><a href=\"https:\/\/github.com\/pgvector\/pgvector?source=post_page-----613069cb2f21--------------------------------\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-5831\" src=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.57.07-1024x254.png\" alt=\"\" width=\"1024\" height=\"254\" srcset=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.57.07-1024x254.png 1024w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.57.07-300x74.png 300w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.57.07-768x190.png 768w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.57.07.png 1364w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"56f8\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">The definition what is a vector, or vector search is quite complex. I\u2019m not the right guy to explain it properly. I will give you just very simple definition which I used to explain my non-technical colleagues.<\/p>\n<p id=\"336c\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">First of all, the following visualisation of the vectors is really good.<\/p>\n<figure class=\"ne nf ng nh ni rc qz ra paragraph-image\">\n<div class=\"rd re fj rf bh rg\" tabindex=\"0\" role=\"button\">\n<div class=\"qz ra rb\"><a href=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/0_PByO52eGBjg6v3XH.webp\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/0_PByO52eGBjg6v3XH.webp\" alt=\"Words plotted in 3-dimensional space. Embeddings can have hundreds or thousands of dimensions\u2013too many for humans to visualize.\" width=\"700\" height=\"267\" \/><\/a><\/div>\n<\/div><figcaption class=\"ri ff rj qz ra rk rl bf b bg z du\" data-selectable-paragraph=\"\">Words plotted in 3-dimensional space. Embeddings can have hundreds or thousands of dimensions \u2014 too many for humans to visualise.<\/figcaption><\/figure>\n<p id=\"1820\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">I borrowed this image from already mentioned Algolia article, and they borrowed it from this article:<\/p>\n<div class=\"pz qa qb qc qd qe\">\n<div class=\"qf ab is\">\n<div class=\"qg ab co cb qh qi\">\n<h3 class=\"bf gv ij z qj qk ql qm qn qo qp gt bk\"><a href=\"https:\/\/cloud.google.com\/blog\/topics\/developers-practitioners\/meet-ais-multitool-vector-embeddings?source=post_page-----613069cb2f21--------------------------------\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-5832\" src=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.58.13-1024x254.png\" alt=\"\" width=\"1024\" height=\"254\" srcset=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.58.13-1024x254.png 1024w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.58.13-300x74.png 300w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.58.13-768x190.png 768w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.58.13.png 1364w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/h3>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"a84e\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">You can see words and connections between them in this picture. It\u2019s in 3-dimensional space. Each dimension has its own meaning. It can be distance from North Pole, or 0 can be black 1 can be white and all other colours can be in between. It really depends on the author of this \u201cspace\u201d. These spaces are called embedding models. I explain it to non-technical people as \u201cGPS\u201d coordinates in fake word. Where vectors are several numbers similar to latitude and longitude in our world. And model is map. So basically vectors are coordinates on map. Yes, it\u2019s not correct and my university teachers would probably try to physically attack me for that definition, but it\u2019s as it is, I\u2019m a practical guy.<\/p>\n<p id=\"6f65\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Nice example is model from OpenAI. They have model\u00a0<strong class=\"mi gv\"><em class=\"nz\">text-embedding-3-large<\/em><\/strong>\u00a0with size of 3072 dimensions. This means that if you put some text into this model, it will return you an array\/list of 3072 numbers. These numbers are basically coordinates in their world named \u201ctext-embedding-3-large\u201d. They have other models too. You can read more about how this model is created and some comparisons in their\u00a0<a class=\"af oa\" href=\"https:\/\/openai.com\/index\/new-embedding-models-and-api-updates\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">article<\/a>.<\/p>\n<p id=\"24aa\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">So let\u2019s move on to the practical and technical. Using vectors for search consists of two parts:<\/p>\n<ol class=\"\">\n<li id=\"28b8\" class=\"mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd pl pm pn bk\" data-selectable-paragraph=\"\">Obtaining and storing Vectors<\/li>\n<li id=\"d610\" class=\"mg mh gu mi b mj po ml mm mn pp mp mq mr pq mt mu mv pr mx my mz ps nb nc nd pl pm pn bk\" data-selectable-paragraph=\"\">Calculations with Vectors<\/li>\n<\/ol>\n<h3 id=\"ebb0\" class=\"rn no gu bf ok ro rp dy oo rq rr ea os mr rs rt ru mv rv rw rx mz ry rz sa sb bk\">Obtaining and storing Vectors<\/h3>\n<p id=\"18c5\" class=\"pw-post-body-paragraph mg mh gu mi b mj pg ml mm mn ph mp mq mr pi mt mu mv pj mx my mz pk nb nc nd gn bk\" data-selectable-paragraph=\"\">First you need to know your data. Getting the right vectors is very important. It\u2019s different for getting vectors for articles and for products or small text or big text. The problem with big or huge text like book or article is that it can contain a lot of meanings. One part of article can be about bad experience with A and second part can be about good experience with B. So embedding model will have hard work to define which vectors to use, where is the position on every dimension. Or product description can also contain tutorial how to cook with it. Thanks to this, the model can make a decision and understand that the meaning of this content is about this meal not about the product.<\/p>\n<p id=\"ad36\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\"><strong class=\"mi gv\">Context<\/strong><\/p>\n<p id=\"b79d\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">For example for articles, books, documentation or basically any large text. It\u2019s common practice to split text into several blocks and vectorize only those blocks. You will get better vectors thanks to smaller possible meanings and you can also address results to more details. For example instead of showing the result that your topic is mentioned in book A, you can show the result that your topic is mentioned in book A in chapter 23, paragraph 5.<\/p>\n<p id=\"6eee\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">If you have a product with description, you need to make sure that you don\u2019t have any directly related text in it. How to cook with this product, history of the company that made this product, or whatever is not directly related to product itself. It\u2019s important because we\u2019re not in the state of models\/AI (2024) where it can automatically define which context is main and important and which is irrelevant and should be ignored. It\u2019s possible but you have to take the right steps first. For example you can take only a few of sentences from the description or use summarised description if you have one. Or you can use Other AI models to generate the right description from your big description. It\u2019s important to focus on the right context.<\/p>\n<p id=\"53aa\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Sometimes it\u2019s not about about removing text, It\u2019s about adding detail. Imagine you are an online shop for coffee. You have names of all products, brands, etc\u2026 You have a good search for products. But you also want to add a search for brands. So you want to use brand name and description, maybe location. Great. It will work, but it can work better. Imagine that some brands don\u2019t just sell coffee, but also equipment to make coffee, some other ingredients or specialise in some special categories of coffee. So , for example, adding the top 5 categories of products that this brand sells in your store will help create a better context for the model to decide the right vectors.<\/p>\n<p id=\"1433\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Important note! Do not forget to remove html or other tags from your text. They\u2019re not needed to get proper vectors and can waste some potential.<\/p>\n<p id=\"64e6\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\"><strong class=\"mi gv\">Even more context<\/strong><\/p>\n<p id=\"7c87\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">So we have proper data, big text is broken down to small, product contains only directly relevant data, brands contains top categories. Great, are we done with proper context? NO!<\/p>\n<p id=\"2011\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Imagine that you just met some random friend and you tell that person:<\/p>\n<ul class=\"\">\n<li id=\"8c35\" class=\"mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd sc pm pn bk\" data-selectable-paragraph=\"\">Dark Roasted Peru, Bean Lovers, sweat taste from South America<\/li>\n<li id=\"9d9b\" class=\"mg mh gu mi b mj po ml mm mn pp mp mq mr pq mt mu mv pr mx my mz ps nb nc nd sc pm pn bk\" data-selectable-paragraph=\"\">Serrano Superior, Ebenica, exotic taste from Kuba<\/li>\n<\/ul>\n<p id=\"14b4\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Well, it can go several ways, but probably the person will not understand what you want to say and the feedback will be different. So let\u2019s give some context:<\/p>\n<ul class=\"\">\n<li id=\"8640\" class=\"mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd sc pm pn bk\" data-selectable-paragraph=\"\">I want to buy coffee Dark Roasted Peru from brand Bean Lovers, it should have sweet taste and it was gown in South America<\/li>\n<li id=\"635f\" class=\"mg mh gu mi b mj po ml mm mn pp mp mq mr pq mt mu mv pr mx my mz ps nb nc nd sc pm pn bk\" data-selectable-paragraph=\"\">I want to buy coffee Serrano Superior by brand Ebenica. It should have exotic taste from Kuba.<\/li>\n<\/ul>\n<p id=\"61ea\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">I don\u2019t think it needs any further explanation. It\u2019s similar with machines. You have to give special context to tell machine that we are talking about products from the store, which part of the text is brand, which is the description, etc.<\/p>\n<p id=\"2a29\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Let\u2019s see some example:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"5bf3\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-string\">\"A <span class=\"hljs-subst\">#{product_category}<\/span> product named <span class=\"hljs-subst\">#{product_name}<\/span> from brand <span class=\"hljs-subst\">#{product_brand}<\/span>. Description: <span class=\"hljs-subst\">#{product_description}<\/span>\"<\/span>\r\n<span class=\"hljs-string\">\"A brand named <span class=\"hljs-subst\">#{brand_name}<\/span> selling <span class=\"hljs-subst\">#{top_categories}<\/span> from <span class=\"hljs-subst\">#{brand_location}<\/span>. Description: <span class=\"hljs-subst\">#{brand_description}<\/span>\"<\/span><\/span><\/pre>\n<p id=\"e2e8\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">If you have data in a different language or multiple languages, you should mention this in this \u201cprompt\u201d and the text itself can be in that language.<\/p>\n<p id=\"f7f8\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Ok, so basically we took very simple stuff and ame it a little bit more complex. I have bad news. I\u2019m not done yet with the context.<\/p>\n<p id=\"3dce\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Just as you need to add context to your data, you need to add context to search terms. Because the same as I wrote for products or brands. So to go on, there is anexample:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"8c05\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-string\">\"Search for <span class=\"hljs-subst\">#{term_string}<\/span> in coffee products or related equipment. Include related terms, brands, categories or similar products \"<\/span><\/span><\/pre>\n<p id=\"10ea\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Ok, we have prepared text to get vectors, now we need to get vectors. You can use local offline models or try some 3rd party api. I use OpenAI, it\u2019s easy.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"0168\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"> <span class=\"hljs-variable\">@spec<\/span> text_to_vector(String.t() |<span class=\"hljs-params\"> list(String.t())) ::\r\n          {:ok, list(float()) <\/span>| list(list(float()))} |<span class=\"hljs-params\"> {:error, any()}\r\n def text_to_vector(text) <span class=\"hljs-keyword\">do<\/span>\r\n  <span class=\"hljs-keyword\">case<\/span> OpenAI.embeddings(model: \"text-embedding-3-large\", input: text, dimensions: 2000) <span class=\"hljs-keyword\">do<\/span>\r\n    {:ok, %{data: [%{\"embedding\" =&gt; e}]}} -&gt;\r\n      {:ok, e}\r\n    {:ok, %{data: embeddings}} -&gt;\r\n      {:ok, Enum.map(embeddings, fn %{\"embedding\" =&gt; e} -&gt; e <span class=\"hljs-keyword\">end<\/span>)}\r\n    {:error, reason} -&gt;\r\n      {:error, reason}\r\n  <span class=\"hljs-keyword\">end<\/span>\r\n<span class=\"hljs-keyword\">end<\/span><\/span><\/span><\/pre>\n<p id=\"6232\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">It\u2019s a simple use of the API, you can send a text or multiple texts and you will get vectors for them. I\u2019m using this OpenAI library\u00a0<a class=\"af oa\" href=\"https:\/\/hex.pm\/packages\/openai\" target=\"_blank\" rel=\"noopener ugc nofollow\">https:\/\/hex.pm\/packages\/openai<\/a><\/p>\n<p id=\"2f7b\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">You don\u2019t have to use a paid 3rd party service. You can for example use the great Elixir library\u00a0<a class=\"af oa\" href=\"https:\/\/hex.pm\/packages\/bumblebee\" target=\"_blank\" rel=\"noopener ugc nofollow\">Bumblebee<\/a>\u00a0and use some models from the wonderful\u00a0<a class=\"af oa\" href=\"https:\/\/huggingface.co\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Hugging Face<\/a>. See code example in\u00a0<a class=\"af oa\" href=\"https:\/\/hexdocs.pm\/bumblebee\/llms_rag.html#generating-embeddings\" target=\"_blank\" rel=\"noopener ugc nofollow\">docs<\/a>. It may look complicated, but basically it\u2019s copy&amp;paste and it really works.<\/p>\n<p id=\"e3cb\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\"><strong class=\"mi gv\">Storage<\/strong><\/p>\n<p id=\"ada5\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Now we have vectors, what now? First you have to decide, based on your usage, whether you want to have a separate table where you store, for example product_id and vector itself. Or you will store them in the same table.<\/p>\n<p id=\"9e23\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">To store it in PostgreSQL, don\u2019t forget to have the\u00a0<a class=\"af oa\" href=\"https:\/\/github.com\/pgvector\/pgvector\" target=\"_blank\" rel=\"noopener ugc nofollow\">pgvector<\/a>\u00a0extension installed and enabled. They also have their own Docker image, so if you use Docker for your PostgreSQL yes, many people do ) you can use it. But more importantly, if you have running tests in your CI\/CD, you can use this Docker image for example in the GitHub Action\u2019s job.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"d5ee\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-attr\">services:<\/span>\r\n  <span class=\"hljs-attr\">postgres:<\/span>\r\n    <span class=\"hljs-attr\">image:<\/span> <span class=\"hljs-string\">pgvector\/pgvector:pg16<\/span><\/span><\/pre>\n<p id=\"22f6\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">So it\u2019s easy to enable this extension if it\u2019s available\/installed on your server:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"33d8\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">CREATE<\/span> EXTENSION IF <span class=\"hljs-keyword\">NOT<\/span> <span class=\"hljs-keyword\">EXISTS<\/span> vector;<\/span><\/pre>\n<p id=\"3d1e\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">or Ecto migration version:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"07cc\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\">  <span class=\"hljs-built_in\">execute<\/span> <span class=\"hljs-string\">\"CREATE EXTENSION IF NOT EXISTS vector;\"<\/span><\/span><\/pre>\n<p id=\"5c87\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">In order for your Elixir application to work with Ecto and vectors you need to have a new attribute type. You can create it yourself or use the\u00a0<a class=\"af oa\" href=\"https:\/\/hex.pm\/packages\/pgvector\" target=\"_blank\" rel=\"noopener ugc nofollow\">pgvector<\/a>\u00a0hex package for it. Thanks to the great work in this package you have type and also query macros that you can use to make nicer queries, but we will talk about that later.<\/p>\n<p id=\"875a\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">If you want to create a new table or add a new column, it\u2019s easy with Ecto migration. You can also do it in your language, so it depends on your framework what you use.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"68f2\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\">add <span class=\"hljs-symbol\">:vector<\/span>, <span class=\"hljs-symbol\">:vector<\/span>, <span class=\"hljs-symbol\">size:<\/span> <span class=\"hljs-number\">2_000<\/span><\/span><\/pre>\n<p id=\"70d9\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">First \u201cvector\u201d is the name of your field, you can use whatever you want. Second is a new type thanks to the hex package and PostgreSQL\u2019s extension. Size is the number of dimensions for this vector. It\u2019s important. It has to be the same size as the output of your model. My example uses OpenAI\u2019s api which allows to specify number of dimensions in output. In case you can not change it as easy as me. There are some code examples on internet how to do it by yourself. It\u2019s out of my current \u201cvector\u2019s knowledge\u201d so I\u2019m not going to play with it.<\/p>\n<p id=\"26e2\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">I could use the default dimension size, but the current PostgreSQL extension only supports indexing only up to 2000 dimensions for vectors. See more details on their\u00a0<a class=\"af oa\" href=\"https:\/\/github.com\/pgvector\/pgvector?tab=readme-ov-file#indexing\" target=\"_blank\" rel=\"noopener ugc nofollow\">GitHub Page in Index section<\/a>.<\/p>\n<p id=\"2996\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Creating index on this field is important for me. It makes query much faster, it\u2019s probably because of large number of dimensions than number of entries in table. There are more types of indexes, read about them and choose what you think is best for your data. Or just try it. My winner is HNSW.<\/p>\n<p id=\"44fb\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Creating an index is easy:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"45ee\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-function\">CREATE INDEX CONCURRENTLY your_table_vectors_vector_idx\r\n      ON your_table_vectors\r\n      USING <span class=\"hljs-title\">hnsw<\/span> <span class=\"hljs-params\">(vector vector_ip_ops)<\/span><\/span>;<\/span><\/pre>\n<p id=\"6d8a\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">But there is a very important part vector_ip_ops. It\u2019s the name of the distance function you want to use for queries. I use OpenAI\u2019s model and in their documentation it is suggested to use function for inner product distance function instead of cosine distance. So based on your data you have to decide which function to use for your queries. It\u2019s easy to change, you can even play around with it later. But don\u2019t forget that if you have a lot of data, indexing can take a long time.<\/p>\n<p id=\"c001\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">If you really have a lot of data, you should read this article about the new\u00a0<a class=\"af oa\" href=\"https:\/\/github.com\/timescale\/pgvectorscale\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">pgvectorscale<\/a>\u00a0extension from the Timescale team.<\/p>\n<div class=\"pz qa qb qc qd qe\">\n<div class=\"qf ab is\">\n<div class=\"qg ab co cb qh qi\">\n<div class=\"qr l\">\n<p class=\"bf b dv z qj qk ql qm qn qo qp du\"><a href=\"https:\/\/thenewstack.io\/make-pgvector-faster-than-pinecone-and-75-cheaper-with-this-new-open-source-extension\/?source=post_page-----613069cb2f21--------------------------------\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-5833\" src=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.59.18-1024x254.png\" alt=\"\" width=\"1024\" height=\"254\" srcset=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.59.18-1024x254.png 1024w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.59.18-300x74.png 300w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.59.18-768x190.png 768w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-20.59.18.png 1364w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"ef7c\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">One important note when you store vector in a table with other data. When you use Elixir Ecto schema, you really don\u2019t want to load vector every time when you work with that product\/schema. It can be a problem because imagine that you have products and every time when you show product on index page or detail page, your system will load all vectors for every product. You really don\u2019t want that. Ecto has an incredible solution for it. Just add this option to your schema definition and it\u2019s done.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"5af8\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\">field <span class=\"hljs-symbol\">:vector<\/span>, Pgvector.Ecto.Vector, <span class=\"hljs-symbol\">load_in_query:<\/span> <span class=\"hljs-literal\">false<\/span><\/span><\/pre>\n<p id=\"7f8f\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">It will ignore this field when Ecto loads the record from the database. You have to explicitly ask for it to load it. Thanks to this, you don\u2019t have to worry about downloading all vectors, parsing them and loading all vectors during regular operations. You may not use these vectors in your application at all, only in the database. I do it that way. Of course only creating them and storing them in the database. But this can and should be done asynchronously. Some services like Supabase or some extensions support it directly from database. You just put the entry in the database and it will call the relevant api and store it in the database for you. Is it good? Hard to say, I don\u2019t like the idea of having that kind of logic in the database. But it might be great for someone. Timescale has published a big tutorial on how to use these tools.<\/p>\n<div class=\"pz qa qb qc qd qe\">\n<div class=\"qf ab is\">\n<div class=\"qg ab co cb qh qi\"><\/div>\n<div class=\"qs l\">\n<div class=\"se l qu qv qw qs qx ln qe\"><a href=\"https:\/\/www.timescale.com\/blog\/build-a-fully-local-rag-app-with-postgresql-mistral-and-ollama\/?source=post_page-----613069cb2f21--------------------------------\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-5834\" src=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.00.24-1024x260.png\" alt=\"\" width=\"1024\" height=\"260\" srcset=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.00.24-1024x260.png 1024w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.00.24-300x76.png 300w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.00.24-768x195.png 768w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.00.24.png 1364w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"8dd7\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">The output of the model, the vector, is basically a list of numbers. So no further steps are required. Just store it in your database as returned by the model\u2019s api.<\/p>\n<h3 id=\"a5ed\" class=\"rn no gu bf ok ro rp dy oo rq rr ea os mr rs rt ru mv rv rw rx mz ry rz sa sb bk\">Calculations with Vectors<\/h3>\n<p id=\"82e9\" class=\"pw-post-body-paragraph mg mh gu mi b mj pg ml mm mn ph mp mq mr pi mt mu mv pj mx my mz pk nb nc nd gn bk\" data-selectable-paragraph=\"\">Well, I say calculations because it\u2019s really about calculations, but it\u2019s really about searching.<\/p>\n<p id=\"a69e\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">First, some important technical notes. If you use Index like HNSW for your vector\u2019s column. There are some important information. By default index only returns first 40 results only. You can read about this on\u00a0<a class=\"af oa\" href=\"https:\/\/github.com\/pgvector\/pgvector?tab=readme-ov-file#query-options\" target=\"_blank\" rel=\"noopener ugc nofollow\">pgvector\u2019s GitHub<\/a>. You need to change configuration for your database or whole server. This depends on your PostgreSQL instance. So you can do it on your current database:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"5cf1\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">ALTER<\/span> DATABASE your_database <span class=\"hljs-keyword\">SET<\/span> hnsw.ef_search <span class=\"hljs-keyword\">TO<\/span> <span class=\"hljs-number\">750<\/span>;<\/span><\/pre>\n<p id=\"c8e0\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Or you can add it to the configuration for all databases, as I did locally for the PostgreSQL installed by Homebrew.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"d026\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-comment\"># \/opt\/homebrew\/var\/postgresql@16\/postgresql.conf<\/span>\r\n<span class=\"hljs-attr\">hnsw.ef_search<\/span> = <span class=\"hljs-number\">750<\/span><\/span><\/pre>\n<p id=\"2389\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">I have a configuration for 750 results. It\u2019s a lot, but I have a reason for it. The usual approach is to make a query that returns you products (movies, names or other entries which you have) and in this case you don\u2019t need 750 results. But if you need to have more complex filters, the query can get really big and complex. Complexity was one of the more reasons why I went with the multiple queries approach:<\/p>\n<ol class=\"\">\n<li id=\"9cc1\" class=\"mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd pl pm pn bk\" data-selectable-paragraph=\"\">Query to check the existence of a search term in our database<\/li>\n<li id=\"0f5d\" class=\"mg mh gu mi b mj po ml mm mn pp mp mq mr pq mt mu mv pr mx my mz ps nb nc nd pl pm pn bk\" data-selectable-paragraph=\"\">Query that returns search results with ONLY ids<\/li>\n<li id=\"5a58\" class=\"mg mh gu mi b mj po ml mm mn pp mp mq mr pq mt mu mv pr mx my mz ps nb nc nd pl pm pn bk\" data-selectable-paragraph=\"\">Query that returns actual results which are used for visitors<\/li>\n<\/ol>\n<p id=\"cd65\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">So you can see step number 2 where the database query returns only ids. And later I do my business logic filtering and using these results. So even if I need 10 products\/results, my business logic, filters, sorters will know much more results than 10. Because maybe the first 100 of those results are not suitable for results. Imagine category filters, price filters, tag filters, delivery locations, etc\u2026. Based on some tests for me number 750 was fine for me. I used a technique where I searched with many more results and checked results at the bottom. When the didn\u2019t make sense I went -50% off that number. I searched again, checked at the bottom, if the last results made sense I added +50%, etc.<\/p>\n<p id=\"f1ad\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\"><strong class=\"mi gv\">Query to check existence of a search term in our database<\/strong><\/p>\n<p id=\"2e49\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">First of all, remember, that this type of search doesn\u2019t use pure pure input from the html input field for the actual search. It uses this input as text which is transformed into vectors. The transformation can be done locally or with 3rd party API. But it doesn\u2019t matter, it will still be slow and more importantly, customers don\u2019t search for unique search terms. It\u2019s common for them to search for the same things quite often. So why should we create vectors for the same search every time, even if we have multiple visitors searching for same thing. So that\u2019s the reason why I store search term with vector. So first search is \u201cslow\u201d, because it has to get proper vector thanks to nice prompt which I mentioned before, and store it in database. After that I don\u2019t use this vector at all. I use search term query which returns relevant vector as a subquery.<\/p>\n<p id=\"02f3\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">But the point is that I have to check the database to see if anyone has already searched for this search term. If not, I have to get the vector from the model and store it in the system.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"d8e4\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">SELECT<\/span> <span class=\"hljs-literal\">TRUE<\/span> <span class=\"hljs-keyword\">FROM<\/span> <span class=\"hljs-string\">\"search_terms\"<\/span> <span class=\"hljs-keyword\">AS<\/span> s0 <span class=\"hljs-keyword\">WHERE<\/span> (s0.<span class=\"hljs-string\">\"term\"<\/span> = <span class=\"hljs-comment\">'banana') LIMIT 1<\/span><\/span><\/pre>\n<pre class=\"py nj nk nl bp nm bb bk\"><span id=\"07f1\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\">from(st <span class=\"hljs-keyword\">in<\/span> SearchTerm, <span class=\"hljs-symbol\">where:<\/span> st.term == ^string)\r\n|<span class=\"hljs-params\">&gt; Repo.exists?()<\/span><\/span><\/pre>\n<p id=\"5d4d\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">To be honest, I don\u2019t use this query. I use a slightly more complex one that I use in several places.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"b382\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">from<\/span>(st <span class=\"hljs-keyword\">in<\/span> SearchTerm,\r\n  <span class=\"hljs-keyword\">where<\/span>: st.term == ^<span class=\"hljs-built_in\">string<\/span>,\r\n  limit: <span class=\"hljs-number\">1<\/span>,\r\n  <span class=\"hljs-keyword\">select<\/span>: st.vector\r\n )<\/span><\/pre>\n<p id=\"72b1\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">If you call it with\u00a0<code class=\"cx nt nu nv nk b\">Repo.one(term_query)<\/code>\u00a0it will give you vector, but you don\u2019t need it. When you call it with\u00a0<code class=\"cx nt nu nv nk b\">Repo.exists?(term_query)<\/code>, it will remove select part of query and add TRUE as you can see in SQL example before. But I use same query as subquery for search itself in next step.<\/p>\n<p id=\"55bb\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">It\u2019s important to say, that search term, string from html input must be normalised. It will save some pain in the future. For example just simple one:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"3445\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\">string\r\n|<span class=\"hljs-params\">&gt; String.trim()\r\n<\/span>|&gt; String.downcase()\r\n|<span class=\"hljs-params\">&gt; String.replace(~r\/\\s+\/, \" \")<\/span><\/span><\/pre>\n<p id=\"7eba\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">When I know that I have normalised the search term in the database with the correct vector. I can proceed to the next step. This step is a compromise. It costs us a few of ms in the query, but thanks to it we can be sure that our search doesn\u2019t (falsely) return zero results or that we don\u2019t call a 3rd party API for every search.<\/p>\n<p id=\"7d6b\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\"><strong class=\"mi gv\">Query that returns search results with ONLY ids<\/strong><\/p>\n<p id=\"c46a\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Well, headliner of our party. The main query that performs the actual search. To be honest, it\u2019s boringly simple. First you have to choose which distance function you want to use. Let\u2019s use \u201cinner product distance\u201d which is suggested by my embeddings\/vectors model (OpenAI).<\/p>\n<p id=\"3089\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">So first of all, in the previous step, we have the search term check and we have term query. We\u2019re going to use that as a subquery.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"b925\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-selector-tag\">from<\/span>(\r\n    s in <span class=\"hljs-built_in\">subquery<\/span>(\r\n    <span class=\"hljs-built_in\">from<\/span>(pv in ProductVector,\r\n      <span class=\"hljs-attribute\">as<\/span>: :product_vector,\r\n      <span class=\"hljs-attribute\">order_by<\/span>: [<span class=\"hljs-attribute\">asc<\/span>: <span class=\"hljs-built_in\">selected_as<\/span>(:distance)],\r\n      <span class=\"hljs-attribute\">select<\/span>: %{\r\n        <span class=\"hljs-attribute\">product_id<\/span>: pv.product_id,\r\n        <span class=\"hljs-attribute\">distance<\/span>:\r\n          <span class=\"hljs-built_in\">selected_as<\/span>(\r\n            <span class=\"hljs-built_in\">fragment<\/span>(<span class=\"hljs-string\">\"(? &lt;#&gt; ?)\"<\/span>, pv.vector, <span class=\"hljs-built_in\">subquery<\/span>(term_query)),\r\n            :distance\r\n          ),\r\n        <span class=\"hljs-attribute\">rank<\/span>:\r\n          <span class=\"hljs-built_in\">selected_as<\/span>(\r\n            <span class=\"hljs-number\">1<\/span> \/ (<span class=\"hljs-number\">60<\/span> + <span class=\"hljs-built_in\">fragment<\/span>(<span class=\"hljs-string\">\"(? &lt;#&gt; ?)\"<\/span>, pv.vector, <span class=\"hljs-built_in\">subquery<\/span>(term_query))),\r\n            :rank\r\n          )\r\n      },\r\n      <span class=\"hljs-attribute\">limit<\/span>: <span class=\"hljs-number\">350<\/span>\r\n    )\r\n  ),\r\n  <span class=\"hljs-attribute\">as<\/span>: :search,\r\n  <span class=\"hljs-attribute\">where<\/span>: s.distance &lt; -<span class=\"hljs-number\">0.4<\/span>,\r\n  <span class=\"hljs-attribute\">select<\/span>: %{product_id: s.product_id, rank: s.rank, distance: s.distance}\r\n)<\/span><\/pre>\n<p id=\"2501\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">There is an interesting function from Ecto,\u00a0<code class=\"cx nt nu nv nk b\">selected_as<\/code>. Basically two functions, thanks to it it\u2019s possible to use value from select part in order by for example. It\u2019s common in sql but not easy at all in Ecto or functional language. More about it in the\u00a0<a class=\"af oa\" href=\"https:\/\/hexdocs.pm\/ecto\/3.12.4\/Ecto.Query.API.html#selected_as\/1\" target=\"_blank\" rel=\"noopener ugc nofollow\">docs<\/a>.<\/p>\n<p id=\"72bc\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">SQL version:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"7fd5\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">SELECT<\/span> s0.\"product_id\", s0.\"rank\", s0.\"rank\"\r\n<span class=\"hljs-keyword\">FROM<\/span> (<span class=\"hljs-keyword\">SELECT<\/span> sp0.\"product_id\"                        <span class=\"hljs-keyword\">AS<\/span> \"product_id\",\r\n             (sp0.\"vector\" <span class=\"hljs-operator\">&lt;<\/span>#<span class=\"hljs-operator\">&gt;<\/span> (<span class=\"hljs-keyword\">SELECT<\/span> sss0.\"vector\" <span class=\"hljs-keyword\">AS<\/span> \"vector\"\r\n                                <span class=\"hljs-keyword\">FROM<\/span> \"search_terms\" <span class=\"hljs-keyword\">AS<\/span> sss0\r\n                                <span class=\"hljs-keyword\">WHERE<\/span> (sss0.\"term\" <span class=\"hljs-operator\">=<\/span> <span class=\"hljs-string\">'chocolate'<\/span>)\r\n                                LIMIT <span class=\"hljs-number\">1<\/span>))            <span class=\"hljs-keyword\">AS<\/span> \"distance\",\r\n             <span class=\"hljs-number\">1<\/span> <span class=\"hljs-operator\">\/<\/span> (<span class=\"hljs-number\">60<\/span> <span class=\"hljs-operator\">+<\/span> (sp0.\"vector\" <span class=\"hljs-operator\">&lt;<\/span>#<span class=\"hljs-operator\">&gt;<\/span> (<span class=\"hljs-keyword\">SELECT<\/span> sss0.\"vector\" <span class=\"hljs-keyword\">AS<\/span> \"vector\"\r\n                                          <span class=\"hljs-keyword\">FROM<\/span> \"search_terms\" <span class=\"hljs-keyword\">AS<\/span> sss0\r\n                                          <span class=\"hljs-keyword\">WHERE<\/span> (sss0.\"term\" <span class=\"hljs-operator\">=<\/span> <span class=\"hljs-string\">'chocolate'<\/span>)\r\n                                          LIMIT <span class=\"hljs-number\">1<\/span>))) <span class=\"hljs-keyword\">AS<\/span> \"rank\"\r\n      <span class=\"hljs-keyword\">FROM<\/span> \"product_vectors\" <span class=\"hljs-keyword\">AS<\/span> sp0\r\n      <span class=\"hljs-keyword\">ORDER<\/span> <span class=\"hljs-keyword\">BY<\/span> \"distance\"\r\n      LIMIT <span class=\"hljs-number\">350<\/span>) <span class=\"hljs-keyword\">AS<\/span> s0\r\n<span class=\"hljs-keyword\">WHERE<\/span> (s0.\"distance\" <span class=\"hljs-operator\">&lt;<\/span> <span class=\"hljs-number\">-0.4<\/span>)<\/span><\/pre>\n<p id=\"0a81\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">It\u2019s not that simple, you could say I\u2019ve over-complicated the query. Yes, but wait for the explanation.<\/p>\n<p id=\"ae96\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">The index on the vector column is only used if the limit is used.<\/p>\n<p id=\"cc5b\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">To avoid getting very weird results at the end, we need to filter results based on distance. The subquery where I later filtered out results with a distance greater than -0.4 was faster. At that time index was not used when I used filtering inside.<\/p>\n<p id=\"aba8\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">The vectors are very large, there are 2 000 (1 999) numbers in the list. So downloading it to your application and immediately uploading it as part of a query is a waste of resources. So instead you can use a query that returns directly in the database within your main query. That\u2019s the reason for subquery with term_query.<\/p>\n<p id=\"aec6\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">You can use the macro\u00a0<code class=\"cx nt nu nv nk b\">max_inner_product(column, vector)<\/code>\u00a0from Elixir\u2019s\u00a0<code class=\"cx nt nu nv nk b\">pgvector<\/code>\u00a0package instead of fragment. It may look nicer this way but in the end it\u2019s still fragment and I could not get it to work with subquery instead of vector.<\/p>\n<p id=\"e3a8\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">The calculated distance is a negative number. The range of this number is still between 0 and -1. These distances can be very different and it really depends on your data, the model you are using and the prompt you are using to get these vectors. I got a big difference between distances on the same data, with the same query when I changed the prompt that generates vectors from OpenAI\u2019s model. Before the change I had numbers like -0.2345 and now it\u2019s much lower around -0.1292949494949 (just example numbers).<\/p>\n<p id=\"bd84\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Using the same approach as when I found 750 good number for my data, I found out that I can remove all results that have a distance lower than -0.4. This means that if the distance is -0.5 it is lower and I don\u2019t want to see it. So -0.1 is a great result and -0.9 is terrible.<\/p>\n<p id=\"4adf\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">I also calculated the rank. That\u2019s because I want to use this query for hybrid search as well. It is described in the next chapter. There is a very simple calculation called\u00a0<a class=\"af oa\" href=\"https:\/\/safjan.com\/implementing-rank-fusion-in-python\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Reciprocal Rank Fusion (RRF)<\/a>. Thanks to this approach you can merge results from different (but similar) ranges and compare them.<\/p>\n<p id=\"7ede\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">So this query returns sorted results containing\u00a0<code class=\"cx nt nu nv nk b\">product_id<\/code>, distance and ranks. You don\u2019t need to export distance or rank in production, but it\u2019s great in development to have all the pieces configured correctly. Or later when you want to merge results with other queries.<\/p>\n<p id=\"b258\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Basically that\u2019s all, the output of with ids needs to be used in your search and filter logic. Here is a very simple example:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"5cae\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\">ids = VectorSearch.search(term_string)\r\n<span class=\"hljs-keyword\">from<\/span>(p <span class=\"hljs-keyword\">in<\/span> Product,\r\n  where: p.<span class=\"hljs-built_in\">id<\/span> <span class=\"hljs-keyword\">in<\/span> ^ids,\r\n  order_by: fragment(<span class=\"hljs-string\">\"array_position(?, ?)\"<\/span>, ^ids, p.<span class=\"hljs-built_in\">id<\/span>)\r\n)\r\n|&gt; Repo.<span class=\"hljs-built_in\">all<\/span>()<\/span><\/pre>\n<pre class=\"py nj nk nl bp nm bb bk\"><span id=\"c6a0\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">SELECT<\/span> <span class=\"hljs-operator\">*<\/span>\r\n<span class=\"hljs-keyword\">FROM<\/span> products p\r\n<span class=\"hljs-keyword\">WHERE<\/span> p.id <span class=\"hljs-operator\">=<\/span> <span class=\"hljs-keyword\">ANY<\/span> (<span class=\"hljs-string\">'{1,2,3,4,5}'<\/span>)\r\n<span class=\"hljs-keyword\">ORDER<\/span> <span class=\"hljs-keyword\">BY<\/span> array_position(<span class=\"hljs-string\">'{1,2,3,4,5}'<\/span>)<\/span><\/pre>\n<p id=\"325b\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">The\u00a0<code class=\"cx nt nu nv nk b\">VectorSearch.search(term_string)<\/code>function returns a sorted list of product_ids from the semantic search. So this list of ids is used to filter only products from found product ids. The last step is the most important one. The results are sorted by the position of the product id in this list from the semantic search, thanks to the super nice function\u00a0<code class=\"cx nt nu nv nk b\">array_position<\/code>. It\u2019s a very simple query to show how it works. You can add more complex filters. In the project I mentioned earlier, we changed the similarity search to semantic search. This means that we split the query from one to two. The first one returns results with product ids and the second one does filtering and other stuff. Basically we changed one line of the query where we called the similarity function to a function that filters products by id and sorts them by array position. I didn\u2019t expect it but old big second query went much more faster thanks to product id filtering which is properly indexed. So we added two more queries but the actual response time of the search is now almost the same and in some cases even faster than before without vectors. Another side effect of this change is that you can change the logic there in the future with minimal changes. You can do the search in 3rd party service and you don\u2019t have to implement complex filtering in their system. You just do a search that returns product ids and the complex filtering will be in your PostgreSQL database without any changes.<\/p>\n<p id=\"4484\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\"><em class=\"nz\">NOTE: You can see that I haven\u2019t added more sql examples and data examples in this part of the article. The reason is that I don\u2019t have a good database (lots of real data) that I can share publicly. If I find that kind of data, I will update this article later.<\/em><\/p>\n<h2 id=\"9be5\" class=\"oj no gu bf ok ol pt on oo op pu or os ot pv ov ow ox pw oz pa pb px pd pe pf bk\">Hybrid search<\/h2>\n<p id=\"d75b\" class=\"pw-post-body-paragraph mg mh gu mi b mj pg ml mm mn ph mp mq mr pi mt mu mv pj mx my mz pk nb nc nd gn bk\" data-selectable-paragraph=\"\">I have to mention that I am pretty sure that in a couple of years, the term or concept of \u201cHybrid search\u201d will be renamed or merged with Semantic Search. And nobody will use only vector search or something like that. Because basically hybrid search does what the word \u201csemantic\u201d means by English dictionaries.<\/p>\n<p id=\"13c3\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">I would like to mention that there is a nice Elixir Hex package for Hybrid Search in PostgreSQL\u00a0<a class=\"af oa\" href=\"https:\/\/hex.pm\/packages\/vecto\" target=\"_blank\" rel=\"noopener ugc nofollow\">vecto<\/a>\u00a0by cpursley (Chase Pursley).<\/p>\n<p id=\"2538\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Semantic search with \u201cfulltext\u201d search. Again, there is no exact definition that forces you to use an exact approach for this. You can use\u00a0<code class=\"cx nt nu nv nk b\">ilike<\/code>\u00a0,\u00a0<code class=\"cx nt nu nv nk b\">similarity<\/code>\u00a0or\u00a0<code class=\"cx nt nu nv nk b\">ts_vector<\/code>\u00a0to make it work. But the logic is the, same, using vectors along with a more precise technique. I see it as a very good practice to help you solve problems with black box vectors generated by some model. Yes, if you have your own process to generate proper vectors for your data, you may not need this hybrid combination.<\/p>\n<p id=\"29ae\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Basically, we need to combine one from the \u201cfull text\u201d approaches from previous chapters with semantic search with vectors into one big query. Sounds simple, it is, until you want to order the results properly.<\/p>\n<p id=\"2aea\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">There is a problem, which I have already mentioned. Almost all the approaches mentioned above have their own way of sorting. That\u2019s why I introduced RRF in semantic search query. Thanks to this, it\u2019s possible to unify the ranking of the results on a scale, so we can merge the results and these ranks to one result table and sort it. So you can have mixed results.<\/p>\n<p id=\"044a\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Query is much bigger and can feel very difficult to read, but if you take it step by step it\u2019s easy.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"8934\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\">similarity_query =\r\n      from(p <span class=\"hljs-keyword\">in<\/span> Product,\r\n        <span class=\"hljs-symbol\">where:<\/span> fragment(<span class=\"hljs-string\">\"SIMILARITY(?, ?)\"<\/span>, p.name, ^string) &gt; <span class=\"hljs-number\">0.17<\/span>,\r\n        <span class=\"hljs-symbol\">order_by:<\/span> [<span class=\"hljs-symbol\">desc:<\/span> selected_as(<span class=\"hljs-symbol\">:rank<\/span>)],\r\n        <span class=\"hljs-symbol\">limit:<\/span> <span class=\"hljs-number\">350<\/span>,\r\n        <span class=\"hljs-symbol\">select:<\/span> <span class=\"hljs-string\">%{\r\n          product_id: p.id,\r\n          rank:\r\n            selected_as(fragment(\"(1\/ (60 - SIMILARITY(?, ?)))\", p.name, ^\"%<span class=\"hljs-subst\">#{string}<\/span>%\"), :rank),\r\n          distance: selected_as(fragment(\"SIMILARITY(?, ?)\", p.name, ^\"%<span class=\"hljs-subst\">#{string}<\/span>%\"), :distance)\r\n        }<\/span>\r\n      )\r\n \r\nterm_query =\r\n      from(st <span class=\"hljs-keyword\">in<\/span> SearchTerm,\r\n        <span class=\"hljs-symbol\">where:<\/span> is_nil(st.term),\r\n        <span class=\"hljs-symbol\">limit:<\/span> <span class=\"hljs-number\">1<\/span>,\r\n        <span class=\"hljs-symbol\">select:<\/span> st.vector\r\n      )\r\n\r\nvector_query =\r\n  from(\r\n    s <span class=\"hljs-keyword\">in<\/span> subquery(\r\n      from(psv <span class=\"hljs-keyword\">in<\/span> ProductVector,\r\n        <span class=\"hljs-symbol\">as:<\/span> <span class=\"hljs-symbol\">:product_vector<\/span>,\r\n        <span class=\"hljs-symbol\">order_by:<\/span> [<span class=\"hljs-symbol\">asc:<\/span> selected_as(<span class=\"hljs-symbol\">:distance<\/span>)],\r\n        <span class=\"hljs-symbol\">select:<\/span> <span class=\"hljs-string\">%{\r\n          product_id: psv.product_id,\r\n          distance:\r\n            selected_as(\r\n              fragment(\"(? &lt;#&gt; ?)\", psv.vector, subquery(term_query)),\r\n              :distance\r\n            ),\r\n          rank:\r\n            selected_as(\r\n              1 \/ (60 + fragment(\"(? &lt;#&gt; ?)\", psv.vector, subquery(term_query))),\r\n              :rank\r\n            )\r\n        }<\/span>,\r\n        <span class=\"hljs-symbol\">limit:<\/span> <span class=\"hljs-number\">350<\/span>\r\n      )\r\n    ),\r\n    <span class=\"hljs-symbol\">as:<\/span> <span class=\"hljs-symbol\">:search<\/span>,\r\n    <span class=\"hljs-symbol\">where:<\/span> s.distance &lt; ^maximal_inner_product_distance,\r\n    <span class=\"hljs-symbol\">select:<\/span> <span class=\"hljs-string\">%{product_id: s.product_id, rank: s.rank, distance: s.distance}<\/span>\r\n  )\r\n\r\nfrom(s <span class=\"hljs-keyword\">in<\/span> subquery(union_all(similarity_query, ^vector_query)),\r\n  <span class=\"hljs-symbol\">order_by:<\/span> [<span class=\"hljs-symbol\">desc:<\/span> s.rank],\r\n  <span class=\"hljs-symbol\">select:<\/span> {s.product_id, s.rank}\r\n)\r\n|<span class=\"hljs-params\">&gt; Repo.all()\r\n<\/span>|&gt; Enum.uniq_by(&amp;elem(&amp;<span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">0<\/span>))<\/span><\/pre>\n<p id=\"d439\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Before we look at SQL version, let\u2019s take a deep dive. We have two main queries:<\/p>\n<ol class=\"\">\n<li id=\"4823\" class=\"mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd pl pm pn bk\" data-selectable-paragraph=\"\">similarity query<\/li>\n<li id=\"9158\" class=\"mg mh gu mi b mj po ml mm mn pp mp mq mr pq mt mu mv pr mx my mz ps nb nc nd pl pm pn bk\" data-selectable-paragraph=\"\">vector query<\/li>\n<\/ol>\n<p id=\"7c5c\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Both have the same result structure:<\/p>\n<ul class=\"\">\n<li id=\"ab07\" class=\"mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd sc pm pn bk\" data-selectable-paragraph=\"\"><code class=\"cx nt nu nv nk b\">product_id<\/code><\/li>\n<li id=\"a469\" class=\"mg mh gu mi b mj po ml mm mn pp mp mq mr pq mt mu mv pr mx my mz ps nb nc nd sc pm pn bk\" data-selectable-paragraph=\"\"><code class=\"cx nt nu nv nk b\">distance<\/code><\/li>\n<li id=\"7934\" class=\"mg mh gu mi b mj po ml mm mn pp mp mq mr pq mt mu mv pr mx my mz ps nb nc nd sc pm pn bk\" data-selectable-paragraph=\"\"><code class=\"cx nt nu nv nk b\">rank<\/code><\/li>\n<\/ul>\n<p id=\"fc2e\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Distance is basically a number that was the result of one of the functions used for a specific search. As mentioned before, it can be in different ranges and the same number doesn\u2019t mean that it is the same in the context of both search results.<\/p>\n<p id=\"f181\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Rank is very important, thanks to the RRF calculation we can compare and sort both results together. Basically thanks to this calculation, the differences between these numbers are minimised, so that the numbers are not equal but, similar.<\/p>\n<p id=\"8486\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">If we have the same number for the ranking, it is easy to merge these two queries is simple thanks to\u00a0<code class=\"cx nt nu nv nk b\">UNION<\/code>\u00a0.<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"628b\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">SELECT<\/span> s0.\"product_id\", s0.\"rank\"\r\n<span class=\"hljs-keyword\">FROM<\/span> (<span class=\"hljs-keyword\">SELECT<\/span> sp0.\"id\"                                           <span class=\"hljs-keyword\">AS<\/span> \"product_id\",\r\n             (<span class=\"hljs-number\">1<\/span> <span class=\"hljs-operator\">\/<\/span> (<span class=\"hljs-number\">60<\/span> <span class=\"hljs-operator\">-<\/span> SIMILARITY(sp0.\"name\", <span class=\"hljs-string\">'%chocolate%'<\/span>))) <span class=\"hljs-keyword\">AS<\/span> \"rank\",\r\n             SIMILARITY(sp0.\"name\", <span class=\"hljs-string\">'%chocolate%'<\/span>)              <span class=\"hljs-keyword\">AS<\/span> \"distance\"\r\n      <span class=\"hljs-keyword\">FROM<\/span> \"products\" <span class=\"hljs-keyword\">AS<\/span> sp0\r\n      <span class=\"hljs-keyword\">WHERE<\/span> (SIMILARITY(sp0.\"name\", <span class=\"hljs-string\">'chocolate'<\/span>) <span class=\"hljs-operator\">&gt;<\/span> <span class=\"hljs-number\">0.17<\/span>::<span class=\"hljs-type\">float<\/span>)\r\n      <span class=\"hljs-keyword\">UNION<\/span> <span class=\"hljs-keyword\">ALL<\/span>\r\n      (<span class=\"hljs-keyword\">SELECT<\/span> ss0.\"product_id\", ss0.\"rank\", ss0.\"distance\"\r\n       <span class=\"hljs-keyword\">FROM<\/span> (<span class=\"hljs-keyword\">SELECT<\/span> ssp0.\"product_id\"                        <span class=\"hljs-keyword\">AS<\/span> \"product_id\",\r\n                    (ssp0.\"vector\" <span class=\"hljs-operator\">&lt;<\/span>#<span class=\"hljs-operator\">&gt;<\/span> (<span class=\"hljs-keyword\">SELECT<\/span> ssss0.\"vector\" <span class=\"hljs-keyword\">AS<\/span> \"vector\"\r\n                                        <span class=\"hljs-keyword\">FROM<\/span> \"search_terms\" <span class=\"hljs-keyword\">AS<\/span> ssss0\r\n                                        <span class=\"hljs-keyword\">WHERE<\/span> (ssss0.\"term\" <span class=\"hljs-operator\">=<\/span> <span class=\"hljs-string\">'chocolate'<\/span>)\r\n                                        LIMIT <span class=\"hljs-number\">1<\/span>))            <span class=\"hljs-keyword\">AS<\/span> \"distance\",\r\n                    <span class=\"hljs-number\">1<\/span> <span class=\"hljs-operator\">\/<\/span> (<span class=\"hljs-number\">60<\/span> <span class=\"hljs-operator\">+<\/span> (ssp0.\"vector\" <span class=\"hljs-operator\">&lt;<\/span>#<span class=\"hljs-operator\">&gt;<\/span> (<span class=\"hljs-keyword\">SELECT<\/span> ssss0.\"vector\" <span class=\"hljs-keyword\">AS<\/span> \"vector\"\r\n                                                  <span class=\"hljs-keyword\">FROM<\/span> \"search_terms\" <span class=\"hljs-keyword\">AS<\/span> ssss0\r\n                                                  <span class=\"hljs-keyword\">WHERE<\/span> (ssss0.\"term\" <span class=\"hljs-operator\">=<\/span> <span class=\"hljs-string\">'chocolate'<\/span>)\r\n                                                  LIMIT <span class=\"hljs-number\">1<\/span>))) <span class=\"hljs-keyword\">AS<\/span> \"rank\"\r\n             <span class=\"hljs-keyword\">FROM<\/span> \"product_vectors\" <span class=\"hljs-keyword\">AS<\/span> ssp0\r\n             <span class=\"hljs-keyword\">ORDER<\/span> <span class=\"hljs-keyword\">BY<\/span> \"distance\"\r\n             LIMIT <span class=\"hljs-number\">350<\/span>) <span class=\"hljs-keyword\">AS<\/span> ss0\r\n       <span class=\"hljs-keyword\">WHERE<\/span> (ss0.\"distance\" <span class=\"hljs-operator\">&lt;<\/span> <span class=\"hljs-number\">-0.4<\/span>))\r\n      <span class=\"hljs-keyword\">ORDER<\/span> <span class=\"hljs-keyword\">BY<\/span> \"rank\" <span class=\"hljs-keyword\">DESC<\/span>\r\n      LIMIT <span class=\"hljs-number\">350<\/span>) <span class=\"hljs-keyword\">AS<\/span> s0\r\n<span class=\"hljs-keyword\">ORDER<\/span> <span class=\"hljs-keyword\">BY<\/span> s0.\"rank\" <span class=\"hljs-keyword\">DESC<\/span>;<\/span><\/pre>\n<p id=\"ea8d\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">The results can look like this:<\/p>\n<pre class=\"ne nf ng nh ni nj nk nl bp nm bb bk\"><span id=\"74f2\" class=\"nn no gu nk b bg np nq l nr ns\" data-selectable-paragraph=\"\">+----------+--------------------+\r\n|product<span class=\"hljs-emphasis\">_id|rank                |\r\n+----------+--------------------+\r\n|11833     |0.01694915254237288 |\r\n|4394      |0.016867469884363277|\r\n|892       |0.016867469884363277|\r\n|6552      |0.01685393258991333 |\r\n|2345      |0.01685393258991333 |\r\n|9035      |0.01685393258991333 |\r\n+----------+--------------------+<\/span><\/span><\/pre>\n<p id=\"b101\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">All these queries are fast in PostgreSQL thanks to indexes and some approaches. But still, don\u2019t forget that you have to call a 3rd party API which can be down or take too long to return vector for the search term. I have already written about this. But depending on your use case, it may even be possible to store product ids for that stored search term. Basically, when a product is updated, you need to update vectors for that product and in that case you need to expire the cache. So maybe you can just remove the stored product_ids for that search term. It all depends on the data what you have and the code what you run.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"ab cb ob oc od oe\" role=\"separator\"><\/div>\n<div class=\"gn go gp gq gr\">\n<div class=\"ab cb\">\n<div class=\"ci bh fz ga gb gc\">\n<h2 id=\"5f79\" class=\"oj no gu bf ok ol om on oo op oq or os ot ou ov ow ox oy oz pa pb pc pd pe pf bk\">3rd party services<\/h2>\n<p id=\"72b8\" class=\"pw-post-body-paragraph mg mh gu mi b mj pg ml mm mn ph mp mq mr pi mt mu mv pj mx my mz pk nb nc nd gn bk\" data-selectable-paragraph=\"\">If you can and want to use 3rd party services for search, there are a few options which I would like to mention that I would consider as a good solution.<\/p>\n<h3 id=\"5aff\" class=\"rn no gu bf ok ro rp dy oo rq rr ea os mr rs rt ru mv rv rw rx mz ry rz sa sb bk\">Supabase<\/h3>\n<div class=\"pz qa qb qc qd qe\">\n<div class=\"qf ab is\">\n<div class=\"qg ab co cb qh qi\"><\/div>\n<div class=\"qs l\">\n<div class=\"sf l qu qv qw qs qx ln qe\"><a href=\"https:\/\/supabase.com\/?source=post_page-----613069cb2f21--------------------------------\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-5835\" src=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.01.28-1024x254.png\" alt=\"\" width=\"1024\" height=\"254\" srcset=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.01.28-1024x254.png 1024w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.01.28-300x74.png 300w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.01.28-768x190.png 768w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.01.28.png 1364w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"fb35\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">It\u2019s an open source project that can be self-hosted or used as a paid cloud solution. It\u2019s a whole ecosystem built on PostgreSQL with a lot of features. They have also some solutions for\u00a0<a class=\"af oa\" href=\"https:\/\/supabase.com\/docs\/guides\/ai\/hybrid-search\" target=\"_blank\" rel=\"noopener ugc nofollow\">Hybrid Search<\/a>.<\/p>\n<h3 id=\"d509\" class=\"rn no gu bf ok ro rp dy oo rq rr ea os mr rs rt ru mv rv rw rx mz ry rz sa sb bk\">Meilisearch<\/h3>\n<div class=\"pz qa qb qc qd qe\">\n<div class=\"qf ab is\">\n<div class=\"qg ab co cb qh qi\"><\/div>\n<div class=\"qs l\">\n<div class=\"sg l qu qv qw qs qx ln qe\"><a href=\"https:\/\/www.meilisearch.com\/?source=post_page-----613069cb2f21--------------------------------\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-5836\" src=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.02.17-1024x253.png\" alt=\"\" width=\"1024\" height=\"253\" srcset=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.02.17-1024x253.png 1024w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.02.17-300x74.png 300w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.02.17-768x189.png 768w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.02.17.png 1362w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"dd91\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">It\u2019s another open source with a similar cloud model to Supabase. But it\u2019s built on different technologies and with focus directly on search. This model is based on a server that needs to be fed with data to be usable. There is a huge development in modern search approaches and I really like it. For me, if I wanted to have \u201conly\u201d search, this would be the first option. They have made good progress recently with\u00a0<a class=\"af oa\" href=\"https:\/\/www.meilisearch.com\/solutions\/hybrid-search\" target=\"_blank\" rel=\"noopener ugc nofollow\">Hybrid search<\/a>. It\u2019s also important to mention that they have some news about\u00a0<a class=\"af oa\" href=\"https:\/\/blog.meilisearch.com\/what-is-federated-search\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Federated Search\u00a0<\/a>, which may be a topic for a new article of mine.<\/p>\n<h3 id=\"35ef\" class=\"rn no gu bf ok ro rp dy oo rq rr ea os mr rs rt ru mv rv rw rx mz ry rz sa sb bk\">ParadeDB<\/h3>\n<div class=\"pz qa qb qc qd qe\">\n<div class=\"qf ab is\">\n<div class=\"qg ab co cb qh qi\"><\/div>\n<div class=\"qs l\">\n<div class=\"sh l qu qv qw qs qx ln qe\"><a href=\"https:\/\/www.paradedb.com\/?source=post_page-----613069cb2f21--------------------------------\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-5837\" src=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.03.13-1024x254.png\" alt=\"\" width=\"1024\" height=\"254\" srcset=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.03.13-1024x254.png 1024w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.03.13-300x74.png 300w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.03.13-768x190.png 768w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.03.13.png 1364w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"4eb4\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">This is another open source solution that has recently been supported by some of cloud managed database providers. Basically it\u2019s PostgreSQL with some extensions and custom \u201cindexing\u201d logic. It\u2019s new to me, but it looks really promising. For example\u00a0<a class=\"af oa\" href=\"https:\/\/tembo.io\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Tembo<\/a>. Here is nice article on how to use this database with Elixir and Ecto.<\/p>\n<div class=\"pz qa qb qc qd qe\">\n<div class=\"qf ab is\">\n<div class=\"qg ab co cb qh qi\"><\/div>\n<div class=\"qs l\">\n<div class=\"si l qu qv qw qs qx ln qe\"><a href=\"https:\/\/moosie.us\/parade_db_ecto?source=post_page-----613069cb2f21--------------------------------\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-5838\" src=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.03.59-1024x255.png\" alt=\"\" width=\"1024\" height=\"255\" srcset=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.03.59-1024x255.png 1024w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.03.59-300x75.png 300w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.03.59-768x191.png 768w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.03.59.png 1364w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"pz qa qb qc qd qe\">\n<div class=\"qf ab is\">\n<div class=\"qg ab co cb qh qi\"><\/div>\n<div class=\"qs l\">\n<div class=\"sj l qu qv qw qs qx ln qe\"><a href=\"https:\/\/tembo.io\/blog\/paradedb-search-analytics?source=post_page-----613069cb2f21--------------------------------\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-5839\" src=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.04.42-1024x255.png\" alt=\"\" width=\"1024\" height=\"255\" srcset=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.04.42-1024x255.png 1024w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.04.42-300x75.png 300w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.04.42-768x191.png 768w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.04.42.png 1364w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"pz qa qb qc qd qe\">\n<div class=\"qf ab is\">\n<div class=\"qg ab co cb qh qi\"><\/div>\n<div class=\"qs l\">\n<div class=\"sk l qu qv qw qs qx ln qe\"><a href=\"https:\/\/www.ubicloud.com\/docs\/managed-postgresql\/paradedb-on-ubicloud?source=post_page-----613069cb2f21--------------------------------\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-5840\" src=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.05.24-1024x255.png\" alt=\"\" width=\"1024\" height=\"255\" srcset=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.05.24-1024x255.png 1024w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.05.24-300x75.png 300w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.05.24-768x191.png 768w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.05.24.png 1366w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/div>\n<\/div>\n<\/div>\n<\/div>\n<h3 id=\"5389\" class=\"rn no gu bf ok ro rp dy oo rq rr ea os mr rs rt ru mv rv rw rx mz ry rz sa sb bk\">Timescale<\/h3>\n<div class=\"pz qa qb qc qd qe\">\n<div class=\"qf ab is\">\n<div class=\"qg ab co cb qh qi\">\n<div class=\"qr l\">\n<p class=\"bf b dv z qj qk ql qm qn qo qp du\"><a href=\"https:\/\/www.timescale.com\/?source=post_page-----613069cb2f21--------------------------------\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-5841\" src=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.06.08-1024x254.png\" alt=\"\" width=\"1024\" height=\"254\" srcset=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.06.08-1024x254.png 1024w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.06.08-300x74.png 300w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.06.08-768x190.png 768w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.06.08.png 1364w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"37da\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">I have mentioned this company several times. They really do a great job with PostgreSQL and open source extensions and modern AI requirements. They have a managed cloud service that supports their extensions, which can really improve search. For example, they have an\u00a0<a class=\"af oa\" href=\"https:\/\/github.com\/timescale\/pgai\" target=\"_blank\" rel=\"noopener ugc nofollow\">extension\u00a0<\/a><code class=\"cx nt nu nv nk b\"><a class=\"af oa\" href=\"https:\/\/github.com\/timescale\/pgai\" target=\"_blank\" rel=\"noopener ugc nofollow\">pgai<\/a><\/code>\u00a0which automatically loads vectors from a selected api and stores them in the database. So you don\u2019t need to have that logic in your application. To be honest I don\u2019t like that kind of logic in a database layer, but it might work for someone else. See nice article about vectors from Timescale.<\/p>\n<div class=\"pz qa qb qc qd qe\">\n<div class=\"qf ab is\">\n<div class=\"qg ab co cb qh qi\"><\/div>\n<div class=\"qs l\">\n<div class=\"sm l qu qv qw qs qx ln qe\"><a href=\"https:\/\/www.timescale.com\/blog\/rag-is-more-than-just-vector-search\/?source=post_page-----613069cb2f21--------------------------------\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-5842\" src=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.07.05-1024x254.png\" alt=\"\" width=\"1024\" height=\"254\" srcset=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.07.05-1024x254.png 1024w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.07.05-300x74.png 300w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.07.05-768x191.png 768w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.07.05.png 1362w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"ab cb ob oc od oe\" role=\"separator\"><\/div>\n<div class=\"gn go gp gq gr\">\n<div class=\"ab cb\">\n<div class=\"ci bh fz ga gb gc\">\n<h3 id=\"7364\" class=\"rn no gu bf ok ro rp dy oo rq rr ea os mr rs rt ru mv rv rw rx mz ry rz sa sb bk\">Final note<\/h3>\n<p id=\"db4d\" class=\"pw-post-body-paragraph mg mh gu mi b mj pg ml mm mn ph mp mq mr pi mt mu mv pj mx my mz pk nb nc nd gn bk\" data-selectable-paragraph=\"\">This article has taken me a little bit longer than I expected because I wanted to mention all the points, steps I had to take. It\u2019s not perfect and there\u2019s a lot of room for improvement. I will try to implement all relevant feedback what I will get, so do not hesitate to contact me in if you see a bug or have a suggestion for improvement.<\/p>\n<h3 id=\"5c27\" class=\"rn no gu bf ok ro rp dy oo rq rr ea os mr rs rt ru mv rv rw rx mz ry rz sa sb bk\">Links<\/h3>\n<p id=\"d60f\" class=\"pw-post-body-paragraph mg mh gu mi b mj pg ml mm mn ph mp mq mr pi mt mu mv pj mx my mz pk nb nc nd gn bk\" data-selectable-paragraph=\"\">There are a lot of people who are smarter than me and have more experience or time for the topics I mentioned in this article. All these people write great articles focusing on more details. I will try to add links to these articles in the future to keep this as a fresh source of information.<\/p>\n<div class=\"pz qa qb qc qd qe\">\n<div class=\"qf ab is\">\n<div class=\"qg ab co cb qh qi\"><\/div>\n<div class=\"qs l\">\n<div class=\"sn l qu qv qw qs qx ln qe\"><a href=\"https:\/\/www.thenile.dev\/blog\/pgvector_myth_debunking?source=post_page-----613069cb2f21--------------------------------\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-5843\" src=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.07.43-1024x254.png\" alt=\"\" width=\"1024\" height=\"254\" srcset=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.07.43-1024x254.png 1024w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.07.43-300x74.png 300w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.07.43-768x191.png 768w, https:\/\/www.codecon.sk\/wp-content\/uploads\/2024\/10\/Snimka-obrazovky-2024-10-23-o-21.07.43.png 1362w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"55b1\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Added 16.10.2024:<\/p>\n<p id=\"4581\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">Thank you for reading and have a nice day<\/p>\n<p id=\"2bf4\" class=\"pw-post-body-paragraph mg mh gu mi b mj mk ml mm mn mo mp mq mr ms mt mu mv mw mx my mz na nb nc nd gn bk\" data-selectable-paragraph=\"\">If you want to get in touch with me or stay informed about Elixir or web development news, you can find me on\u00a0<a class=\"af oa\" href=\"https:\/\/x.com\/quatermain32\" target=\"_blank\" rel=\"noopener ugc nofollow\">X\/Twitter<\/a>.<\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Well, I lied. Doing a good search with the right results for your data is indeed magic. But thanks to the people who work on PostgreSQL and extensions it\u2019s not really magic. It can be a lot easier than you think.<\/p>\n","protected":false},"author":5,"featured_media":9092,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"categories":[56],"tags":[],"class_list":["post-9090","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Search is not magic with PostgreSQL &#8212; CODECON<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.codecon.sk\/en\/search-is-not-magic-with-postgresql\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Search is not magic with PostgreSQL &#8212; CODECON\" \/>\n<meta property=\"og:description\" content=\"Well, I lied. Doing a good search with the right results for your data is indeed magic. But thanks to the people who work on PostgreSQL and extensions it\u2019s not really magic. It can be a lot easier than you think.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.codecon.sk\/en\/search-is-not-magic-with-postgresql\/\" \/>\n<meta property=\"og:site_name\" content=\"CODECON\" \/>\n<meta property=\"article:published_time\" content=\"2025-10-25T13:58:18+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-03-11T15:03:37+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.codecon.sk\/wp-content\/uploads\/2025\/10\/magazin-2-1024x512-1.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"512\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"author\" content=\"Eva Kalinov\u00e1\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Eva Kalinov\u00e1\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"41 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/search-is-not-magic-with-postgresql\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/search-is-not-magic-with-postgresql\\\/\"},\"author\":{\"name\":\"Eva Kalinov\u00e1\",\"@id\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/#\\\/schema\\\/person\\\/815e94ce93756757859939501e018711\"},\"headline\":\"Search is not magic with PostgreSQL\",\"datePublished\":\"2025-10-25T13:58:18+00:00\",\"dateModified\":\"2026-03-11T15:03:37+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/search-is-not-magic-with-postgresql\\\/\"},\"wordCount\":7164,\"image\":{\"@id\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/search-is-not-magic-with-postgresql\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.codecon.sk\\\/wp-content\\\/uploads\\\/2025\\\/10\\\/magazin-2-1024x512-1.webp\",\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/search-is-not-magic-with-postgresql\\\/\",\"url\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/search-is-not-magic-with-postgresql\\\/\",\"name\":\"Search is not magic with PostgreSQL &#8212; CODECON\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/search-is-not-magic-with-postgresql\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/search-is-not-magic-with-postgresql\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.codecon.sk\\\/wp-content\\\/uploads\\\/2025\\\/10\\\/magazin-2-1024x512-1.webp\",\"datePublished\":\"2025-10-25T13:58:18+00:00\",\"dateModified\":\"2026-03-11T15:03:37+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/#\\\/schema\\\/person\\\/815e94ce93756757859939501e018711\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/search-is-not-magic-with-postgresql\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.codecon.sk\\\/en\\\/search-is-not-magic-with-postgresql\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/search-is-not-magic-with-postgresql\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.codecon.sk\\\/wp-content\\\/uploads\\\/2025\\\/10\\\/magazin-2-1024x512-1.webp\",\"contentUrl\":\"https:\\\/\\\/www.codecon.sk\\\/wp-content\\\/uploads\\\/2025\\\/10\\\/magazin-2-1024x512-1.webp\",\"width\":1024,\"height\":512},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/search-is-not-magic-with-postgresql\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Domovsk\u00e1 str\u00e1nka\",\"item\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Search is not magic with PostgreSQL\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/\",\"name\":\"CODECON\",\"description\":\"Najv\u00e4\u010d\u0161ie komunitn\u00e9 stretnutie v\u00fdvoj\u00e1rov na Slovensku\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.codecon.sk\\\/en\\\/#\\\/schema\\\/person\\\/815e94ce93756757859939501e018711\",\"name\":\"Eva Kalinov\u00e1\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Search is not magic with PostgreSQL &#8212; CODECON","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.codecon.sk\/en\/search-is-not-magic-with-postgresql\/","og_locale":"en_US","og_type":"article","og_title":"Search is not magic with PostgreSQL &#8212; CODECON","og_description":"Well, I lied. Doing a good search with the right results for your data is indeed magic. But thanks to the people who work on PostgreSQL and extensions it\u2019s not really magic. It can be a lot easier than you think.","og_url":"https:\/\/www.codecon.sk\/en\/search-is-not-magic-with-postgresql\/","og_site_name":"CODECON","article_published_time":"2025-10-25T13:58:18+00:00","article_modified_time":"2026-03-11T15:03:37+00:00","og_image":[{"width":1024,"height":512,"url":"https:\/\/www.codecon.sk\/wp-content\/uploads\/2025\/10\/magazin-2-1024x512-1.webp","type":"image\/webp"}],"author":"Eva Kalinov\u00e1","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Eva Kalinov\u00e1","Est. reading time":"41 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.codecon.sk\/en\/search-is-not-magic-with-postgresql\/#article","isPartOf":{"@id":"https:\/\/www.codecon.sk\/en\/search-is-not-magic-with-postgresql\/"},"author":{"name":"Eva Kalinov\u00e1","@id":"https:\/\/www.codecon.sk\/en\/#\/schema\/person\/815e94ce93756757859939501e018711"},"headline":"Search is not magic with PostgreSQL","datePublished":"2025-10-25T13:58:18+00:00","dateModified":"2026-03-11T15:03:37+00:00","mainEntityOfPage":{"@id":"https:\/\/www.codecon.sk\/en\/search-is-not-magic-with-postgresql\/"},"wordCount":7164,"image":{"@id":"https:\/\/www.codecon.sk\/en\/search-is-not-magic-with-postgresql\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codecon.sk\/wp-content\/uploads\/2025\/10\/magazin-2-1024x512-1.webp","inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.codecon.sk\/en\/search-is-not-magic-with-postgresql\/","url":"https:\/\/www.codecon.sk\/en\/search-is-not-magic-with-postgresql\/","name":"Search is not magic with PostgreSQL &#8212; CODECON","isPartOf":{"@id":"https:\/\/www.codecon.sk\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.codecon.sk\/en\/search-is-not-magic-with-postgresql\/#primaryimage"},"image":{"@id":"https:\/\/www.codecon.sk\/en\/search-is-not-magic-with-postgresql\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codecon.sk\/wp-content\/uploads\/2025\/10\/magazin-2-1024x512-1.webp","datePublished":"2025-10-25T13:58:18+00:00","dateModified":"2026-03-11T15:03:37+00:00","author":{"@id":"https:\/\/www.codecon.sk\/en\/#\/schema\/person\/815e94ce93756757859939501e018711"},"breadcrumb":{"@id":"https:\/\/www.codecon.sk\/en\/search-is-not-magic-with-postgresql\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.codecon.sk\/en\/search-is-not-magic-with-postgresql\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codecon.sk\/en\/search-is-not-magic-with-postgresql\/#primaryimage","url":"https:\/\/www.codecon.sk\/wp-content\/uploads\/2025\/10\/magazin-2-1024x512-1.webp","contentUrl":"https:\/\/www.codecon.sk\/wp-content\/uploads\/2025\/10\/magazin-2-1024x512-1.webp","width":1024,"height":512},{"@type":"BreadcrumbList","@id":"https:\/\/www.codecon.sk\/en\/search-is-not-magic-with-postgresql\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Domovsk\u00e1 str\u00e1nka","item":"https:\/\/www.codecon.sk\/en\/"},{"@type":"ListItem","position":2,"name":"Search is not magic with PostgreSQL"}]},{"@type":"WebSite","@id":"https:\/\/www.codecon.sk\/en\/#website","url":"https:\/\/www.codecon.sk\/en\/","name":"CODECON","description":"Najv\u00e4\u010d\u0161ie komunitn\u00e9 stretnutie v\u00fdvoj\u00e1rov na Slovensku","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.codecon.sk\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.codecon.sk\/en\/#\/schema\/person\/815e94ce93756757859939501e018711","name":"Eva Kalinov\u00e1"}]}},"_links":{"self":[{"href":"https:\/\/www.codecon.sk\/en\/wp-json\/wp\/v2\/posts\/9090","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.codecon.sk\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.codecon.sk\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.codecon.sk\/en\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.codecon.sk\/en\/wp-json\/wp\/v2\/comments?post=9090"}],"version-history":[{"count":1,"href":"https:\/\/www.codecon.sk\/en\/wp-json\/wp\/v2\/posts\/9090\/revisions"}],"predecessor-version":[{"id":9091,"href":"https:\/\/www.codecon.sk\/en\/wp-json\/wp\/v2\/posts\/9090\/revisions\/9091"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.codecon.sk\/en\/wp-json\/wp\/v2\/media\/9092"}],"wp:attachment":[{"href":"https:\/\/www.codecon.sk\/en\/wp-json\/wp\/v2\/media?parent=9090"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.codecon.sk\/en\/wp-json\/wp\/v2\/categories?post=9090"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.codecon.sk\/en\/wp-json\/wp\/v2\/tags?post=9090"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}