Your keywords still matter. They're just not enough anymore.
The same shift is hitting Amazon, Shopify, and Etsy all at once. Here's what changed on each platform, and what it means for whether your products get found.
Evidence tiers (read this first)
This piece draws from sources with different levels of strength, and we keep those distinctions clear on purpose:
- Amazon. Backed by a peer-reviewed paper. This is the strongest evidence.
- Etsy. Based on Etsy's own engineering blog, including production metrics from a live system.
- Shopify. Based on product documentation, changelog updates, and earnings.
- Answer engines. Based on platform statements and third-party analytics. This is the softest layer of evidence.
Where the evidence is stronger, we lean on it more heavily. Where it's softer, we say so.
Search now has a second step
For roughly the last 13 years, product discovery was mostly driven by keyword matching. A shopper typed words. The marketplace found listings containing those words, then ranked them based on performance.
The old playbook was straightforward:
- Research high-volume keywords.
- Put them in your title, bullets, and backend fields.
- Build sales velocity.
Tools like Helium 10 and Jungle Scout were built for exactly that model. For a long time, it worked.
That model hasn't disappeared. But it has been layered over.
Across every major platform, search now includes a second step. After matching the words in the query, the system tries to infer the intent behind them:
- Why is this person searching?
- What situation are they in?
- Who is this product for?
- What would actually solve their problem?
That changes the unit of competition.
Under keyword matching, you competed on coverage: did your listing contain the right words? Under intent matching, you compete on something much harder to fake. Does your listing communicate the buyer's real need, in language the system can connect to the query even when the wording doesn't overlap exactly?
The same structure now shows up across platforms, so it's useful to name it once:
- The candidate pool. Keywords and structured data still determine whether you're even eligible to appear. Ignore this layer and you're invisible. It has not gone away.
- The intent layer. Among the listings that qualify, the system increasingly favors the ones whose content best matches the inferred intent behind the query. This is the newer layer. It's where the old keyword playbook starts to break down.
The rest of this breakdown tells that same two-layer story three times, once for Amazon, once for Shopify, once for Etsy, plus the answer-engine layer now sitting on top of all of them.
Amazon: this change is published, not rumored
Amazon's version of this shift is the best-documented case in e-commerce, because Amazon published it directly.
The system is called COSMO, the Common Sense Knowledge Generation and Serving System. It appears in a peer-reviewed SIGMOD-Companion 2024 paper hosted on Amazon Science. This is not a leaked patent or a guru theory. It is published research describing a production system.
COSMO is a commonsense knowledge graph built from real shopper behavior:
- Roughly 6.3 million nodes and 29 million edges.
- 18 product categories.
- Built from search-buy pairs (what people searched for, then bought) and co-buy pairs (what people bought together).
Its job is to learn the intent behind a query. The paper's own opening example makes the idea concrete:
A shopper searches "shoes for pregnant women." Those shoppers went on to buy slip-resistant shoes. So COSMO learns the link (pregnant → requires → slip-resistant), even though the query never included the phrase "slip-resistant."
That's the whole idea: connect the query to the need behind it.
Under the hood, COSMO organizes this information into 15 relation types, described in the paper's Table 2 and Figure 4. These relations cover:
- Who a product is for.
- What it does or is capable of.
- What it fundamentally is.
- Where, when, and on what part of the body it is used.
- What it pairs with.
If your listing only describes surface-level features, it helps the keyword layer but gives the intent layer very little to work with.
Two qualifications matter here, because precision matters:
-
Keywords still affect ranking. The paper does not say COSMO replaces keyword matching. COSMO knowledge is combined with product text inside the relevance model. Keywords still help get you into the candidate pool. Intent helps determine relevance once you're there.
-
COSMO is not a listing grader. The paper describes COSMO being used in three applications:
- Search relevance scoring.
- Session-based recommendation.
- Search navigation.
It does not describe a per-listing "readiness score." So if a tool sells you a COSMO score, that score is a vendor-created layer built on top of the paper, not something quoted from it. The 15 relation types are real and citable. A listing grade is a productized interpretation.
The customer-facing companion to all this is Rufus, Amazon's AI shopping assistant. In Amazon's Q4 2025 earnings, reported in February 2026, the company said:
- More than 300 million customers used Rufus during 2025.
- Rufus drove nearly $12 billion in incremental annualized sales.
- Rufus users were about 60% more likely to complete a purchase.
The COSMO paper predates Rufus and does not mention it, so any claim that "COSMO powers Rufus" should be treated as industry inference, not direct evidence.
What is not inference is this: Amazon has already shifted meaningful traffic and revenue toward intent-based discovery.
Shopify: the shift hits both on your storefront and beyond it
Shopify is really two versions of the same shift happening at once, and Shopify now documents both.
On your own storefront
Shopify has added AI-powered semantic search to native online-store search through its Search & Discovery app.
In Shopify's own documentation, the feature is described as going beyond keyword matching to understand buyer intent by using related words, concepts, and categories. It also reads product attributes to do that, including the product description and image data, such as colors and text inside images.
Shopify's own example is simple and useful:
A shopper searches "christmas party shoes." The store has no products using the words "christmas" or "party." But it does sell red pumps. The system connects christmas with red, and party shoes with pumps, so the red pumps still appear.
That's the same shift Amazon and Etsy are making: match the intent, not just the string.
A few specifics matter here:
- It applies to storefront search, not predictive search.
- It is available only on select plans (Shopify names Grow, Advanced, and Plus).
- It covers catalogs with fewer than 200,000 products.
So no, it isn't active on every Shopify store by default. But it is clearly the direction of Shopify's own search stack, and the mechanism is documented rather than guessed at.
And because the system reads your descriptions and image data, thin, vague, or spec-only product content gives it less to match against.
Off your storefront
The second shift is even bigger: buyers increasingly never reach your store through a traditional search bar at all. They find you through an answer engine.
In Shopify's Q4 2025 earnings, the company reported that orders coming from AI search sources (ChatGPT, Google AI Mode, AI Overviews, Gemini, and Perplexity) grew roughly 15x year over year. According to Shopify's merchant documentation:
- Shopify products are automatically discoverable in ChatGPT through the Shopify Catalog, with no opt-in required.
- Merchants can opt into Google AI Mode through Agentic Storefronts in the admin.
- That channel began rolling out by default for stores in late March 2026.
These engines parse your product data and try to match it to the buyer's stated need. A creative product title like "The Luna" matches nothing when the buyer asks for "an organic cotton sleep mask for sensitive skin." The title is seller language. The request is buyer language.
And that's the key point: both doors lead to the same requirement.
Whether the shopper is searching on your storefront or discovering you through an external answer engine, both systems are reading your product data for intent. Both reward clear, descriptive, intent-rich product language.
This is the same intent layer Amazon and Etsy are building toward. It's just arriving on the one platform where you control the storefront, and still can't opt out of the broader shift.
Etsy: popularity is not the same as relevance
Etsy is actually the second best-documented example here, not the weakest.
In January 2026, Etsy's Search Relevance team published a detailed write-up on the company's engineering blog, Code as Craft. It described a deployed LLM-powered system called the Semantic Relevance Evaluation and Enhancement Framework. This is Etsy's own primary documentation. It is not seller-forum speculation or SEO folklore.
The most useful part is Etsy's explanation of why the system was built.
Etsy says its search historically relied heavily on engagement signals (clicks, add-to-carts, purchases) as proxies for relevance. But those signals are biased. Popular listings naturally attract more clicks, even when they aren't the best fit for a specific query.
Semantic relevance was introduced as a corrective. Etsy defines it as how well a listing aligns with the buyer's intent as expressed in the query. That matters. It means a major marketplace is publicly acknowledging that "what gets engagement" and "what the buyer actually meant" are not the same thing, and is engineering toward the second.
How it works, in Etsy's own framing:
- An LLM evaluates query-listing pairs as relevant, partially relevant, or irrelevant.
- The process is anchored to human-labeled "golden" data.
- That judgment layer is then distilled into a fast model that can run in production with less than 10 milliseconds of added latency.
Crucially, the model uses much more than titles and tags. It reads the title, images, text description, attributes, variations, and extracted entities. So if your listing language is thin, vague, or overly generic, the model has less signal to work with.
And Etsy says this system does more than just reshuffle rankings. In production, it does three separate things:
- Filters out listings predicted to be irrelevant before ranking happens.
- Feeds relevance scores into the ranking model.
- Boosts listings judged to be highly relevant.
That means a listing can match the keywords and still get removed from real contention if it reads as a weak intent match. The consequence is bigger than "show up lower." In some cases, it means not really showing up at all.
This system is live, and Etsy reports measurable results:
The share of fully relevant listings increased from 58% to 62% between August and October 2025. A search for "fall decor" now surfaces seasonal decor and deprioritizes loosely related items like clothing that previously slipped through.
Etsy also notes that this shift can help small or newer sellers surface more often, because relevance now has more weight than raw engagement history alone.
One detail neatly ties this whole breakdown together. Etsy says its work on fine-grained relevance was inspired by Amazon's ESCI framework (Exact, Substitute, Complement, Irrelevant). That is the same relevance scheme Amazon used to validate COSMO in its paper. So two of the largest marketplaces in e-commerce are now building on related relevance machinery. That is one of the clearest signals available that this is a direction of travel, not a passing vendor fad.
The practical takeaway is the same as on the other platforms: write listing copy that clearly explains what the item is, who it's for, and how it's used, in natural language. An LLM is now reading for intent match. It is not simply counting tags.
The layer above all of this: answer engines choose the shortlist
On top of these marketplaces, there is now another discovery layer: answer engines.
More shoppers are beginning their journey in places like ChatGPT, Google AI Overviews, Google AI Mode, Perplexity, and Gemini instead of a conventional search bar.
These systems do not return a page with 50 links for the shopper to sift through. They read product data across the web and return a short, curated shortlist, often just a handful of named products.
That compression changes the game:
- Ranking eighth on a results page used to still generate clicks.
- Being excluded from a five-product AI shortlist generates nothing.
And how do these systems build the shortlist? By reading which products best match the intent and constraints in the request. That's the intent layer again, except now it's influencing not just order, but the entire consideration set.
Adobe's retail analytics, cited in our flagship report, found that AI-driven traffic converts materially better than traditional channels. That helps explain why every platform above is now racing to expose cleaner, more intent-rich product data to these engines.
The surfaces may be fragmenting. But the requirement underneath them is converging.
The pattern underneath all three platforms
Strip away the platform names and the same picture remains. Product discovery now has two layers almost everywhere.
1. The candidate-pool layer rewards coverage and structure: keywords, completed fields, schema, clean feeds. Keyword tools like Helium 10 and Jungle Scout are still useful here. This layer still matters. If you miss it, nothing else saves you.
2. The intent layer rewards language match: who the product is for, what situation it fits, what problem it solves, what it gets compared against, and what it pairs with.
This layer does not respond to keyword volume. It responds to whether your listing reflects how buyers actually think about the category. And increasingly, it acts as a gate, not just a tiebreaker:
- Etsy can filter out listings it predicts are irrelevant before ranking even begins.
- Answer engines can leave weak matches off the shortlist entirely.
So a keyword match no longer guarantees that you're still in contention.
That's where the old keyword playbook runs into a ceiling. It was built for the first layer. It has very little to say about the second.
Knowing that 40,000 people a month search for "standing desk" tells you to include the phrase "standing desk." It does not tell you what those shoppers are quietly worried about:
- Wobble at full height.
- Noise during video calls.
- Whether it will fit in a small apartment.
That second category of information belongs to the intent layer. And it does not live in a keyword tool. It lives in what buyers say.
What it actually takes to compete on intent
If you want to compete on the intent layer, you need access to the buyer's own language for the category:
- The criteria they use to judge options.
- The objections they raise.
- The use cases they mention.
- The outcomes they want.
- The products they compare yours against.
- The words they naturally use.
That is not the kind of data keyword tools generate. It is the kind of data buyers generate, scattered across Reddit threads, YouTube comments, marketplace reviews, Q&A pages, and category-specific forums.
There are two ways to get it.
1. Read buyer language manually. This works. It's also slow. Our companion guide, How to Research Buyer Voice for Your Product Category (The Manual Way), walks through the process in full: how to define a buyer-intent query, where to look, how to structure what you find into a Voice Map, and how to turn that into listing copy. Budget eight hours per category. The output is real. Most competitors will never do this work.
2. Automate the reading. That's what DecodeIQ does. A Category Scan reads buyer conversations across more than 20 networks. It extracts the same entity types the manual process teaches you to find, validates them across sources, and produces a Voice Map you can actually write from. The manual guide explains the method. DecodeIQ runs the same method in under 15 minutes.
One distinction is worth keeping clear, because it's easy to blur. You can ask an LLM to "imagine" the intent behind a query, and it will usually produce something plausible. But plausible is not the same as observed.
When a model guesses buyer intent, it is still starting from the model's assumptions. That is seller language wearing an intent costume. Reading what buyers actually said this week is different in kind, not just degree.
Whether you do that reading by hand or with a tool, the input needs to come from observed buyer language, not invented intent. That distinction is the whole point.
Where to start
- Want the paradigm-level argument with the full data? Read the flagship report: Invisible to AI: Why Your Product Listings Are Disappearing from the Search That Converts 42% Better.
- Want to build buyer intelligence yourself? Start with How to Research Buyer Voice for Your Product Category (The Manual Way).
- Want the framework behind it? See The Buyer Voice Gap and the buyer intelligence dossier.
- Want to skip the manual reading? Run a Category Scan.
The platforms are converging on the same requirement faster than most sellers are adapting. The keyword layer got you this far. The intent layer is where visibility will be won or lost over the next few years, and the input that feeds it is buyer language.
Start from the buyer. Not the spreadsheet.
Frequently asked questions
What is the difference between keyword matching and intent matching in e-commerce search?
Keyword matching finds listings that contain the words a shopper typed, then ranks them by performance. Intent matching adds a second step: after matching the words, the system infers the need behind them and favors listings whose content fits that need. Keywords still decide eligibility, but intent increasingly decides relevance.
What is Amazon COSMO?
COSMO is the Common Sense Knowledge Generation and Serving System that Amazon described in a peer-reviewed SIGMOD-Companion 2024 paper. It is a commonsense knowledge graph of roughly 6.3 million nodes and 29 million edges across 18 product categories, built from real search-buy and co-buy behavior. Its purpose is to learn the intent behind a query, not just match its words.
Does COSMO replace keywords on Amazon?
No. The paper describes COSMO knowledge being combined with your product text inside the relevance model, not replacing keyword matching. Keywords still get a listing into the candidate pool, and intent signals help decide relevance once it qualifies.
Is there a COSMO score for my product listing?
No. The COSMO paper deploys the system in three applications, search relevance scoring, session-based recommendation, and search navigation, and never describes a per-listing readiness score. Any COSMO score sold by a tool is a vendor-built layer on top of the paper, not a metric quoted from it.
How does Etsy decide which listings are relevant?
Etsy runs an LLM-powered system that grades query-listing pairs as relevant, partially relevant, or irrelevant, anchored to human-labeled data. In production it filters out listings predicted to be irrelevant before ranking, feeds relevance scores into ranking, and boosts strong matches. A keyword-matched listing can be removed from contention if it reads as a weak intent match.
Why are keyword tools like Helium 10 and Jungle Scout not enough anymore?
Those tools are strong at the candidate-pool layer: the keyword volume, fields, and structure that decide eligibility. They do not capture the intent layer, which is how buyers actually describe their needs, objections, and use cases. Keyword volume tells you which phrase to use, not what buyers are worried about behind it.
How do I optimize a product listing for AI search and answer engines?
Write copy that states clearly what the item is, who it is for, and how it is used, in natural language. Answer engines like ChatGPT and Google AI Mode read product data and return a short shortlist based on intent match, so thin or spec-only text gives them little to work with. The input that feeds the intent layer is observed buyer language from real buyer conversations.
What is the buyer voice gap?
The buyer voice gap is the distance between how sellers describe products and how buyers actually talk about them when deciding. Sellers write in feature and spec language, while buyers reason in needs, objections, and use cases. Intent-based search rewards listings that close that gap by reflecting real buyer language.
Sourcing notes
- Amazon COSMO: "COSMO: A Large-Scale E-commerce Common Sense Knowledge Generation and Serving System at Amazon," SIGMOD-Companion 2024 (Amazon Science). Node and edge counts, category coverage, the 15 relation types, the Figure 1 pregnant-women example, and the three deployed applications are drawn directly from the paper.
- Amazon Rufus: Amazon Q4 2025 earnings, reported February 2026 (300M+ users, ~$12B incremental annualized sales, ~60% higher purchase completion). Rufus is not mentioned in the COSMO paper; any direct link between them remains industry inference.
- Shopify: Shopify Help Center documentation on semantic understanding in the Search & Discovery app (including the "christmas party shoes" example, the use of description and image data, and the plan, catalog-size, and storefront-only requirements), plus the Shopify changelog entry "Semantic Search is now available on more plans" (June 24, 2024). Off-storefront figures come from Shopify Q4 2025 earnings (~15x AI-search order growth) and Shopify merchant documentation covering ChatGPT Catalog discovery and Google AI Mode via Agentic Storefronts.
- Etsy: "How Etsy Uses LLMs to Improve Search Relevance," Etsy Code as Craft engineering blog, January 16, 2026 (Yuqing Zhang, Congzhe Su, Susan Liu). The Semantic Relevance Evaluation and Enhancement Framework, the engagement-bias rationale, the relevance categories, the listing features used, the filtering / feature-enrichment / boosting integration points, the 58% to 62% deployment metric (August to October 2025), the "fall decor" example, and the ESCI reference all come from Etsy directly.
- Answer-engine layer and conversion data: Adobe retail analytics as cited in the DecodeIQ flagship report, plus the platform statements above.
Jack Metalle is the Founding Technical Architect of DecodeIQ, a buyer intelligence platform that helps e-commerce sellers understand how their customers actually think, compare, and decide. His M.Sc. thesis (2004) predicted the shift from keyword-based to semantic retrieval systems. He has spent two decades building systems that extract structured meaning from unstructured data.
Related Articles
The Buyer Voice Gap: Why Your E-Commerce Listings Speak the Wrong Language
The Buyer Voice Gap is the invisible mismatch between how sellers describe products and how buyers evaluate them. Here is how to detect and close it.
April 16, 2026
GuideHow to Research Buyer Voice for Your Product Category (The Manual Way)
A step-by-step manual for researching how buyers in your product category actually think, compare, and decide before they purchase.
April 26, 2026
ArticleWhy Your High-Volume Keywords Are Not Converting: The Decision Framework Problem
Keywords capture search intent fragments. Buyer decision frameworks live underneath. Ranking well without converting usually means the framework is missing.
April 16, 2026
See how your category's buyers actually talk
DecodeIQ scans real buyer conversations across Reddit, YouTube, reviews, and forums, then generates listing copy that speaks your buyer's language.