What AI gets right (and wrong) about discovery
Looking for inspiration on top AI answer engines
I skipped a week of writing, and a week turned into a month. In that time, I missed a lot of AI e-commerce updates. OpenAI announced plans to integrate checkout into ChatGPT via Shopify. OpenAI introduced a ChatGPT agent. And Daydream launched their beta for fashion/apparel LLM-enabled search. Clearly, these teams are not taking Summer Fridays!
Many people warned me that traveling with a toddler is exhausting. They are 100% correct. The flights to and from Italy tested us (and probably all the poor passengers in our vicinity - sorry!). But it was well worth it. My son loved being in the mountains, trying Italian food, and exploring the playgrounds that are much nicer and much more numerous than I’m used to in NYC. I was also in my happy place with no cell service, not thinking at all about anything digital for a full week. Upon returning, I’ve been playing catch up. But I’m back!
Back in February, I asked AI tools to complete some shopping tasks on my behalf. What followed was a deep dive into the early capabilities of agentic commerce. I would love to revisit these capabilities in the next few weeks. Today, I wanted to take a similar approach and catalog my thoughts on AI answer engines for product search. I have written before about how the multimodal (especially visual) search capabilities inherent in AI answer engines will unlock richer product discovery for shopping. Today, I’ll put this to the test with some queries that are meant to challenge these capabilities.
I’m particularly interested in the fashion category. Fashion choices are highly subjective and nuanced, and traditional search has struggled to provide the curated, personalized results that consumers seek. Fabric quality, silhouette, or specific style preferences are hard to distill to type into a search bar. Daydream raised $50M to solve this problem, and Google has invested billions in Shopping capabilities. Unlocking a search experience that feels “magical” could turn traditional search on its head and revive the struggling luxury fashion e-commerce category.
What are AI answer engines?
Now that all e-comm platforms from Pinterest to ThredUp are injecting AI into search, which platforms “count” as AI answer engines? AI answer engines have the following features:
Provide direct, targeted answers vs. a list of links, like old-school Google
Are conversational, allowing the user to refine the response
Demonstrate multi-source reasoning - collecting inputs from multiple sources to provide a response
(Emerging) Answer engines like ChatGPT “remember” things about the user from previous engagements and can use these inputs to further personalize results. Not all answer engines have this ability today, but I believe this will soon become part of the standard as the tech advances.
By this definition, Google is a hybrid. Traditional googling still yields link-based answers, but more and more frequently, users see AI Overviews and other AI-driven content, which resembles answer engines. Google also offer Gemini, a traditional answer engine.
According to Orbit Media as of May 2025, ChatGPT and Google Gemini are used the most.
Interestingly, the starting point for all of these tools looks very similar. And nothing points users to a shopping use case. I’m curious how these interfaces will evolve as more users turn to these tools to shop.
In contrast, Daydream is fully shopping focused and provides prompting around how to search.
Shopping prompt challenge
To test out the top 3 answer engines (ChatGPT, Gemini, and Copilot) plus the new Daydream tool (which is specifically focused on shopping) I wanted to come up with shopping challenges that I think could stump the tools. Here are some actual things I’m searching for right now that I haven’t had much luck with (yet).
Prompt 1: A wedding challenge
I have a wedding in Telluride in early August. The attire is semi-formal with western-chic attire preferred. Please suggest some wedding guest dresses that are my style. Midi length preferred.
Why I think this could be challenging: Parsing wedding guest dress codes is notoriously difficult for humans. I’m curious how these tools will fare in curating appropriate dresses that also have a “western-chic” element.
Results:
❌ ChatGPT
Several of the recommendations were way too casual for a wedding. The first recommended option was white!
🟡 Copilot
The dresses seem reasonably appropriate, though not my style. I don’t like that the results don’t include images. Also, upon clicking the link, I am directed to a PLP instead of a PDP - so I need to search again on the retailer site to find the recommended style. Lulu’s was recommended multiple times - perhaps they are doing something right with structured data and/or piping product data directly to Microsoft?
🟡 Daydream
They did a good job showing midi dresses that are appropriate for a wedding (nothing white!) but the black options aren’t really appropriate for August. Their tool allows me to follow up and ask about other colors, which helps. I expected more unique options from the “western-chic” ask, since Daydream is all about apparel e-commerce.
❌ Gemini
Gemini described what an ideal dress would look like (fabrics, colors) but didn’t suggest any actual products. Gemini did this for all of the searches, which was surprising since it is Google’s product - I had hoped it would be the best shopping experience because Google Shopping is already very advanced.
Prompt 2: Fit criteria
I am looking for a cocktail dress that can be worn with a normal bra. Can you find some options in black?
Why I think this could be challenging: In traditional e-commerce, you have to use filters to search for items with specific product attributes. Many shoppers have strong preferences for clothes that can be worn with certain undergarments, have a specific shape (“gathered at the waist”) or similar, but traditional search doesn’t do a good job isolating these items.
🟡 Copilot
The options they suggested are most consistent with wearing a normal bra, but are all over the place in terms of color and formality, not checking several of my search boxes. They cite their sources, and I can see that they referenced lists and landing pages focused on the “bra friendly” aspect and didn’t necessarily triangulate beyond this. Lulu’s is again showing up frequently - the “Bra Friendly Dresses” landing page was clearly created to be SEO-friendly and it seems to be working well here, too.
❌ Daydream
The third option is strapless! But these are cocktail appropriate.
❌ ChatGPT
The second one is not normal bra friendly! ChatGPT shared lots of context on bra-friendly shapes, but then selected products that didn’t fit its own criteria. I wonder why.
Prompt 3: Photo inputs - with a twist
A top like this, but in red, blue or a different color (not white)
Why I think this could be challenging: This top has been following me around the internet since I clicked an Instagram ad a few days ago. I love the shape but I don’t like that it only comes in white. I’m curious how well the tools will do with multi-modal input.
🟡 ChatGPT
The suggestions were decent. I like how it talked me through the “thought” process. It is clearly interpreting my prompt accurately. The actual product matches still need work, though.
🟡 Copilot
Several of the options were in line with the ask. The second option was Sold Out.
❌ Daydream
If there was a top that was the polar opposite of the top in the prompt image, it would be one of these.
❌ Gemini
…Did not realize this was a shopping query and got confused.
My observations from testing out LLM search tools
Overall, these tools have a ways to go to crack the fashion code. It is cool that they can handle asks like “midi dress” or color preferences without the need for manual filtering.
Some things I noticed consistently:
Playing telephone: The tools seemed to interpret my prompts well, but lost the thread when they went to seek out options from retailers. Several of the tools walked me through their sound reasoning for their recommendations, then suggested products that violated their own criteria. I suspect that overly broad or missing product data could be a factor. In a similar vein, these tools surfaced Sold Out items. Perhaps this is part of the rationale to push brands to directly share product data.
Inability to handle multidimensional asks: Most tools could identify midi dresses, or tops that are bra-friendly. But the ability to find options that met multiple criteria was challenging. The tools seem to be relying on curated lists (LPs, “best of” lists) and then “trusting” that the things on that list are relevant.
Results get better with more inputs and refinement: For the most part, re-prompting did improve results, e.g. “I like the first option, show me more like that”
Tools need more inputs to understand my style: There was no magical moment where I thought “wow, that dress is so me!” The dresses checked some objective boxes, but the tools need more inputs on my personal style. Perhaps in future they will connect to social feeds and/or pull from my email to see what I’ve bought in the past.
Lack of serendipity: In the store when you’re shopping with a skilled sales associate, they might suggest something beyond your specific ask if they feel it would be flattering or work well for your needs. I hope that in the future, these tools will be able to integrate that kind of experience, perhaps by relying on their memory and detecting patterns over time.
This exercise will be fun to revisit in a few months, as these tools will likely improve significantly in a relatively short time.
Have you all tried shopping with AI answer engines? What has been your experience so far?
ICYMI
Last time, I wrote about my predicted secondary effects of AI in e-commerce. Of these predictions, I got the most feedback around my take that human content - models in ads, high-touch customer service, in-store support - will become a high-touch differentiator for luxury brands.
Disclaimer: The views and opinions expressed in this blog are solely my own and do not reflect the official policy or position of my employer or any organization with which I am affiliated. All content is written in a personal capacity.
Stay Curious,
Melina
Great post! I had a similar experience with Daydream... it got better with more input from me but I think it has a long way to go.
Absolutely love this post! I honestly have not tried to find specific products or items through the various LLMs, but I’m definitely going to give it a shot and I will let you know what I find.