Role

Product Designer (shared ownership, 2 designers)

Timeline

Sep 2024 (24 hours)

Team

2 Designers
3 Engineers
1 Cybersecurity Specialist
3 Data Analysts

Skills

UX Research
Competitive Analysis
Ideation
Interaction Design
Prototyping

overview

I had 24 hours to help make Amazon's reviews trustworthy again, and our team won first place.

The "verified purchase" badge was no longer enough. Anyone could buy a product and leave a review. But could you trust it? That was the challenge.

As one of two designers on a nine-person team, I drove research synthesis and interaction design for TrustVine, a system that uses Amazon's existing Vine community to verify reviews at scale without compromising user privacy.

I led the concept from synthesis to interactive prototype and presented the final solution to Amazon's judging team.

Our solution placed first amongst three teams, recognized for creating a genuine trust signal that felt native to Amazon's ecosystem.

overview

I had 24 hours to help make Amazon's reviews trustworthy again, and our team won first place.

The "verified purchase" badge was no longer enough. Anyone could buy a product and leave a review. But could you trust it? That was the challenge.

As one of two designers on a nine-person team, I drove research synthesis and interaction design for TrustVine, a system that uses Amazon's existing Vine community to verify reviews at scale without compromising user privacy.

I led the concept from synthesis to interactive prototype and presented the final solution to Amazon's judging team.

Our solution placed first amongst three teams, recognized for creating a genuine trust signal that felt native to Amazon's ecosystem.

the problem

Although AI systems can improve the detection of deceptive reviews through semantic analysis, they are insufficient on their own.

Automated systems now catch bots and paid shills reliably. But LLMs struggle with low-effort comments like "it's fine," sarcasm, or sometimes an image that isn't even for the right product. Humans, on the other hand, can spot them almost immediately.

LLM-generated reviews can closely resemble human writing, making them difficult for AI systems to detect. Moreover, research on Amazon reviews shows that text-based approaches alone are ineffective, as deceptive behaviour is better identified through network patterns. Together, these limitations reduce the reliability of fully automated detection systems.

Take a typical Amazon product page. Both reviews shown are from verified purchases, but their quality vastly differs. One offers a low-effort, unhelpful comment. The other provides verifiable detail with an image and specific experience.

the problem

Although AI systems can improve the detection of deceptive reviews through semantic analysis, they are insufficient on their own.

Automated systems now catch bots and paid shills reliably. But LLMs struggle with low-effort comments like "it's fine," sarcasm, or sometimes an image that isn't even for the right product. Humans, on the other hand, can spot them almost immediately.

LLM-generated reviews can closely resemble human writing, making them difficult for AI systems to detect. Moreover, research on Amazon reviews shows that text-based approaches alone are ineffective, as deceptive behaviour is better identified through network patterns. Together, these limitations reduce the reliability of fully automated detection systems.

Take a typical Amazon product page. Both reviews shown are from verified purchases, but their quality vastly differs. One offers a low-effort, unhelpful comment. The other provides verifiable detail with an image and specific experience.

research & discovery

I pushed for six hours of research before we opened a design tool.

I led a competitive analysis of Yelp's Elite Squad and Google's Local Guides. Every platform used human moderators, but none had a closed loop where human feedback improved the AI over time.

I also saw the Vine Voice program as an underused asset. Their reviews were confined to assigned products, which limited their reach and made their judgment less environmentally valuable. Each review existed in isolation instead of feeding a smarter system.

So the question then became: "How might we leverage Vine Voices to create a verification layer that makes reviews more trustworthy for shoppers, sellers, and Amazon's own AI models?"

research & discovery

I pushed for six hours of research before we opened a design tool.

I led a competitive analysis of Yelp's Elite Squad and Google's Local Guides. Every platform used human moderators, but none had a closed loop where human feedback improved the AI over time.

I also saw the Vine Voice program as an underused asset. Their reviews were confined to assigned products, which limited their reach and made their judgment less environmentally valuable. Each review existed in isolation instead of feeding a smarter system.

So the question then became: "How might we leverage Vine Voices to create a verification layer that makes reviews more trustworthy for shoppers, sellers, and Amazon's own AI models?"

define

Rather than adding to a sea of reviews, what if human judgment could make the entire ecosystem smarter?

I created Amira, a Vine Voice member who has a good eye for spotting inauthentic writing. Mapping her journey surfaced a core insight: Vine members already evaluate products with genuine care, but that judgment only added more noise.

define

Rather than adding to a sea of reviews, what if human judgment could make the entire ecosystem smarter?

I created Amira, a Vine Voice member who has a good eye for spotting inauthentic writing. Mapping her journey surfaced a core insight: Vine members already evaluate products with genuine care, but that judgment only added more noise.

Ideation & Process

The team aligned early on working within Amazon's existing infrastructure rather than building something new.

Two constraints shaped my design direction: our cybersecurity specialist ruled out surfacing reviewer identities, and our data analysts pushed for judgments that could feed back into model training.

My co-designer and I built the submission flow together, then I took ownership of the dashboard, information architecture, and shopper-facing trust signal.

Ideation & Process

The team aligned early on working within Amazon's existing infrastructure rather than building something new.

Two constraints shaped my design direction: our cybersecurity specialist ruled out surfacing reviewer identities, and our data analysts pushed for judgments that could feed back into model training.

My co-designer and I built the submission flow together, then I took ownership of the dashboard, information architecture, and shopper-facing trust signal.

final design

Verification mechanism

I chose a binary flag (helpful or report) to prevent decision fatigue on a repeated task. After flagging, follow up questions appear. By selecting the specific reason, that human judgment becomes training data for Amazon's model. Multiple flags from different Vine members then trigger a final determination.

Incentive model

I introduced three reward mechanisms for Vine members. First, a leaderboard showing points earned per verification. Second, weekly goals to encourage consistent participation. Lastly, accuracy tracking to reward members whose judgments consistently aligned with the community consensus.

Privacy constraints

I designed around anonymization by default. Identities and purchase history remain hidden. This simplified the UI and kept judgments focused on content quality, not on who wrote the review.

reflections

Good alignment at the start is what made fast execution possible at the end.

The six hours we spent on research and cross-discipline discussion before touching any design tool felt like a risk at the time. In hindsight, it's why our solution felt coherent rather than bolted together. Everyone on the team understood the problem the same way which meant that when we had to make fast calls in the final hours, we were cutting scope, not re-explaining the concept.

reflections

Good alignment at the start is what made fast execution possible at the end.

The six hours we spent on research and cross-discipline discussion before touching any design tool felt like a risk at the time. In hindsight, it's why our solution felt coherent rather than bolted together. Everyone on the team understood the problem the same way which meant that when we had to make fast calls in the final hours, we were cutting scope, not re-explaining the concept.

what's next

I would invest in the feedback loop next.

The submission flow was the only surface fully built during the hackathon. Given more time, I would design feedback loop visibility. Showing Vine members their accuracy rate and how their flags shaped the AI would turn a transactional task worth coming back to.

what's next

I would invest in the feedback loop next.

The submission flow was the only surface fully built during the hackathon. Given more time, I would design feedback loop visibility. Showing Vine members their accuracy rate and how their flags shaped the AI would turn a transactional task worth coming back to.