DUET: Joint Exploration of User–Item Profiles in Recommendation

Research Paper

What is DUET?

A closed-loop framework for jointly generating aligned user and item profiles for recommendation — without hand-crafted templates.

Trained end-to-end with reinforcement learning using downstream recommendation performance as the reward signal.

3 Real-world Datasets Qwen3-8B & LLaMA3-8B State-of-the-Art Results

Traditional recommendation systems represent users and items as dense vectors and learn to align them in a shared latent space for relevance estimation. Recent LLM-based recommenders instead leverage natural-language representations that are easier to interpret and integrate with downstream reasoning modules.

This paper studies how to construct effective textual profiles for users and items, and how to align them for recommendation. A central difficulty is that the best profile format is not known a priori: manually designed templates can be brittle and misaligned with task objectives. Moreover, generating user and item profiles independently may produce descriptions that are individually plausible yet semantically inconsistent for a specific user–item pair.

We propose DUET, an interaction-aware profile generator that jointly produces user and item profiles conditioned on both user history and item evidence. DUET follows a three-stage procedure: it first turns raw histories and metadata into compact cues, then expands these cues into paired profile prompts to generate profiles, and finally optimizes the generation policy with reinforcement learning using downstream recommendation performance as feedback.

Key result: Experiments on three real-world datasets (Yelp, Amazon Music, Amazon Books) show that DUET consistently outperforms strong baselines, demonstrating the benefits of template-free profile exploration and joint user–item textual alignment.

Core Insight

Why joint profiling matters

Independently generated profiles may amplify incompatible facets of the same user–item pair, obscuring the true relevance signal.

Independent Generation Semantic Mismatch

User Profile

Fan of HEAVY METAL and PUNK. Prefers dark themes.

Item Profile

A highly rated POP-ROCK album with polished production.

The two profiles focus on incompatible aspects, hiding the shared funk/soul connection and producing a misleading relevance signal.

DUET — Joint Alignment Semantic Alignment ✓

User Profile

Enthusiast of FUNK and SOUL. Values technical mastery.

Item Profile

A peak-era FUNK-ROCK album with genre-blurring creativity.

DUET reconciles both sides into a compatible interpretation, surfacing the shared funk affinity and enabling accurate relevance estimation.

Figure 1. DUET aligns raw user and item data by transforming them into textual profiles within a shared semantic space.

Contributions

Three core contributions

01

Text-Based User–Item Alignment

Represent both users and items as natural-language profiles and align them in a shared semantic space, extending the classic vector-based alignment principle to interpretable textual representations fully compatible with LLMs.

02

Template-Free Exploration Framework

Start from cue-based initialization, expand cues into candidate profile prompts, and jointly optimize user and item profiles with downstream RL feedback — no rigid templates or hand-crafted attributes required.

03

State-of-the-Art Performance

Extensive experiments across three real-world datasets with two backbone LLMs show DUET consistently outperforms all baselines, validating both joint profiling and feedback-driven profile optimization.

Framework

The DUET Framework

A closed-loop framework that transforms raw user–item interaction histories into performance-aligned textual profiles through three learned stages — all realized in a single seq-to-seq forward pass at inference time.

1

Cue-Based Initialization

Raw user histories and item metadata are distilled into minimal cues — concise hypotheses highlighting one potential preference or characteristic. These act as lightweight seeds for profile exploration, deliberately underspecified to allow subsequent discovery.

"enjoys retro puzzle games"
"prefers concise product reviews"
"lightweight trail-running shoes"

2

Joint Exploration via Profile Prompt Discovery

Rather than directly summarizing, the model generates an intermediate constructed_prompt — a natural-language instruction defining format, abstraction level, and attribute selection. Conditioned on this prompt, user and item profiles are generated jointly.

Cue → Profile Prompt S → Profile
O = [Cue → S → Profile]
Single-pass seq-to-seq generation

3

On-Policy RL Optimization (GRPO)

Profiles are consumed by a frozen downstream recommender. The continuous fractional reward R_perf measures prediction accuracy and drives GRPO optimization, reinforcing profile constructions that yield better recommendations.

R(u,i) = 1 − |y_ui − ŷ_ui| / M
Optimized with GRPO (DeepSeek-AI, 2025)
Downstream model f is frozen throughout

Figure 2. Overview of the DUET framework. Three stages — Cue-Based Initialization, Joint Exploration via Adaptive Profile Prompt Discovery, and On-Policy Optimization — are unified into a single generation pass. The downstream task environment provides a continuous reward signal for optimizing profile quality.

Single-Pass Efficiency

All three stages — cue extraction, profile prompt construction, and profile generation — are realized in a single sequence-to-sequence forward pass at inference time, introducing no additional latency compared to standard profile generation methods.

Formulation as On-Policy RL

Profile generation is treated as an on-policy RL problem where the state is s = {H_u, H_i}, the action is the joint generation sequence, and quality is evaluated solely by functional utility in a fixed recommendation environment — no textual ground-truth profiles required.

Experiments

Experimental results

Evaluated on Yelp, Amazon Music, and Amazon Books using Qwen3-8B and LLaMA3-8B as both the profile generator and the downstream recommender.

Yelp

61.2%

Accuracy · Qwen3-8B

+5.05 pp vs. best baseline

Amazon Music

68.0%

Accuracy · Qwen3-8B

+5.59 pp vs. best baseline

Amazon Books

64.4%

Accuracy · Qwen3-8B

+5.03 pp vs. best baseline

Method	Yelp				Amazon Music				Amazon Books
Method	MAE↓	RMSE↓	Acc↑	F1↑	MAE↓	RMSE↓	Acc↑	F1↑	MAE↓	RMSE↓	Acc↑	F1↑
Qwen3-8B
10H (History Only)	1.1235	1.9478	23.17	27.54	0.9102	1.4021	39.26	46.58	0.9314	1.4527	37.63	45.19
KAR (Xi et al., 2024)	0.7396	1.2184	55.34	48.67	0.7483	1.1380	58.65	60.29	0.7098	1.0923	56.17	58.78
RLMRec (Ren et al., 2024)	0.8197	1.3312	47.15	42.46	0.7438	1.1069	54.89	57.65	0.7812	1.1584	52.86	55.93
PALR (Yang et al., 2023)	0.7994	1.2876	48.53	43.19	0.6075	0.9531	57.35	56.77	0.7485	1.1187	54.24	56.38
LettinGo (Wang et al., 2025)	0.6632	1.1047	56.18	48.95	0.4737	0.8834	62.37	57.09	0.5821	0.9416	59.35	60.57
Reason4Rec (Fang et al., 2025)	0.7028	1.1523	55.69	47.73	0.5654	0.9635	58.69	54.67	0.6397	1.0098	58.47	56.84
DUET (Ours)	0.5126	0.9485	61.23	55.18	0.3937	0.7564	67.96	63.89	0.4612	0.9089	64.38	59.27
LLaMA3-8B
10H (History Only)	1.0864	1.9532	22.09	27.30	0.7917	1.3346	38.13	46.87	0.8064	1.3866	37.15	45.27
KAR (Xi et al., 2024)	0.6427	1.1668	54.51	47.98	0.5726	0.9033	57.53	59.92	0.5892	0.9614	55.87	58.21
RLMRec (Ren et al., 2024)	0.7428	1.3572	46.74	42.11	0.6076	0.9886	53.78	57.42	0.6226	0.9477	52.12	55.79
PALR (Yang et al., 2023)	0.7238	1.3265	47.72	43.29	0.5823	0.9222	56.73	59.31	0.5977	0.8855	55.06	57.62
LettinGo (Wang et al., 2025)	0.6196	1.1289	56.03	51.24	0.5204	0.9369	61.92	59.50	0.5543	0.7967	58.95	60.39
Reason4Rec (Fang et al., 2025)	0.7586	1.0418	55.80	53.00	0.5442	0.7722	60.86	54.88	0.6029	0.8345	59.70	56.35
DUET (Ours)	0.5367	0.9687	60.87	54.74	0.4680	0.8277	63.30	60.60	0.5092	0.9500	63.42	58.12

Best values shown in green bold. DUET (highlighted rows) consistently outperforms all baselines across both backbone LLMs and all three datasets.

Method	Yelp			Amazon Music			Amazon Books
Method	NDCG@1	NDCG@5	NDCG@10	NDCG@1	NDCG@5	NDCG@10	NDCG@1	NDCG@5	NDCG@10
10H	0.1823	0.2815	0.4928	0.1875	0.3796	0.5153	0.1841	0.3146	0.4263
KAR	0.2156	0.3298	0.5412	0.3018	0.4896	0.6015	0.2965	0.4715	0.5834
RLMRec	0.2419	0.3472	0.5587	0.3371	0.5434	0.6162	0.2748	0.4526	0.5719
PALR	0.2494	0.3563	0.5691	0.3395	0.5247	0.6115	0.2627	0.4634	0.5538
LettinGo	0.3187	0.4685	0.5814	0.4012	0.5674	0.6489	0.3795	0.5189	0.6284
Reason4Rec	0.2575	0.3792	0.5526	0.2928	0.5912	0.6343	0.3013	0.4928	0.5959
DUET (Ours)	0.3390	0.4873	0.6008	0.5123	0.6165	0.7025	0.4288	0.5638	0.6599

Ranking evaluation under EASE-based hard negatives. DUET achieves NDCG@10 of 0.7025 on Amazon Music — the strongest result across all methods and cutoffs.

Configuration	Yelp		Amazon Music		Amazon Books
	MAE↓	Acc↑	MAE↓	Acc↑	MAE↓	Acc↑
10H — History Only	1.1235	23.17	0.9102	39.26	0.9314	37.63
+ Profile Generation	0.7218	55.48	0.6597	58.67	0.6764	57.14
+ Cue & Strategy Layer	0.7085	55.83	0.5708	58.91	0.6389	58.43
+ Joint Optimization (LG-style)	0.6632	56.18	0.4737	62.37	0.5821	59.35
Full DUET — Cue + Strategy + Joint Opt.	0.5126	61.23	0.3937	67.96	0.4612	64.38

Each component contributes. Profile generation alone provides the largest accuracy jump. Combining all three stages achieves the best results on every dataset.

Setting	Yelp		Amazon Music		Amazon Books
	MAE↓	Acc↑	MAE↓	Acc↑	MAE↓	Acc↑
DUET w/o RL	0.8283	48.53	0.7322	57.18	0.8741	51.83
DUET (full, with RL)	0.5126	61.23	0.3937	67.96	0.4612	64.38

RL is essential. Removing the RL optimization causes Yelp accuracy to drop from 61.23% to 48.53% (−12.7 pp), demonstrating that the gains cannot be attributed to prompt design alone. RL enables adaptive exploration of effective profile construction strategies under real recommendation feedback.

History Length	Yelp		Amazon Music		Amazon Books
	MAE↓	Acc↑	MAE↓	Acc↑	MAE↓	Acc↑
10H + 30 profiles	0.5126	61.23	0.3883	67.96	0.4612	65.13
10H + 50 profiles	0.4909	62.43	0.3924	67.88	0.4553	64.62
10H + 70 profiles	0.4987	61.98	0.3937	68.22	0.4608	64.38

Moderate history length (30–50 interactions) achieves competitive or best results on most metrics. Excessive histories can introduce noisy signals that slightly degrade performance on Yelp and Amazon Books.

Case Study

Profile alignment in action

DUET distills fragmented user history and sparse item reviews into semantically aligned profiles — capturing the shared funk/soul connection that raw history alone would miss.

Andre Grindle is a dedicated music enthusiast with a deep appreciation for funk, soul, and progressive rock, particularly drawn to artists who blend technical mastery with genre-defining innovation. His reviews consistently highlight albums that showcase musical complexity, lyrical depth, and historical significance, often referencing artists like Stevie Wonder, Prince, and Rush. He values authenticity and artistic vision, frequently praising albums that resist trend-chasing and prioritize creative integrity. His detailed, introspective reviews suggest a preference for nuanced analysis over superficial praise, with a focus on emotional resonance, production quality, and the artist's evolution.

"Worlds Away" by Pablo Cruise is celebrated as a peak-era funk-rock album that exemplifies the band's ability to merge polished production with genre-blurring creativity. Users consistently praise its tight musicianship, genre-defying sound, and the band's ability to craft catchy, emotionally resonant tracks. The album's appeal lies in its seamless fusion of rock, funk, and pop. The majority highlight its historical significance as a defining work of the 1970s rock-funk movement. The album resonates most with fans of 1970s progressive rock, genre-defying artists, and listeners seeking well-crafted, emotionally engaging music.

Semantic correspondence: The user profile highlights funk/soul affinity and emphasis on historical significance; the item profile independently characterizes the album as a defining funk-rock work of the 1970s. DUET's joint optimization produces this alignment automatically — without any hard-coded templates or attribute lists.

Figure 4. The highlighted regions demonstrate that user preferences summarized in the user profile align with the key attributes extracted in the item profile. DUET captures the meaningful preference–attribute correspondence that is difficult to recover from individual reviews alone.

Semantic Analysis

Beyond task performance

Two complementary metrics confirm that DUET profiles exhibit genuine semantic structure rather than serving as incidental textual artifacts.

Semantic Alignment

Embedding-level cosine similarity between generated user and item profiles using all-mpnet-base-v2. Higher values indicate stronger semantic compatibility between modeled user preferences and item characteristics.

0.638

Yelp

0.595

Music

0.729

Books

Highest across all methods on every dataset.

Coverage — Faithfulness

Token-level overlap between generated profiles and input histories, quantifying how much of the profile is grounded in historical evidence. DUET maintains mid-to-high coverage while achieving superior alignment — the best balance of abstraction and evidence preservation.

40%

User Cov.

45%

Item Cov.

↑

vs. LG

Amazon Music results; comparable or better coverage across all datasets.

Robustness under Preference Diversity

Users are partitioned into three groups by rating variance (stable → diverse). DUET's performance degrades gradually rather than catastrophically as preference diversity increases — from 71.76% accuracy (stable, Yelp) to 51.13% (diverse), indicating that the framework remains stable under heterogeneous or noisy interaction histories. Amazon Music shows particularly robust behavior, suggesting that music domain preferences are less sensitive to history noise.

Citation

Cite this work

If DUET is useful for your research, please consider citing our paper.

@misc{chen2026duetjointexplorationuser,
  title         = {DUET: Joint Exploration of User Item Profiles
                  in Recommendation System},
  author        = {Yue Chen and Yifei Sun and Lu Wang and
                  Fangkai Yang and Pu Zhao and Minjie Hong and
                  Yifei Dong and Minghua He and Nan Hu and
                  Jianjin Zhang and Zhiwei Dai and Yuefeng Zhan and
                  Weihao Han and Hao Sun and Qingwei Lin and
                  Weiwei Deng and Feng Sun and Qi Zhang and
                  Saravan Rajmohan and Dongmei Zhang},
  year          = {2026},
  eprint        = {2604.13801},
  archivePrefix = {arXiv},
  primaryClass  = {cs.IR},
  url           = {https://arxiv.org/abs/2604.13801},
}