ACL 2026

DUET: Joint Exploration of
User–Item Profiles in Recommendation

Template-free, interaction-aware profile generation via reinforcement learning — aligning users and items in a shared semantic space.

Yue Chen1,*,  Yifei Sun1,*,  Lu Wang2,†,  Fangkai Yang2, Pu Zhao2, Minjie Hong3,
Yifei Dong4, Minghua He2, Nan Hu2, Jianjin Zhang2, Zhiwei Dai2,
Yuefeng Zhan2, Weihao Han2, Hao Sun2, Qingwei Lin2, Weiwei Deng2,
Feng Sun2, Qi Zhang2, Saravan Rajmohan2, Dongmei Zhang2

1Peking University  ·  2Microsoft  ·  3Zhejiang University  ·  4KTH Royal Institute of Technology
*Equal contribution    Corresponding author

Research Paper

What is DUET?

A closed-loop framework for jointly generating aligned user and item profiles for recommendation — without hand-crafted templates.

Trained end-to-end with reinforcement learning using downstream recommendation performance as the reward signal.

 3 Real-world Datasets  Qwen3-8B & LLaMA3-8B  State-of-the-Art Results

Traditional recommendation systems represent users and items as dense vectors and learn to align them in a shared latent space for relevance estimation. Recent LLM-based recommenders instead leverage natural-language representations that are easier to interpret and integrate with downstream reasoning modules.

This paper studies how to construct effective textual profiles for users and items, and how to align them for recommendation. A central difficulty is that the best profile format is not known a priori: manually designed templates can be brittle and misaligned with task objectives. Moreover, generating user and item profiles independently may produce descriptions that are individually plausible yet semantically inconsistent for a specific user–item pair.

We propose DUET, an interaction-aware profile generator that jointly produces user and item profiles conditioned on both user history and item evidence. DUET follows a three-stage procedure: it first turns raw histories and metadata into compact cues, then expands these cues into paired profile prompts to generate profiles, and finally optimizes the generation policy with reinforcement learning using downstream recommendation performance as feedback.

Key result: Experiments on three real-world datasets (Yelp, Amazon Music, Amazon Books) show that DUET consistently outperforms strong baselines, demonstrating the benefits of template-free profile exploration and joint user–item textual alignment.

Core Insight

Why joint profiling matters

Independently generated profiles may amplify incompatible facets of the same user–item pair, obscuring the true relevance signal.

Independent Generation Semantic Mismatch
User Profile
Fan of HEAVY METAL and PUNK. Prefers dark themes.
Item Profile
A highly rated POP-ROCK album with polished production.

The two profiles focus on incompatible aspects, hiding the shared funk/soul connection and producing a misleading relevance signal.

DUET — Joint Alignment Semantic Alignment ✓
User Profile
Enthusiast of FUNK and SOUL. Values technical mastery.
Item Profile
A peak-era FUNK-ROCK album with genre-blurring creativity.

DUET reconciles both sides into a compatible interpretation, surfacing the shared funk affinity and enabling accurate relevance estimation.

DUET vs Independent Generation

Figure 1. DUET aligns raw user and item data by transforming them into textual profiles within a shared semantic space.

Contributions

Three core contributions

01

Text-Based User–Item Alignment

Represent both users and items as natural-language profiles and align them in a shared semantic space, extending the classic vector-based alignment principle to interpretable textual representations fully compatible with LLMs.

02

Template-Free Exploration Framework

Start from cue-based initialization, expand cues into candidate profile prompts, and jointly optimize user and item profiles with downstream RL feedback — no rigid templates or hand-crafted attributes required.

03

State-of-the-Art Performance

Extensive experiments across three real-world datasets with two backbone LLMs show DUET consistently outperforms all baselines, validating both joint profiling and feedback-driven profile optimization.

Framework

The DUET Framework

A closed-loop framework that transforms raw user–item interaction histories into performance-aligned textual profiles through three learned stages — all realized in a single seq-to-seq forward pass at inference time.

1

Cue-Based Initialization

Raw user histories and item metadata are distilled into minimal cues — concise hypotheses highlighting one potential preference or characteristic. These act as lightweight seeds for profile exploration, deliberately underspecified to allow subsequent discovery.

"enjoys retro puzzle games"
"prefers concise product reviews"
"lightweight trail-running shoes"
2

Joint Exploration via Profile Prompt Discovery

Rather than directly summarizing, the model generates an intermediate constructed_prompt — a natural-language instruction defining format, abstraction level, and attribute selection. Conditioned on this prompt, user and item profiles are generated jointly.

Cue → Profile Prompt S → Profile
O = [Cue → S → Profile]
Single-pass seq-to-seq generation
3

On-Policy RL Optimization (GRPO)

Profiles are consumed by a frozen downstream recommender. The continuous fractional reward Rperf measures prediction accuracy and drives GRPO optimization, reinforcing profile constructions that yield better recommendations.

R(u,i) = 1 − |yui − ŷui| / M
Optimized with GRPO (DeepSeek-AI, 2025)
Downstream model f is frozen throughout
DUET Pipeline Overview

Figure 2. Overview of the DUET framework. Three stages — Cue-Based Initialization, Joint Exploration via Adaptive Profile Prompt Discovery, and On-Policy Optimization — are unified into a single generation pass. The downstream task environment provides a continuous reward signal for optimizing profile quality.

Single-Pass Efficiency

All three stages — cue extraction, profile prompt construction, and profile generation — are realized in a single sequence-to-sequence forward pass at inference time, introducing no additional latency compared to standard profile generation methods.

Formulation as On-Policy RL

Profile generation is treated as an on-policy RL problem where the state is s = {Hu, Hi}, the action is the joint generation sequence, and quality is evaluated solely by functional utility in a fixed recommendation environment — no textual ground-truth profiles required.

Experiments

Experimental results

Evaluated on Yelp, Amazon Music, and Amazon Books using Qwen3-8B and LLaMA3-8B as both the profile generator and the downstream recommender.

Yelp
61.2%
Accuracy · Qwen3-8B
+5.05 pp vs. best baseline
Amazon Music
68.0%
Accuracy · Qwen3-8B
+5.59 pp vs. best baseline
Amazon Books
64.4%
Accuracy · Qwen3-8B
+5.03 pp vs. best baseline
Method Yelp Amazon Music Amazon Books
MAE↓RMSE↓Acc↑F1↑ MAE↓RMSE↓Acc↑F1↑ MAE↓RMSE↓Acc↑F1↑
Qwen3-8B
10H (History Only) 1.12351.947823.1727.54 0.91021.402139.2646.58 0.93141.452737.6345.19
KAR (Xi et al., 2024) 0.73961.218455.3448.67 0.74831.138058.6560.29 0.70981.092356.1758.78
RLMRec (Ren et al., 2024) 0.81971.331247.1542.46 0.74381.106954.8957.65 0.78121.158452.8655.93
PALR (Yang et al., 2023) 0.79941.287648.5343.19 0.60750.953157.3556.77 0.74851.118754.2456.38
LettinGo (Wang et al., 2025) 0.66321.104756.1848.95 0.47370.883462.3757.09 0.58210.941659.3560.57
Reason4Rec (Fang et al., 2025) 0.70281.152355.6947.73 0.56540.963558.6954.67 0.63971.009858.4756.84
DUET (Ours) 0.51260.948561.2355.18 0.39370.756467.9663.89 0.46120.908964.3859.27
LLaMA3-8B
10H (History Only) 1.08641.953222.0927.30 0.79171.334638.1346.87 0.80641.386637.1545.27
KAR (Xi et al., 2024) 0.64271.166854.5147.98 0.57260.903357.5359.92 0.58920.961455.8758.21
RLMRec (Ren et al., 2024) 0.74281.357246.7442.11 0.60760.988653.7857.42 0.62260.947752.1255.79
PALR (Yang et al., 2023) 0.72381.326547.7243.29 0.58230.922256.7359.31 0.59770.885555.0657.62
LettinGo (Wang et al., 2025) 0.61961.128956.0351.24 0.52040.936961.9259.50 0.55430.796758.9560.39
Reason4Rec (Fang et al., 2025) 0.75861.041855.8053.00 0.54420.772260.8654.88 0.60290.834559.7056.35
DUET (Ours) 0.53670.968760.8754.74 0.46800.827763.3060.60 0.50920.950063.4258.12

Best values shown in green bold. DUET (highlighted rows) consistently outperforms all baselines across both backbone LLMs and all three datasets.

Method Yelp Amazon Music Amazon Books
NDCG@1NDCG@5NDCG@10 NDCG@1NDCG@5NDCG@10 NDCG@1NDCG@5NDCG@10
10H0.18230.28150.49280.18750.37960.51530.18410.31460.4263
KAR0.21560.32980.54120.30180.48960.60150.29650.47150.5834
RLMRec0.24190.34720.55870.33710.54340.61620.27480.45260.5719
PALR0.24940.35630.56910.33950.52470.61150.26270.46340.5538
LettinGo0.31870.46850.58140.40120.56740.64890.37950.51890.6284
Reason4Rec0.25750.37920.55260.29280.59120.63430.30130.49280.5959
DUET (Ours) 0.33900.48730.6008 0.51230.61650.7025 0.42880.56380.6599

Ranking evaluation under EASE-based hard negatives. DUET achieves NDCG@10 of 0.7025 on Amazon Music — the strongest result across all methods and cutoffs.

Configuration Yelp Amazon Music Amazon Books
MAE↓Acc↑ MAE↓Acc↑ MAE↓Acc↑
10H — History Only1.123523.170.910239.260.931437.63
+ Profile Generation0.721855.480.659758.670.676457.14
+ Cue & Strategy Layer0.708555.830.570858.910.638958.43
+ Joint Optimization (LG-style)0.663256.180.473762.370.582159.35
Full DUET — Cue + Strategy + Joint Opt. 0.512661.23 0.393767.96 0.461264.38

Each component contributes. Profile generation alone provides the largest accuracy jump. Combining all three stages achieves the best results on every dataset.

Setting Yelp Amazon Music Amazon Books
MAE↓Acc↑ MAE↓Acc↑ MAE↓Acc↑
DUET w/o RL0.828348.530.732257.180.874151.83
DUET (full, with RL) 0.512661.23 0.393767.96 0.461264.38

RL is essential. Removing the RL optimization causes Yelp accuracy to drop from 61.23% to 48.53% (−12.7 pp), demonstrating that the gains cannot be attributed to prompt design alone. RL enables adaptive exploration of effective profile construction strategies under real recommendation feedback.

History Length Yelp Amazon Music Amazon Books
MAE↓Acc↑ MAE↓Acc↑ MAE↓Acc↑
10H + 30 profiles0.512661.230.388367.960.461265.13
10H + 50 profiles0.490962.430.392467.880.455364.62
10H + 70 profiles0.498761.980.393768.220.460864.38

Moderate history length (30–50 interactions) achieves competitive or best results on most metrics. Excessive histories can introduce noisy signals that slightly degrade performance on Yelp and Amazon Books.

Case Study

Profile alignment in action

DUET distills fragmented user history and sparse item reviews into semantically aligned profiles — capturing the shared funk/soul connection that raw history alone would miss.

 Generated User Profile
Andre Grindle is a dedicated music enthusiast with a deep appreciation for funk, soul, and progressive rock, particularly drawn to artists who blend technical mastery with genre-defining innovation. His reviews consistently highlight albums that showcase musical complexity, lyrical depth, and historical significance, often referencing artists like Stevie Wonder, Prince, and Rush. He values authenticity and artistic vision, frequently praising albums that resist trend-chasing and prioritize creative integrity. His detailed, introspective reviews suggest a preference for nuanced analysis over superficial praise, with a focus on emotional resonance, production quality, and the artist's evolution.
 Generated Item Profile
"Worlds Away" by Pablo Cruise is celebrated as a peak-era funk-rock album that exemplifies the band's ability to merge polished production with genre-blurring creativity. Users consistently praise its tight musicianship, genre-defying sound, and the band's ability to craft catchy, emotionally resonant tracks. The album's appeal lies in its seamless fusion of rock, funk, and pop. The majority highlight its historical significance as a defining work of the 1970s rock-funk movement. The album resonates most with fans of 1970s progressive rock, genre-defying artists, and listeners seeking well-crafted, emotionally engaging music.

Semantic correspondence: The user profile highlights funk/soul affinity and emphasis on historical significance; the item profile independently characterizes the album as a defining funk-rock work of the 1970s. DUET's joint optimization produces this alignment automatically — without any hard-coded templates or attribute lists.

Case Study — Profile Alignment

Figure 4. The highlighted regions demonstrate that user preferences summarized in the user profile align with the key attributes extracted in the item profile. DUET captures the meaningful preference–attribute correspondence that is difficult to recover from individual reviews alone.

Semantic Analysis

Beyond task performance

Two complementary metrics confirm that DUET profiles exhibit genuine semantic structure rather than serving as incidental textual artifacts.

Semantic Alignment

Embedding-level cosine similarity between generated user and item profiles using all-mpnet-base-v2. Higher values indicate stronger semantic compatibility between modeled user preferences and item characteristics.

0.638
Yelp
0.595
Music
0.729
Books

Highest across all methods on every dataset.

Coverage — Faithfulness

Token-level overlap between generated profiles and input histories, quantifying how much of the profile is grounded in historical evidence. DUET maintains mid-to-high coverage while achieving superior alignment — the best balance of abstraction and evidence preservation.

40%
User Cov.
45%
Item Cov.
vs. LG

Amazon Music results; comparable or better coverage across all datasets.

Robustness under Preference Diversity

Users are partitioned into three groups by rating variance (stable → diverse). DUET's performance degrades gradually rather than catastrophically as preference diversity increases — from 71.76% accuracy (stable, Yelp) to 51.13% (diverse), indicating that the framework remains stable under heterogeneous or noisy interaction histories. Amazon Music shows particularly robust behavior, suggesting that music domain preferences are less sensitive to history noise.

Citation

Cite this work

If DUET is useful for your research, please consider citing our paper.

@misc{chen2026duetjointexplorationuser,
  title         = {DUET: Joint Exploration of User Item Profiles
                  in Recommendation System},
  author        = {Yue Chen and Yifei Sun and Lu Wang and
                  Fangkai Yang and Pu Zhao and Minjie Hong and
                  Yifei Dong and Minghua He and Nan Hu and
                  Jianjin Zhang and Zhiwei Dai and Yuefeng Zhan and
                  Weihao Han and Hao Sun and Qingwei Lin and
                  Weiwei Deng and Feng Sun and Qi Zhang and
                  Saravan Rajmohan and Dongmei Zhang},
  year          = {2026},
  eprint        = {2604.13801},
  archivePrefix = {arXiv},
  primaryClass  = {cs.IR},
  url           = {https://arxiv.org/abs/2604.13801},
}