本文 6 個互動圖表在手機上以重點摘要呈現，互動版請以桌面瀏覽器開啟。

把「朋友看過、朋友按過 love」這件事塞進 Reels 推薦系統聽起來像個前端功能——多 fetch 幾個 metadata、加個 bubble UI，半個 sprint 完事。直到你發現要把它做對，得先在 trillions 級的 person-to-person edges 上每週重跑一次 closeness inference，並把 friend-bubble metadata 釘進原本就極度敏感的 video prefetch window 裡。

Reel Friends 的多層 retrieval——Meta 怎麼把 social graph + embedding 拼成十億級 candidate set

Meta 在 5 月 13 日這篇〈Reel Friends: Building Social Discovery that Scales to Billions〉，把先前 3 月那篇 Friend Bubbles 工程文裡藏著的 retrieval / ranking / online learning 三層架構，往「為什麼這麼設計」的方向講開來。重點不是「Reels 加了一個朋友推薦功能」這個 product 敘事，而是當一條 social signal 要進到本來就以 video embedding 為核心的 recommender，怎麼避免它退化成「在 ranker 後面塞一個 if friend then boost」的廉價拼裝。

先講一句話的 mental model：Reel Friends 不是獨立的 recommender，它是一條穿過 Reels 既有 retrieval / ranking 漏斗的「social-aware 旁路」。旁路上的每一個 stage，都得在不破壞主漏斗 SLA 的前提下，把 friend-aware 的 candidate 跟 friend-aware 的 signal 塞進去。整個工程的精彩之處，就在於「塞進去」這四個字背後的全部 trade-off。

整套系統拆成五個 component：closeness model、retrieval、MTML ranker、video prefetch window、online learning loop。下面那張可點擊的 architecture diagram 是讀這篇文的單一入口——點任一個 box 看它的責任跟它刻意不知道什麼。剩下的章節各自展開三個非顯然的設計決策、那個倒推架構形狀的 forcing function、以及 component 之間的 contract 為什麼是真正的工程價值。

五個 component 的責任邊界

closeness model · responsibility

Output a per-edge closeness score over the Facebook friend graph. 兩個 complementary model 並排：survey-based（社交圖結構 + 使用者屬性，週級 batch 跑遍 trillions of edges）跟 context-specific（likes / comments / reshares 的 streaming signals）。

Does not know：哪些 candidate 之後會被 retrieve、哪些 reel 會被 surface。它只交付 closeness vector。

兩支拆開是為了同時兼顧 structural stability 跟 behavioral freshness。

retrieval · responsibility

從 closeness 高的 top-K friends 抓他們最近互動過的 reels，組出 candidate pool。先 social hard filter 再 embedding soft score——順序不能反。

Does not know：candidate 的 final ranking、ranker 用哪些 feature。它只負責 recall。

朋友按過的 reel 進 pool 的機率是 1，不是 ANN 的 0.97——這是 social discovery 跟 friend boost 的本質差。

MTML ranker · responsibility

multi-task multi-label，同時 predict watch time / like / love / share / follow / integrity。同一個 model 部署在 early-stage（cheap features，pool 還大）跟 late-stage（expensive features，pool 已縮）。loss 用 P(engagement | bubble impression) 條件機率形式，把 selection bias 數學上吸收掉。

Does not know：candidate 為什麼進 pool、closeness 是怎麼算的。對 friend / non-friend candidate 用同一套 model 評分。

expressive reactions (love / laughter) 比 simple like 更強，label weight 不平均。

video prefetch window · responsibility

把 friend-bubble metadata「pinned to that same prefetch window」——隨 video bytes 一起被預取，不另開 request。整個 server-side ranking 必須塞進這個窗口。三條 nonnegotiable：smooth scrolling、no load latency regression、low CPU overhead for metadata fetch。

Does not know：上游 model 細節。它只在意 wall-clock budget 有沒有被吃光。

所有上游 component 的 latency budget 都是被這個 window 倒推出來的 forcing function。

online learning loop · responsibility

interaction event 進 streaming pipeline，更新 closeness (context-specific 那一支)、ranker、integrity head 三處。conditional 形式讓 update 不會把「我自己偏好」自我強化。integrity 用 auxiliary loss 跟 engagement jointly trained——representation level 對齊，不是 post-filter。

Does not know：ranking 是怎麼排出來的。它只看 (impression, interaction) pair。

解耦讓 ranker 跟 learning loop 可以分別替換。

互動圖表

五層架構：social signal 是第一層 hard filter，而非後置 boost，順序決定 ANN 算力不被浪費。

讀完之後你應該能清楚回答這幾個問題：closeness model 為什麼要兩個版本而不是一個；retrieval 為什麼是分層而不是單一 ANN lookup；MTML ranker 為什麼要 early-stage 跟 late-stage 各跑一次；以及為什麼整套系統的工程瓶頸不在 GPU、不在 embedding 維度，而在那個「video prefetch window」的幾百毫秒。

Decision 1：closeness 為什麼是兩支 model

structural 那支（survey-based）的 label 來自使用者調查直接問「跟這個人有多熟」，input 是 social-graph features——mutual friends、connection strength、interaction patterns。文章裡那句「weekly inference over trillions of person-to-person connections」就在講這支：以週為單位跑遍 Facebook friend graph 所有 edges，把分數 materialize 進 feature store。trillions edges 用 weekly cadence 才壓得下成本；想做到日級，運算成本會線性放大。

但 weekly 對「昨天剛加好友」「最近三天密集互動」的場景太慢。context-specific 那支讀 likes / comments / reshares 的 streaming signal，更新得比週快得多，補上 fast-moving 的那一段。兩支不是替代關係：最近沒互動的好朋友，context-specific 信號近零，但 survey-based 仍給出 baseline——你姊還是你姊，即使你們上週沒按彼此貼文。把這個 baseline 拿掉的系統，會在 cold case（少互動但實際很熟）跟 bursty case（最近多互動但其實只是短暫）之間反覆過擬合。

等同於把 representation 拆成兩個 frequency band，再各自找最適合的 model class 跟 training cadence。這個拆分還有 evaluation 上的好處：survey-based 有明確的 ground truth（使用者調查回答），可以直接算 prediction error；context-specific 沒有 explicit label，品質要靠 downstream metric 回推。揉成一個 model 的話，survey ground truth 會被 downstream metric 稀釋——你會發現 model 越練越「迎合 engagement」、跟使用者實際標的 closeness 慢慢偏離，但因為 downstream metric 在改善，team 不會立刻發現偏離。分開之後，structural 那一支可以用 supervised metric 直接 monitor，behavioral 那一支用 downstream proxy——兩條 evaluation pipeline 互相校準，比單一 pipeline 健壯得多。

retrieval 階段拿到的 closeness score，是這兩個 signal 的融合——不是簡單相加，而是針對「這個 viewer 對這個 friend 的當下熟悉程度」做的 conditional 預測。融合方式文章沒明說，但從 product 行為推測，比較可能是加權平均加 cold-start fallback（context-specific 信號近零時退回 structural 的 baseline，而不是把整個 score 拉到零）。

trillions of edges 這個規模本身也決定了 model class。在 trillion-edge graph 上跑一輪 GNN training，message passing 的 memory footprint 是天文數字，週級內收斂很困難。所以 survey-based 那支幾乎不可能是 vanilla GNN——更可能是 hybrid：先用便宜的 graph 特徵（mutual friends count、Adamic-Adar、Jaccard similarity）做 feature engineering，再用 MLP / GBDT 做 supervised learning。GNN 部分如果有，也只用在比較密集的 sub-graph 上，而不是整張 trillion-edge graph。

Decision 2：retrieval 為什麼是分層而不是 single ANN

誘惑性的設計：把 (viewer, friend) pairs 跟 video 都壓進同一個 embedding space，retrieval 變一次 approximate nearest neighbor lookup。paper 上乾淨，production 上會死得很快——social-graph 是 sparse 且 discrete 的（人跟人之間要嘛是朋友要嘛不是，朋友之間的熟度是有界 scalar），content similarity 是 dense 且 continuous 的（任何兩個 reel 之間都有非零 cosine similarity）。塞進同一個 embedding 等於要求 representation 同時 encode 兩種完全不同的 metric structure，最後通常是兩邊都做不好。實務上常見的失敗模式是 social signal 被 content signal 淹沒——因為 content embedding 的 magnitude 大、覆蓋面廣，social 那一塊在整體分數裡變成 noise floor，再也不重要。

Reels 改成分層：先用 social graph 把 candidate 集合縮到「跟你 closeness 高的 friends 最近互動過的 reels」這個尺寸可控的子集，然後在這個子集上才跑 content-based embedding 比對。social graph 是第一層 hard filter，embedding 是第二層 soft scorer——順序顛倒會把 ANN 算力浪費在會被 social filter 砍掉的 candidate 上。

第一層的 social filter 是 hard rule，不會有 ANN 那種 recall-precision trade-off：朋友按過某個 reel，這個 reel 進 candidate pool 的機率是 1，不是 0.97。對 social discovery 這個 product 來說，這個 guarantee 比 embedding 的 generalization 重要得多——使用者預期看到「我朋友看的東西」，不是「跟我朋友看的東西很像的東西」。分層還帶來 evaluation 解耦：retrieval 的 metric 是 recall@K，ranker 的 metric 是 ranking quality；同一個系統時這兩個 metric 會打架，分層之後各自 optimize。

下面這個 slider widget 把規模算給你看：top-K friends 跟 per-friend recent reels 兩個 knob，乘起來決定 raw social pool 的 cartesian size。Meta 月活約 30 億，所以 worst-case 全平台 viewer-candidate pair 是十億甚至兆級。當然絕大多數 pair 不會 materialize 成實際 retrieval request，但 system design 必須能 handle 這個 footprint。

拖兩個 slider：top-K friends × 每個 friend 最近一週互動過的 reels，乘起來是每個 viewer 的 raw social pool。Meta 月活約 30 億，所以 worst-case viewer-candidate cartesian product 在十億～兆級。

top-K close friends per viewer 100 recent reels per friend (last 7d) 20

per-viewer pool2,000

× 30 億 viewer6.0 × 10¹²

need to shrink ÷40× before ranker

互動圖表

K=100 × R=20 = 2,000 candidates；全球最壞情況 6×10¹²，social filter 必須先於 ANN。

把 K = 100、R = 20 當 baseline，raw pool 就 2000；乘以三十億月活，每個 request 的 worst-case storage / I/O footprint 落在十二兆級。social hard filter 必須在最前面，是因為它能用 indexed lookup 把這個 cartesian product 砍到 single-digit ms 內收斂；之後 embedding scorer 的 ANN 算力，只花在收斂後的小 pool 上。embedding scorer 本身是 two-tower 變體：viewer + friend context 為一塔，video 為另一塔，offline 把所有 video tower output 預算好，online 只算 viewer tower 跟做 dot product。把這個 trade-off 弄反——先 ANN 再 social filter——意味著你要先付出十億級的 dot product 算力，再砍掉其中 99% 的 candidate，這在 production 上根本撐不住。

還有一個分層帶不出來但必須處理的場景：冷啟動使用者、剛加入平台的帳號、朋友很少的人——這些情況下 social filter 會輸出極小的 pool 甚至空集合，系統需要 fall back 到純 content-based retrieval。fall back 太快會讓 social 體驗稀釋，太慢會讓冷啟動使用者看到空頁。從產品形態推測，bubble UI 應該是 conditional 顯示——只在 social filter 有夠多 candidate 時才出現，否則整個 bubble feature 對該使用者就 silent disable。

Decision 3：MTML ranker 的 conditional loss

MTML = multi-task multi-label：同一個 model 同時 predict 好幾個目標（watch time、likes、comments、shares、follows），每個 task 一個 head，shared backbone 共用 representation。multi-label 是每個 sample 可以同時觸發多個 label——一個 reel 可能被 watch 完 + 被按 love + 被 share 三個 label 同時為 1。揉在一起的好處是 shared representation 被多個 supervision signal 同時拉著，比起單目標 model 更不容易 overfit 任何一個 metric 的 noise。

部署在 early-stage 跟 late-stage，shared backbone——early 用 cheap features 砍 pool 從幾千到幾十，late 用 expensive real-time features 精排。雙部署的關鍵好處是 representation 一致：不會出現「early-stage 認為第三名、late-stage 完全不認得它」的 cascade drift。這是經典 cascade ranking 痛點——上下游 model 各自獨立訓練時，下游永遠在處理一個被上游意外篩掉一半好 candidate 的 input distribution。共享 backbone 換來的就是這個一致性。

真正非顯然的是那個 loss：P(video engagement | bubble impression)。為什麼是條件機率而不是 joint？因為 joint P(engagement, bubble impression) 會學到「凡是進 bubble 的 reel 都比較容易被 engage」這種 surface correlation——但這只是 selection bias，不是 bubble 本身的 causal effect。conditional 形式逼 model 在「已經知道是 bubble surface 的」前提下學「viewer 在這個 specific 條件下到底會不會 engage」，把 social filtering 帶來的 selection bias 數學上吸收進 conditioning variable。實作層面這需要 logging system 完整記錄每個 bubble impression（包含 positive 跟 negative），不能只 log 有 interaction 的事件。

還有一個 label engineering 細節：文章特別指出「expressive reactions such as love or laughter drive stronger downstream engagement than simple likes」。like 的 cost 太低——使用者點 like 幾乎不思考，signal noise 比例高；love / laugh 需要多一個 click，這個 marginal cost 過濾掉很多隨手的 like，留下來的都是「真的覺得這個 reel 不錯」的人。從 information theory 角度看，cost 較高的 signal carry 較高的 information per event，跟留存率的 correlation 也更強。MTML 的 multi-label 設計剛好讓 model 同時看到 like 跟 love，自動學到兩者之間 information density 的差距，並把更高 weight 給更稀有但更可靠的 signal。

還有一個容易被外界誤解的點：MTML 的真正力量不在「同時 optimize 多個目標」這句話本身，而在 multi-task learning 帶來的 regularization effect——多個 task 共用 backbone 會迫使 representation 學到一個泛化到所有 task 的 latent space，而不是 overfit 到任何單一 task 的 noise。這對 ranker 特別關鍵，因為任何單一 engagement metric 都有 reward hacking 的風險：為了 maximize watch time，model 可能會傾向推超長 reel；為了 maximize like，model 可能會傾向推 clickbait。multiple supervision 互相牽制，讓 representation 不容易被任何單一 metric 帶歪。

下面這張圖把兩種訓練方式的 long-term 行為畫出來——unconditional model 在 online learning loop 下會 collapse 到 fixed point（永遠推朋友看過的東西），conditional model 維持 stable。這不是 paper 報的真實數字，是把 feedback-loop 的 dynamics 用 logistic + selection bias 的 toy 模型跑出來給你看 attractor 長什麼樣。

toy simulation：兩個 ranker 各跑 200 個 online learning step。x 軸是 step，y 軸是 surface-content diversity（不同 reel 類型在 top-10 surface 的 entropy，1.0 = max diversity, 0 = 全推同一類）。

conditional P(engage | bubble impression) unconditional P(engage, bubble impression)

互動圖表

unconditional ranker ~140 步後 diversity 崩至 0.12；conditional 穩定在 0.85，防止 bubble。

collapse 不是 bug 是 attractor——只要 sample selection 跟 reward signal 之間有 feedback loop，沒有 conditioning 的 unconditional model 在足夠 online steps 之後一定走到這個 fixed point。實作上看起來是 metric 慢慢退化、品質慢慢下降，底下是熱力學意義上的不可逆過程。Reel Friends 把 ranker 寫成條件機率，等同於把 selection bias 從 representation 上隔離，online update 不會把「我自己偏好」當成「使用者偏好」自我強化。

integrity 跟 engagement 在這個 MTML 內 jointly trained——同一個 backbone 之外有 engagement head 跟 integrity head。joint train 而不是 post-filter 的好處是 representation level 的對齊：backbone 同時要服務 engagement prediction 跟 integrity classification，逼著它不能學到「engagement 很高但有 integrity 風險」的 short-cut representation。post-filter 的做法則是先學純 engagement model 再用獨立 classifier 過濾——這在 distribution 邊緣會反覆失敗，因為 engagement model 會把 representation 學到 integrity classifier 的 decision boundary 上。inference time 通常還有 integrity veto：只要 integrity score 超過 threshold，不管 engagement score 多高都不 surface。

Forcing function：video prefetch window

整套架構在 paper 上看像是 layered design 出於 ML clarity；實際上它是被 prefetch window 這個 hardware-level 的 wall clock 預算 forced 出來的。Reels 的 video prefetch window 是一個既有的、被精細優化過的時間窗——大概在數百毫秒等級，包含 ranker 決定下一支影片、CDN edge 把 video bytes 推到 client、metadata 隨包附帶。friend-bubble 的所有 server-side 計算必須在這個 window 裡完成。

三條 nonnegotiable 倒推回來決定所有事：smooth scrolling 要求 main thread 不能 block；no regression in load latency 要求 metadata + bubble UI 全部塞進原本就為 video prefetch 預留的時間窗；low CPU overhead for metadata fetch 要求 client-side parsing 極簡。具體做法是把 friend-bubble metadata「pinned to that same prefetch window」——隨 video bytes 一起被預取，不另開 request、不另開 TCP 連線、不另開 endpoint。pin 跟 attach 不一樣——pin 是釘進去、不能分開。

這直接約束上游每個 component：closeness 不可能 online 推理，必須 precomputed feature lookup；retrieval 第一層必須是 indexed lookup 而不是 graph traversal——你不可能在 prefetch window 裡跑「BFS 我的好友、抓他們最近活動」的 graph query；embedding 比對必須走 ANN 而不是 exact search；ranker 的 early-stage 必須用 cheap features。所有 layering、所有 precompute、所有 freshness trade-off，都是為了讓最後一段 online compute 能塞進這個 wall clock。下面這張表把每個 signal 的 freshness band / storage / compute / latency budget 對齊看，點欄位標題可排序。

每個 signal 必須住在跟它 freshness 匹配的 storage / compute layer——錯位的成本是 latency 或 cost 二選一。點欄位標題可排序。
signal / stage	freshness band	storage / lookup	compute pattern	latency budget
closeness · structural	weekly	offline feature store	batch graph inference, trillions of edges	N/A (offline)
closeness · behavioral	minute–hour	streaming feature store	incremental update from interaction stream	N/A (offline)
retrieval · social filter	per-request	indexed lookup on precomputed top-K friends	exact filter, no ANN	~10–30 ms
retrieval · embedding scorer	per-request	ANN index on video embeddings	two-tower dot product, hundreds of candidates	~20–50 ms
ranker · early-stage	per-request	shared MTML backbone, cheap features only	shallow forward pass on full pool	~30–60 ms
ranker · late-stage	per-request	same MTML backbone + real-time features	full forward pass on shortlist	~40–80 ms
integrity head	joint-trained	auxiliary head on shared backbone	same forward pass, no extra inference	0 (free ride)
online learning · update	minute	streaming trainer + parameter server	incremental gradient on (impression, interaction) pairs	N/A (off path)

互動圖表

closeness 每週離線、behavioral 分鐘串流、filter ~30 ms、ranker ~80 ms；freshness 決定儲存層。

integrity head 的 latency budget 是 0——同一個 forward pass 順便輸出 integrity score，這是 representation-level joint training 在 serving 上的免費午餐。post-filter 的做法則要再切一塊 window 給獨立 classifier。closeness 的兩支也都是 N/A：request time 拿到的永遠是 lookup，不是 inference。把 closeness inference 放在 request path 上的 alternative 設計，會直接讓那 ~10 ms 變成 ~100 ms，等於吃掉 prefetch window 的 sizable fraction。

下面這個 latency 拖把把 per-request 預算逐段排出來，從 closeness lookup 到 metadata packaging。把整條總和拉到接近或超過 prefetch window 的紅線，看哪個區段最先失守。

拖每段的 ms 數，看 stacked bar 撞到 250 ms 紅線時哪個 stage 先把預算吃光。

total wall clock170 mswithin budget ✓

closeness lookup 10 retrieval · social filter 20 embedding scorer 30 ranker · early-stage 40 ranker · late-stage 60 metadata + network 10

互動圖表

上限 250 ms；各段合計 170 ms（10+20+30+40+60+10），尚在 prefetch window 內。

把 late-stage ranker 拉到 100 ms、把 embedding scorer 拉到 60 ms，total 就直接撞紅線。要救回來只能在更上游動刀——比如砍 top-K friends 從 100 降到 50（pool 變小，late-stage 用得起更便宜的 features）、或者把 closeness 從「精算後 lookup」改成「快取後 lookup」（lookup 從 10 ms 變 3 ms）。每一個 ms 的省都來自上游的設計決策，不是 late-stage 自己能解決的。

還有一個容易被忽略的細節：metadata 的 size 必須小到不會 measurably 拖慢 prefetch 的 transfer time。video bytes 動輒幾 MB，metadata 必須在 KB 等級才不會影響 percentile 級的 transfer latency。所以 metadata schema 必須極簡——大概就是「友圈成員的 user id 列表 + 各自的 reaction type + 一個 ordering hint」這種 wire format，client 直接拿來 render，不做任何二次計算。這條約束又反過來限制了 server-side 能塞什麼進 metadata：複雜的 ranking score、複雜的 explanation、複雜的 per-friend recommendation reason 都不行。

Online learning 跟 component contract

online learning 在 ranker 上的常見實作是 streaming gradient update：使用者對某個 bubble reel 做出 interaction，event 進 training pipeline，model weight 以接近 real-time 的速度更新。對 freshness 很重要——使用者今天突然開始喜歡 cooking reels，ranker 不必等到明天的 batch retrain 才知道。但 update rate 設太高會被噪聲帶歪——偶然多看了幾個 cooking reel，ranker 立刻 collapse 到只推 cooking。設太低又失去意義，跟 daily batch 沒差。實務上會用 EMA 平滑、或 importance sampling 把 streaming weight 調低。

另一個 failure mode 是 catastrophic forgetting：新進的 streaming sample 在 representation space 上太密集，舊的 representation 被沖刷掉。對 social discovery 特別危險——跟好朋友的長期關係是「舊 signal」，最近一週的爆炸性互動是「新 signal」，如果 model 對新 signal overfit 就會忘掉長期關係的重要性。緩解通常是 replay buffer——把舊的 sample 維持一個池子，每次 update 混合新舊一起 train。文章沒提，但任何成熟 online learning 系統都必備。

online learning 跟 integrity 的耦合也是 Reel Friends 比一般「給朋友推朋友的影片」工程上難得多的地方。純 product 角度的實作會把這兩件事拆開：一個 team 做 ranker、另一個 team 做 integrity filter。Meta 這套把它們 joint 起來，付出的代價是 model team 必須同時掌握 engagement metric 跟 integrity metric 的訓練細節；換來的好處是 representation 一致、online update 不會在兩個目標之間搖擺。

把五個 component 串起來看，整套系統能 long-term iterate 的關鍵 invariant 是「每個 component 刻意不知道相鄰 component 的細節」。closeness model 不知道 candidate 之後會被怎麼 retrieve；retrieval 不知道 candidate 的 final ranking；ranker 不知道某個 reel 為什麼進 pool，對 friend / non-friend candidate 用同一套 model 評分；online learning loop 不知道 ranking 是怎麼排出來的，只看 (impression, interaction) 配對。這幾組 cross-component invariant 看起來是 ML clarity，實際上是 organizational scalability——當你的 closeness model 知道太多 retrieval 內部細節，你就很難在不破壞 retrieval 的前提下改 closeness model；每改一處都要協調三個 team，最後沒人敢動任何東西。

反過來說，每個 component 的 internal complexity 可以很高：closeness model 內部可以是 hybrid GNN、retrieval 內部可以是 multi-stage cascade、ranker 內部可以是 hundreds-of-million 參數的 transformer。internal complexity 高沒關係，只要 interface 乾淨，每個 component 就可以 independently scale up 而不衝擊其他 team。這是分散式系統「small interface, large implementation」原則在 ML 系統上的對應。

把整套放在 naive vs real 的對照裡

naive 設計 Meta 實際做法 naive 為什麼會崩

"friend boost" 的 1 sprint 版本

所有事丟給既有 ranker，最後做 post-hoc rerank。

closeness：binary friend / non-friend，沒有 graded score
retrieval：照舊 content-based ANN，朋友信號完全不參與 candidate generation
ranking：既有 model 跑完，最後乘上一個 friend_boost = 1.2
integrity：另一個 post-filter classifier，獨立部署
online learning：照舊用同一個 ranker，沒有 conditioning

Reel Friends 的 representation-level 做法

從 first principles 進到每一層 model，joint train，joint serve。

closeness：survey-based + context-specific 雙 model，frequency-orthogonal 融合，weekly + streaming
retrieval：social hard filter 第一層，embedding soft score 第二層，順序不可顛倒
ranking：MTML early + late 雙部署，shared backbone，conditional loss P(engagement | bubble impression)
integrity：joint-trained auxiliary head，share backbone，representation 一致
online learning：streaming update 進 closeness / ranker / integrity 三處，conditional 形式吸收 selection bias

三個 collapse mode

不是「naive 設計能跑但效果差」——是「幾個月後 collapse 到退化解」。

collapse 1：binary closeness 讓久不互動的舊朋友跟天天 chat 的同事被同等對待，bubble 變雜訊
collapse 2：post-hoc boost 只能 rerank pool 內已有的 reel，pool 裡沒有的朋友 reel 永遠進不來——recall ceiling 被 content ANN 釘死
collapse 3：沒有 conditioning 的 online learning 把 selection bias 累積進 engagement model，ranker 慢慢 collapse 到「永遠推朋友看過的東西」

互動圖表

naive friend boost 幾個月內以 3 種方式崩潰；representation-level 整合 social signal 從根源防止崩潰。

三個 collapse mode 對齊看：binary closeness 讓 bubble 變雜訊、post-hoc boost 被 content ANN 的 recall 釘住 ceiling、unconditional online learning 把 selection bias 滾雪球。三個失敗模式都不會在第一週的 metric 上出現——它們是慢性病，幾個月之後才看得到形狀，到時候 representation 已經被汙染、用任何 post-hoc 修補都救不回來。Reel Friends 把 social 從一個 post-hoc rerank signal 提升到 first-class structural component，付出的代價是五個 component 都要重新對 social-aware distribution 做設計；換來的是 social discovery 跟 content discovery 在 representation level 上的同一致性。前者是工程量，後者是 product moat——很容易被低估，但只有後者能在使用者越用越多之後維持品質。

真正的 moat 不在任何單一 component 上，而在 component 之間的 interface contract：closeness 跟 retrieval 之間（top-K friends 列表 + closeness vector）、retrieval 跟 ranker 之間（candidate pool + pool-internal score）、ranker 跟 serving 之間（按 prefetch window pin 住的 metadata schema）、ranker 跟 online learning 之間（log (impression, interaction, condition) tuple）。這四組 interface 是 system design 真正的價值——把它們設計對，每個 component 才能 independently iterate。Meta 在這篇文章裡實際上是在示範這四組 contract 怎麼定，這對其他 team 的參考價值，遠超過任何單一 component 的 model architecture 細節。

What this enables：把 social 從事後 rerank 的 signal，升格成 retrieval / ranking / online learning 三層 jointly modeled 的 first-class component。朋友看過的 reels 不再是 ranker 後面一個 boost 開關，而是整套 distribution 從上游就 social-aware——這是 social discovery 跟 friend boost 之間的本質差距。