< Explain other AI papers

EgoLife: Towards Egocentric Life Assistant

Jingkang Yang, Shuai Liu, Hongming Guo, Yuhao Dong, Xiamengwei Zhang, Sicheng Zhang, Pengyun Wang, Zitang Zhou, Binzhu Xie, Ziyue Wang, Bei Ouyang, Zhengyu Lin, Marco Cominelli, Zhongang Cai, Yuanhan Zhang, Peiyuan Zhang, Fangzhou Hong, Joerg Widmer, Francesco Gringoli, Lei Yang, Bo Li, Ziwei Liu

2025-03-07

EgoLife: Towards Egocentric Life Assistant

Summary

This paper talks about EgoLife, a project that aims to create a smart personal assistant using AI-powered glasses that can understand and help with your daily activities

What's the problem?

Current AI assistants don't fully understand our daily lives from our perspective, which limits how helpful they can be. They also struggle with remembering things over long periods and recognizing different people

What's the solution?

The researchers had six people wear special AI glasses for a week, recording everything they did. This created a huge dataset of 300 hours of video from the wearer's point of view. They then used this data to train two AI systems: EgoGPT, which understands what's happening in the videos, and EgoRAG, which can answer questions about long periods of time. Together, these form the EgoButler system

Why it matters?

This matters because it could lead to AI assistants that truly understand our daily lives and can help us in more meaningful ways. Imagine having a personal assistant that can remind you of things you forgot, give you health advice based on your habits, or offer personalized recommendations. By sharing their work, the researchers hope to inspire more advancements in this field, bringing us closer to having smart, helpful AI companions in our everyday lives

Abstract

We introduce EgoLife, a project to develop an egocentric life assistant that accompanies and enhances personal efficiency through AI-powered wearable glasses. To lay the foundation for this assistant, we conducted a comprehensive data collection study where six participants lived together for one week, continuously recording their daily activities - including discussions, shopping, cooking, socializing, and entertainment - using AI glasses for multimodal egocentric video capture, along with synchronized third-person-view video references. This effort resulted in the EgoLife Dataset, a comprehensive 300-hour egocentric, interpersonal, multiview, and multimodal daily life dataset with intensive annotation. Leveraging this dataset, we introduce EgoLifeQA, a suite of long-context, life-oriented question-answering tasks designed to provide meaningful assistance in daily life by addressing practical questions such as recalling past relevant events, monitoring health habits, and offering personalized recommendations. To address the key technical challenges of (1) developing robust visual-audio models for egocentric data, (2) enabling identity recognition, and (3) facilitating long-context question answering over extensive temporal information, we introduce EgoButler, an integrated system comprising EgoGPT and EgoRAG. EgoGPT is an omni-modal model trained on egocentric datasets, achieving state-of-the-art performance on egocentric video understanding. EgoRAG is a retrieval-based component that supports answering ultra-long-context questions. Our experimental studies verify their working mechanisms and reveal critical factors and bottlenecks, guiding future improvements. By releasing our datasets, models, and benchmarks, we aim to stimulate further research in egocentric AI assistants.