AI Blog #2: Rapid Listing - Building Multi-modal AI Application [English Blog]

AI Blog #2: Rapid Listing - Building Multi-modal AI Application [English Blog]

A Technical Journey from Traditional Programming to LLM Integration - Team Collaboration, Challenges, and Innovation and our User-First approach.

Transforming Traditional Marketplace Listings

What began as “Garage Sale” (Chợ Phiên) evolved into “Rapid Listing” (Đăng nhanh bằng AI)—a name that better reflected our ambitious vision of integrating AI technology into the core functionality of our Chợ Tốt mobile app. This project represented more than just a new feature; it was a multi-modal AI product, marking a fundamental shift from traditional deterministic programming to the probabilistic world of Large Language Models (LLMs).

At Chợ Tốt, our user-first philosophy drives every innovation. We recognized that traditional listing creation was time-consuming and often intimidating for sellers. This user pain point became the catalyst for Rapid Listing—transforming a complex, multi-step process into something as simple as pointing a camera at an item.

AI Quét Là Bán

The Technical Foundation: From Idea to Implementation for iOS App

Brainstorming and Architecture Design

The concept was revolutionary for our marketplace—users could simply point their camera at an item, and our AI would automatically generate a high-quality listing with extracted product information, category mapping, and pricing suggestions.

The initial architecture involved several key components:

  • Real-time camera feed processing using Apple’s AVFoundation and Core Media framework
  • Multi-modal data extraction (text, audio, video, image)
  • Hybrid intelligence model combining on-device and cloud AI
  • Probabilistic UI design to handle non-deterministic AI responses

Research and Dataset Preparation

Our Data Engineering team executed a comprehensive data collection and annotation pipeline:

  • Multi-category taxonomy spanning diverse product verticals
  • Large-scale dataset with thousands of labeled video samples
  • Automated quality validation pipelines ensuring consistent annotation standards

This dataset became the foundation for training our AI models to accurately extract attributes and map products to Chợ Tốt’s specific category taxonomy.

Technical Challenges and Solutions

Transitioning from Deterministic to Probabilistic Programming

The Problem: Traditional mobile app development follows a predictable, deterministic pattern: fetch data from an API, decode it into a strict data model, and display it in a predefined UI. LLM-driven development shatters this paradigm.

Note: This blog highlights case studies and architecture decisions for iOS, but the same principles and workflow apply to Android.

The Solution: We developed a flexible data ingestion layer that could handle:

  • Variable AI outputs with optional fields
  • Unexpected response formats
  • Partial or incomplete data
  • Resilient parsing mechanisms

Instead of “fail-fast” decoding, we embraced graceful degradation and suggestion-based interfaces.

Real-time Multi-modal Data Processing

The Problem: Unlike our previous straightforward use of AVFoundation for video playback, Rapid Listing demanded a sophisticated real-time data extraction pipeline.

At Chợ Tốt, we have a dedicated Video Hub section where users can upload and share videos related to their listings, enhancing the overall user experience and engagement on the platform. We leveraged this existing engineering knowledge and expertise to build the Rapid Listing feature, applying advanced real-time video processing to connect with our AI services while ensuring a seamless user experience.

For Rapid Listing, we need more than that:

  • Process CMSampleBuffer which is an API from Apple Core Media framework to process streams directly from the camera
  • Extract high-quality frames without blocking the main thread
  • Synchronize video, audio, and image data streams
  • Intelligently select relevant frames to minimize costs and latency
Sample buffers are Core Foundation objects that the system uses to move media sample data through the media pipeline. – CMSampleBuffer Document


The Solution: We architectured a multi-tiered processing system:

Multi-tiered processing system

Hybrid Intelligence Architecture

Relying solely on server-side AI would introduce unacceptable latency and a poor user experience. We implemented a hybrid model to mitigate this:

Tier 1: On-Device Intelligence (TensorFlow Lite)

  • Real-time object detection and user guidance
  • Product category mapping
  • Local feedback enhancement
  • ~26ms inference time on 224x224 image crops

Tier 2: Remote LLM for Deep Inference

  • Complex attribute extraction
  • Advanced and fine-grained category mapping
  • Price estimation and validation
  • Quality assurance against platform guidelines

System Architecture Diagrams

System Architecture Breakdown

Real-time Capture and Processing Flow

This diagram shows the real-time multi-modal capture process.

Capture Phase: Users point their camera at items, triggering continuous video streaming and audio recording from the device sensors.

Real-time Processing: The On-Device TensorFlow Lite model performs object detection (~26ms latency) in a continuous loop, while the real-time UI provides immediate category guidance and visual feedback to users with question prompts, so that user can talk and describe the product they want to sell, this step is optional.

Intelligent Selection: The system automatically selects the best frames from the video stream and segments relevant audio portions, preparing optimized multi-modal data for further cloud processing and listing generation.

System Overview - Component Interaction

This system enables users to create optimized listings using AI-powered analysis of mobile camera and audio inputs.

The architecture balances on-device processing for speed with cloud-based AI for sophisticated analysis, creating an efficient pipeline from raw mobile content to professional marketplace listings.

Key Technical Hurdles

Our existing Video Hub experience provided the foundation, but Rapid Listing required real-time processing capabilities.

We optimized our video processing pipeline to handle:

  • Continuous frame extraction from camera streams
  • Intelligent frame selection
  • Memory-efficient processing to prevent device overheating
  • Seamless integration with AI inference services

UI/UX for Probabilistic Systems

The non-deterministic nature of LLM responses forced a fundamental rethink of our UI/UX approach:

  • Traditional approach: Fixed data model → Rigid interface
  • AI-first approach: Variable data → Adaptive, suggestion-based UI

We designed the interface to present AI-generated listings as pre-filled, editable drafts, managing user expectations while inviting review and refinement.

Integration Challenges

Coordinating on-device ML models, real-time camera processing, and remote LLM services required meticulous optimization:

  • Threading management to prevent UI blocking
  • Memory optimization for ML model efficiency
  • Network resilience for remote AI calls
  • Error handling for probabilistic systems

Project Management: Agile in an AI World

We adopted Kanban methodology to manage the inherent uncertainty of R&D and AI development.

Prototype -> MVP -> Final Product Development Cycle

This approach proved essential for:

  • Cross-team coordination between multiple teams
  • Iterative development with rapid prototyping and testing
  • Stakeholder alignment across multiple work-groups
  • Risk management in uncertain AI development timelines

The collaborative nature was crucial—this is a full cross-workgroup initiative involving multiple teams.

The Result

Rapid Listing with AI (Đăng nhanh bằng AI) helps sellers create listings more easily and conveniently with AI assistance. The AI feature is integrated directly into our current listing flow, automatically suggesting and optimizing content to save users' time and improve listing quality.

Learn more at out Official Help Guide: Đăng nhanh bằng AI

Looking Forward

The success of Rapid Listing reinforces our commitment to user-first innovation. Every AI integration we plan stems from real user needs we've identified across Chợ Tốt's platform. This project proved that when we combine cutting-edge technology with deep user empathy, we can create experiences that truly transform how people interact with our marketplace


Acronyms and Technical Terms

  • Rapid Listing: feature "Đăng nhanh bằng AI"
  • AI: Artificial Intelligence - Computer systems that can perform tasks typically requiring human intelligence
  • API: Application Programming Interface - A set of protocols for building software applications
  • AVFoundation and Core Media: Apple’s multimedia frameworks for working with audiovisual media.
  • LLM: Large Language Model - AI models trained on vast amounts of text data
  • ML: Machine Learning - A subset of AI that enables computers to learn from data
  • R&D: Research & Development - Work directed toward innovation and new product development
  • TensorFlow Lite: Google’s lightweight machine learning framework for mobile devices. It's now called LiteRT (or Lite Runtime). Cross-platform on both Android and iOS.
  • UI: User Interface - The visual elements through which users interact with an application
  • UX: User Experience - The overall experience a user has when interacting with a product

Thank you!


Loading comments...