gordonliu

Introduction

In September 2024, I embarked on an ambitious project to develop an intelligent traffic assistant system that would enhance driver safety and convenience. The goal was to create a system that could automatically detect when traffic lights turn green, alerting drivers who might be distracted. What started as a seemingly straightforward computer vision project evolved into a deep exploration of modern machine learning architectures, systematic debugging, and infrastructure optimization.

Project Genesis and Initial Architecture

The core concept was simple: a camera-based system that would alert drivers when stopped at a red light that had just turned green. After an extensive literature review of traffic light detection systems (covering 15 research papers), I discovered LightFormer, a cutting-edge model for end-to-end right-of-way detection that had successfully relabeled the BOSCH and LISA traffic light datasets.

I began by designing a Finite State Machine (FSM) that would:

Continuously monitor right-of-way status for both straight and left turns, using LightFormer
Detect when the vehicle comes to a stop, using a motion detection module
Monitor the appropriate signal based on the vehicle's lane position
Trigger an alert when right-of-way becomes available

Deep Dive into Model Architecture

The project took an interesting turn when I encountered issues with the original LightFormer implementation. Rather than treating these as roadblocks, I saw an opportunity for deeper learning. I made the deliberate choice to reimplement the entire model in pure PyTorch, stripping away the Lightning framework. This decision proved invaluable as it forced me to:

Dissect and understand each layer of the neural architecture
Refresh my understanding of fundamental concepts (CNNs, Transformers, RNNs)
Master advanced concepts like Subcenter Arcface
Gain hands-on experience with video classification architecture components

Systematic Debugging and Optimization

When the model exhibited mode collapse during training, I implemented a systematic debugging approach that showcases my problem-solving methodology:

Data Analysis: Implemented weighted sampling to address class imbalance
Architecture Validation: Identified and fixed a critical double softmax bug
Training Dynamics: Monitored activations for gradient issues
Hyperparameter Optimization: Implemented learning rate finding and scheduling

Infrastructure and Tools Development

Alongside the core ML work, I developed several supporting systems that demonstrate my full-stack capabilities:

Necromancer: Cost-Optimized Training Infrastructure

Developed a system for running training jobs on GCP spot instances
Achieved 60% cost reduction in cloud computing expenses
Implemented automatic training resumption using managed instance groups
Integrated with Google Cloud Storage for seamless state management

Development Tools

Implemented comprehensive TensorBoard monitoring
Created a preprocessing pipeline for efficient data handling
Developed a configuration system for parallel experiment management

Results and Future Directions

The modified LightFormer architecture achieved ~70% accuracy on both straight and turn classifications. While not the state-of-the-art result I initially aimed for, I learned a lot about modern ML architectures. The project also opened my eyes to new frontiers and approaches which I plan on exploring:

Foundation Model Adaptation: Investigating pre-trained vision models specifically tuned for traffic scenarios
Multimodal Integration: Exploring models that can incorporate multiple types of traffic-related data
Novel Architecture Development: Planning a new architecture (ChimeFormer) based on insights gained

Key Learnings and Technical Achievements

This project helped me develop several critical skills:

Deep understanding of modern ML architectures
Systematic debugging and optimization
Infrastructure development and cost optimization
Practical implementation of academic research

LightFormer2