Introduction
In September 2024, I embarked on an ambitious project to develop an intelligent traffic assistant system that would enhance driver safety and convenience. The goal was to create a system that could automatically detect when traffic lights turn green, alerting drivers who might be distracted. What started as a seemingly straightforward computer vision project evolved into a deep exploration of modern machine learning architectures, systematic debugging, and infrastructure optimization.
Project Genesis and Initial Architecture
The core concept was simple: a camera-based system that would alert drivers when stopped at a red light that had just turned green. After an extensive literature review of traffic light detection systems (covering 15 research papers), I discovered LightFormer, a cutting-edge model for end-to-end right-of-way detection that had successfully relabeled the BOSCH and LISA traffic light datasets.
I began by designing a Finite State Machine (FSM) that would:
- Continuously monitor right-of-way status for both straight and left turns, using LightFormer
- Detect when the vehicle comes to a stop, using a motion detection module
- Monitor the appropriate signal based on the vehicle's lane position
- Trigger an alert when right-of-way becomes available
Deep Dive into Model Architecture
The project took an interesting turn when I encountered issues with the original LightFormer implementation. Rather than treating these as roadblocks, I saw an opportunity for deeper learning. I made the deliberate choice to reimplement the entire model in pure PyTorch, stripping away the Lightning framework. This decision proved invaluable as it forced me to:
- Dissect and understand each layer of the neural architecture
- Refresh my understanding of fundamental concepts (CNNs, Transformers, RNNs)
- Master advanced concepts like Subcenter Arcface
- Gain hands-on experience with video classification architecture components
Systematic Debugging and Optimization
When the model exhibited mode collapse during training, I implemented a systematic debugging approach that showcases my problem-solving methodology:
- Data Analysis: Implemented weighted sampling to address class imbalance
- Architecture Validation: Identified and fixed a critical double softmax bug
- Training Dynamics: Monitored activations for gradient issues
- Hyperparameter Optimization: Implemented learning rate finding and scheduling
Infrastructure and Tools Development
Alongside the core ML work, I developed several supporting systems that demonstrate my full-stack capabilities:
Necromancer: Cost-Optimized Training Infrastructure
- Developed a system for running training jobs on GCP spot instances
- Achieved 60% cost reduction in cloud computing expenses
- Implemented automatic training resumption using managed instance groups
- Integrated with Google Cloud Storage for seamless state management
Development Tools
- Implemented comprehensive TensorBoard monitoring
- Created a preprocessing pipeline for efficient data handling
- Developed a configuration system for parallel experiment management
Results and Future Directions
The modified LightFormer architecture achieved ~70% accuracy on both straight and turn classifications. While not the state-of-the-art result I initially aimed for, I learned a lot about modern ML architectures. The project also opened my eyes to new frontiers and approaches which I plan on exploring:
- Foundation Model Adaptation: Investigating pre-trained vision models specifically tuned for traffic scenarios
- Multimodal Integration: Exploring models that can incorporate multiple types of traffic-related data
- Novel Architecture Development: Planning a new architecture (ChimeFormer) based on insights gained
Key Learnings and Technical Achievements
This project helped me develop several critical skills:
- Deep understanding of modern ML architectures
- Systematic debugging and optimization
- Infrastructure development and cost optimization
- Practical implementation of academic research