Can AI Fix Road Congestion? IISc's UVH-26 Dataset Brings Indian Reality To Traffic AI

Bengaluru: Traffic congestion remains one of the most persistent urban challenges in India, draining productivity, increasing fuel consumption, and worsening air pollution. While AI is increasingly being deployed under Smart City and urban mobility initiatives, many of these systems struggle to deliver results on Indian roads. The reason lies in the data they are trained on.

Most traffic AI models used in India rely on Western datasets that assume disciplined lane behaviour and limited vehicle diversity. Indian roads, in contrast, are dominated by two-wheelers, auto rickshaws, buses, and a wide variety of informal and mixed-use vehicles. This mismatch causes AI systems to misclassify vehicles or miss them entirely, undermining applications such as signal optimisation, congestion detection, urban planning, and safety analysis.

UVH-26: A Dataset Designed for Indian Traffic

To address this gap, the AI for Integrated Mobility (AIM) initiative at the Indian Institute of Science (IISc) launched UVH-26 in November 2025, which is India’s largest open-source traffic image dataset. Released alongside a set of AI models, UVH-26 is built entirely from Indian roads, containing 26,646 high-resolution images collected from 2,800 Safe City CCTV cameras across Bengaluru, with 1.8 million annotated bounding boxes covering 14 Indian Road Congress–defined vehicle categories.

Example base images (left) and their pre-annotations with bounding box and label (right) (Special Arrangement)

The dataset was created through a national Urban Vision Hackathon that brought together 565 student volunteers. To ensure accuracy, annotations were generated using consensus-based techniques, producing high-quality labels suitable for real-world deployment. Unlike global datasets such as COCO, UVH-26 captures the density, diversity, and unpredictability of Indian urban traffic.

Why Detection Accuracy Is the Foundation of Traffic AI

Professor Yogesh Simmhan, Associate Professor at IISc’s Department of Computational and Data Sciences, in an exclusive interview with ETV Bharat, pointed out that accurate detection and classification are the foundational building blocks of all urban mobility applications. Vehicle counting, congestion analysis, adaptive signal control, and road safety monitoring all depend on correctly identifying what is actually moving on the road.

When these foundational tasks fail, even the most advanced AI systems become ineffective. While modern AI architectures are already powerful, their real-world success depends largely on the data used to train them. By providing representative Indian traffic data, UVH-26 enables AI systems that are far more accurate and reliable in Indian conditions.

India-Specific Models Show Significant Accuracy Gains

Along with the dataset, AIM@IISc has released six fine-tuned vehicle detection models based on YOLOv11, DAMO-YOLO, and RT-DETRv2 architectures. These models show performance improvements of up to 31.5 per cent compared to standard COCO-trained baselines when evaluated on Indian traffic.

Performance comparison of UVH-26 trained models against state-of-the-art baselines (Special Arrangement)

Professor Simmhan cautioned that these gains vary by vehicle type. For globally common vehicles such as sedans, improvements are modest. However, for India-specific vehicles—including auto rickshaws, buses, trucks, and varied SUV types—the accuracy gains are substantial. These improvements directly address the categories that dominate Indian roads and are most critical for traffic management.

The following table summarises the technical highlights of UVH-26:

Category Details Dataset Size 26,646 anonymized 1080p traffic images from 2,800 Bengaluru CCTV cameras Annotations 1.8M bounding boxes across 14 India-specific vehicle classes Consensus Methods Majority Voting and STAPLE for high-quality ground truth Models Released YOLOv11 (S/X), DAMO-YOLO (T/L), RT-DETRv2 (S/X) Performance Gains Up to 31.5% improvement in mAP@50:95 vs COCO-trained baselines Licensing Dataset under CC BY 4.0; Models under Apache 2.0 / AGPL-3.0

How Better AI Translates Into Decongested Roads