GAICo: A Deployed and Extensible Framework for Evaluating Diverse and Multimodal Generative AI Outputs
N. Gupta, P. Koppisetti, K. Lakkaraju, B. Srivastava
IAAI/AAAI 2026 (In-Press)
The rapid proliferation of Generative AI (GenAI) into diverse, high-stakes domains necessitates robust and reproducible evaluation methods. However, practitioners often resort to ad-hoc, non-standardized scripts, as common metrics are often unsuitable for specialized, structured outputs (e.g., automated plans, time-series) or holistic comparison across modalities (e.g., text, audio, and image). This fragmentation hinders comparability and slows AI system development. To address this challenge, we present GAICo (Generative AI Comparator): a deployed, open-source Python library that streamlines and standardizes GenAI output comparison. GAICo provides a unified, extensible framework supporting a comprehensive suite of reference-based metrics for unstructured text, specialized structured data formats, and multimedia (images, audio). Its architecture features a high-level API for rapid, end-to-end analysis, from multi-model comparison to visualization and reporting, alongside direct metric access for granular control. We demonstrate GAICo's utility through a detailed case study evaluating and debugging complex, multi-modal AI Travel Assistant pipelines. GAICo empowers AI researchers and developers to efficiently assess system performance, make evaluation reproducible, improve development velocity, and ultimately build more trustworthy AI systems, aligning with the goal of moving faster and safer in AI deployment. Since its release on PyPI in Jun 2025, the tool has been downloaded over 13K times, across versions, by Aug 2025, demonstrating growing community interest.
GAICo: Demonstrating a Unified Framework for Multi-Modal GenAI Evaluation
P. Koppisetti, N. Gupta, K. Lakkaraju, B. Srivastava
AAAI Demonstration 2026 (In-Press)
Promoting Nutrition Adherence with Convenience Using Group Recommendations and Multimodal Food Reasoning - Initial Results
N. Gupta, B. Srivastava, V. Nagpal, L. Valluru, K. Lakkaraju, Z. Abdulrahman, A. Davison
WAIN workshop at ICDM 2025 (In-press)
A common yet regular decision made by people, whether healthy or with any health condition, is to decide what to have in meals like breakfast, lunch, and dinner, which typically consist of a combination of foods for appetizers, main courses, side dishes, desserts, and beverages. However, this decision is often seen as a trade-off between nutritious choices (e.g., salt and sugar levels, nutritional content) and convenience (e.g., cost, speed of access, cuisine type, food source type). We present a data-driven solution for meal recommendations that considers customizable meal configurations and time horizons. This solution balances user preferences while taking into account a food’s constituents and cooking process. Beyond the problem formulation, our contributions include introducing goodness measures, a recipe conversion method from text to the recently introduced multimodal rich recipe representation (R3) format, learning methods using contextual bandits that show promising preliminary results, and the prototype, usage-inspired BEACON system.
2025
Building a Plan Ontology to Represent and Exploit Planning Knowledge and Its Applications
B. Muppasani, N. Gupta, V. Pallagani, B. Srivastava, R. Mutharaju, M. N. Huhns, and V. Narayanan
Discover Data Journal (Nov 2025)
Ontologies are known for their ability to organize rich metadata, support the identification of novel insights via semantic queries, and promote reuse. In this paper, we consider the problem of automated planning, where the objective is to find a sequence of actions that will move an agent from an initial state of the world to a desired goal state. We hypothesize that given a large number of available planners and diverse planning domains, they carry essential information that can be leveraged to improve many ontology applications. We use open data on planning domains and planners to construct the most comprehensive planning ontology to date, based on supported competency questions, and demonstrate its applications in two practical use cases - planner selection and plan explanation. We have also made the ontology and associated resources available to the AI and data communities to promote further research.
Revisiting LLMs in Planning from Literature Review: a Semi-Automated Analysis Approach and Evolving Categories Representing Shifting Perspectives
V. Pallagani, N. Gupta, B. Muppasani, B. Srivastava
ICAPS 2025 (Sept 2025)
Tracking the rapidly evolving literature at the intersection of large language models (LLMs) and planning has become increasingly complex due to significant growth in research output and shifting thematic focuses. Building on the survey by Pallagani et al.(2024), which organized 126 papers collected till November 2023 into eight categories, we present a platform that automates the extraction, categorization, and trend analysis of new papers. Our analysis reports on category drift, identifying evolving perspectives on the use of LLMs for planning. Our analysis reveals a decline in the percentage of papers for six categories, an increase in two, and the emergence of two new categories. Specifically, we contribute by (1) developing an automated system for categorizing new papers into existing or emergent categories,(2) reporting on category shifts with the addition of 47 new papers till September 2024, and (3) introducing a platform for continuous extraction, categorization, and trend tracking in LLM and planning research. This platform also features a leaderboard to encourage innovations in automated paper categorization.
Towards Enhancing Road Safety in South Carolina Using Insights from Traffic and Driver-Education Data (Student Abstract)
N. Gupta, B. Muppasani, S. Srivastava, A. Goel, R. Hartfield, T. Buehrig, M. Reck, E. Kennedy, K. Poore, K. Tremblay, B. Srivastava, and L. Vasconcelos
AAAI 2025 Student Abstract (Apr 2025)
In this student paper, we report on our project to enhance road safety in South Carolina (SC) by analyzing traffic data provided by the Department of Transportation and evaluating the impact of a school-level student driver education program called Alive@25. We improve the understanding of road safety using these traffic and training data to understand collision patterns and areas for improvement and assess training coverage gaps. Our approach combines geospatial analysis, economic impact assessment, temporal trend analysis, and interactive visualizations while leveraging AI techniques to clean and analyze extensive datasets. Key findings revealed higher collision rates in urban counties and rising collision rates in mostly rural areas, where Alive@25 participation is declining. These insights led to recommendations for improving road infrastructure and expanding safety training programs. This research demonstrates the potential of AI-driven insights to inform timely, cost-effective interventions and promote multi-stakeholder engagement in addressing public safety challenges while teaching students data science and AI skills and civic engagement.
On the Books in South Carolina: Mining for Jim Crow Laws
K. Boyd, V. Srivastava, L. DuPre, C. Frear, N. Gupta
University of South Carolina (Feb 2025)
On the Books in South Carolina: Mining for Jim Crow Laws is a collections-as-data and machine learning project by the University of South Carolina Libraries (USC), sub awarded by the University of North Carolina at Chapel Hill (UNC), and made possible by The Andrew W. Mellon Foundation, for the period of May 2022-December 2024. Following UNC’s steps from their first year of the grant, the USC project created a text corpus of South Carolina state legislature acts passed in the period from Reconstruction through the Civil Rights Movement (1868-1968). The USC team then utilized machine learning techniques to create a model classifying the laws as either Jim Crow or not.