CausalAgents: A Robustness Benchmark for Motion Forecasting using Causal Relationships

Rebecca Roelofs, Liting Sun, Ben Caine, Khaled S. Refaat, Ben Sapp, Scott Ettinger, Wei Chai

As machine learning models become increasingly prevalent in motion forecasting systems for autonomous vehicles (AVs), it is critical that we ensure that model predictions are safe and reliable. However, exhaustively collecting and labeling the data necessary to fully test the long tail of rare and challenging scenarios is difficult and expensive. In this work, we construct a new benchmark for evaluating and improving model robustness by applying perturbations to existing data. Specifically, we conduct an extensive labeling effort to identify causal agents, or agents whose presence influences human driver behavior in any way, in the Waymo Open Motion Dataset (WOMD), and we use these labels to perturb the data by deleting non-causal agents from the scene. We then evaluate a diverse set of state-of-the-art deep-learning model architectures on our proposed benchmark and find that all models exhibit large shifts under perturbation. Under non-causal perturbations, we observe a 25-38% relative change in minADE as compared to the original. We then investigate techniques to improve model robustness, including increasing the training dataset size and using targeted data augmentations that drop agents throughout training. We plan to provide the causal agent labels as an additional attribute to WOMD and release the robustness benchmarks to aid the community in building more reliable and safe deep-learning models for motion forecasting.

IEEE International Conference on Robotics and Automation (ICRA 2024).

MotionLM: Multi-Agent Motion Forecasting as Language Modeling

Ari Seff, Brian Cera, Dian Chen, Mason Ng, Aurick Zhou, Nigamaa Nayakanti, Khaled S. Refaat, Rami Al-Rfou, Benjamin Sapp

Reliable forecasting of the future behavior of road agents is a critical component to safe planning in autonomous vehicles. Here, we represent continuous trajectories as sequences of discrete motion tokens and cast multi-agent motion prediction as a language modeling task over this domain. Our model, MotionLM, provides several advantages: First, it does not require anchors or explicit latent variable optimization to learn multimodal distributions. Instead, we leverage a single standard language modeling objective, maximizing the average log probability over sequence tokens. Second, our approach bypasses post-hoc interaction heuristics where individual agent trajectory generation is conducted prior to interactive scoring. Instead, MotionLM produces joint distributions over interactive agent futures in a single autoregressive decoding process. In addition, the model’s sequential factorization enables temporally causal conditional rollouts. The proposed approach establishes new state-of-the-art performance for multi-agent motion prediction on the Waymo Open Motion Dataset, ranking 1st on the interactive challenge leaderboard.

International Conference on Computer Vision (ICCV 2023).

Wayformer: Motion Forecasting via Simple & Efficient Attention Networks

Nigamaa Nayakanti, Rami Al-Rfou, Aurick Zhou, Kratarth Goel, Khaled S. Refaat, Benjamin Sapp

Motion forecasting for autonomous driving is a challenging task because complex driving scenarios result in a heterogeneous mix of static and dynamic inputs. It is an open problem how best to represent and fuse information about road geometry, lane connectivity, time-varying traffic light state, and history of a dynamic set of agents and their interactions into an effective encoding. To model this diverse set of input features, many approaches proposed to design an equally complex system with a diverse set of modality specific modules. This results in systems that are difficult to scale, extend, or tune in rigorous ways to trade off quality and efficiency. In this paper, we present Wayformer, a family of attention based architectures for motion forecasting that are simple and homogeneous. Wayformer offers a compact model description consisting of an attention based scene encoder and a decoder. In the scene encoder we study the choice of early, late and hierarchical fusion of the input modalities. For each fusion type we explore strategies to tradeoff efficiency and quality via factorized attention or latent query attention. We show that early fusion, despite its simplicity of construction, is not only modality agnostic but also achieves state-of-the-art results on both Waymo Open MotionDataset (WOMD) and Argoverse leaderboards, demonstrating the effectiveness of our design philosophy.

IEEE International Conference on Robotics and Automation (ICRA 2023).

Ranked 1st on Argoverse and Waymo Open Datasets

Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D Human Keypoints

Jiachen Li, Xinwei Shi, Feiyu Chen, Jonathan Stroud, Zhishuai Zhang, Tian Lan, Junhua Mao, Jeonhyung Kang, Khaled S. Refaat, Weilong Yang, Eugene Ie, Congcong Li

Accurate understanding and prediction of human behaviors are critical prerequisites for autonomous vehicles, especially in highly dynamic and interactive scenarios such as intersections in dense urban areas. In this work, we aim at identifying crossing pedestrians and predicting their future trajectories. To achieve these goals, we not only need the context information of road geometry and other traffic participants, but also need fine-grained information of the human pose, motion and activity, which can be inferred from human keypoints. In this paper, we propose a novel multi-task learning framework for pedestrian crossing action recognition and trajectory prediction, which utilizes 3D human keypoints extracted from raw sensor data to capture rich information on human pose and activity. Moreover, we propose to apply two auxiliary tasks and contrastive learning to enable auxiliary supervisions to improve the learned keypoints representation, which further enhances the performance of major tasks. We validate our approach on a large-scale in-house dataset, as well as a public benchmark dataset, and show that our approach achieves state-of-the-art performance on a wide range of evaluation metrics. The effectiveness of each model component is validated in a detailed ablation study.

IEEE International Conference on Robotics and Automation (ICRA 2023).

MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction

Balakrishnan Varadarajan, Ahmed Hefny, Avikalp Srivastava, Khaled S. Refaat, Nigamaa Nayakanti, Andre Cornman, Kan Chen, Bertrand Douillard, Chi Pang Lam, Dragomir Anguelov, Benjamin Sapp

Predicting the future behavior of road users is one of the most challenging and important problems in autonomous driving. Applying deep learning to this problem requires fusing heterogeneous world state in the form of rich perception signals and map information, and inferring highly multi-modal distributions over possible futures. In this paper, we present MultiPath++, a future prediction model that achieves state-of-the-art performance on popular benchmarks. MultiPath++ improves the MultiPath architecture by revisiting many design choices. The first key design difference is a departure from dense image-based encoding of the input world state in favor of a sparse encoding of heterogeneous scene elements: MultiPath++ consumes compact and efficient polylines to describe road features, and raw agent state information directly (e.g., position, velocity, acceleration). We propose a context-aware fusion of these elements and develop a reusable multi-context gating fusion component. Second, we reconsider the choice of pre-defined, static anchors, and develop a way to learn latent anchor embeddings end-to-end in the model. Lastly, we explore ensembling and output aggregation techniques -- common in other ML domains -- and find effective variants for our probabilistic multimodal output representation. We perform an extensive ablation on these design choices, and show that our proposed model achieves state-of-the-art performance on the Argoverse Motion Forecasting Competition and the Waymo Open Dataset Motion Prediction Challenge.

IEEE International Conference on Robotics and Automation (ICRA 2022).

Accelerated Deep Reinforcement Learning of Terrain-Adaptive Locomotion Skills

Khaled S. Refaat, Kai Ding

Learning locomotion skills on dynamic terrains allows creating realistic animations without recording motion capture data. The simulated character is trained to navigate varying terrains avoiding obstacles with balance and agility. Model-free reinforcement learning has been used to develop such skills for simulated characters. In particular, a mixture of actor-critic experts (MACE) was recently shown to enable learning of such complex skills by promoting specialization and incorporating human knowledge. However, this approach still requires access to a very large number of training interactions and explorations with a computationally expensive simulator. We demonstrate how to accelerate model-free reinforcement learning to acquire terrain-adaptive locomotion skills, as well as decrease the need for large-scale exploration. We first generalize model-based value expansion (MVE) to a mixture of actor-critic experts, showing the conditions under which the method accelerates learning in this generalized setting. This motivates combining MACE with MVE resulting in the MACE-MVE algorithm. We then propose learning to predict future terrains, character states, rewards, and the probability of falling down via convolutional networks to speed-up learning using generalized MVE. We analyze our approach empirically showing that it can substantially speed-up learning of such challenging skills. Finally, we study the effect of various design choices to control for uncertainty and manage dynamics fidelity.

Thirty-fifth Conference on Neural Information Processing Systems, Deep Reinforcement Learning Workshop (NeurIPS, Deep RL 2021).

Agent Prioritization for Autonomous Navigation

Khaled S. Refaat, Kai Ding, Natalia Ponomareva, Stéphane Ross

In autonomous navigation, a planning system reasons about other agents to plan a safe and plausible trajectory. Before planning starts, agents are typically processed with computationally intensive models for recognition, tracking, motion estimation and prediction. With limited computational resources and a large number of agents to process in real time, it becomes important to efficiently rank agents according to their impact on the decision making process. This allows spending more time processing the most important agents. We propose a system to rank agents around an autonomous vehicle (AV) in real time. We automatically generate a ranking data set by running the planner in simulation on real-world logged data, where we can afford to run more accurate and expensive models on all the agents. The causes of various planner actions are logged and used for assigning ground truth importance scores. The generated data set can be used to learn ranking models. In particular, we show the utility of combining learned features, via a convolutional neural network, with engineered features designed to capture domain knowledge. We show the benefits of various design choices experimentally. When tested on real AVs, our system demonstrates the capability of understanding complex driving situations.

IEEE/RSJ International Conference on Robots and Systems (IROS 2019).

Decomposition Techniques for Learning Graphical Models

Khaled S. Refaat

Probabilistic graphical models are ubiquitous tools for reasoning under uncertainty that have been useful to many fields. Despite their importance, learning these models from incomplete data remains a challenge, due to the high non-convexity of the corresponding optimization problem. Iterative algorithms, such as Expectation Maximization (EM), are typically used for learning from incomplete data, yet these approaches tend to exhibit behaviors that are independent of the degree of incompleteness in the data. We argue in this thesis that the degree of incompleteness is a main indicator of the difficulty of a learning problem. As such, we investigate a number of learning approaches, which are driven and motivated by this degree. In particular, we show that by exploiting certain patterns in the dataset, the learning problem can be decomposed into smaller and independent learning problems, which can lead to orders-of-magnitude speed-up in learning time. Moreover, we propose a new class of algorithms for learning graphical models, whose learned parameters and running time improve as the data becomes less incomplete.

PhD Thesis, UCLA, 2015.

An Upper Bound on the Global Optimum in Parameter Estimation

Khaled S. Refaat, Adnan Darwiche

Learning graphical model parameters from incomplete data is a non-convex optimization problem. Iterative algorithms, such as Expectation Maximization (EM), can be used to get a local optimum solution. However, little is known about the quality of the learned local optimum, compared to the unknown global optimum. We exploit variables that are always observed in the dataset to get an upper bound on the global optimum which can give insight into the quality of the parameters learned by estimation algorithms.

Conference on Uncertainty in Artificial Intelligence (UAI 2015).

Plenary Presentation.

Data Compression for Learning MRF Parameters

Khaled S. Refaat, Adnan Darwiche

We propose a technique for decomposing and compressing the dataset in the parameter learning problem in Markov random fields. Our technique applies to incomplete datasets and exploits variables that are always observed in the given dataset. We show that our technique allows exact computation of the gradient and the likelihood, and can lead to orders-of-magnitude savings in learning time.

International Joint Conference on Artificial Intelligence (IJCAI 2015).

Decomposing Parameter Estimation Problems

Khaled S. Refaat, Arthur Choi, Adnan Darwiche

We propose a technique for decomposing the parameter learning problem in Bayesian networks into independent learning problems. Our technique applies to incomplete datasets and exploits variables that are either hidden or observed in the given dataset. We show empirically that the proposed technique can lead to orders-of-magnitude savings in learning time. We explain, analytically and empirically, the reasons behind our reported savings, and compare the proposed technique to related ones that are sometimes used by inference algorithms.

Advances in Neural Information Processing Systems (NIPS 2014).

EDML for Learning Parameters in Directed and Undirected Graphical Models

Khaled S. Refaat, Arthur Choi, Adnan Darwiche

EDML is a recently proposed algorithm for learning parameters in Bayesian networks. It was originally derived in terms of approximate inference on a metanetwork, which underlies the Bayesian approach to parameter estimation. While this initial derivation helped discover EDML in the first place and provided a concrete context for identifying some of its properties (e.g., in contrast to EM), the formal setting was somewhat tedious in the number of concepts it drew on.In this paper, we propose a greatly simplified perspective on EDML, which casts it as a general approach to continuous optimization. The new perspective has several advantages. First, it makes immediate some results that were non-trivial to prove initially. Second, it facilitates the design of EDML algorithms for new graphical models, leading to a new algorithm for learning parameters in Markov networks. We derive this algorithm in this paper, and show, empirically, that it can sometimes learn estimates more efficiently from complete data, compared to commonly used optimization methods, such as conjugate gradient and L-BFGS.

Advances in Neural Information Processing Systems (NIPS 2013). (A preliminary version appeared in the ICML 2013 Workshop on Interactions between Inference and Learning, Atlanta, GA, USA)

Large-Scale Query Understanding

Khaled S. Refaat, Sugato Basu, Deirdre O'brien, Liadan O'Callaghan

In this paper, we propose a large-scale multi-dimensional co-clustering framework for understanding queries in a search engine. To achieve this goal, the system simultaneously clusters queries along with attributes of results that were shown (and clicked) on these queries. In our application, we co-cluster queries along with advertisements (commercial results), advertisement keywords, and query terms — this gives us the ability to look at concepts that correspond to groups of queries and ads, as well as do better noise filtering in query clustering [1]. After getting query clusters, we identify representative queries for each cluster, in an attempt to explain what concept underlies a cluster. Our system extends the co-clustering MapReduce framework [2] to perform multi-dimensional co-clustering at scale.

In NIPS 2012 Workshop on Big Learning : Algorithms, Systems, and Tools, Lake Tahoe, Nevada, USA, 2012.

New Advances and Theoretical Insights into EDML

Khaled S. Refaat, Arthur Choi, Adnan Darwiche

EDML is a recently proposed algorithm for learning MAP parameters in Bayesian networks. In this paper, we present a number of new advances and insights on the EDML algorithm. First, we provide the multivalued extension of EDML, originally proposed for Bayesian networks over binary variables. Next, we identify a simplified characterization of EDML that further implies a simple fixed-point algorithm for the convex optimization problem that underlies it. This characterization further reveals a connection between EDML and EM: a fixed point of EDML is a fixed point of EM, and vice versa. We thus identify also a new characterization of EM fixed points, but in the semantics of EDML. Finally, we propose a hybrid EDML/EM algorithm that takes advantage of the improved empirical convergence behavior of EDML, while maintaining the monotonic improvement property of EM.

Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI 2012), Catalina Island, USA, 2012.

Oral Presentation. Oral acceptance rate: 7.8%.

EDML: A Method for Learning Parameters in Bayesian Networks

Arthur Choi, Khaled S. Refaat, Adnan Darwiche

We propose a method called EDML for learning MAP parameters in binary Bayesian networks MAP parameters in binar y Bayesian networks under incomplete data. The method assumes Beta priors and can be used to learn maximum likelihood parameters when the priors are uninformative. EDML exhibits interesting behaviors, especially when compared to EM. We introduce EDML, explain its origin, and study some of its properties both analytically and empirically.

Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011), Barcelona, 2011.

Efficient Stochastic Analysis of Real-Time Systems via Random Sampling

Khaled S. Refaat, Pierre Emmanuel Hladik

This paper provides a stochastic approach to the analysis of real-time systems under preemptive priority-driven scheduling. The main idea is to simplify the execution time distributions via random sampling to decrease complexity. This beneficial effect is counterbalanced by an increase in pessimism. However, the proposed analysis is significantly less pessimistic than the classical worst-case deterministic analysis. In addition, it could be tuned according to the memory and time availability. Thus, the proposed method provides, for the first time, a relation between pessimism and computational resources. The testing results show the effectiveness of the sampling approach in terms of practicality and optimism.
Keywords:stochastic analysis;efficient;random sampling;real time systems;pessimism

Proceedings of the 22nd Euromicro Conference on Real-Time Systems (ECRTS 2010), Brussels, 2010, pp. 175-183.

The Support Vector Machined Kernel: Towards a New Classification Framework

Khaled S. Refaat

In this thesis, we propose the so-called ”SVM’ed-kernel function” and its use in SVM classification problems. This kernel function is itself a support vector machine classi- fier that is learned statistically from data. We show that the new kernel manages to change the classical methodology of defining a feature vector for each pattern. One will only need to define features representing the similarity between two patterns allowing many details to be captured in a concise way. The new proposed kernel shows very promising results. It opens the door for new feature definitions that could be created in various machine learning problems where similarity between patterns can be formulated more suitably.

Master's Thesis, Cairo University, 2010.

Hand-Drawn Shape Recognition using the SVM'ed Kernel

Khaled S. Refaat, Amir F. Atiya

We describe an application of the novel Support Vector Machined Kernel (SVM’ed Kernel) to the Recognition of hand-drawn shapes. The SVM’ed kernel function is itself a support vector machine classifier that is learned statistically from data using an automatically generated training set. We show that the new kernel manages to change the classical methodology of defining a feature vector for each pattern. One will only need to define features representing the similarity between two patterns allowing many details to be captured in a concise way. In addition, we illustrate that features describing a single pattern could also be used in this new framework. In this paper we show how the SVM’ed Kernel is defined and trained for the multiclass shape recognition problem. Simulation results show that the SVM’ed Kernel outperforms all other classical kernels and is more robust to hard test sets.
Keywords:Shape recognition,Support Vector Machine,Kernel,Similarity.

Proceedings of the 19th International Conference on Artificial Neural Networks (ICANN 2009), Cyprus, 2009, LNCS Volume 5769/2009, pp. 275-284.

The Support Vector Machined Kernel

Khaled S. Refaat

In this paper, we propose the so-called “SVM’ed-kernel function” and its use in SVM classification problems. This kernel function is itself a support vector machine classifier that is learned statistically from data. We show that the new kernel manages to change the classical methodology of defining a feature vector for each pattern. One will only need to define features representing the similarity between two patterns allowing many details to be captured in a concise way. The new proposed kernel shows very promising results. It opens the door for new feature definitions that could be created in various machine learning problems where similarity between patterns can be formulated more suitably.
Index Terms:Support Vector Machine,Kernel,Similarity.

Proceedings of the IEEE region 8 Eurocon, Saint Petersburg, Russia, 2009, pp. 1978-1984.

Top 5 IEEE Region 8 (Africa, Europe, Middle East) student paper contest finals.

An Optimized Method for Arabic Cross-Document Named Entity Normalization

Khaled S. Refaat, A. Madkour

This paper presents a technique to perform Arabic cross-document named entity normalization. The proposed method offers significant time improvement over conventional nxn comparisons performed between named entities. It relies on a novel efficient algorithm that avoids normalizing the new entities against all existing entities. Only a single candidate from the normalized entities is chosen to be checked against each new entity. This allows using extensive normalization checking only with the entity that is most likely to be normalized. Our results show that we obtain comparative results in nearly half the time required by conventional named entity normalization methods. We have also tuned a SVM model that decides whether two entities should be merged or not. This SVM model outperforms the related work in accuracy by 9%.

Proceedings of the 2nd International Conference in Arabic Language Resources and Tools, Cairo, Egypt, 2009, pp. 219-221.

Using Semantic Features to Detect Spamming in Social BookMarking Systems

Amgad Madkour, Tarek Hefni, Ahmed Hefny, Khaled S. Refaat

Collaborative software is gaining pace as a vital means of information sharing between users. This paper discusses one of the key challenges that affect such systems which is identifying spammers. We discuss potential features that describe the system’s users and illustrate how we can use those features in order to determine potential spamming users through various machine learning models.

Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge in Databases (ECML PKDD) Discovery Challenge, RSDC08, Antwerp, Belgium, 2008.

A new Approach for Context-Independent Hand-Written Offline Diagram Recognition using Support Vector Machines

Khaled S. Refaat, W. Helmy, A. Ali, M. Abdelghany, A. Atiya

Structured diagrams are very prevalent in many document types. Most people who need to create such diagrams use structured graphics editors such as Microsoft Visio [1]. Structured graphics editors are extremely powerful and expressive but they can be cumbersome to use [2]. We have shown through extensive timing experiments that structured diagrams drawn by hand will take only about 10% of the time it takes to draw one using a tool like Visio. This indicates the value of automated recognition of hand-written diagrams. Recently, applications have been developed that use online systems running on pen-input PCs that allow users to create structured diagrams by drawing the diagram on the PC tablet. The progress of offline diagram recognition is still minimal. The objective of this paper is to propose a context-independent off-line diagram recognition system. Our approach utilizes support vector machines [3] for recognition and Line Primitive Extraction by Interpretation of Line Continuation for segmentation [4].

Proceedings of the International Joint Conference on Neural Networks (IJCNN 2008), Hong Kong, 2008, pp. 177-182.

Support Vector Machine vs An Optimized Neural Network for Diagnosing Plant Diseases

M. Sammany, Khaled S. Refaat

Vegetable crops suffer from many leaf batches, which differ in color, shape, and size according to the cause. Leaf batches happen as a result of plant pathogens. In agriculture mass production, it is needed to discover the beginning of plant disease batches early to be ready for appropriate timing control. In this regard, Support Vector Machine (SVM) has been used to classify the plants symptoms according to their appropriate categories, these categories are Yellow Spotted (YS) category, White Spotted (WS) category, Red Spotted (RS) category, and discolored category (D). The results obtained using SVM have been compared to the results obtained by an optimized Multi-layered Perceptron (MLP).

Proceedings of the International Computer Engineering Conference (ICENCO 2006), Giza, Egypt, 2006, pp. 25-31.

Publications

CausalAgents: A Robustness Benchmark for Motion Forecasting using Causal Relationships

MotionLM: Multi-Agent Motion Forecasting as Language Modeling

Wayformer: Motion Forecasting via Simple & Efficient Attention Networks

Pedestrian Crossing Action Recognition and Trajectory Prediction with 3D Human Keypoints

MultiPath++: Efficient Information Fusion and Trajectory Aggregation for Behavior Prediction

Accelerated Deep Reinforcement Learning of Terrain-Adaptive Locomotion Skills

Agent Prioritization for Autonomous Navigation

Decomposition Techniques for Learning Graphical Models

An Upper Bound on the Global Optimum in Parameter Estimation

Data Compression for Learning MRF Parameters

Decomposing Parameter Estimation Problems

EDML for Learning Parameters in Directed and Undirected Graphical Models

Large-Scale Query Understanding

New Advances and Theoretical Insights into EDML

EDML: A Method for Learning Parameters in Bayesian Networks

Efficient Stochastic Analysis of Real-Time Systems via Random Sampling

The Support Vector Machined Kernel: Towards a New Classification Framework

Hand-Drawn Shape Recognition using the SVM'ed Kernel

The Support Vector Machined Kernel

An Optimized Method for Arabic Cross-Document Named Entity Normalization

Using Semantic Features to Detect Spamming in Social BookMarking Systems

A new Approach for Context-Independent Hand-Written Offline Diagram Recognition using Support Vector Machines

Support Vector Machine vs An Optimized Neural Network for Diagnosing Plant Diseases