Pracheta Amaranath

Ph.D. Candidate
Manning College of Information and Computer Sciences
Amherst, Massachusetts, USA

Short Bio

I'm a Ph.D Candidate at the University of Massachusetts Amherst, working in the intersection of causal inference and simulation modeling. My work spans two key areas: using causal modeling to enable quick and accurate decision-making from complex simulation models, and using techniques in simulation-based inference and generative modeling to improve causal inference algorithms. I'm fortunate to be advised by Peter Haas and David Jensen. I have a Masters from the University of Massachusetts Amherst, and a undergraduate degree in electronics and telecommunications engineering.

Core Skills

Causal Inference: Bayesian inference, graphical models, probabilistic programming, time-series modeling
Machine Learning: Generative neural networks, LLMs, statistical modeling
Simulation Modeling: Discrete-event simulations, simulation metamodeling

Note: I am currently on the job market for industry post-doc and full-time research scientist positions. Please contact me if you think I would be a good fit for your team.

Honors & Distinctions

"Causal Dynamic Bayesian Networks for Simulation Metamodeling" nominated for the Best Paper Award, Winter Simulation Conference, 2023
"Improving Generative Methods for Causal Evaluation via Simulation-Based Inference" selected for Oral Spotlight presentation at ACIC 2026
Interned at X, the moonshot factory as an AI Resident from Jun 2023-Jan 2024.
Visiting Scholar at Microsoft Research, India from Nov 2024-Jan 2025.
Awarded a Dissertation Proposal Writing Fellowship from the Manning College of Information and Computer Sciences for Fall 2024

Publications

Year	Venue	Publication	Links
2026	CLeaR	Improving Generative Methods for Causal Evaluation via Simulation-Based Inference. Generating synthetic datasets that accurately reflect real-world observational data is critical for evaluating causal estimators, but it remains a challenging task. Existing generative methods offer a solution by producing synthetic datasets anchored in the observed data (source data) while allowing variation in key parameters such as the treatment effect and amount of confounding bias. However, it is often unclear which generative methods to use and which values of parameters to choose when generating synthetic datasets. Moreover, existing methods typically require users to provide fixed point estimates of such parameters. This denies users the ability to express uncertainty over both generative methods and parameter values and removes the potential for posterior inference, potentially leading to unreliable estimator comparisons. We introduce simulation-based inference for causal evaluation (SBICE), a framework that treats the generative method and its corresponding generative parameters as uncertain and infers their posterior distribution given a source dataset. Leveraging techniques in simulation-based inference, SBICE identifies suitable generative methods and infers distributions over its parameter configurations to produce synthetic datasets closely aligned with the source data distribution. Empirical results demonstrate that SBICE improves the reliability of estimator evaluations by generating realistic datasets whose causal estimates closely match the estimates of the source data, making it a robust and uncertainty-aware approach to selecting causal estimators. Pracheta Amaranath, Vinitra Muralikrishnan, Amit Sharma, and David Jensen.	Link PDF
(Ongoing)	(Submitted to WSC)	Extending Causal Metamodeling to a Non-Markovian Queue. Metamodels for discrete-event simulations approximate the behavior of simulation models without running expensive simulations. Prior work introduced modular dynamic Bayesian networks (MDBNs)---a class of metamodels that can estimate a range of probabilistic and causal queries (PCQs) using a single, trained model---but the method was limited to Markovian systems. In this paper, we initiate an extension of MDBNs to non-Markovian queues by approximating non-exponential distributions using phase-type distributions. This approach raises novel challenges, including balancing metamodeling accuracy and tractability when choosing the number of phases, efficiently learning metamodel parameters, and choosing the sampling interval that is used to approximate a continuous-time simulation by a discrete-time MDBN. We provide preliminary solutions to these challenges, yielding the first causal metamodeling technique for non-Markovian systems. Experiments on a G/M/1 queue demonstrate that the MDBN can produce accurate answers to PCQs with orders-of-magnitude speedup of inference times relative to direct simulation. Pracheta Amaranath, Anant Bhide, David Jensen, and Peter J. Haas.
(Ongoing)	Manuscript	Scaling Modular Dynamic Bayesian Networks for Queuing Simulations. In preparation. Pracheta Amaranath, Anant Bhide, Sam Witty, David Jensen, and Peter J. Haas.	-
2023	WSC	Causal Dynamic Bayesian Networks for Simulation Metamodeling. A traditional metamodel for a discrete-event simulation approximates a real-valued performance measure as a function of the input-parameter values. We introduce a novel class of metamodels based on modular dynamic Bayesian networks (MDBNs), a subclass of probabilistic graphical models which can be used to efficiently answer a rich class of probabilistic and causal queries (PCQs). Such queries represent the joint probability distribution of the system state at multiple time points, given observations of, and interventions on, other state variables and input parameters. This paper is a first demonstration of how the extensive theory and technology of causal graphical models can be used to enhance simulation metamodeling. We demonstrate this potential by showing how a single MDBN for an M/M/1 queue can be learned from simulation data and then be used to quickly and accurately answer a variety of PCQs, most of which are out-of-scope for existing metamodels. Pracheta Amaranath, Sam Witty, Peter J. Haas, and David Jensen.	Link PDF
2023	Biometrics	Adaptive Selection of the Optimal Strategy to Improve Precision and Power in Randomized Trials. Benkeser et al. demonstrate how adjustment for baseline covariates in randomized trials can meaningfully improve precision for a variety of outcome types. Their findings build on a long history, starting in 1932 with R.A. Fisher and including more recent endorsements by the U.S. Food and Drug Administration and the European Medicines Agency. Here, we address an important practical consideration: how to select the adjustment approach—which variables and in which form—to maximize precision, while maintaining Type-I error control. Balzer et al. previously proposed Adaptive Pre-specification within TMLE to flexibly and automatically select, from a prespecified set, the approach that maximizes empirical efficiency in small trials (N < 40). To avoid overfitting with few randomized units, selection was previously limited to working generalized linear models, adjusting for a single covariate. Now, we tailor Adaptive Pre-specification to trials with many randomized units. Using V-fold cross-validation and the estimated influence curve-squared as the loss function, we select from an expanded set of candidates, including modern machine learning methods adjusting for multiple covariates. As assessed in simulations exploring a variety of data-generating processes, our approach maintains Type-I error control (under the null) and offers substantial gains in precision—equivalent to 20%-43% reductions in sample size for the same statistical power. When applied to real data from ACTG Study 175, we also see meaningful efficiency improvements overall and within subgroups. Laura B. Balzer, Erica Cai, Lucas Godoy Garraza, and Pracheta Amaranath.	Link PDF
2022	arXiv	Measuring Interventional Robustness in Reinforcement Learning. Recent work in reinforcement learning has focused on several characteristics of learned policies that go beyond maximizing reward. These properties include fairness, explainability, generalization, and robustness. In this paper, we define interventional robustness (IR), a measure of how much variability is introduced into learned policies by incidental aspects of the training procedure, such as the order of training data or the particular exploratory actions taken by agents. A training procedure has high IR when the agents it produces take very similar actions under intervention, despite variation in these incidental aspects of the training procedure. We develop an intuitive, quantitative measure of IR and calculate it for eight algorithms in three Atari environments across dozens of interventions and states. From these experiments, we find that IR varies with the amount of training and type of algorithm and that high performance does not imply high IR, as one might expect. Katherine Avery, Jack Kenney, Pracheta Amaranath, Erica Cai, and David Jensen.	Link PDF
2022	Manuscript	Characterizing and Applying Methods for Constructing Observational Data to Evaluate Treatment Effect Estimators. In preparation. This work studies construction of observational datasets for more faithful and systematic evaluation of treatment effect estimators. Pracheta Amaranath, Purva Pruthi, Amanda Gentzel, and David D. Jensen.	-
2020	INFORMS	Estimating the Prevalence of Multiple Chronic Diseases via Maximum Entropy. The prevalence of multiple chronic disease conditions (MCC) among the U.S. population is steadily increasing and accounts for an outsized share of healthcare expenditures. A predictive model for the probabilities of co-occurring conditions is crucial for planning and implementing complex care interventions, estimating aggregate healthcare costs, and studying physiological associations among conditions. Although the number of MCC patients is large, the chronic-condition combinations among these patients exhibit significant heterogeneity, leading to data-sparsity issues that thwart the simple maximum-likelihood estimation approaches used today. We combine maximum-entropy, data-mining, and machine learning techniques to create an algorithm for estimating MCC prevalence in data-sparse settings; it estimates the prevalence of unseen combinations in a principled manner that is mathematically consistent and reflects the MCC associations that have enough support in the data to be trustworthy. The viability of our approach is demonstrated via experiments on synthetic data and a detailed case study using Medical Expenditure Panel Survey (MEPS) data. Pracheta Amaranath, Ninad Khargonkar, Prasanna Srinivasan, Roshan Thaikkat, Hari Balasubramanian, and Peter J. Haas.	Link

Experience

(Nov 2024-Jan 2025) Visiting Scholar at Microsoft Research, India: I worked on developing simulation-based inference methods to aid causal evaluation and improve generative neural networks!
(Jun 2023-Jan 2024) AI Resident at at X, the moonshot factory I worked on designing and developing simulation tools to evaluate the machine learning pipelines for sequence data.
Previously, I spent a summer (Summer 2020) at EBSCO as a semantic and modeling intern working on information retrieval and extraction of metadata from clinical text for classification. I was also a Data Science for Common Good Fellow with the Center for Data Science in Summer 2019, where I worked on research involving disease modeling and risk assessment for communities in Massachusetts.
In a previous life, I worked at Cisco Systems (India) as a systems engineer. I was responsible for solution design and testing for routing, switching, wireless, and data center architectures. I also contributed to developing network programmability solutions for various technology. I have an undergraduate degree in electronics and telecommunication engineering.

Research Projects

I'm working with Adobe Research as a Ph.D. Mentor on developing evaluation frameworks to quantify cost-awareness and resource eﬃciency in LLM-based planning algorithms.
I was funded by the Center for Science for AI Governance to develop theoretical foundations and empirical analyses to understand when and why LLM-generated text can be reliably detected.
I was funded by Army Research Oﬃce—DEVCOM Analysis Center to design a causal, simulation model as a testbed for logistic networks and developed surrogate models enabling eﬃcient what-if analysis for complex operational systems.
I'm also working with collaborators in the Dept. of Industrial Engineering to develop an algorithm to estimate the prevalence of multiple chronic conditions using maximum entropy, frequent itemset mining and machine learning!

News

(May 2026) I'll be attending ACIC 2026 to present my work on causal evaluation!
(April 2026) I'll be a panelist at the Voices of Data Science 2026!
(April 2026) I presented my work on Improving Generative Methods for Causal Evaluation via Simulation-Based Inference at CLeaR 2026!
(September 2025) I have successfully defended my thesis proposal titled The Interface of Simulation and Causal Modeling!
(January 2024) I wrapped up my residency at X, the moonshot factory.
(December 2023) Our paper was nominated as a finalist for the Best Contributed Theoretical Paper in the 2023 Winter Simulation Conference!
(December 2023) I'm participating in the Ph.D. Colloquium at the 2023 Winter Simulation Conference
(June 2023) Our paper was accepted to the 2023 Winter Simulation Conference!
(June 2023) I'm starting my Ph.D. residency at X, the moonshot factory
(May 2023) I'm have passed portfolio and am now a Ph.D. candidate!
(November 2020) My work on chronic disease modeling was recently presented at the 2020 INFORMS Annual Meeting.