I am Yi Zhang, a 6th year Ph.D. student at Uiversity of Pennylvania, advised by Prof. Zack Ives and Prof. Dan Roth. My research interests lie broadly in natural lanaguge processing and data management systems.
During my Ph.D. study at Penn, my research is mostly about helping people understand natural lanaguge claims, data analytics-driven reports and their interactions, by exploring and reasoning provenance-based contextual information.
For a natural lanaguge claim, we propose ''claim provenance'' to describe where a claim may come from and explain how it may have been derived. We infer it by developing novel information extraction, text generation and reasoning techniques.
For a data analytics-driven report with tables or visualizations, we build a search platform over data in a ''data lake'', which finds relevant supplementary (joinable or unionable) data in an interactive speed. Our tool provides additional data as well as provenance information (e.g., workflows), to help user assess whether the given data was ''cherry picked'' or representative, and thus whether it supports specific claims or conclusions. Meanwhile, our tool also recommends data scientists related data analytics pipelines to probe the conclusions based on the given data.
To bridge the two contexts, we further explore data provenance for quantitative natural language claims. Typically, quantitative claims are compositional queries with discrete computations on specific tables, while the claim may not explicitly specify which operators, columns or rows are involved in the computation. Therefore, it is difficult to match the claims to the tables directly. Instead, I propose to generate latent programs for a given quantitative claim and reason potential alignments between the generated variables in the latent program to the columns and rows in candidate tables to better support the search for the right source tables.
I dream of
- always finding inspiration.