Research Summary
Modern science and technology have witnessed in the past decade an explosive growth of networked systems giving rise to a vast ocean of data with completely transformed scale, structure, and complexity. These networked data are voluminous, fine-grained, heterogeneous, dynamic, and, without loss of generality, can naturally be modeled and interpreted as
big graphs.
While these graphs have revolutionized the way we create, model, and disseminate complex networked data, their sheer sizes and unique complexity have rendered them extremely hard to manage, costly to ingest, and complicated to analyze.
My research centers around data and network sciences, database systems, data mining, and Big Data systems. Specifically, I have been striving to
(1) understand the theoretical foundations and algorithmic principles underpinning the generation, organization, function, and evolution of large-scale networked systems, and
(2) design and develop efficient, cost-effective, and scalable algorithms and computational tools for modeling, managing, querying, and mining big graphs in a wide spectrum of conceptual, physical, technological, ecological, and societal applications in today's networked world.
Selected Research Projects
Graph Query Modeling, Processing, and Optimization. The dominance of graphs
in real-world applications asks for efficient access methods and graph-aware querying functionalities that
enable exploration and utilization of big graphs in a way far beyond the conventional, SQL-based solutions.
At the core of many graph-based computational tasks lies a fundamental and critical graph querying primitive,
the objective of which is to search as output all the relevant (sub-)graphs, either exactly or approximately,
from big graphs for a user-specified, graph-structured query.
We have proposed a series of efficient and cost-effective graph indexing solutions
to support high-performance subgraph query processing and optimization in big graphs.
Related publications: VLDB'22, ICDE'19, ICDE'17, KDD'12, SIGMOD'11, VLDB'10, VLDB'07
Mining, Learning, and Summarizing Big Graphs.
In a wide array of graph mining, learning, and computational tasks, a key observation has been recognized that oftentimes, only a small fraction of graphs, or certain facets/dimensions of graph data, are mission-relevant. As a result, an exhaustive
exploration of entire graph data turns out to be overkill, thus inevitably incurring a large amount of wasteful computation.
To realize this vision, we have proposed a series of graph summarization methods to optimize graph mining and learning in real-world big graphs.
Related publications: VLDB'24, ICDE'20, VLDB'17, CIKM'15, CIKM'09
Scalable Computation in Dynamic Graph Streams.
Real-world big graphs are not static but dynamically evolving all the time in extremely fast speed. As a result, they are often modeled as graph streams where individual edges of the underlying big graph are received and updated in a form of a data stream. Due in particular to the dynamic and transient nature, we cannot explicitly materialize graph streams in memory, or even on disks, for repeated exploration and computation. We have proposed a series of cost-effective graph sketches, and constant-time or sub-linear approximation algorithms for efficient graph analytics with theoretically guaranteed accuracy and performance in big graph streams.
Related publications: ICDE'16a, ICDE'16b, VLDB'12