Home | Research | Publications | Teaching | Curriculum Vitae | Links

From Data to Knowledge: Extracting and Utilizing Scholarly Knowledge Graphs

Knowledge bases today are central to the successful utilization of information available in the large and growing amounts of digital data on the Web. Such technologies have started to unleash a transformation of Web search from a keyword match to discovery, learning, and creativity, which are crucial to promoting the goal of knowledge discovery. Unfortunately, the search for information remains inherently difficult for significant portions of the Web such as the Scholarly Web, which contains many millions of scientific documents. For example, PubMed has over 20 million documents, whereas Google Scholar is estimated to have more than 100 million. Open-access digital libraries such as CiteSeerX, which acquire freely-available research articles from the Web, witness an increase in their document collections as well. Despite attractive advancements by scholarly search portals, semantic search technologies that “understand” complex concepts and their relations and can systematically satisfy users’ intricate information needs are yet to be investigated on the Scholarly Web. The goal of this project is to design solutions to make information more accessible and comprehensible to Scholarly Web users in particular, and Web users in general, and to help them discover knowledge more effectively and efficiently. The approach taken will be to develop an integrated framework, focusing on the extraction and utilization of scholarly knowledge graphs in online scholarly environments. Educationally, this work will involve: training of graduate, undergraduate, and high-school students, particularly encouraging the participation of women and underrepresented groups in the research efforts; curriculum development and integration of research into courses taught by the PI; exposure of students to industry and international experiences; and education for the general public.

The project will target the following research objectives: (1) explore the construction of scholarly knowledge graphs that combine evidence from multiple resources in an open information extraction framework; (2) design and develop novel algorithms for the detection and analysis of interesting and previously unknown connections between concepts, in order to enforce knowledge discovery on the Scholarly Web; and (3) investigate the utility of scholarly knowledge graphs in a question answering system. The results of this research will be integrated into the CiteSeerX digital library. The software, tools, and benchmark datasets, which will be developed during the course of this project will be made publicly available. All findings will be shared to the research community through publications in academic journals and presented in Information Retrieval, Text Mining and Natural Language Processing conferences. For further information, see the project web page: http://people.cs.ksu.edu/~ccaragea/skg.html.

Selected Publications related to Scientific Data Analysis

Other CiteSeerX Related Publications

Invited Talks and Presentations

International Summer Schools

Related Workshops

People

This research is supported by the National Science Foundation.