A Conceptual-Modeling Approach to Data Extraction

Abstract: For more than a decade we have been studying ways that conceptual modeling can improve our ability to extract semantic data from unstructured and semi-structured information sources such as web pages and historical documents. We start with a data-extraction ontology that comprises a conceptual model together with data frames describing concepts of interest that may be found in an information source. Data-extraction ontologies provide significant advantages over traditional extraction techniques because they give relatively high precision and recall and are robust to changes in the information source structure. However, these benefits come at a price: extraction ontologies require significant expertise to construct, and they apply to a relatively narrow domain of interest. In this lecture I will review our past contributions with a special emphasis on the research techniques we have applied, and I will also describe current areas of research and directions for future experimentation.

CV: Dr. Stephen W. Liddle is academic director of the Kevin and Debra Rollins Center for Entrepreneurship and Technology at Brigham Young University and professor of Information Systems at the Marriott School of Management. Dr. Liddle teaches mobile app development and information systems analysis. Liddle has been a member of BYU’s business school faculty since 1995, after receiving his PhD in Computer Science from BYU. He has been active in the conceptual modeling community for two decades, and currently serves as treasurer of the steering committee for the ER Conference. Liddle’s research interests include conceptual modeling, software engineering environments and tools, data extraction, and e-business. He is particularly interested in mobile application development and applications of conceptual modeling, such as the use of ontologies in data extraction. His work has appeared in journals such as Data and Knowledge Engineering and the Annals of Operations Research, and in respected conferences such as the ER Conference and CIKM, among others. Besides authoring or co-authoring more than 50 refereed academic papers, Liddle is editor of numerous conference and workshop proceedings, and is co-author of the book E-Business: Principles and Strategies for Accountants. Liddle is a member of several advisory boards for tech startups in Utah, and has considerable experience in software development.

