Wednesday, May 31st at 11am.
Speaker: Graham McDonald, doctoral candidate, University of Glasgow, U.K.
Abstract: Freedom of Information (FOI) laws legislate that government documents should be opened to the public. However, many government documents contain sensitive information, such as personal information or information that would be likely to damage the international relations of countries if it were to be released. Therefore, sensitive information is exempt from released through FOI and all government documents must be sensitivity reviewed to ensure that any such sensitivities are not released to the public. With the emergence of born-digital government documents in recent decades, traditional (paper-based) sensitivity review processes are no longer viable, and there is a timely need for automatic sensitivity classification techniques to assist the sensitivity review of born-digital documents. In this talk, I will begin by providing an overview of some of the main characteristics of sensitive information and the associated challenges for automatic sensitivity classification, before presenting some of our recent work developing sensitivity classifiers that make use of syntactic evidence from sequences of Parts-of-Speech, and semantic evidence derived from word embeddings, to effectively classify personal information and international relations sensitivities in government documents.