Abstract
Digital documents are vulnerable to being copied. Most existing copy detection prototypes employ an exhaustive sentence-based comparison method in comparing a potential plagiarized document against a repository of legal or original documents to identify plagiarism activities. This approach is not scalable due to the potentially large number of original documents and the large number of sentences in each document. Furthermore, the security level of existing mechanisms is quite weak; a plagiarized document could simply by-pass the detection mechanisms by performing a minor modification on each sentence. In this paper, we propose a copy detection mechanism that will eliminate unnecessary comparisons. This is based on the observation that comparisons between two documents addressing different subjects are not necessary. We describe the design and implementation of our experimental prototype called CHECK. The results of some exploratory experiments will be illustrated and the security level of our mechanism will be discussed.
Original language | English |
---|---|
Title of host publication | Proceedings of the 1997 ACM Symposium on Applied Computing, SAC 1997 |
Publisher | Association for Computing Machinery |
Pages | 70-77 |
Number of pages | 8 |
ISBN (Print) | 0897918509, 9780897918503 |
DOIs | |
Publication status | Published - 1 Jan 1997 |
Event | 1997 ACM Symposium on Applied Computing, SAC 1997 - San Jose, CA, United States Duration: 28 Feb 1997 → 1 Mar 1997 |
Conference
Conference | 1997 ACM Symposium on Applied Computing, SAC 1997 |
---|---|
Country/Territory | United States |
City | San Jose, CA |
Period | 28/02/97 → 1/03/97 |
Keywords
- Copy detection
- Digital libraries
- Document plagiarism
- Information retrieval
ASJC Scopus subject areas
- Software