State of the art Plagiarism detection tools Contents - EAT

Transcription

State of the art Plagiarism detection tools Contents - EAT
MarkusPlagiat State of the art 11/02/2011 State of the art Plagiarism detection tools Contents 1. Introduction .................................................................................................................................... 2 2. Text plagiarism detection tools ....................................................................................................... 3 2.1 2.1.1 Online .............................................................................................................................. 3 2.1.2 Desktop ......................................................................................................................... 11 2.2 Online ............................................................................................................................ 15 2.2.2 Desktop ......................................................................................................................... 21 Free, Open Source ................................................................................................................. 23 Source code detection tools ......................................................................................................... 25 3.1 Paying .................................................................................................................................... 25 3.2 Free ....................................................................................................................................... 26 3.2.1 Online ............................................................................................................................ 26 3.2.2 Desktop ......................................................................................................................... 27 3.3 4. Free ....................................................................................................................................... 15 2.2.1 2.3 3. Paying ...................................................................................................................................... 3 Free, Open Source ................................................................................................................. 28 Summary table of plagiarism detection tools ............................................................................... 32 4.1 Text plagiarism detection tools ............................................................................................. 32 4.2 Source code plagiarism detection tools ................................................................................ 33 5. Conclusion ..................................................................................................................................... 34 6. Bibliography .................................................................................................................................. 35 MarkusPlagiat State of the art 11/02/2011 1. Introduction The purpose of this document is to present existing tools about plagiarism detection, within the framework of our preliminary searches for the MarkusPlagiat project (conception of a plagiarism detection module for MarkUs). The list of tools presented here is of course not complete, but was meant to be representative enough of the set of existing tools. First, this document presents plagiarism detection tools for text, and then for source code. For both, tools are sorted in three categories: paying, free1, and Open Source, and also whether they are online (solution through internet, with a web interface or a system of e-­‐mail sending) or stand alone desktop solutions (software that you install on your computer). 1
(in this document, the word “free” applied to an application has the meaning of “not paying”) MarkusPlagiat State of the art 11/02/2011 2. Text plagiarism detection tools 2.1
Paying 2.1.1 Online Ephorus Ephorus describes itself as the “market leader in Europe” in plagiarism detection software. It targets mainly schools in higher education, and it is not really for individuals. The cost is determined according to who (what school) you are. Ephorus is composed of three services: Ephorus Internet compares with documents on the Internet, Ephorus Group with documents of parallel student groups, and Ephorus Database with documents handed in before or at other educational institutes with an Ephorus account. Submissions are made and results are displayed through a web interface; the report contains the links of potentially plagiarized pages sorted by the probability of having been plagiarized. However Ephorus also offers to integrate their tools within the existing Intranet or virtual learning environment of schools. Homepage: http://www.ephorus.com/home MarkusPlagiat State of the art 11/02/2011 PlagiarismScanner PlagiarismScanner is a commercial online plagiarism-­‐detecting application which runs against Internet resources, that is websites, digital databases and online libraries such as Questia or ProQuest. Contrary to Ephorus, PlagiarismScanner is rather suited for individuals, since the cost is determined by a fixed rate depending on the number of words you want to scan. In practical terms, you just have to copy-­‐paste your text online, and after several minutes a report is shown, containing an overall originality rating, a color-­‐code indicating percentage of plagiarized materials in the submitted text, and a detailed list of plagiarism references. It also highlights plagiarized parts and indicates the original source the passage was taken from. PlagiarismScanner claims to be a unique service as it does not store the submitted documents, while scanning only against existing pages on the Internet. Homepage: http://www.plagiarismscanner.com/ MarkusPlagiat State of the art 11/02/2011 SafeAssign Contrary to the two previous tools, SafeAssign is a plagiarism prevention service which is not independent, but offered at no additional cost as a part of Blackboard products (Blackboard sells solutions in virtual learning environments). SafeAssign checks submitted works against Internet, an internal database (“with over 1,100 publication titles and about 2.6 million articles from '90s to present time updated weekly”), papers submitted to SafeAssign by users in their respective institutions, and papers volunteered by students from Blackboard client institutions to help prevent cross-­‐institutional plagiarism. The report contains links of plagiarized pages, highlights plagiarized parts of the submitted document and shows the original parts. It is interesting to see that companies offering solutions in virtual learning environments are more and more including plagiarism detection tools in their products: this surely reflects the growing concern of teachers about the phenomenon of plagiarism. Homepage: http://safeassign.com/ MarkusPlagiat State of the art 11/02/2011 Turnitin Turnitin is clearly one of the leaders in the market of plagiarism-­‐detecting tools, with clients in over 120 countries (according to them), and also a website which is one of the most complete and carefully produced among all tools presented in this document. First, Turnitin differs from some other plagiarism detection tools, because the analysis is made not only through searches on the internet, but also through searches into huge commercial databases such as newspapers or articles, or among other works previously submitted by students. As for the system, the user must first subscribe, then he sends his documents and the site sends him back a report. However Turnitin is above all designed for institutions, meaning it is compatible with most of well-­‐known learning management system and virtual learning environments such as Moodle or Blackboard, and it also offers a customized integration in other portals. When integrated in a virtual learning environment, Turnitin offers a full interface, which allows teachers to see a very detailed report (one of the main advantages of Turnitin) with percentages of similarities; the potentially plagiarized parts of the submitted document are highlighted, and when you get the cursor of your mouse over them, a bubble containing the plagiarized text is displayed over. Like MarkUs, it also allows teachers to comment, mark and grade submitted works. As for the cost, it is 100$ for an individual instructor; for institutions, the site license is based on the institution's size, number of subscribers, and its structure (approximately 4000$/year for unlimited reports). Homepage: http://turnitin.com/static/index.php MarkusPlagiat State of the art 11/02/2011 Urkund Urkund is an automated online plagiarism detection system. Its system is easier to use than previous ones, since the entire process is automated by e-­‐mail sending (no need to access to a site or login), and therefore it only requires that you know how to send and read e-­‐mails. The teacher must subscribe and create an account on the site; then students send directly their works by e-­‐mail on the account of their teacher. The documents are compared to the internet, to the database of Urkund containing archived students’ works, and to published materials (PrioInfo, the owner of Urkund, offers first and foremost a licensed e-­‐book platform to corporations, publishers and libraries as well as universities). Once the analysis complete, the results are automatically sent by e-­‐mail to the teacher, along with the original work of the student. Like Turnitin, Urkund also exists as a plug-­‐in to some learning management system such as Blackboard or Moodle. However Urkund is less known, and settled in fewer countries. Homepage: http://www.urkund.com/int/en/ MarkusPlagiat State of the art 11/02/2011 Noplagiat.com Noplagiat.com is a French online detection tool with a very simple interface. It can be interesting if you are looking for something very simple to use, or for an occasional use, with no subscription. The user sends his documents to the site through a form, then he launches the analysis and the engine checks for similarities with contents found on the internet or also in an internal database. One good point with this tool is that it takes into account the bibliography of the document to check, and also its references if they are mentioned. You can also deliberately ignore some sources in the analysis. At the end of the analysis, the site shows a report, containing the documents or internet sites which have been potentially plagiarized, with also a window which displays directly an extract from the detected sources. Moreover, the user has the possibility to sort these documents into files and to schedule analysis launches. Finally, the students can submit their works directly on the site by giving their teacher’s mail address, and then these documents will be available to the teacher for analysis and marking. This service costs between 1.20€ and 0.60€ per analysis, depending on how many analysis you order at the same time. Supported formats are Word, PDF and Text. Homepage: http://www.noplagiat.com/ MarkusPlagiat State of the art 11/02/2011 Compilatio.net Compilatio.net is an interesting anti-­‐plagiarism solution, since it offers a different point of view from other tools. Indeed, it is a set of two tools: -­‐
-­‐
Compilatio Magister, which is an online plagiarism detection tool for instructors. The user loads a document into his “workspace”, then he can launch an analysis and display results showing the percentage of similarity, with similar parts being highlighted, and similar sources are sorted by probability of plagiarism (supported formats are Text, PDF, Rich Form Text, Microsoft Office files, HTML, PHP and ASP). Compilatio Studium, which is the “same tool for students’ side”. Students can check the level of originality of their works with this tool. These two tools have basically the same functionalities; it is the interface which differs, since these tools don’t have the same purpose. Indeed, the teacher can easily see which parts have been plagiarized, and display side by side the original text and the plagiarized one. However, the student is encouraged to check by himself pages which are listed as having been plagiarized, in order to review his work. Moreover, the student’s view comes with messages to increase awareness about plagiarism. The purpose of this system is to sensitize students about the problem of plagiarism, but also to let students do themselves the task of detecting plagiarism. Indeed, an idea is to make students check their work with Compilatio Studium before sending it, and only after that, they would send their work along with the report generated by Compilatio Studium. The report could also act as evidence, in case of unintentional plagiarism. Homepage: http://www.compilatio.net/en/ MarkusPlagiat State of the art 11/02/2011 Pompotron.com Pompotron.com is a very simple web plagiarism detection tool, with no registration required. The analysis is conducted through three steps: -­‐
-­‐
-­‐
The user sends his document and the plagiarism detection is launched. Many formats are supported for the detection. Once the analysis done, the user gives his mail address and is directed to the payment step. The payment is done through Allopass, a system of payment by phone or text messages. The cost for one document is around 1.80€ by phone, 3€ by text message, and 2€ by PayPal. Finally, the user receives an e-­‐mail from the site, containing the probability rate that the document has been plagiarized, and also the potential references for the document. The main downside of Pompotron.com is that you can send only one document at the same time, and also its payment system which is not very practical. Actually, the site seems to be designed for students (that is what the message on the site says), and to be a partner or even a part of Compilatio.net. Homepage: http://www.pompotron.com/ MarkusPlagiat State of the art 11/02/2011 2.1.2 Desktop PlagiarismDetect PlagiarismDetect is first of all an online plagiarism detection tool, but users who subscribe an account can use for no additional fee the same tool as a plug-­‐in to MS Office 2007. Indeed, if a plagiarism detection tool is not integrated to a larger system which automates the submission of works (such as a learning management system, or MarkUs…), it can quickly become tiring for a grader to copy-­‐paste or upload works to check for plagiarism. The fact that PlagiarismDetect emphasizes this plug-­‐in reflects that the concept of saving time by doing the plagiarism-­‐detecting task directly in an editor is as interesting as an online solution for some people. However, it limits the format of documents to MS Word, which is not very practical. Homepage: http://www.plagiarismdetect.com/ MarkusPlagiat State of the art 11/02/2011 Plagiarism-­‐Detector Plagiarism-­‐Detector is a standalone computer desktop application for plagiarism detection, which runs only on Windows. The detection is made by dividing the text into sequences, which are then checked with search engines (Google, Yahoo…); therefore documents are checked only against internet. A license costs around 50$ for one computer. Plagiarism-­‐Detector targets mainly individuals. Depending on the version, it exists as a standalone application supporting Microsoft Office formats, Text, Rich Text Format and html, or also as a plug-­‐in to Microsoft Word or PowerPoint. Some elements of the interfaces are interesting. A “document similarity matrix”, which gives an overall view on the text and its plagiarized parts. The split screen is also interesting. Here, a quickly understandable summary of the results. Homepage: http://plagiarism-­‐detector.com/ MarkusPlagiat State of the art 11/02/2011 EVE2 This paying plagiarism detection software is exclusively for Windows and in English. The analysis is made through searches on the internet. Supported formats are Text, Microsoft Word and Corel Word Perfect. As for the report, the links of the pages that students may have plagiarized are listed and displayed to the teacher, and the report is rather full. This software costs 30$. According to the company which sells it, one good point with EVE2 is that it is cheaper than other common tools, and also faster (according to them, a typical time is a few minutes per 2500 words). However a downside is that it does not trace documents that are not in HTML format, and it is also not easily installed on server. Moreover, the interface is not very attractive (very simple, not colorful, no diagrams). Homepage: http://www.canexus.com/ MarkusPlagiat State of the art 11/02/2011 CopyCatch CopyCatch is a set of plagiarism detection software. The most interesting tools among them are: -­‐
CopyCatch Gold: this is the paying version, exclusively for Windows. It compares works to web resources. Accepted formats are Text, Word, Rich Text Format. The cost of a single user license for educational use is 250£ per year. -­‐
CopyCatch Investigator: this is the free version. One interesting thing with it is that it is programmed in Java and therefore multiplatform. These tools seemed very interesting, though it seems that they are not developed anymore. Several download links are available for CopyCatch Investigator through the web, but are not really reliable. Homepage: http://www.cflsoftware.com/ MarkusPlagiat 2.2
State of the art 11/02/2011 Free 2.2.1 Online Plagium Plagium is a very simple online plagiarism detection tool. You just have to paste your original text, and Plagium will search for redundancies over the web. There are many free, online tools, but most of them look like Plagium, meaning they are very simple, with just a copy-­‐paste system. However, a good point with Plagium is that it has an additional functionality which allows you to track the usage of a chunk of text or the content of a URL over time. It can be useful for people who are concerned that in the future somebody might steal their own text from a blog or Web site. “Plagium Alert” feature can keep you appraised of who might be using your text once it is made public. Concretely, you just have to enter your text or URL into Plagium, and then each week you will receive a notification of the status of the text or content usage. Homepage: http://www.plagium.com/ MarkusPlagiat State of the art 11/02/2011 Plagiarism Checker Here is another example of free online tool. There are really a lot of resembling tools, offering the same functionalities. Homepage: http://www.plagiarismchecker.com/ MarkusPlagiat State of the art 11/02/2011 SeeSources SeeSources.com is also an online plagiarism detection tool. It resembles to Plagium and many other free online tools, but here you can also load documents in MS Word, HTML and Text format. Moreover, according to the site, the detection process is more complex than other tools which simply divide the text into sequences and then check these with search engines (Google for most of them). Indeed, it “automatically extracts the unique signatures of a text and searches the Internet for them, efficiently detecting brazen plagiarism”. The process seems to be inspired by Zipf's law, which says that texts or even phrases have their own, retrievable signature. Broadly speaking, the law says that some words are found very often while most rarely occur (which is amplified for combinations). Unfortunately, SeeSources.com is no longer under development, since the team has created its successor, PlagScan, which is paying. Homepage: http://www.plagscan.com/seesources/ MarkusPlagiat State of the art 11/02/2011 Copyscape Copyscape is another variant of free online tool; nevertheless its particularity is that it is designed for checking plagiarism of web pages only (no direct pasting of text). The purpose is then a little bit different from the one in the educational context: here, the concern is about protecting one’s website against plagiarism. Moreover, an interesting functionality is that it offers free banners that you can put on your website, in order to protect it against plagiarism. One could say that it is not really useful, but the design of the banners is attractive. There is also a paying version, Copyscape Premium. This is something you often see with free online tools: the main tool is free, and it also offers a paying version with more functionalities. Homepage: http://www.copyscape.com/ MarkusPlagiat State of the art 11/02/2011 Plagiserve Plagiserve is another free online tool, but it differs from other tools, for several reasons. First, it has some downsides: -­‐
You must register, even if it is a free service. Then, you have to enter you user id each time you paste your text on the site to launch a detection process. This is one of the downsides of Plagiserve. -­‐ You have to wait for 12 hours (according to Plagiserve) to have a complete report. It is sent to you by e-­‐mail. However it has some interesting advantages over other tools: -­‐ It does not only search through HTML pages, but it can also detect documents that exist in PDF and postscript formats. -­‐ PlagiServe has a database of over 150,000 student essays, term papers and cliff notes, and they also send out Web robots to check “high risk” sites like Britannica.com, Refdesk.com and Encyclopedia.com for copied materials. The service is based in Ukraine. Homepage: http://www.plagiserve.com/ MarkusPlagiat State of the art 11/02/2011 DupliChecker DupliChecker is another online tool which resembles to other ones, but it has an additional functionality which allows you to choose the search engine that will be used to check for plagiarism over the web. It can be interesting, since it gives the user more freedom. Indeed, here the user knows exactly how the detection process is made. DupliChecker just automates a process that the user could do himself. Homepage: http://www.duplichecker.com/ MarkusPlagiat State of the art 11/02/2011 2.2.2 Desktop Viper Viper is a free plagiarism detection software, exclusively for Windows and in English. According to them, it scans over 10 billion online sources including websites, online journals, news sources... Homepage: http://www.scanmyessay.com/ Here are some screenshots of the interface: MarkusPlagiat State of the art 11/02/2011 Le Petit Renifleur, Le renifleur Archiviste, Le renifleur Web There is a free compilation of three plagiarism detection software, created by an amateur programmer. These three software accept and work with all files with Microsoft Office format (Word, Excel, PowerPoint...). Le Petit Renifleur allows to recover various information about local files: content, date of creation and/or modification, size of the file. Le Renifleur Archiviste allows the comparison of local files by analyzing the content. Le Renifleur Web compares local files with files found on the internet. Unfortunately, it seems that this project is not anymore under development. This is something which often seems to happen with free software. MarkusPlagiat 2.3
State of the art 11/02/2011 Free, Open Source WCopyfind This program is a free plagiarism detection software, created by Louis A. Bloomfield, a professor of the University of Virginia, and distributed under the GNU GPL license. It is written in C++, and the source code is available on his website (the last version is of October, 2009). This program examines a collection of document files. It extracts the text portions of those documents and looks through them for matching words in phrases of a specified minimum length. When it finds two files that share enough words, WCopyfind generates HTML report files. These reports contain the document text with the matching phrases underlined. Supported formats are Text, HTML, and Microsoft Word (only the old .doc format, and not the .docx format, though it may have been improved by now). One downside is that WCopyfind cannot handle PDF format, which is not very practical since many students submit their report as PDF files today. Moreover, WCopyfind cannot search the web to find matching documents. The user must specify which documents it compares. These documents can be local ones, or web-­‐resident documents that are pointed to by local internet shortcuts. If you suspect that a particular web page has been copied, you must create an internet shortcut to that page and include this shortcut in the collection of documents that you give to WCopyfind. One advantage of WCopyfind is that it can compare several documents at the same time. It can then indicate if one file is a copy of another file, or if they are both copies of a third document. Homepage: http://plagiarism.phys.virginia.edu/Wsoftware.html MarkusPlagiat State of the art 11/02/2011 CopyTracker CopyTracker is a plagiarism detection software developed by a team at the Ecole Centrale de Lilles, and distributed under a General Public License. CopyTracker allows the comparison of a file to other input files (system of comparison of texts) or a database (a personal database for example), or also sources in the internet. Supported formats are PDF, Microsoft Word and HTML. It generates an analytical report showing a percentage of plagiarism, which is practical and quick for analysis. It also shows the number of common sentences found between several files. The program also generates a PDF file of the tested document, with suspects passages highlighted. CopyTracker also allows managing files imported into the software. This software is currently available in English or French. The major downside is that apparently, CopyTracker is not anymore under development. Indeed, it seems that another team has taken over the project, and has developed an online version of the tool. Homepage: http://copytracker.ec-­‐lille.fr/ MarkusPlagiat State of the art 11/02/2011 3. Source code detection tools 3.1
Paying CodeMatch CodeMatch is a commercial source-­‐code plagiarism detector. It compares every file in one directory with every file in another directory, in order to determine which files are the most correlated. Then, it produces a database that can be exported to an HTML report, which lists the most highly correlated pairs of files. It also generates a detailed report for each pair, showing the specific items in the files (statements, comments, identifiers, or instruction sequences) that caused the high correlation. One advantage of CodeMatch is its impressive number of supported languages, compared to other free tools. Here is the list of supported languages: ABAP, ASM-­‐M68k, BASIC, C, C++, C#, Delphi, Flash ActionScript, Fortran, FoxPro, Java, JavaScript, LISP, MASM, MATLAB, Pascal, Perl, PHP, PowerBuilder, Python, RealBasic, Ruby, SQL, Verilog, VHDL, Visual Basic. CodeMatch has also some additional functionalities, which allow finding open source code within proprietary code, determining common authorship of two different programs, or discovering common, standard algorithms within different programs. Here is a screenshot of the interface. CodeMatch is in fact a part of CodeSuite, a suite of different tools for detecting plagiarism, pinpointing copyright infringement, highlighting trade secret theft, measuring intellectual property, and tracking software development changes through numerous revisions. One of the tools in this suite, BitMatch, is interesting because it allows comparing a source code to a suspect binary object code, which is useful if you do not have the source code of a program that you suspect. Homepage: http://www.safe-­‐corp.biz/ MarkusPlagiat 3.2
State of the art 11/02/2011 Free 3.2.1 Online SID SID is an online application, which detects similarity between programs by computing the shared information between them. It was originally an algorithm developed for comparing how similar or dissimilar genomes are. It was then realized that this algorithm could be extended to many other applications including finding chain letter history and detecting plagiarism. SID currently supports Java and C++ source code. To use the service, you must create an account. Then you can submit files in a formatted zip file. Once the comparison finished, you can either receive an e-­‐mail notification with a link to the page with your results as soon as the results are ready, or go to the results page directly. This page contains information about previous submissions, status of currently processing submissions, and self explanatory results for each submission. The detection seems to use a simple algorithm, based on Kolmogorov complexities. Homepage: http://genome.math.uwaterloo.ca/SID/ Moss Moss is an automatic system for determining the similarity of programs and detecting plagiarism in programming classes. Moss is a free service but the users must create an account. Moss is being provided as an Internet service. The current submission script is for Linux. In response to a query the Moss server produces HTML pages listing pairs of programs with similar code. Moss also highlights individual passages in programs that appear the same, making it easy to quickly compare the files. Finally, Moss can automatically eliminate matches to code that one expects to be shared (libraries or instructor-­‐supplied code), thereby eliminating false positives that arise from legitimate sharing of code. Moss supports a large number of languages: C, C++, Java, C#, Python, Visual Basic, Javascript, FORTRAN, ML, Haskell, Lisp, Scheme, Pascal, Modula2, Ada, Perl, TCL, Matlab, VHDL, Verilog, Spice, MIPS assembly, a8086 assembly, a8086 assembly, MIPS assembly, HCL2. One concern with online tools is that submitted works could be retained and reused, though Moss guaranties that results are retained for 10 days, and then everything is deleted. The current script is available here: http://moss.stanford.edu/general/scripts/mossnet Homepage: http://theory.stanford.edu/~aiken/moss/ MarkusPlagiat State of the art 11/02/2011 3.2.2 Desktop SIM SIM is a plagiarism-­‐detecting software developed by a teacher of the Vrije University in Amsterdam. It tests lexical similarity in source codes in C, Java, Pascal, Modula-­‐2, Lisp, Miranda, and natural language. It can be used to detect potentially duplicated code fragments in large software projects, in program text, in shell scripts and in documentation. The source of this program is available (written in C), but unfortunately the license is not open. Homepage: http://www.cs.vu.nl/~dick/sim.html JPlag JPlag is a system that finds similarities among multiple sets of source code files. It currently supports Java, C#, C, C++, Scheme, and natural language text. JPlag uses a variation of the Karp-­‐Rabin comparison algorithm. Instructors can use JPlag for free, but they must first set up an account, in order to prevent unauthorized use by students, according to them. JPlag was developed by the Department of Informatics of the University of Karlsruhe. JPlag can compare two directories, or two files between them. It displays results in HTML Format, showing histograms of similarity values found for all pairs of programs, similar pairs and their similarity values. Similar lines are matched with the same color. One downside is that apparently, it seems that sometimes JPlag is unable to parse some files; in this case, the concerned file is not taken into account in the detection process. Homepage: https://www.ipd.uni-­‐karlsruhe.de/jplag/ MarkusPlagiat 3.3
State of the art 11/02/2011 Free, Open Source AC AC is an anti-­‐plagiarism system for programming assignments. It currently supports programs written in C, C++ or Java. AC incorporates multiple similarity detection algorithms found in the scientific literature, and allows their results to be visualized graphically. It is distributed under the General Public License. AC has a quite complete interface, with a sort of file manager. The user can compare a large number of files between them. Results are displayed in individual histograms, population histogram, and graphs. One good point with AC is that it provides a large number of test files in Java and C. This is really useful to test a plagiarism-­‐detecting tool. Here are some examples of the representation of results in AC: Homepage: http://tangow.ii.uam.es/ac/ MarkusPlagiat State of the art 11/02/2011 Sherlock Sherlock is a program which finds similarities between textual documents. It works on text files, as well as source code files. It uses digital signatures to find similar pieces of text. A digital signature is a number which is formed by turning several words in the input into a series of bits and joining those bits into a number. Sherlock was developed at Warwick University’s Computer Science Department. The user starts the plagiarism detection process by selecting the directory in which the files to be compared reside. Each suspicious section is clearly marked and the user can view the original and pre-­‐processed files simultaneously. Sherlock also allows the user to view the tokenized file. A very useful feature in Sherlock is that it has the option for the academic to mark the viewed files as suspicious or innocent. Sherlock also has a very interesting representation of results, as showed below. Each node represents one submission and each edge represents at least one match between two submissions. The color of the connecting edges is related to the summed percentage figure over all the matches found for the pair of files being examined. Homepage: http://sydney.edu.au/engineering/it/~scilect/sherlock/ MarkusPlagiat State of the art 11/02/2011 Baldr Baldr is a source code plagiarism-­‐detecting software. It has been programmed in Java and therefore it allows a multiplatform use. This software compares source codes of a large number of files. One of the main advantages of Baldr is its very complete report: display of histograms, matrix of comparison and the potentially plagiarized zones in the code are highlighted. Here is a matrix of comparison between all analyzed files: Here, two correlated source codes are displayed side by side with similar parts being highlighted: Homepage: http://labs.esiea.fr/2007/10/11/baldr/?lang=en MarkusPlagiat State of the art 11/02/2011 Plaggie Plaggie is a stand-­‐alone source code plagiarism detection engine purposed for Java programming exercises. It can compare two directories containing several files, or two files. It generates a report in HTML format, showing percentage of similarity values between projects. Homepage: http://www.cs.hut.fi/Software/Plaggie/ PMD The PMD open source tool provides a Copy/Paste Detector (CPD) for finding duplicate code. CPD uses the Karp-­‐Rabin string matching algorithm. It works with Java, JSP, C, C++, Fortan and PHP code. It also provides guidance on how to add other programming languages to the tool. Unlike other tools, PMD is not specifically aimed at detecting similarities in students’ work but works well in doing so. The developers of PMD provide good support and documentation for this tool. Because it is a duplicate code detector, this tool scans the files themselves for duplicate code, hence it returns similar code found within the same file. However, it is also successful in returning similar code across different files and can be used as a tool for detecting similarity in source-­‐code files. MarkusPlagiat State of the art 11/02/2011 4. Summary table of plagiarism detection tools 4.1
Text plagiarism detection tools Paying Free Online Desktop x x x x x x x x x Open Source Ephorus PlagiarismScanner SafeAssign Turnitin Urkund Noplagiat.com Compilatio.net Pompotron.com PlagiarismDetect Plagiarism-­‐Detector EVE2 CopyCatch Plagium Plagiarism Checker Copyscape SeeSources Plagiserve DupliChecker Viper Le Petit Renifleur, Le renifleur Archiviste, Le renifleur Web JPlag WCopyfind CopyTracker x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x MarkusPlagiat 4.2
State of the art Source code plagiarism detection tools Paying Free Open Online Source Desktop Supported languages x BASIC, C, C++, C#, Delphi, Flash ActionScript, Java, JavaScript, MASM, Pascal, Perl, PHP, PowerBuilder, Ruby, SQL, Verilog, VHDL… Java, C++ C, C++, Java, Pascal, Python, Ada, ML, Lisp… C, Java, Pascal, Modula-­‐2, Lisp, Miranda Java, C#, C, C++ C, C++, Java Java Java Java, JSP, C, C++, Fortran, PHP ? CodeMatch x SID Moss x x x x SIM x x JPlag AC Plaggie Sherlock PMD Baldr x x x x x x x x x x x x x x x x x 11/02/2011 MarkusPlagiat State of the art 11/02/2011 5. Conclusion After having listed a large number of tools, we can see some general tendencies. First, there are two types of tools: the ones designed for individuals (simple free online tools for text plagiarism, with just a copy-­‐paste system, or stand alone desktop solutions) and those designed for institutions (commercial online tools, which can be integrated to a virtual learning environment). There are also several categories of users for plagiarism-­‐detecting tools. Indeed, most of the users are instructors, but some tools are specifically designed for students, and some others for publishers, or authors of a website. We can also see a recurring problem with online tools: many users are concerned that websites could keep a track of works they submit or upload, and use them against their will, or use them in order to constitute an internal database. That is why some people prefer using a software installed on their computer. Moreover, it often makes individuals save time to do the anti-­‐plagiarism task locally. Another tendency that we can see is that there is a real market over anti-­‐plagiarism tools. Indeed, for text plagiarism, there are many commercial tools, and many of them are available only on Windows. However, there is also some problems with free tools: indeed, it often happens that when we find a promising free tool, it is not anymore under development. Moreover, there are not many free software for text plagiarism detection (most of free tools are online). Finally, concerning source code plagiarism detecting tools, we can notice that the free ones are often developed by teachers in universities, or a little team of the Computer Science Department of an university. This shows that there could be a lack in this domain, and that makes the MarkUs Plagiat project all the more interesting. MarkusPlagiat State of the art 11/02/2011 6. Bibliography -­‐
-­‐
-­‐
-­‐
-­‐
-­‐
-­‐
Here is a very interesting site which contains information and resources on plagiarism and plagiarism detection tools, along with feature/performance comparisons and demonstrations: http://www.ics.heacademy.ac.uk/resources/assessment/plagiarism/index.html A comparison of plagiarism detection tools, Jurriaan Hage, Peter Rademaker, Nikè van Vugt, Department of Information and Computing Sciences, Utrecht University, The Netherlands, June 2010 A very interesting report about plagiarism detection tools, which also gives many technical information about plagiarism detection techniques and software: La détection de plagiat, Mémoire final, Jeremy Mourguet, Guillaume Kraemer, Frederic Gallion, Mathieu Picaud, 26 Janvier 2009, Université Bordeaux 1 An interesting review of many plagiarism detection tools, with many links: http://my.fit.edu/~jbarlow/HBB/PlagiarismStuff/PlagiarismDetectionTools.htm An interesting website created by a professor of the University of Virginia, providing general information about plagiarism, a list of tools, but also the source code : http://plagiarism.phys.virginia.edu/ Wikipedia Google 

Documents pareils

Dealing with plagiarism in the digital age Introduction

Dealing with plagiarism in the digital age Introduction We were also interested in studies where electronic plagiarism detection systems had been used with students to improve academic writing. Plagiarism detection has been used by instructors in a vari...

Plus en détail