Skip to content

Added a new project (Plagiarism_checker) #210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
156 changes: 156 additions & 0 deletions P/019_Plagiarism_Checker.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<small><small><i>\n",
"All the IPython Notebooks in **Python Mini-Projects** series by Dr. Milaan Parmar are available @ **[GitHub](https://github.com/milaan9/91_Python_Mini_Projects)**\n",
"</i></small></small>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Python Program to Check Plagiarism "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2021-10-05T13:06:11.749110Z",
"start_time": "2021-10-05T13:06:09.467318Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Similarity data:\n",
" ('Ben.txt', 'Clark.txt', 0.408904884400347)\n",
"Similarity data:\n",
" ('Arthur.txt', 'Clark.txt', 0.5430431121089816)\n",
"Similarity data:\n",
" ('Arthur.txt', 'Ben.txt', 0.4595329317649595)\n"
]
}
],
"source": [
"'''\n",
"Python Program to Check Plagiarism \n",
"'''\n",
"\n",
"# Import necessary modules!\n",
"import os\n",
"from sklearn.feature_extraction.text import TfidfVectorizer\n",
"from sklearn.metrics.pairwise import cosine_similarity\n",
"\n",
"student_files = [doc for doc in os.listdir() if doc.endswith('.txt')]\n",
"student_notes = [open(_file, encoding='utf-8').read()\n",
" for _file in student_files]\n",
"\n",
"\n",
"def vectorize(Text): return TfidfVectorizer().fit_transform(Text).toarray()\n",
"def similarity(doc1, doc2): return cosine_similarity([doc1, doc2])\n",
"\n",
"\n",
"vectors = vectorize(student_notes)\n",
"s_vectors = list(zip(student_files, vectors))\n",
"plagiarism_results = set()\n",
"\n",
"\n",
"def check_plagiarism():\n",
" global s_vectors\n",
" for student_a, text_vector_a in s_vectors:\n",
" new_vectors = s_vectors.copy()\n",
" current_index = new_vectors.index((student_a, text_vector_a))\n",
" del new_vectors[current_index]\n",
" for student_b, text_vector_b in new_vectors:\n",
" sim_score = similarity(text_vector_a, text_vector_b)[0][1]\n",
" student_pair = sorted((student_a, student_b))\n",
" score = (student_pair[0], student_pair[1], sim_score)\n",
" plagiarism_results.add(score)\n",
" return plagiarism_results\n",
"\n",
"\n",
"for data in check_plagiarism():\n",
" print(\"Similarity data:\\n\", data) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}
1 change: 1 addition & 0 deletions P/Arthur.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Success can mean a variety of different things. Success is, quite simply, the accomplishment of a predetermined goal. To some people it could mean making money, cultivate and develop certain basic qualities, to others it could mean keeping everyone happy, but to me, it means achieving the goals and objective I have set for myself for my life. Besides working on your goals that would lead a person towards success it is very important to push your limit every day, take charge of your life, and keep learning. This experience enables us to think smartly to solve a critical problem and achieve success. It is very important to take care of your mind which could be done by eliminating negative thoughts and negative people from your life. I think in order to call something successful, both the result and the process should be great. Without success, you, the group, your company, your goals, dreams and even entire civilizations cease to survive.
1 change: 1 addition & 0 deletions P/Ben.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
In order to be successful one needs cultivate and develop certain basic qualities. Success is, quite simply, the accomplishment of a predetermined goal. We consciously or subconsciously set goals for ourselves all the time. First of all, you must know aim and objective of your life. Unless you know your destination, you cannot set out on a journey. At first, you must by very clear in your objectives to be achieved. Finally, you should enjoy the overall process rather than the final outcome. There is no shortcut to success. Hard work is the only key to achieving it; it teaches us discipline, dedication and determination. There is no single right way to be successful. What works for you might not work for someone else.
1 change: 1 addition & 0 deletions P/Clark.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Success is, quite simply, the accomplishment of a predetermined goal. Success is considered to be a term that describes two things. It also the combination of variety of different things. The first one is achievement of a certain major or minor goal. This could be succeeding in making a delicious dinner, or a more global thing succeeding in a career or job. The second definition of success is more broad and subjective. Success provides confidence, security, a sense of well-being, the ability to contribute at a greater level, hope and leadership. This experience enables us to think smartly to solve a critical problem and achieve success. Without success, you, the group, your company, your goals, dreams and even entire civilizations cease to survive.
96 changes: 96 additions & 0 deletions P/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
<p align="center">
<a href="https://github.com/milaan9"><img src="https://img.shields.io/static/v1?logo=github&label=maintainer&message=milaan9&color=ff3300" alt="Last Commit"/></a>
<a href="https://hits.seeyoufarm.com"><img src="https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Fmilaan9%2Ftree/main/019_Plagiarism_Checker&count_bg=%231DC92C&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=views&edge_flat=false"/></a>
</p>
<!--<img src="https://badges.pufler.dev/contributors/milaan9/01_Python_Introduction?size=50&padding=5&bots=true" alt="milaan9"/>-->


# Plagiarism Checker

In this class, you'll learn how to check similarity between text (.txt) documents using cosine similarity

In order to compute the simlilarity between on two text documents, the textual raw data is transformed into vectors ➡ **arrays of numbers** and then from that we are going to use a basic knowledge vector to compute the the similarity between them.


<p align="center">
<img src="output.png" width="500"/>
</p>

## Prerequisites:

1. <b> Python Basics </b>
2. <b> sklearn module </b>

---

## Install Necessary Modules:

Open your [![Anaconda](https://img.shields.io/badge/Anaconda-342B029.svg?&style=flate&logo=anaconda&logoColor=white)](https://www.anaconda.com/products/individual) Prompt <img alt="propmt" src="https://img.shields.io/badge/-__-000000?style=flat-square&logo=Plex&logoColor=white"> and type and run the following command (individually):

- pip install sklearn

**[`Sklearn`](https://scikit-learn.org/)** (Scikit-learn) is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction via a consistence interface in Python. This library, which is largely written in Python, is built upon NumPy, SciPy and Matplotlib.

Once Installed now we can import it inside our python code.

---

## Frequently asked questions ❔

### How can I thank you for writing and sharing this tutorial? 🌷

You can <img src="https://img.shields.io/static/v1?label=%E2%AD%90 Star &message=if%20useful&style=style=flat&color=blue" alt="Star Badge"/> and <img src="https://img.shields.io/static/v1?label=%E2%B5%96 Fork &message=if%20useful&style=style=flat&color=blue" alt="Fork Badge"/> Starring and Forking is free for you, but it tells me and other people that it was helpful and you like this tutorial.

Go [**`here`**](https://github.com/milaan9/91_Python_Mini_Projects) if you aren't here already and click ➞ **`✰ Star`** and **`ⵖ Fork`** button in the top right corner. You will be asked to create a GitHub account if you don't already have one.

---

### How can I read this tutorial without an Internet connection? <img alt="GIF" src="https://github.com/TheDudeThatCode/TheDudeThatCode/blob/master/Assets/hmm.gif" width="20vw" />

1. Go [**`here`**](https://github.com/milaan9/91_Python_Mini_Projects) and click the big green ➞ **`Code`** button in the top right of the page, then click ➞ [**`Download ZIP`**](https://github.com/milaan9/91_Python_Mini_Projects/archive/refs/heads/main.zip).

![Download ZIP](https://github.com/milaan9/91_Python_Mini_Projects/blob/main/img/dnld_rep.png)

2. Extract the ZIP and open it. Unfortunately I don't have any more specific instructions because how exactly this is done depends on which operating system you run.

3. Launch ipython notebook from the folder which contains the notebooks. Open each one of them

`Kernel > Restart & Clear Output`

This will clear all the outputs and now you can understand each statement and learn interactively.

If you have git and you know how to use it, you can also clone the repository instead of downloading a zip and extracting it. An advantage with doing it this way is that you don't need to download the whole tutorial again to get the latest version of it, all you need to do is to pull with git and run ipython notebook again.

---

## Authors ✍️

I'm Dr. Milaan Parmar and I have written this tutorial. If you think you can add/correct/edit and enhance this tutorial you are most welcome🙏

See [github's contributors page](https://github.com/milaan9/91_Python_Mini_Projects/graphs/contributors) for details.

If you have trouble with this tutorial please tell me about it by [Create an issue on GitHub](https://github.com/milaan9/91_Python_Mini_Projects/issues/new). and I'll make this tutorial better. This is probably the best choice if you had trouble following the tutorial, and something in it should be explained better. You will be asked to create a GitHub account if you don't already have one.

If you like this tutorial, please [give it a ⭐ star](https://github.com/milaan9/91_Python_Mini_Projects).

---

## Licence 📜

You may use this tutorial freely at your own risk. See [LICENSE](https://github.com/milaan9/91_Python_Mini_Projects/blob/main/LICENSE).

Copyright (c) 2020 Dr. Milaan Parmar

---

<div align="center">
<h3> Connect with me<a href="https://gifyu.com/image/Zy2f"><img src="https://github.com/milaan9/milaan9/blob/main/Handshake.gif" width="50px"></a>
</h3>
<p align="center">
<a href="https://www.linkedin.com/in/milaanparmar" target="_blank"><img alt="LinkedIn" width="25px" src="https://github.com/TheDudeThatCode/TheDudeThatCode/blob/master/Assets/Linkedin.svg"></a>
<a href="https://www.instagram.com/milaanparmar9" target="_blank"><img alt="Instagram" width="25px" src="https://github.com/TheDudeThatCode/TheDudeThatCode/blob/master/Assets/Instagram.svg"></a>
<a href="https://www.facebook.com/milaanparmar" target="_blank"><img alt="Facebook" width="25px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg"></a>
<a href="mailto:milaanparmar9@gmail.com" target="_blank"><img alt="Gmail" width="25px" src="https://github.com/TheDudeThatCode/TheDudeThatCode/blob/master/Assets/Gmail.svg"></a>
</p>


Binary file added P/output.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.