A GUI app to find social media information about any person on the world and put it into a neat PDF report.
As you are reading this text, I can safely assume you're willing to find social media information about some person.
I have several questions for you. What if you could...
- spend only 20-40 secs to retrieve your information from Twitter, Instagram, LinkedIn and Google Search;
- get the information you need as you know there're pretty advanced sorting algorithms;
- get the information in a neat PDF file with according links and pictures
Sounds like a miracle, but that's exactly why we are here for you: to get, analyze and visualize the information you need.
We all like to see in action, so check out the file here, if you want.
Clone this repository on your PC using git.
git clone https://github.com/pandrey2003/dossier-builder.git
All this beauty takes several steps...
- Create the
.env
file into theapp/backend/scraping
directory with the following credentials:
GOOGLE_DEVELOPER_KEY=
GOOGLE_CSE_ID=
IPSTACK_API_KEY=
LINKEDIN_LOGIN=
LINKEDIN_PASSWORD=
INSTAGRAM_LOGIN=
INSTAGRAM_PASSWORD=
TWITTER_API_KEY=
TWITTER_API_SECRET=
TWITTER_ACCESS_TOKEN=
TWITTER_ACCESS_SECRET=
Explanation on the provided credentials can be read on the bottom of the README. The setup happens only once.
Note: the caching algorithm has been enabled for Instagram interaction, which allows you to renew your cache settings only once per 2 months. LinkedIn throttling limit is 900 API calls/hour.
- Install all the dependencies from
Pipfile
andPipfile.lock
using pipenv:
pipenv install
- Activate the
pipenv
environment:
pipenv shell
- Run the
run.py
file (opens GUI). - Enter as many fields about the desired person as possible (below you can see more explanations if needed).
- Choose the PDF output directory by clicking the according button.
7. Click the submit button and observe the progress bar (normally takes 20-40 seconds to scrape, filter and visualize the data about the desired person).
Take a look at an example of finding information about Sergey Bubka (YouTube link):
The initial GUI window looks like this:
However, you may get confused about what you should write in each field, see the explanation on the bottom of the README.
GOOGLE_DEVELOPER_KEY
is your API key from the Google Developers platform.GOOGLE_CSE_ID
is your Google Custom Search Engine ID (you have to set it up to search the info all around the web).IPSTACK_API_KEY
is your API key from ipstack. If you do not have it, this is a 2-minute procedure.LINKEDIN_LOGIN
andLINKEDIN_PASSWORD
are the login and the password to your LinkedIn profile (no API-related credentials needed).INSTAGRAM_LOGIN
andINSTAGRAM_PASSWORD
are the login and the password to your Instagram profile (no API-related credentials needed).- For the following Twitter credentials, you have to create an app at Twitter Developers Portal. After this, you get
TWITTER_API_KEY
andTWITTER_API_SECRET
from your app page. Your access token and access token secret can be received using thetweepy
library. In case you do not know how to get it, watch this tutorial up to 12:45 minutes. The access token and the access token secret are permanent, so this set up happens only once.
- The field 1 - an ordinary input field, look at the label on the left to know which information you should enter. Fields "First name", "Last name" and "Location" are very recommended to be filled.
- The field 2 - the additional information selector (used for searching on Google Search), the field 3 - the additional information input. To put it simple for 2 and 3, let's say you want to find the profile pandrey2003 on GitHub. In this case, you write selector, "GitHub", into the field 2 and the profile name, "pandrey2003", into the field 3. Note: fields 2 and 3 are totally optional.
- The button 4 is used to choose the PDF output directory on your PC. Mandatory: visualization is an essential logical part of the app.
- The button 5 sends all your input data and the output directory to the logical part of the project. Press on it when you are sure you have entered all the necessary information.
- The progress bar 6 reflects the progress of the logical part of the project (no your interaction, just to see the progress). 2% means scraping has already started, 60% means scraping has been done and your data is being analyzed, 75% indicates analysis has been done and the data is being visualized, 100% - you can see the PDF file in the requested directory.
This project has been successfully defended as a scientific one and received the 2nd absolute place in Ukraine. Link to the work. All those wins would have been impossible but for the people who had helped me.