I had this idea when I was working on The Selenium Project, where I made small bots to automate daily tasks, but that and this collecting Fake Profiles on LinkedIn are totally unrelated. I just had an idea and thought why not see if it really works but there were several problems and in this post, I’ll explain everything from my idea to the code.
My Selenium Project
When I started to learn and experiment python, I had a thought what if we can automate the way we browse the internet like some code to automatically sign in and follow, like or click a button and automate all of this process. This caused the Selenium Project.
Scraping LinkedIn? Hold up!
I chose LinkedIn because
- It is a professional platform (tend to have less spam compared to others)
- It was eventually getting spammed and I want to give it a shot
When I first started automating LinkedIn tasks like login, follow and view other profiles I understood that it wasn’t an easy task here are some problems with automating LinkedIn
- LinkedIn easily detects bot activity
- We cannot view everyone’s profile easily
- Search and scraping is strictly limited on this platform
However, I wasn’t exactly scraping all the data from LinkedIn and I made some tweaks with how Selenium works and in the end it worked
Did I really collect Fake Profiles
Well here’s the thing,
To really identify a fake profile we need to scrape or find out all the activities of a profile complete from their posts, likes and identify their profile photo and still even then we might need some help with machine learning etc.
I had just started with selenium and don’t want to stop my project before it even started and then I had an idea, a simple one but it does the job.
Fake Profiles, of Who?
We just can’t go after every random profile we find and investigate its authenticity, I think it might be possible but not for me at that time. However, there were some easy targets celebrities, CEOs with both unique names and designations, Yess !!!
Here’s what I did
- Find/Pick a celebrity name
- Preferably I chose the CEO
- [Automation starts]
- Search their name
- Collect all profile links
- Find profiles that contain their exact name
I managed to get 50-75 Profiles per search
* Not all of them are fake, but a majority like 90% were fake or spam
I chose big CEOs because there were many fake profiles in their names and they were an easy to target compared to any other profile category on LinkedIn
So, let’s get started with the code now!
- Selenium [pip install selenium]
- Python 3+
- Chrome Web Driver [Download]
Remember, your Chrome Driver and Chrome Browser version must be same
Code to Collect Fake Profiles on LinkedIn
Like I said before, I used Python and Selenium and Chrome Web Driver
If you’re trying this make sure you have everything setup.
from selenium import webdriver import time driver = webdriver.Chrome("C:/WebDrivers/chromedriver.exe") driver.get('https://www.linkedin.com/uas/login')
text_area = driver.find_element_by_id('username') text_area.send_keys("[email protected]") text_area = driver.find_element_by_id('password') text_area.send_keys("your_password_here") submit_button = driver.find_elements_by_xpath('//*[@id="app__container"]/main/div/form/div/button') submit_button.click()
If the X-Path of submit button doesn’t work you can find a new one easily
- We use the element id and Xpath to enter data and click buttons
- Right-click on the required element button/field
- Select Inspect element option
- In the HTML code look for the id of the button
id starts with a ‘ # ‘
Looks something like this #button-login
Sometimes the id method doesn’t work then repeat the same i.e right click and find the XPath of the required element
1. Profile Link Collector
s= s=["Amitab Bachan","Jeff Bezos","Mark Zuckerberg"] l=(len(s))
Search and crawl
Searches the profile name and prints out exact matches
def search(m): driver.get("https://www.linkedin.com/feed/") search=driver.find_element_by_xpath("//*[@id=\"ember41\"]/input") search.send_keys(m,"\n") time.sleep(5) print("Seraching",m) res= for a in driver.find_elements_by_xpath('.//a'): res.append(a.get_attribute("href")) n="" n=m.lower() n=n.replace(" ","-") print(n) #these if statements are for getting exact links to the resultant profiles for i in range (len(res)): if n in res[i]: print(res[i])
Iterator & Result
for i in range(len(s)): z=s[i] search(z)
End of code for Fake Profiles on LinkedIn
2. Brute Force Collector
This method also collects Fake Profiles on LinkedIn but focuses only on one name with more results like 50-70 links per search because I managed to actually go through all the search results pages and gather more information for a search query
m="Jeff Bezos" final_list=
Super Crawler and Intelligent Link collector
The only thing that makes this script intelligent is that it collects all links and then applies filters to match the exact keywords or names we’re looking for and additionally goes through more search result pages and the no.of search results pages we can go through can be easily adjusted bringing us more results per search query.
def search(m): num=1 driver.get("https://www.linkedin.com/feed/") search=driver.find_element_by_xpath("//*[@id=\"ember41\"]/input") search.send_keys(m,"\n") time.sleep(5) print("Seraching",m) res= for a in driver.find_elements_by_xpath('.//a'): res.append(a.get_attribute("href")) n="" n=m.lower() n=n.replace(" ","-") print(n) for i in range (len(res)): if n in res[i]: print("Link ",num," : ",res[i]) num=num+1 final_list.append(res[i]) current=driver.current_url j=2 for i in range(5): j=j+1 res= driver.get(current+"&page="+str(j)) time.sleep(2) for a in driver.find_elements_by_xpath('.//a'): res.append(a.get_attribute("href")) n="" n=m.lower() n=n.replace(" ","-") for i in range (len(res)): if n in res[i]: print("Link ",num," : ",res[i]) num=num+1 final_list.append(res[i]) search(m)
The Result – Collects 70+ Profiles with exact match
for x in range(len(final_list)): print(x," : ",final_list[x])
Sample Result Data
Searching Jeff Bezos
Yes though we are using a unique name I believe there are some real people with the same name, but I have checked them and on an average search 90% are fake
Note: This is a fun project, data scraping and excessive use of this might ban your LinkedIn account use this for Educational purpose only, cheers have fun.
This post is published as a part of my project called “The Selenium Project“, where I automate the boring stuff using python and selenium mostly. If you find it interesting check it out and drop a star at the GitHub Repository