In this tutorial, I am going to show you how to scrape your B2B email list of target/prospective clients.
Scrapebox is an excellent tool that has a lot of powerful features that can be combined together to take email scraping to another level.
There are a number of ways you can go about proxies. You can either install and configure HMA! Pro VPN and run it with a timed out IP change. You will need to download the "previous version" from https://www.hidemyass.com/en-gb/installation-files because it has a timed out IP change which is something that the new version does not have. For scraping, we need to change our IP address as frequently as possible because the search engines will start to ban an IP address after a few hundred or a couple of thousands scrapes.
A better option is to use your own private proxies because they are more stable. I like to use backconnect rotating proxies from Storm Proxies but those can be very expensive. You will most likely be running Scrapebox at a hundred or more threads and this will require a very large custom proxy package with many threads. This can be fairly expensive considering that you need the proxies just for scraping.
Step 2: Compile your List of Root Keywords for Your Prospects
The next step will be for you to put together a list of root keywords for the prospects you are targeting. Root keywords will most often comprise category keywords, brands, popular keywords and so on. The idea is to then use the keywords with a keyword research tool to scrape more keywords for each root keywords.
So, for example, let's say that we are targeting all the CBD websites. I have gone onto Google and searched for a few CBD sites and copied their categories, brands and even popular search term words. For the sake of this example, I will keep the list short as I do not want for it to occupy the best part of this article. As a general rule of thumb, you should have at least 200 - 500 root keywords. You need to ensure that there are enough for when it comes to using them as a basis to scrape more niche-related keywords.
CBD Oil buy
CBD for sale
CBD back pain
CBD joint muscle
CBD green roads
CBD Full spectrum
CBD broad spectrum
CBD for disabilities
CBD with thc
CBD near me
CBD for honey
CBD for skin
CBD Face cream
CBD skin care
CBD Skin patch
CBD Dog Treats
CBD Pills Capsules and Tinctures
CBD for Pets
Marijuana Derived CBD
Hemp Derived CBD
4 Corners Cannabis
Alaska Cannabis Exchange
Alternate Vape CBD
AON Mother Nature
Bio Hemp CBD
Blue Moon Hemp
Blue Sky Biologicals
Browns Botanicals CBD
Buddha CBD Teas
Casa Luna Chocolate
CBD for Life
CBD for Life 1
CBD Fusion Water
CBD Global Extracts
CBD Infusions 1
CBD Living Water
CBD RxFunctional Remedies
Cloud 9 Hemp
Crystal Pure CBD
Dose of Nature
Elixinol CBD Australia
Golden Leaf CBD
Green Garden Gold
Hemp Health Technologies
Holland Hemp Company
Holy Grail CBD
Jeffs Best Hemp
List of CBD Oil Companies Weveed
Mana Artisan Botanics
Michigan Hemp Company
Natures Way Botanicals
Noontide Herbal Elixirs
Plus CBD Oil
Prime My Body
Progressive Pet Products Inc
PURE CBD VAPORS
Pure Science Lab
Real Scientific Hemp Oil
Tasty Hemp Oil
The Fay Farm
The Medics Inc
The Wee Hemp Company
Therapy Pure Essentials
Top Verified CBD Brands
Try the CBD
Zakah Life Essentials
Awesome! Now, I am going to use Domination Robot (https://dominationrobot.com) keyword research tool to expand this list of root keywords to prepare myself and extensive list of niche-related keywords to use with Scrapebox. As you can see below, just enter these root or "seed" keywords into the "Seed keywords" pane and hit start. The keyword scraper will start to collect related keywords for each one of the Seed keywords.
Select "custom footprint". We are not going to be scraping for particular website platforms so it is ok to use no footprint at all.
I recommend that you use the default values of 50 threads for each function as Scrapebox does have a tendency to crash when being run at a large thread number.
I recommend that you remove all the duplicate URLs and domains is because you do not want to scrape emails from those urls as it will result in a lot of duplicates.
Select however many search engines you need and start harvesting. If you are using proxies, make sure you have proxies enabled and if you are running HMA! VPN PRO in the background, make sure that the IP changes every minute as for scraping, you need as many new proxies/IPs as possible!
Now you need to export all the scraped website urls in a text file. You should create a folder with the name of the niche and name your file using the niche name and the fact that these urls are targets/prospective clients.
You should now name your notepad text file accordingly. This is how I named it:
"CBD - Scrapebox target client urls"
Now you need to open the Scrapebox Email Scraper Premium Plugin.
Now we are going to load the website urls that we have just scraped and extract the email addresses from those urls.
Now we are going to select the file with all the scraped website URLs and load it.
This is one of the most important steps. In order to get emails that are clean and related to our niche, we are going to configure all the filters.
Filter 1: "Search only in pages with" - here, we are going to add all the possible keywords related to our niche. DO NOT use commas, just a space to separate each keyword. The software will scan the HTML code, content, meta titles and meta descriptions of each website for any of these keywords. If a website from our list contains any one of these keywords, the software will extract an email from it. However, if a website does not contain, Scrapebox will skip it. The idea behind this filter is to scrape topically relevant websites. So, in our case, a website that contains the word "CBD" is most likely to be relevant to our business niche.
You could also copy and paste the same keywords into "Search only in pages with" -> "any of the words in the url". So, here, instead of checking the actual page of the website for the keywords, the software will scan the urls for these keywords. I WOULD NOT use this filter because it will narrow the results down too much. And do not forget that a website url will not necessarily have the keywords in it for a lot of niches.
Filter 2: "Ignore Emails when the"
"email username contains" - basically, here we will enter our list of keywords that we do not want in our list.
"email domain contains" - here we will enter our list of keywords that we do not want in our list.
"site is" - here, we will add sites for which we do not want emails. These would usually include the Majestic Million websites.
The idea behind this filter is to get rid of poor quality and spammy emails.
Filter 3: "Grab Emails only when"
"email username contains"
"email domain contains"
Here, we would enter our keywords that we want to appear inside the email. Again, I WOULD NOT use this filter simply because not every email will necessarily have our keywords. Some niches may be more keyword driven than others. For example, compare the fashion industry against the CBD niche. CBD niche has more websites with the actual keyword CBD, whereas, fashion websites are more brand driven so there is no way that we will find many emails with our keywords inside the emails.
"Settings" - "worker threads" - this is just the thread number for this plugin. I like to run it at a 100 as I have a fairly powerful dedicated server.
I do not check the "only scrape emails matching the domain" because a lot of emails will come from business directories and social media sites and there is no way that the scraped emails will match to those domains. Therefore, once again, I like to be over-inclusive.
Use proxies if you have proxies. If not, just run your HMA! VPN PRO in the background.
Now run the plugin.
Now send all the scraped emails to the email tester.
In the "Senders email address" (sic) just enter your email address.
Here, you will remove all the untested, invalid and emails with errors.
This is a very basic and working video of how to undertake the entire process I have described above.
Click here to download the video.
Comments will be approved before showing up.