In this tutorial, I will give you a step-by-step walkthrough of how to use Scrapebox Email Extractor Premium Plugin to scrape emails. Before we start, I just want to point out that Scrapebox Email Extractor Premium Plugin is an email extractor that has the ability to scrape the search engines such as Google and Bing for keywords and then harvest the relevant urls and extract emails.
This is the main interfact of Scrapebox.
Navigate to "Premium Plugins" and select "Email Scraper Premium".
Select the first tab: "Search by keywords". Here, you should select the search engines. I recommend that you select Google, Yahoo, Bing, Ask.com, Search.com, Ecosia, DuckDuckGo and any others. In the keyword section, you will need to add the relevant keywords. I will show you in my video of how to generate the keywords. Usually, whenever I am scraping for local businesses, I take a root keyword such as "Jewellery Stores" and merge it with my footprint list of all major cities and countries in the world. This helps me to scrape each local keyword for every city.
Now open the options button at the bottom. Let me give you a quick walkthrough of all the options.
FILTER 1: "Search only in pages with" - "any of the words". This is a content filter. Here you can enter your list of keywords that must be present in a site's body content, html, meta title and meta description. This is a very useful filter that will make sure that a website is topically related to your keywords. Inside the spreadsheet, I have a "content filters" field where I have a list of niche related keywords. Content filters (as I like to call them) are basically the most important keywords related to a niche. Let me give you example. I am now scraping contact details for jewellery stores around the world. This is how my content filer looks:
Gold Platinum Silver Watches Jewellery Earrings Necklaces Rings Bracelets Jewelry Charms Wedding Gemstone Engagement Bridal
Note: you do not need to enter commas to separate words. Just enter one space between each word. Usually, to build your list of "content filters", run a basic google search for your keyword such as jewellery store and you should immediately see some ideas on the SERP. Otherwise, to build you content filters, you can go to a few websites and pick out some category or brand keywords. At the end of the day, the content filter just checks a website page for relevancy. However, whilst running the email extractor, I noticed that a page that contains the word "jewellery" may not necessarily be a jewellery shop. In fact, I came across an article about a jewellery store robbery.
FILTER 2: "Ignore Emails when The" - "email username contains" and "email domain contains".
This filter basically filters out all the junk emails. You can download my own personal filters that I use.
FILTER 3: "Grabemails [sic] only when 1) "email username contains"and 2) "email domain contains"
This is a very good filter if you would like to refine your search results to include your keywords inside the website domain or the username. I do not use this option as it can significantly limit the number of results. I prefer to cast the net wider. Continuing with the previous jewellery store example, most website domains will not contain jewellery related keywords as most jewellery stores are brands such as Harrods, Graff and Van Cleef & Arpels.
In the "Settings" box, I like to select 100 "worker" threads because I am running my instance of the Scrapebox email scraper on a powerful dedicated server. I DO NOT check the "only scrape emails matching the domain name" option simply because some sites like directories will have valid business emails but the domain name of the directory will not match the domain name of a jewellery store. Again, I choose not to use this filter as I like to keep my results broad. Equally, there are many artisan jewellers or smaller businesses that may not have a company email address. For the "Total Number of Results to Process", I choose the maximum: 1000000 because I run local searches for my keyword and every location in the world. I therefore need as many results as possible.
In the Proxies box, make sure to paste the proxies inside as you will need proxies when scraping the search engines. Search engines like Google or even Bing will ban your IP address after a few minutes which will significantly impact your scraping endeavours. I like to use rotating proxies from storm proxies. They are relatively cheap and the proxy changes with every http request. Do not forget that you will need to authenticate the local IP address of your laptop or VPS (whatever you are using) with Storm Proxies. Also, Storm Proxies have different plans with different thread numbers.
You should enter your keywords into the "keywords" pane. As I do a lot of local searches, I use my list of all major cities in the world as my footprint. I then combine my root keyword with every city and country variation to run a very detailed search for all local businesses. What I like about this approach is that it grabs local data from virtually all search engines, maps and directories.
For LOCAL searches, you will need to use this footprint list of all cities and countries. You can download it here.
If you are scraping for local business results, you will need to use the footprint with all countries and cities to search for that keyword in every city and country. See above (LOCAL SEARCHES).
However, if you are going to scrape emails for international businesses/websites, you will need to use a list of keywords. For example, if we take the cryptocurrency niche, there is no point of searching local cities and countries because cryptocurrency sites are not local businesses - they are global.
So what you should do is start off with your niche and then produce a starting list of as many category-related keywords, brands and so on!
So, if we continue with our cryptocurrency example, some of our starting/ root keywords would be:
I am going to keep this list fairly short for the sake of this example and brevity but in real life, you would have a much longer list with most important niche keywords. Do not forget, we are also going to be using the main list of root keywords as a content filter in the Scrapebox email scraper premium plugin.
Now we will need to expand upon these keywords to produce a much bigger list to use with our Scrapebox Email Scraper Premium Plugin.
I like to use Domination Robot keyword planner and then run the scraped keywords through the scrapebox keyword scraper. This produces a much longer list of keywords and subsequently, more niche-related emails during the email scraping stage.
Go to https://dominationrobot.com, login and go to "keywords". Inside the "Seed Keywords", enter the list of ROOT KEYWORDS we have created above and click start. Wait for the processing to finish and then copy all the "Money keywords".
If you do not have Domination Robot, you can use the Money Robot keyword planner tool but you will have to search for every keyword one at a time. You cannot just enter all of your keywords in bulk!
Next, open the "Keyword scraper".
Now select all the sources. Make sure to use proxies and paste all of your keywords into the left hand side. Once the keyword scraping is done, copy the keywords and use them with your Scrapebox Email Scraper Premium Plugin!
Here is a quick video on how to do the keyword research.
Let us assume that you have finished the scraping process. Now click on "Send to Email tester" to clean the entire email list.
Click on options at the bottom of the interface and just enter your existing email address. It will be used for testing the email list by trying to send an email from you to your contacts on the list.
Once you have finished running the email tester, click on Filter and Remove all Invalid Emails!
Now you can export all valid emails to file. Make sure to create a folder and use a clear file naming convention. In my case, I would name my jewellery store list as:
Good organisation helps to save time and avoid mistakes, especially when you are scraping hundreds or thousands of lists!
Comments will be approved before showing up.