The site grabber feature of Internet Download Manager
Introduction
The site grabber feature of Internet Download Manager not only lets you download
required files that are specified with filters, for example all pictures from a
web site, or all audio files from a web site, but it also lets you download
subsets of web sites, or complete web sites for mirroring or offline browsing.
The site grabber has a set of predefined project templates, which make it
easy to set the Grabber for a required type of project.
The grabber itself is an easy to use four step wizard that determines
completely what files to download and where from. The grabber has a flexible set
of filters both for the web pages to explorer, and for the files to download.
After creating a project, the grabber starts to explore files.
Then you can check all necessary files and download them in the grabber by
pressing the download button in the toolbar, or you can add the checked files to
the main list of Internet Download Manager. At any time you can get back to any
stage of the wizard to change settings, for example paths to save files, or
filter settings, and then go to the last stage to resume exploring a site or
downloading files.
It's also possible to download all explored files automatically. After
closing the grabber, Internet Download Manager asks to save the project. The
saved project is added to the list of saved projects which is shown under the
"Grabber projects" node in the categories tree of the main IDM
dialog.
When you right click on a project name, Internet Download Manager shows a
popup menu where you can open a project to continue downloading, schedule the
project, or delete the project.
Grabber wizard. Creating a project
Step 1.
Set a start page
On the first step of the wizard you should specify the start page. By
default, http protocol is assumed; other protocols like https are required to be
specified explicitly. The start page also sets the current site. For example if
you specified http://www.tonec.com/support/index.html, the current site would be
www.tonec.com with all supported protocols like ftp, https, http applied to this
site name.
If a site requires authorization, you should also set login and password on
this step. Some sites allow browsing/downloading only after authentication on a
certain page. In this case you should press on
"Advanced>>" button, check "Enter login and
password manually" box, and specify the page to login to the site. Also
if the site has a logout button, you should specify here the logout pages that
the Grabber should not open. If you set the login page, the Grabber will open a
browser window after the fourth step and let you login to the site manually
before proceeding with exploring and downloading.
If you plan to save the grabber project for a later use, then you need to
choose a unique project name and enter it in "Grabber Project
Name" field at the top of the dialog. The project name is shown in the
list of saved projects in the categories tree of the main IDM dialog.
If you need to download all pictures, video or audio files from a website,
or download a complete web site, you may select the appropriate template in
Project template listbox. Project templates make it easy to start your projects
quickly, because all required settings are made automatically.
But it's not necessary to select a project template. Project templates make
predefined settings in your project for the next steps of grabber wizard. If you
download files from web sites with the same Grabber settings, you may choose the
"custom" template on this step, make necessary settings on the next stages, and
then save the settings as a template by picking "Project->Save
current settings as a template" menu item.
Step 2.
Select where to save files to.
On the second step you need to select where to save all downloaded
files.
You can save each file to a folder according to the category of the file.
For example, if you have "compressed files" category defined which lists zip arj
and rar file types, and it has an associated folder, for example, c:\my
documents\myname\downloads\compressed, then all downloaded zip, arj and rar
files will be saved to c:\my documents\myname\downloads\compressed folder.
Also you can save all downloaded files to a folder associated with a
selected category. You will need to select the corresponding radio button, and
choose a category. The grabber will find and show a directory below the
category.
If you want to create all folders as created on the web site, you can
select a directory where to save all downloaded files and check "use
original relative subfolders" box.
If you are downloading a complete web site, or a part of a web site, you
can check the box to convert links to local for offline browsing. This checkbox
is disabled when you select a template on the first step which doesn't require
saving any html pages, for example "All images from a web site" template. After
downloading of all selected files or after stopping the grabber, the grabber
will convert the links to downloaded files to local relative ones for every
downloaded web page. Also the grabber will convert all links to the files that
are not downloaded (remote files) to absolute internet links.
If "Overwrite existing files" box is not checked and the
file with the same name already exists, the grabber will add underline and a
number to the file name, for example index_2.html.
It's not necessary to check "Add checked files to IDM"
box, the main Grabber window toolbar has a button with the same functionality to
add all selected files to the main download list of Internet Download Manager.
If this checkbox is checked, the grabber will add selected files to IDM
automatically on closing the grabber.
Step 3.
Set site explorer filters.
At this step you should specify what web pages to explore to search for
required files. Please note that you set the criteria only for explored web
pages. You can set file types, location, and other filters for downloaded files
on the next step.
The start page that you specified on the first step sets the current site
to explore. For example if you specified
http://www.tonec.com/support/index.html, the current site would be www.tonec.com
with all supported protocols applied to this site name like
https://www.tonec.com and ftp://www.tonec.com. On this step you can tell the
Grabber to find all files on the current site only, or you can specify the
number of levels of web pages to process on the current (this) site and the
number of levels of web pages to process on other sites. Click to learn what is the number
of link levels. Be careful setting a large number of levels for other sites,
because it may slow down IDM showing useless files, and it may lead to
processing of millions of needless pages.
If you check "Ignore popup windows" box, the Grabber will
not explore the web pages that pop up in browsers during page loading. Note that
the popup window term is not applicable for the Grabber, it's applicable for web
browsers. The grabber doesn't open any browser windows except when you are using
a manual authentication.
If the start web page has a path relative to the site name (for example
http://www.tonec.com/support/index.html) then "Don't explore parent
directories" checkbox will be active. If you check "Don't
explore parent directories" box, then the Grabber will not explore
parent directories relative to the start page. For example, for
http://www.tonec.com/support/index.html, the grabber will NOT explore
http://www.tonec.com/index.html and http://www.tonec.com/other/index.html, but
will explore http://www.tonec.com/support/file.html and
http://www.tonec.com/support/other/index.html
If you check the "Explore all sites within the main
domain" box, then the Grabber will explore all other domains which have
a common part with the start page domain. For example for
http://www.tonec.com/support/index.html, the grabber will explore
http://tonec.com http://ftp.tonec.com and http://some.other.domain.tonec.com. On
child domains the Grabber will explorer the number of levels specified for the
current site.
The grabber can also run a Java script on a page and parse its results.
This way you can retrieve more links from a site, but you
should use this feature with caution.
If you click on "Advanced >>" button, the dialog
will expand and let you specify include and exclude filters for the
domains/paths within which you need to explore pages You can use asterisk
wildcard to match any number of any characters to create a filter pattern.
Set the number of levels.
Depending on your needs find below the samples of how many levels you
should set in the Grabber.
According to the picture, Level 0 for the current site and Level 0 for
other sites means that only the start page will be explored. All the files which
the start page contains (pictures, zips, audio, video files, etc.) will be added
to the list of the grabber.
Level 1 for this site and Level 0 for other site means that the Grabber
will explore the start page and web pages number 1 and 2.
Level 3 for this site and Level 1 for other sites means that the Grabber
will explore the start page as well as web page number 1, 2, 3, 4, 5, 6, 7, 8 of
the current (this) site and pages 9, 10, 11 from other sites, etc.
Please note that the number of levels relates to web pages only. For
downloaded files linked from a page, the Grabber doesn't check levels set on
this step. There are other filters for downloaded files which can be set on the
next step.
Processing Java script.
There are some links on different sites that are formed by running a Java
script on different events like page loading or pressing a button. IDM Grabber
can run all such Java script to retrieve such links. But in some cases when
running a script, the Grabber may run unwanted commands like starting an
installation of a component. Use "Process Javascript" option with caution and
only if you know that the site has no malicious Java script, or if you trust the
site you are downloading from.
Even with this option turned off, the Grabber will screen the text of Java
script and retrieve most of the links except those which are derived from
complex Java script expressions.
Step 4.
Set file filters
On this step you should set file types, locations and other filters for
downloaded files. You can set include and exclude filters for all file types.
If you are not satisfied with predefined filters, you can add/change them
by using "Add Filter" button. After clicking on "Add
Filter", the following "Edit Filters" dialog will appear.
For a filter that contains several file types, the file type elements
should be separated from each other with commas without spaces. The asterisk
wildcard (*) denotes any number of any characters. Using a wildcard you can
create a pattern matching several file names for example "image*.jpg" pattern
matches any jpg image filename starting from "image" word, like image01.jpg,
image2.jpg, imageHot.jpg, and image735.jpg. It's possible to use "<start
page>" expression in filters to specify the start page set on the first step.
If you check "Search files on this site only" box then the
files located on other sites won't be shown in the main Grabber window, and the
Grabber won't check the size and the type of these files.
It's very likely that during exploring a project, the Grabber will find
many copies of the same file in different locations. If you check "Hide
duplicate files found in different locations", the grabber will show
only the first copy of the file it finds. The grabber treats a file as a copy if
it has the same name and the same size. This option is disabled when
"use original relative subfolders" option is enabled.
If you check "Start downloading all matched files at once"
then all found files will be downloaded immediately. You may explore the site at
first, check the files that you need, and download them in the main Grabber
window, or add them to the main list of IDM.
"Advanced >>" button expands the dialog and let you
set include and exclude filters for the paths/domains where the Grabber will
download files from. You can use the asterisk wildcard (*) to denote any number
of any characters. Also you can set the minimum and the maximum size of files to
download.
Main Action Dialog
The main grabber dialog has a toolbar, a window where all files are
displayed, and a tree which shows a structure of a site.
The tree shows the site structure according to links and to folders. You
can also see the entire list of all found files as well as all found files on a
page or in a folder.
The toolbar has the following buttons: "Start Exploring ",
"Stop Exploring", "Check All Files",
"Uncheck All Files", "Start Downloading the checked
files", "Stop Downloading the checked files",
"Add checked files to the main IDM list and to the download
queue", and "Show Grabber Statistics".
It's possible to check all required files for downloading and start
downloading them immediately or add them to the main IDM list. At any time you
can get back to a previous page of the Grabber wizard to change settings,
filters, paths for saving files, and etc., and then go forward to resume
exploring a site or to resume downloading files.
If a file has been downloaded, then the file or its folder can be opened
from popup menu by right clicking on the file name.
It's also possible to select several files in the list using a mouse and
the shift key and then check or uncheck the files all together. The
"properties" item in popup menu opens a dialog where you can
choose a file name to save the file, or where you can copy file URL or referrer
to the clipboard.
"Show Grabber Statistics" button opens "IDM Grabber
Statistics" window. The window is always on top so that you can see it while you
are running a grabber project.
The grabber statistics shows general statistic information about the
project.
Settings Dialog
On the Grabber settings dialog you can specify how many files to explore
and how many files to download at the same time.
If a link to a found file has any text, then the description containing the
text will be added to the description field and be shown when you add the file
to the main IDM list.
By default "Look up files in IE cache before downloading"
box is checked. If the grabber finds a file in IE cache, it checks if the file
was changed. If the file was changed, the grabber downloads it from the site. If
not, the grabber takes it from the cache of IE. If you don't use Internet
Explorer, you may turn this option off.
Scheduler
It's possible to schedule exploring/downloading time or synchronization
time for every project.
If you choose "One-time exploring/downloading" option then
it will be possible (a) only to explore a site to look for matching files, or
(b) to explore a site and download all matched files, or (c) to download checked
files.
Please note that if you need to download several files from a site that has
frequent problems with downloading, it's better to add required files to IDM
main list and download these files using "Start Queue" button in IDM main
window. In this case IDM will retry to start downloading files infinitely. But
when you are downloading in the grabber, the grabber makes only 2 attempts to
start downloading, after it stops supposing it did everything possible for the
project.
You can schedule periodic synchronization for a site or for checked files
by turning on "Periodic synchronization" option. In this case
the grabber checks if files have been changed, and if changed, it will download
new files and replace them on new ones. For periodic synchronization, the
scheduler will turn on "overwrite existing files" option in project settings on
step 2 of the grabber wizard.
The project will start at a specified day and time only if "Start
download at" box is checked. The project itself should be closed, or if
it's opened, it should be opened in the Main Action Dialog and it should be
stopped. A running project, or a project opened on any stage of the wizard will
not be run by the scheduler.
If you choose "Periodic synchronization" it will be
possible to start the project many times in every predefined number of
minutes/hours. The scheduler will run the project periodically until the time
specified in "Stop download at", or if it's not specified then
until the end of the day.
It's possible to stop any project at a specified time if you check
"Stop download at" box and specify a time. Note that the
project will stop independently of "Start download at" checkbox
state.
After processing a grabber project you can program the grabber to hang up
modem, exit Internet Download Manager, or turn off computer like in the main IDM
scheduler. But in the grabber scheduler you can also open a file or run a
program after processing a grabber project. If you want to run several files on
completion of a grabber task, you can create a batch .bat file and specify it in
"Open the following file when done". Please note that after
finishing a scheduled project, IDM will wait for 10 seconds and then start to
execute actions planned on completion.