Thursday, July 24, 2014

The site grabber feature of Internet Download Manager

The site grabber feature of Internet Download Manager 

Introduction

The site grabber feature of Internet Download Manager not only lets you download required files that are specified with filters, for example all pictures from a web site, or all audio files from a web site, but it also lets you download subsets of web sites, or complete web sites for mirroring or offline browsing.
 
The site grabber has a set of predefined project templates, which make it easy to set the Grabber for a required type of project.
 
The grabber itself is an easy to use four step wizard that determines completely what files to download and where from. The grabber has a flexible set of filters both for the web pages to explorer, and for the files to download. After creating a project, the grabber starts to explore files.
 

 
Then you can check all necessary files and download them in the grabber by pressing the download button in the toolbar, or you can add the checked files to the main list of Internet Download Manager. At any time you can get back to any stage of the wizard to change settings, for example paths to save files, or filter settings, and then go to the last stage to resume exploring a site or downloading files.
 
It's also possible to download all explored files automatically. After closing the grabber, Internet Download Manager asks to save the project. The saved project is added to the list of saved projects which is shown under the "Grabber projects" node in the categories tree of the main IDM dialog.
 

 
When you right click on a project name, Internet Download Manager shows a popup menu where you can open a project to continue downloading, schedule the project, or delete the project. 

Grabber wizard. Creating a project

   Step 1. Set a start page
               On the first step of the wizard you should specify the start page. By default, http protocol is assumed; other protocols like https are required to be specified explicitly. The start page also sets the current site. For example if you specified http://www.tonec.com/support/index.html, the current site would be www.tonec.com with all supported protocols like ftp, https, http applied to this site name.
 

 
If a site requires authorization, you should also set login and password on this step. Some sites allow browsing/downloading only after authentication on a certain page. In this case you should press on "Advanced>>" button, check "Enter login and password manually" box, and specify the page to login to the site. Also if the site has a logout button, you should specify here the logout pages that the Grabber should not open. If you set the login page, the Grabber will open a browser window after the fourth step and let you login to the site manually before proceeding with exploring and downloading.
 
If you plan to save the grabber project for a later use, then you need to choose a unique project name and enter it in "Grabber Project Name" field at the top of the dialog. The project name is shown in the list of saved projects in the categories tree of the main IDM dialog.
 
If you need to download all pictures, video or audio files from a website, or download a complete web site, you may select the appropriate template in Project template listbox. Project templates make it easy to start your projects quickly, because all required settings are made automatically.
 
But it's not necessary to select a project template. Project templates make predefined settings in your project for the next steps of grabber wizard. If you download files from web sites with the same Grabber settings, you may choose the "custom" template on this step, make necessary settings on the next stages, and then save the settings as a template by picking "Project->Save current settings as a template" menu item. 
  Step 2. Select where to save files to.
               On the second step you need to select where to save all downloaded files.
 

 
You can save each file to a folder according to the category of the file. For example, if you have "compressed files" category defined which lists zip arj and rar file types, and it has an associated folder, for example, c:\my documents\myname\downloads\compressed, then all downloaded zip, arj and rar files will be saved to c:\my documents\myname\downloads\compressed folder.
 
Also you can save all downloaded files to a folder associated with a selected category. You will need to select the corresponding radio button, and choose a category. The grabber will find and show a directory below the category.
 
If you want to create all folders as created on the web site, you can select a directory where to save all downloaded files and check "use original relative subfolders" box.
 
If you are downloading a complete web site, or a part of a web site, you can check the box to convert links to local for offline browsing. This checkbox is disabled when you select a template on the first step which doesn't require saving any html pages, for example "All images from a web site" template. After downloading of all selected files or after stopping the grabber, the grabber will convert the links to downloaded files to local relative ones for every downloaded web page. Also the grabber will convert all links to the files that are not downloaded (remote files) to absolute internet links.
 
If "Overwrite existing files" box is not checked and the file with the same name already exists, the grabber will add underline and a number to the file name, for example index_2.html.
 
It's not necessary to check "Add checked files to IDM" box, the main Grabber window toolbar has a button with the same functionality to add all selected files to the main download list of Internet Download Manager. If this checkbox is checked, the grabber will add selected files to IDM automatically on closing the grabber. 
  Step 3. Set site explorer filters.
                    At this step you should specify what web pages to explore to search for required files. Please note that you set the criteria only for explored web pages. You can set file types, location, and other filters for downloaded files on the next step.
 

 
The start page that you specified on the first step sets the current site to explore. For example if you specified http://www.tonec.com/support/index.html, the current site would be www.tonec.com with all supported protocols applied to this site name like https://www.tonec.com and ftp://www.tonec.com. On this step you can tell the Grabber to find all files on the current site only, or you can specify the number of levels of web pages to process on the current (this) site and the number of levels of web pages to process on other sites. Click to learn what is the number of link levels. Be careful setting a large number of levels for other sites, because it may slow down IDM showing useless files, and it may lead to processing of millions of needless pages.
 
If you check "Ignore popup windows" box, the Grabber will not explore the web pages that pop up in browsers during page loading. Note that the popup window term is not applicable for the Grabber, it's applicable for web browsers. The grabber doesn't open any browser windows except when you are using a manual authentication.
 
If the start web page has a path relative to the site name (for example http://www.tonec.com/support/index.html) then "Don't explore parent directories" checkbox will be active. If you check "Don't explore parent directories" box, then the Grabber will not explore parent directories relative to the start page. For example, for http://www.tonec.com/support/index.html, the grabber will NOT explore http://www.tonec.com/index.html and http://www.tonec.com/other/index.html, but will explore http://www.tonec.com/support/file.html and http://www.tonec.com/support/other/index.html
 
If you check the "Explore all sites within the main domain" box, then the Grabber will explore all other domains which have a common part with the start page domain. For example for http://www.tonec.com/support/index.html, the grabber will explore http://tonec.com http://ftp.tonec.com and http://some.other.domain.tonec.com. On child domains the Grabber will explorer the number of levels specified for the current site.
 
The grabber can also run a Java script on a page and parse its results. This way you can retrieve more links from a site, but you should use this feature with caution.
 
If you click on "Advanced >>" button, the dialog will expand and let you specify include and exclude filters for the domains/paths within which you need to explore pages You can use asterisk wildcard to match any number of any characters to create a filter pattern.  
              Set the number of levels.

                   Depending on your needs find below the samples of how many levels you should set in the Grabber.
 
According to the picture, Level 0 for the current site and Level 0 for other sites means that only the start page will be explored. All the files which the start page contains (pictures, zips, audio, video files, etc.) will be added to the list of the grabber.
 
Level 1 for this site and Level 0 for other site means that the Grabber will explore the start page and web pages number 1 and 2.
 
Level 3 for this site and Level 1 for other sites means that the Grabber will explore the start page as well as web page number 1, 2, 3, 4, 5, 6, 7, 8 of the current (this) site and pages 9, 10, 11 from other sites, etc.
 
Please note that the number of levels relates to web pages only. For downloaded files linked from a page, the Grabber doesn't check levels set on this step. There are other filters for downloaded files which can be set on the next step. 
              Processing Java script.  
                   There are some links on different sites that are formed by running a Java script on different events like page loading or pressing a button. IDM Grabber can run all such Java script to retrieve such links. But in some cases when running a script, the Grabber may run unwanted commands like starting an installation of a component. Use "Process Javascript" option with caution and only if you know that the site has no malicious Java script, or if you trust the site you are downloading from.
 
Even with this option turned off, the Grabber will screen the text of Java script and retrieve most of the links except those which are derived from complex Java script expressions. 
  Step 4. Set file filters
             On this step you should set file types, locations and other filters for downloaded files. You can set include and exclude filters for all file types.
 

 
If you are not satisfied with predefined filters, you can add/change them by using "Add Filter" button. After clicking on "Add Filter", the following "Edit Filters" dialog will appear.
 

 
For a filter that contains several file types, the file type elements should be separated from each other with commas without spaces. The asterisk wildcard (*) denotes any number of any characters. Using a wildcard you can create a pattern matching several file names for example "image*.jpg" pattern matches any jpg image filename starting from "image" word, like image01.jpg, image2.jpg, imageHot.jpg, and image735.jpg. It's possible to use "<start page>" expression in filters to specify the start page set on the first step.
 
If you check "Search files on this site only" box then the files located on other sites won't be shown in the main Grabber window, and the Grabber won't check the size and the type of these files.
 
It's very likely that during exploring a project, the Grabber will find many copies of the same file in different locations. If you check "Hide duplicate files found in different locations", the grabber will show only the first copy of the file it finds. The grabber treats a file as a copy if it has the same name and the same size. This option is disabled when "use original relative subfolders" option is enabled.
 
If you check "Start downloading all matched files at once" then all found files will be downloaded immediately. You may explore the site at first, check the files that you need, and download them in the main Grabber window, or add them to the main list of IDM.
 
"Advanced >>" button expands the dialog and let you set include and exclude filters for the paths/domains where the Grabber will download files from. You can use the asterisk wildcard (*) to denote any number of any characters. Also you can set the minimum and the maximum size of files to download. 

Main Action Dialog

The main grabber dialog has a toolbar, a window where all files are displayed, and a tree which shows a structure of a site.
 

 
The tree shows the site structure according to links and to folders. You can also see the entire list of all found files as well as all found files on a page or in a folder.
 
The toolbar has the following buttons: "Start Exploring ", "Stop Exploring", "Check All Files", "Uncheck All Files", "Start Downloading the checked files", "Stop Downloading the checked files", "Add checked files to the main IDM list and to the download queue", and "Show Grabber Statistics".
 
It's possible to check all required files for downloading and start downloading them immediately or add them to the main IDM list. At any time you can get back to a previous page of the Grabber wizard to change settings, filters, paths for saving files, and etc., and then go forward to resume exploring a site or to resume downloading files.
 
If a file has been downloaded, then the file or its folder can be opened from popup menu by right clicking on the file name.
 

 
It's also possible to select several files in the list using a mouse and the shift key and then check or uncheck the files all together. The "properties" item in popup menu opens a dialog where you can choose a file name to save the file, or where you can copy file URL or referrer to the clipboard.
 
"Show Grabber Statistics" button opens "IDM Grabber Statistics" window. The window is always on top so that you can see it while you are running a grabber project.
 

 
The grabber statistics shows general statistic information about the project. 

Settings Dialog


On the Grabber settings dialog you can specify how many files to explore and how many files to download at the same time.
 
If a link to a found file has any text, then the description containing the text will be added to the description field and be shown when you add the file to the main IDM list.
 
By default "Look up files in IE cache before downloading" box is checked. If the grabber finds a file in IE cache, it checks if the file was changed. If the file was changed, the grabber downloads it from the site. If not, the grabber takes it from the cache of IE. If you don't use Internet Explorer, you may turn this option off. 

Scheduler

It's possible to schedule exploring/downloading time or synchronization time for every project.
 
If you choose "One-time exploring/downloading" option then it will be possible (a) only to explore a site to look for matching files, or (b) to explore a site and download all matched files, or (c) to download checked files.
 
Please note that if you need to download several files from a site that has frequent problems with downloading, it's better to add required files to IDM main list and download these files using "Start Queue" button in IDM main window. In this case IDM will retry to start downloading files infinitely. But when you are downloading in the grabber, the grabber makes only 2 attempts to start downloading, after it stops supposing it did everything possible for the project.
 
You can schedule periodic synchronization for a site or for checked files by turning on "Periodic synchronization" option. In this case the grabber checks if files have been changed, and if changed, it will download new files and replace them on new ones. For periodic synchronization, the scheduler will turn on "overwrite existing files" option in project settings on step 2 of the grabber wizard.
 
The project will start at a specified day and time only if "Start download at" box is checked. The project itself should be closed, or if it's opened, it should be opened in the Main Action Dialog and it should be stopped. A running project, or a project opened on any stage of the wizard will not be run by the scheduler.
 
If you choose "Periodic synchronization" it will be possible to start the project many times in every predefined number of minutes/hours. The scheduler will run the project periodically until the time specified in "Stop download at", or if it's not specified then until the end of the day.
 
It's possible to stop any project at a specified time if you check "Stop download at" box and specify a time. Note that the project will stop independently of "Start download at" checkbox state.
 
After processing a grabber project you can program the grabber to hang up modem, exit Internet Download Manager, or turn off computer like in the main IDM scheduler. But in the grabber scheduler you can also open a file or run a program after processing a grabber project. If you want to run several files on completion of a grabber task, you can create a batch .bat file and specify it in "Open the following file when done". Please note that after finishing a scheduled project, IDM will wait for 10 seconds and then start to execute actions planned on completion.