Tag Archives: wget

C++ || Serial & Parallel Multi Process File Downloader Using Fork & Execlp

The following is another homework assignment which was presented in an Operating Systems Concepts class. The following are two multi-process programs using commandline arguments, which demonstrates more practice using the fork() and execlp() system calls on Unix based systems.

==== 1. OVERVIEW ====

File downloaders are programs used for downloading files from the Internet. The following programs listed on this page implement two distinct type of multi-process downloaders:

1. a serial file downloader which downloads files one by one.
2. a parallel file downloader which dowloads multiple files in parallel.

In both programs, the parent process first reads a file via the commandline. This file which is read is the file that contains the list of URLs of the files to be downloaded. The incoming url file that is read has the following format:

[URL1]
[URL2]
.
.
.
[URLN]

Where [URL] is an http internet link with a valid absolute file path extension.
(i.e: http://newsimg.ngfiles.com/270000/270173_0204618900-cc-asmbash.jpg)

After the url file is parsed, next the parent process forks a child process. Each created child process uses the execlp() system call to replace its executable image with that of the “wget” program. The use of the wget program performs the actual file downloading.

==== 2. SERIAL DOWNLOADER ====

The serial downloader downloads files one at a time. After the parent process has read and parsed the incoming url file from the commandline, the serial downloader proceeds as follows:

1. The parent forks off a child process.
2. The child uses execlp("/usr/bin/wget", "wget", [URL STRING1], NULL) system call in order to replace its program with wget program that will download the first file in urls.txt (i.e. the file at URL ).
3. The parent executes a wait() system call until the child exits.
4. The parent forks off another child which downloads the next file specified in url.txt.
5. Repeat the same process until all files are downloaded.

The following is implemented below:

Since the serial downloader downloads files one at a time, that can become very slow. That is where the parallel downloader comes in handy!

==== 3. PARALLEL DOWNLOADER ====

The parallel downloader downloads files all at once and is implemented much like the serial downloader. The parallel downloader proceeds as follows:

1. The parent forks off n children, where n is the number of URLs in url.txt.
2. Each child executes execlp("/usr/bin/wget", "wget", [URL STRING], NULL) system call where each is a distinct URL in url.txt.
3. The parent calls wait() (n times in a row) and waits for all children to terminate.
4. The parent exits.

The following is implemented below:


QUICK NOTES:
The highlighted lines are sections of interest to look out for.

The code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.

Also note, while the parallel downloader executes, the outputs from different children may intermingle.