Tag Archives: wget
C++ || Serial & Parallel Multi Process File Downloader Using Fork & Execlp
The following is another homework assignment which was presented in an Operating Systems Concepts class. The following are two multi-process programs using commandline arguments, which demonstrates more practice using the fork() and execlp() system calls on Unix based systems.
==== 1. OVERVIEW ====
File downloaders are programs used for downloading files from the Internet. The following programs listed on this page implement two distinct type of multi-process downloaders:
1. a serial file downloader which downloads files one by one.
2. a parallel file downloader which dowloads multiple files in parallel.
In both programs, the parent process first reads a file via the commandline. This file which is read is the file that contains the list of URLs of the files to be downloaded. The incoming url file that is read has the following format:
[URL1]
[URL2]
.
.
.
[URLN]
Where [URL] is an http internet link with a valid absolute file path extension.
(i.e: http://newsimg.ngfiles.com/270000/270173_0204618900-cc-asmbash.jpg)
After the url file is parsed, next the parent process forks a child process. Each created child process uses the execlp() system call to replace its executable image with that of the “wget” program. The use of the wget program performs the actual file downloading.
==== 2. SERIAL DOWNLOADER ====
The serial downloader downloads files one at a time. After the parent process has read and parsed the incoming url file from the commandline, the serial downloader proceeds as follows:
1. The parent forks off a child process.
2. The child uses execlp("/usr/bin/wget", "wget", [URL STRING1], NULL) system call in order to replace its program with wget program that will download the first file in urls.txt (i.e. the file at URL).
3. The parent executes a wait() system call until the child exits.
4. The parent forks off another child which downloads the next file specified in url.txt.
5. Repeat the same process until all files are downloaded.
The following is implemented below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
// ============================================================================ // Author: Kenneth Perkins // Date: Aug 19, 2013 // Taken From: http://programmingnotes.org/ // File: Serial.cpp // Description: File downloaders are programs used for downloading files // from the Internet. Using the fork() & execlp("wget") command, the // following is a multi-process serial file downloader which reads an // input file containing url file download links as a commandline // argument and downloads the files located on the internet one by one. // ============================================================================ #include <iostream> #include <cstring> #include <fstream> #include <cstdlib> #include <unistd.h> #include <sys/wait.h> using namespace std; // compile & run // g++ Serial.cpp -o Serial // ./Serial urls.txt int main(int argc, char* argv[]) { // declare variables pid_t pid = -1; int urlNumber = 0; char urlName[256]; ifstream infile; // check if theres enough command line args if(argc < 2) { cout <<"\nERROR -- NOT ENOUGH ARGS!" <<"\n\nUSAGE: "<<argv[0]<<" <file containing url downloads>\n\n"; exit(1); } // try to open the file containing the download url links // exit if the url file is not found infile.open(argv[1]); if(infile.fail()) { cout <<"\nERROR -- "<<argv[1]<<" NOT FOUND!\n\n"; exit(1); } // get download url links from the file while(infile.getline(urlName, sizeof(urlName))) { ++urlNumber; // fork another process pid = fork(); if(pid < 0) { // ** error occurred perror("fork"); exit(1); } else if(pid == 0) { // ** child process cout <<endl<<"** URL #"<<urlNumber <<" is currently downloading... **\n\n"; execlp("/usr/bin/wget", "wget", urlName, NULL); } else { // ** parent process // parent will wait for the child to complete wait(NULL); cout <<endl<<"-- URL #"<<urlNumber<<" is complete! --\n"; } } infile.close(); cout <<endl<<"The parent process is now exiting...\n"; return 0; }// http://programmingnotes.org/ |
Since the serial downloader downloads files one at a time, that can become very slow. That is where the parallel downloader comes in handy!
==== 3. PARALLEL DOWNLOADER ====
The parallel downloader downloads files all at once and is implemented much like the serial downloader. The parallel downloader proceeds as follows:
1. The parent forks off n children, where n is the number of URLs in url.txt.
2. Each child executes execlp("/usr/bin/wget", "wget", [URL STRING], NULL) system call where eachis a distinct URL in url.txt.
3. The parent calls wait() (n times in a row) and waits for all children to terminate.
4. The parent exits.
The following is implemented below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
// ============================================================================ // Author: Kenneth Perkins // Date: Aug 19, 2013 // Taken From: http://programmingnotes.org/ // File: Parallel.cpp // Description: File downloaders are programs used for downloading files // from the Internet. Using the fork() & execlp("wget") command, the // following is a multi-process parallel file downloader which reads an // input file containing url file download links as a commandline // argument and downloads the files located on the internet all at once. // ============================================================================ #include <iostream> #include <cstring> #include <fstream> #include <cstdlib> #include <unistd.h> #include <sys/wait.h> using namespace std; // compile & run // g++ Parallel.cpp -o Parallel // ./Parallel urls.txt int main(int argc, char* argv[]) { // declare variables pid_t pid = -1; int urlNumber = 0; char urlName[256]; ifstream infile; // check if theres enough command line args if(argc < 2) { cout <<"\nERROR -- NOT ENOUGH ARGS!" <<"\n\nUSAGE: "<<argv[0]<<" <file containing url downloads>\n\n"; exit(1); } // try to open the file containing the download url links // exit if the url file is not found infile.open(argv[1]); if(infile.fail()) { cout <<"\nERROR -- "<<argv[1]<<" NOT FOUND!\n\n"; exit(1); } // get download url links from the file while(infile.getline(urlName, sizeof(urlName))) { ++urlNumber; // fork another process pid = fork(); if(pid < 0) { // ** error occurred perror("fork"); exit(1); } else if(pid == 0) { // ** child process cout <<endl<<"** URL #"<<urlNumber <<" is currently downloading... **\n\n"; execlp("/usr/bin/wget", "wget", urlName, NULL); } } infile.close(); while(urlNumber > 0) { // ** parent process // parent will wait for the child to complete wait(NULL); cout <<endl<<"-- URL #"<<urlNumber<<" is complete! --\n"; --urlNumber; } cout <<endl<<"The parent process is now exiting...\n"; return 0; }// http://programmingnotes.org/ |
QUICK NOTES:
The highlighted lines are sections of interest to look out for.
The code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.
Also note, while the parallel downloader executes, the outputs from different children may intermingle.