Tag Archives: knapsack problem
C++ || Knapsack Problem Using Dynamic Programming
The following is another homework assignment which was presented in an Algorithm Engineering class. Using a custom timer class, the following is a program which reduces the problem of selecting which Debian Linux packages to include on installation media, to the classical knapsack problem.
The program demonstrated on this page extends the previously implemented Exhaustive Search algorithm, this time using Dynamic Programming to find a solution to this problem.
==== 1. THE PROBLEM ====
The Exhaustive Search algorithm ran in O(2n) time and was very slow.
The goal of this program is to implement a dynamic programming algorithm which runs in O(n2 • W) time, returning the optimal set of items/packages contained in the input file, which is fast enough to produce solutions to inputs of realistic size.
This program solves the following problem:
Input: A list of Debian packages, each with a size (in kilobytes) and number of votes, both of which are integers; and an integer W
representing the capacity of the install media (in kilobytes).
Output: The set of packages with total size ≤ W
, such that the total number of package votes is maximized.
In terms of the knapsack problem, the Debian package votes correspond to knapsack values, and our Debian package sizes correspond to knapsack weights.
==== 2. THE INPUT ====
The following file (knapsack_packages.txt) contains the name, size, and votes for approximately 36,000 binary Debian packages that are currently available on the amd64 platform, and have popcon information.
The file obeys the following format:
[ number of packages ]
[ name 0 ] [ space ] [ votes 0 ] [ space ] [ size 0 ] [ newline ]
[ name 1 ] [ space ] [ votes 1 ] [ space ] [ size 1 ] [ newline ]
[ name 2 ] [ space ] [ votes 2 ] [ space ] [ size 2 ] [ newline ]
...
The packages are arranged within the file in sorted order, from the largest number of votes to the smallest.
==== 3. FLOW OF CONTROL ====
A test harness program is created which executes the above function and measures the elapsed time of the code corresponding to the algorithm in question. The test program will perform the following steps:
1. Load the "knapsack_packages.txt" file into memory.
2. Input size "n" and weight "W" as variables.
3. Execute the dynamic knapsack algorithm. The output of this algorithm should be the best combination sequence of packages.
4. Print the first 20 packages from the solution. A solution is the best sequence of packages, displaying the name of the package, the votes, and the size of each package, as well as the total votes and total size of the entire solution.
A knapsack problem instances is created of varying input sizes “n” by using the first “n” entries in the file knapsack_packages.txt. In other words, to create a problem instance with n = 100
, only use the first 100 packages listed in the file as input.
==== 4. TEST HARNESS ====
Note: This program uses two external header files (Timer.h and Project2.h).
• Code for the Timer class (Timer.h) can be found here.
• Code for “Project2.h” can be found here.
• “Project5.h” is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
// ============================================================================= // Author: K Perkins // Date: Nov 3, 2013 // Taken From: http://programmingnotes.org/ // File: main.cpp // Description: This is the test harness program which runs the code and // measures the elapsed time of the code corresponding to the algorithm // in question. // // This test program will reduce the problem of selecting which Debian // Linux packages to include on installation media, using the knapsack // problem. The goal is to implement a dynamic programming algorithm // for the knapsack problem, returning an optimal set of items/packages // which is fast enough to produce solutions to inputs of realistic size. // ============================================================================= #include <iostream> #include <cstdlib> #include <cassert> #include <fstream> #include <vector> #include "Timer.h" #include "Project2.h" #include "Project5.h" using namespace std; // the maximum weight const int MAX_WEIGHT = 65536; // the number of packages to examine const int NUM_PACKAGES = 36149; int main() { // declare variables Timer timer; Project5 proj5; Project2 proj2; PackageStats access; ifstream infile; vector<PackageStats> packages; vector<PackageStats> bestCombo; vector<int> knapResult; int* packageWeight = new int[NUM_PACKAGES]; int* packageVotes = new int[NUM_PACKAGES]; int totPackages = 0; // open file & make sure it exists infile.open("knapsack_packages.txt"); if(infile.fail()) { cout<<"nCant find file!n"; exit(1); } // get the total number of packages from the file infile >> totPackages; // make sure there are enough packages in the file assert(NUM_PACKAGES <= totPackages); // get the remaining info from the file // std::string int int for(int x=0;(infile >> access.name >> access.votes >> access.size) && x < NUM_PACKAGES; ++x) { packages.push_back(access); packageWeight[x] = access.size; packageVotes[x] = access.votes; } infile.close(); // display stats cerr<<"n = "<<NUM_PACKAGES<<", W = "<<MAX_WEIGHT <<"nn-- Dynamic Search Solution --n"; // start the timer timer.Start(); // return a vector containing the array indexes // of the best knapsack package solution knapResult = proj5.DynamicKnapsack(packageWeight, packageVotes, NUM_PACKAGES, MAX_WEIGHT); // stop the timer timer.Stop(); // using the data found from above, return // the packages that reside in those array indexes bestCombo = proj5.ReturnBest(packages, knapResult); // display info to the screen cout<<"n- Number of packages generated in this set: "<<bestCombo.size() <<"nn- First 20 packages..nn"; // display the best solution packages proj2.Display(bestCombo, 20); // display the size and total votes cout<<"nTotal Size = "<<proj2.TotalSize(bestCombo)<<" -- Total Votes = " <<proj2.TotalVotes(bestCombo)<<endl; // display the elapsed time cout<<endl<<"It took "<<timer.Elapsed()*1000 <<" clicks ("<<timer.Elapsed()<<" seconds)"<<endl; delete[] packageWeight; delete[] packageVotes; return 0; }// http://programmingnotes.org/ |
==== 5. THE ALGORITHMS – “include Project5.h” ====
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
// ============================================================================= // Author: K Perkins // Date: Nov 3, 2013 // Taken From: http://programmingnotes.org/ // File: Project5.h // Description: This is a simple class which holds the functions for // project 5 // ============================================================================= #ifndef PROJECT5_H #define PROJECT5_H #include <vector> #include <algorithm> #include "Project2.h" using std::vector; struct PackageSet { int index; int weight; vector<int> appearances; }; class Project5 { public: Project5(){} vector<int> DynamicKnapsack(int packageWeight[], int packageVotes[], int numPackages, int maxWeight) { // declare variables vector<int> best; vector<int> temp; vector<PackageSet> pset; int** T = new int*[numPackages+1]; int deleted = 0; int currWeight = maxWeight; bool dontCheck = false; for(int x=0; x <= numPackages; ++x) { T[x] = new int[maxWeight+1]; PackageSet access; dontCheck = false; temp.clear(); for(int y=0; y <= maxWeight; ++y) { if(x==0 || y==0) { T[x][y] = 0; } else if(packageWeight[x-1] <= y) { T[x][y] = std::max(T[x-1][y-packageWeight[x-1]] + packageVotes[x-1], T[x-1][y]); if(T[x][y] == (T[x-1][y-packageWeight[x-1]] + packageVotes[x-1])) { // if we find a valid packet, place it into // the temp vector for storage if(!dontCheck && !(std::find(temp.begin(), temp.end(), packageWeight[x-1]) != temp.end())) { temp.push_back(packageWeight[x-1]); } // find all of the weight instances where // this packet is "valid" access.appearances.push_back(y); dontCheck = true; } } else { T[x][y] = T[x-1][y]; } }// end for loop // gather info about the packet we just found if((std::find(temp.begin(), temp.end(), packageWeight[x-1]) != temp.end())) { access.index = x-1; access.weight = packageWeight[x-1]; pset.push_back(access); } // memory management, used to delete // array indexes thats no longer in use if(x > 1) { delete[] T[deleted++]; } }// end for loop delete[] T; // obtain the best possible knapsack solutuion, and save // their array indexes into a vector, starting from the end // NOTE: this places the knapsack solution in opposite (reverse) order for(int x = pset.size()-1; x >= 0; --x) { if(IsSolution(pset, x, currWeight)) { best.push_back(pset.at(x).index); currWeight -= pset.at(x).weight; } pset.pop_back(); } // reverse the vector back into ascending order std::reverse(best.begin(), best.end()); return best; }// end of DynamicKnapsack bool IsSolution(vector<PackageSet>& pset, int index, int currWeight) { return std::find(pset.at(index).appearances.begin(), pset.at(index).appearances.end(), currWeight) != pset.at(index).appearances.end(); }// end of IsSolution vector<PackageStats> ReturnBest(vector<PackageStats>& packages, vector<int>& knapResult) { vector<PackageStats> best; for(unsigned x=0; x < knapResult.size(); ++x) { best.push_back(packages.at(knapResult.at(x))); } return best; }// end of ReturnBest ~Project5(){} }; #endif // http://programmingnotes.org/ |
QUICK NOTES:
The highlighted lines are sections of interest to look out for.
The code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.
Note: Dont forget to include the input file!
The following is sample output:
=== RUN #1 ===
n = 36149, W = 65536
-- Dynamic Search Solution --
- Number of packages generated in this set: 764
- First 20 packages..
debianutils 90 128329
libgcc1 46 128327
debconf 168 121503
grep 595 121426
gzip 142 121346
sed 261 121220
findutils 805 121173
lsb-base 26 121121
sysv-rc 80 121092
sysvinit-utils 116 121078
base-files 65 121072
initscripts 93 121037
util-linux 846 121022
mount 272 120961
libselinux1 109 120918
cron 120 120872
base-passwd 49 120856
tzdata 488 120813
logrotate 64 120745
popularity-contest 67 120729Total Size = 65536 -- Total Votes = 25250749
It took 33500 clicks (33.50 seconds)
=== RUN #2 ===
n = 36149, W = 800000
-- Dynamic Search Solution --
- Number of packages generated in this set: 4406
- First 20 packages..
debianutils 90 128329
libgcc1 46 128327
dpkg 2672 128294
perl-base 1969 127369
debconf 168 121503
grep 595 121426
gzip 142 121346
login 980 121332
coreutils 6505 121240
bash 1673 121229
sed 261 121220
findutils 805 121173
lsb-base 26 121121
sysv-rc 80 121092
sysvinit-utils 116 121078
base-files 65 121072
initscripts 93 121037
util-linux 846 121022
mount 272 120961
libselinux1 109 120918Total Size = 800000 -- Total Votes = 41588693
It took 608760 clicks (608.76 seconds)
C++ || Knapsack Problem Using Exhaustive Search
The following is another homework assignment which was presented in an Algorithm Engineering class. Using a custom timer class, the following is a program which reduces the problem of selecting which Debian Linux packages to include on installation media, to the classical knapsack problem. This program implements an exhaustive search algorithm for this problem, and displays its performance running time to the screen.
NOTE: Looking for the Dynamic Programming version to this problem? Click here!
==== 1. OVERVIEW ====
Debian is a well-established GNU/Linux distribution. Ubuntu Linux, arguably the most popular Linux distribution, is based on Debian. Debian software is organized into packages. Each package corresponds to a software component, such as the GCC compiler or Firefox web browser. There are packages not only for the software that comprises the operating system, but also end-user applications. Nearly every piece of noteworthy open source software has a corresponding package. There are currently over 29,000 Debian source code packages available.
This wide selection of packages is mostly a good thing. However it poses a problem for creating installation media – the floppy disks, CDs, DVDs, or flash drive images that administrators use to install the operating system. If a user wants to install a package that is missing from their installation media they need to download it over the internet, so it is beneficial to include as many packages as possible. It is impractical to include every package on those media, so the Debian project needs to select a small subset of important packages to include.
Another part of Debian is the Popularity Contest, or popcon:
The Popularity Contest tries to measure the popularity of each Debian package. Users may elect to participate, in which case the list of their installed packages is transmitted back to Debian. These lists are tallied as votes. So, thanks to popcon, we can get a vote tally for each package.
==== 2. THE PROBLEM ====
This program solves the following problem:
Input: A list of Debian packages, each with a size (in kilobytes) and number of votes, both of which are integers; and an integer W
representing the capacity of the install media (in kilobytes).
Output: The set of packages with total size ≤ W
, such that the total number of package votes is maximized.
In terms of the knapsack problem, the Debian package votes correspond to knapsack values, and our Debian package sizes correspond to knapsack weights.
==== 3. THE INPUT ====
The following file (knapsack_packages.txt) contains the name, size, and votes for approximately 36,000 binary Debian packages that are currently available on the amd64 platform, and have popcon information.
The file obeys the following format:
[ number of packages ]
[ name 0 ] [ space ] [ votes 0 ] [ space ] [ size 0 ] [ newline ]
[ name 1 ] [ space ] [ votes 1 ] [ space ] [ size 1 ] [ newline ]
[ name 2 ] [ space ] [ votes 2 ] [ space ] [ size 2 ] [ newline ]
...
The packages are arranged within the file in sorted order, from the largest number of votes to the smallest.
==== 4. FLOW OF CONTROL ====
A test harness program is created which executes the above function and measures the elapsed time of the code corresponding to the algorithm in question. The test program will perform the following steps:
1. Load the "knapsack_packages.txt" file into memory.
2. Input size "n" and weight "W" as variables.
3. Execute the exhaustive search knapsack algorithm. The output of this algorithm should be the best combination sequence of packages.
4. Print the solution. A solution is the best sequence of packages, displaying the name of the package, the votes, and the size of each package, as well as the total votes and total size of the entire solution.
A knapsack problem instances is created of varying input sizes “n” by using the first “n” entries in the file knapsack_packages.txt. In other words, to create a problem instance with n = 100
, only use the first 100 packages listed in the file as input.
==== 5. TEST HARNESS ====
Note: This program uses a custom Timer class (Timer.h). To obtain code for that class, click here.
“Project2.h” is listed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
// ============================================================================= // Author: K Perkins // Date: Nov 2, 2013 // Taken From: http://programmingnotes.org/ // File: main.cpp // Description: This is the test harness program which runs the code and // measures the elapsed time of the code corresponding to the algorithm // in question. This test program will reduce the problem of selecting // which Debian Linux packages to include on installation media, using the // knapsack problem. An exhaustive search algorithm is used, and a measure // of its performance is recorded. // ============================================================================= #include <iostream> #include <cstdlib> #include <cassert> #include <fstream> #include <string> #include <vector> #include "Timer.h" #include "Project2.h" using namespace std; // the maximum weight const int MAX_WEIGHT = 65536; // the number of packages to examine const int NUM_PACKAGES = 24; int main() { // declare variables Timer timer; Project2 proj2; ifstream infile; PackageStats access; vector<PackageStats> packages; vector<PackageStats> bestCombo; int totPackages = 0; // open file & make sure it exists infile.open("knapsack_packages.txt"); if(infile.fail()) { cerr<<"nCant find file!n"; exit(1); } // get the total number of packages from the file infile >> totPackages; // make sure there are enough packages in the file assert(NUM_PACKAGES <= totPackages); // display stats cerr<<"n = "<<NUM_PACKAGES<<", W = "<<MAX_WEIGHT <<"nn-- Exhaustive Search Solution --nn"; // get the remaining info from the file // std::string int int for(int x=0; (infile >> access.name >> access.votes >> access.size) && x < NUM_PACKAGES; ++x) { packages.push_back(access); } infile.close(); // start the timer timer.Start(); // find the best knapsack subset solution bestCombo = proj2.ExhaustiveKnapsack(packages, MAX_WEIGHT); // stop the timer timer.Stop(); // display the best solution packages proj2.Display(bestCombo, bestCombo.size()); // display the size and total votes cout<<"nTotal Size = "<<proj2.TotalSize(bestCombo)<<" -- Total Votes = " <<proj2.TotalVotes(bestCombo)<<endl; // display the elapsed time cout<<"nIt took "<<timer.Elapsed()*1000 <<" clicks ("<<timer.Elapsed()<<" seconds)"<<endl; return 0; }// http://programmingnotes.org/ |
==== 6. THE ALGORITHMS – “include Project2.h” ====
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
// ============================================================================= // Author: K Perkins // Date: Nov 2, 2013 // Taken From: http://programmingnotes.org/ // File: Project2.h // Description: This is a simple class which holds the functions for // project 2 // ============================================================================= #ifndef PROJECT2_H #define PROJECT2_H #include <iostream> #include <vector> #include <string> using std::vector; using std::string; // structure to hold the items contained in the file struct PackageStats { string name; int votes; int size; }; class Project2 { public: Project2(){} vector<PackageStats> ExhaustiveKnapsack(vector<PackageStats>& packages, int weight) { vector<PackageStats> best; int* subsets = new int[packages.size()]; int bestVote = 0; // generate subsets (2^n possibilities) for(unsigned x=0; x < (unsigned)(1 << (int)packages.size()); ++x) { int index = 0; int totalSize = 0; int totalVotes = 0; // generate subsets using binary digits for(unsigned y=0; y < packages.size(); ++y) { if(((x >> y) & 1) == 1) { subsets[index++] = y; totalSize += packages.at(y).size; totalVotes += packages.at(y).votes; } } // find the best combination of subsets if((totalSize <= weight) && (best.empty() || (totalVotes > bestVote))) { bestVote = totalVotes; best = ReturnBest(packages, subsets, index); } } delete[] subsets; return best; }// end of ExhaustiveKnapsack int TotalSize(vector<PackageStats>& packages, vector<int> s = vector<int>()) { int total = 0; // if theres 2 parameters if(!s.empty()) { for(unsigned x=0; x < s.size(); ++x) { total += packages.at(s.at(x)).size; } } else // if theres only 1 { for(unsigned x=0; x < packages.size(); ++x) { total += packages.at(x).size; } } return total; }// end of TotalSize int TotalVotes(vector<PackageStats>& packages, vector<int> s = vector<int>()) { int total = 0; // if theres 2 parameters if(!s.empty()) { for(unsigned x=0; x < s.size(); ++x) { total += packages.at(s.at(x)).votes; } } else // if theres only 1 { for(unsigned x=0; x < packages.size(); ++x) { total += packages.at(x).votes; } } return total; }// end of TotalVotes vector<PackageStats> ReturnBest(vector<PackageStats>& packages, int subsets[], int size) { vector<PackageStats> best; for(int x=0; x < size; ++x) { best.push_back(packages.at(subsets[x])); } return best; }// end of ReturnBest void Display(vector<PackageStats>& packages, unsigned size) { for(unsigned x=0; x < size && x < packages.size(); ++x) { std::cout<<packages.at(x).name<<" " <<packages.at(x).size<<" "<<packages.at(x).votes<<std::endl; } }// end of Display ~Project2(){} }; #endif // http://programmingnotes.org/ |
QUICK NOTES:
The highlighted lines are sections of interest to look out for.
NOTE: Looking for the Dynamic Programming version to this problem? Click here!
The code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.
Note: Dont forget to include the input file!
The following is sample output:
n = 24, W = 65536
-- Exhaustive Search Solution --
debianutils 90 128329
libgcc1 46 128327
dpkg 2672 128294
perl-base 1969 127369
debconf 168 121503
grep 595 121426
gzip 142 121346
login 980 121332
coreutils 6505 121240
bash 1673 121229
sed 261 121220
findutils 805 121173
lsb-base 26 121121
sysv-rc 80 121092
sysvinit-utils 116 121078
base-files 65 121072
initscripts 93 121037
util-linux 846 121022
mount 272 120961
libselinux1 109 120918
cron 120 120872
base-passwd 49 120856
apt 1356 120826
tzdata 488 120813Total Size = 19526 -- Total Votes = 2934456
It took 19140 clicks (19.14 seconds)