C++ || Simple Spell Checker Using A Hash Table

January 31, 2013 admin 4 Comments

Click Here For Updated Version Of Program

The following is another programming assignment which was presented in a C++ Data Structures course. This assignment was used to gain more experience using hash tables.

REQUIRED KNOWLEDGE FOR THIS PROGRAM

Hash Table - What Is It? How To Create A Spell Checker How To Read Data From A File Strtok - Split Strings Into Tokens #include 'HashTable.h' The Dictionary File - Download Here

== OVERVIEW ==

This program first reads words from a dictionary file, and inserts them into a hash table.

The dictionary file consists of a list of 62,454 correctly spelled lowercase words, separated by whitespace. The words are inserted into the hash table, with each bucket growing dynamically as necessary to hold all of the incoming data.

After the reading of the dictionary file is complete, the program prompts the user for input. After input is obtained, each word that the user enteres into the program is looked up within the hash table to see if it exists. If the user entered word exists within the hash table, then that word is spelled correctly. If not, a list of possible suggested spelling corrections is displayed to the screen.

== HASH TABLE STATISTICS ==

To better understand how hash tables work, this program reports the following statistics to the screen:

• The total size of the hash table. • The size of the largest hash table bucket. • The size of the smallest hash table bucket. • The total number of buckets used. • The average hash table bucket size.

A timer is used in this program to time (in seconds) how long it takes to read in the dictionary file. The program also saves each hash table bucket into a separate output .txt file. This is used to further visualize how the hash table data is internally being stored within memory.

== SPELL CHECKING ==

The easiest way to generate corrections for a spell checker is via a trial and error method. If we assume that the misspelled word contains only a single error, we can try all possible corrections and look each up in the dictionary.

Example:

wird: bird gird ward word wild wind wire wiry

Traditionally, spell checkers look for four possible errors: a wrong letter (“wird”), also knows as alteration. An inserted letter (“woprd”), a deleted letter (“wrd”), or a pair of adjacent transposed letters (“wrod”).

The easiest of which is checking for a wrong letter. For example, if a word isnt found in the dictionary, all variants of that word can be looked up by changing one letter. Given the user input “wird,” a one letter variant can be “aird”, “bird”, “cird”, etc. through “zird.” Then “ward”, “wbrd”, “wcrd” through “wzrd”, can be checked, and so forth. Whenever a match is found within the dictionary, the spelling correction should be displayed to the screen.

For a detailed analysis how the other methods can be constructed, click here.

===== SIMPLE SPELL CHECKER =====

This program uses a custom template.h class. To obtain the code for that class, click here.





		
		
			Simple Spell Checker Using A Hash Table
			
C++
			
			// ============================================================================
//    Author: Kenneth Perkins
//    Date:   Jan 31, 2013
//    Taken From: http://programmingnotes.org/
//    File:  SpellCheck.cpp
//    Description: This is a simple spell checker which tests the HashTable.h
//          class. 
// ============================================================================
#include <iostream>
#include <fstream>
#include <cctype>
#include <cstring>
#include <string>
#include <iomanip>
#include <ctime>
#include <limits>
#include "HashTable.h"
using namespace std;

// iterator declaration for hash table
typedef HashTable<string>::Iterator iterDec;
 
// hash table size
const int TABLE_SIZE = 19000;

// strtok delimiters
const char* DELIMITERS = " ,.-\':;?()+*/\\%$#!\"@\^&";

// function prototypes
void PrintTableStats(HashTable<string>& hashTable);
int SpellCheck(HashTable<string>& hashTable, string word);
string ToLowerCase(string word);

int main()
{	
	// declare variables
	int result = 0;
	string userInput;
	string currWord;
	clock_t beg; // used to time the hashtable load
	clock_t end; // used to time the hashtable load
	char response;
	ifstream infile;
	HashTable<string> hashTable(TABLE_SIZE);

	// open the dictionary file
	infile.open("INPUT_Dictionary_programmingnotes_freeweq_com.txt");

	// check if the file exists, EXIT if it doesnt
	if(infile.fail())
	{
		cout<<"\n\n**ERROR - The dictionary file could not be found...\n";
		exit(1);
	}

	cerr<<"\nLoading dictionary....";		
	beg = clock(); // start the timer

	// get data from file and put into hashtable
	while(infile >> currWord)
	{
		// makes sure duplicate words arent inserted into table
		if(!hashTable.Count(currWord))
		{
			hashTable.Insert(currWord);
		}
	}	
	infile.close();
	PrintTableStats(hashTable);	

	end = clock()-beg; // end the timer
	cout<<"\n\nDictionary loaded in "<<
		(double)end / ((double)CLOCKS_PER_SEC)<<" secs!";
	
	// creates a line separator
	cout<<endl;
	cout.fill('-');
	cout<<left<<setw(50)<<""<<endl;	

	do{ // get user input
		cout<<"\n>> Please enter a sentence: ";	
		getline(cin,userInput);
		cout<<endl;

		// split each word from the string into individual words to check if 
		// they are spelled correctly 
		char* splitInput = strtok(const_cast<char*>(userInput.c_str()),DELIMITERS);
		while(splitInput!=NULL)
		{		
			currWord = splitInput;
			currWord = ToLowerCase(currWord);
			result += SpellCheck(hashTable,currWord);
			splitInput = strtok(NULL,DELIMITERS);
		}

		// display results		
		if(result > 0)
		{
			cout<<"Number of words spelled incorrectly: "<<result<<endl;
			result = 0;
		}

		// ask for more data
		cout<<"\nDo you want to enter another sentence? (y/n): ";
		cin >> response;
		cin.ignore(numeric_limits<streamsize>::max(),'\n'); // clear the cin buffer
	}while(toupper(response)=='Y');

	cout<<"\nBYE!!\n";

	return 0;
}// end of main

void PrintTableStats(HashTable<string>& hashTable)
{
	int largestBucket = -9999999;
	int largestIndex = 0;
	int smallestBucket = 9999999;
	int smallestIndex = 0;
	double numBuckestUsed = 0;
	ofstream outfile("OUTPUT_HashTable_Stats_programmingnotes_freeweq_com.txt");
	
	for(int x=0; x < hashTable.TableSize(); ++x)
	{
		// iterator is used to traverse each hashtable bucket
		iterDec it = hashTable.begin(x);
		if(!hashTable.IsEmpty(x))
		{
			if(smallestBucket > hashTable.BucketSize(x))
			{
				smallestBucket = hashTable.BucketSize(x);
				smallestIndex = x;
			}
			if(largestBucket < hashTable.BucketSize(x))
			{
				largestBucket = hashTable.BucketSize(x);
				largestIndex = x;
			}
			++numBuckestUsed;
			outfile<<"\nBucket #"<<x<<":\n";
			for(int y = 0; y < hashTable.BucketSize(x); ++y)
			{
				outfile <<"\t"<< it[y] << endl;
			}
		}
	}
	cout<<"Complete!\n";

	// creates a line separator
	cout<<endl;
	cout.fill('-');
	cout<<left<<setw(50)<<""<<endl;

	cout<<"Total dictionary words = "<<hashTable.TotalElems()<<endl
		<<"Hash table size = "<<hashTable.TableSize()<<endl
		<<"Largest bucket size = "<<largestBucket<< " items at index #"<<largestIndex<<endl
		<<"Smallest bucket size = "<<smallestBucket<< " items at index #"<<smallestIndex<<endl
		<<"Total buckets used = "<<numBuckestUsed<<endl
		<<"Total percent of hash table used = "<<(numBuckestUsed/hashTable.TableSize())*100<<"%"<<endl
		<<"Average bucket size = "<<(hashTable.TotalElems()/numBuckestUsed)<<" items";
}// end of PrintTableStats

int SpellCheck(HashTable<string>& hashTable, string word)
{	
	int result = 0;
	int suggestion = 0;
	string remove[256];
	int numRemove=0;

	if(!hashTable.Count(word))
	{
		++result;
		cout<<"** "<<word<<": ";			
		// alteration & insertion
		for(unsigned x = 0; x < word.length(); ++x)
		{		
			string alteration = word;
			for(char c = 'a'; c <= 'z'; ++c)
			{
				//alteration
				alteration[x] = c;
				if(hashTable.Count(alteration))
				{
					cout<<alteration<<", ";
					remove[numRemove++] = alteration;
					++suggestion;
					// remove the entry so it isnt displayed multiple times
					hashTable.Remove(alteration);
				}
				//insertion
				string insertion = word.substr(0, x) + c + word.substr(x);
				if(hashTable.Count(insertion))
				{
					cout<<insertion<<", ";	
					remove[numRemove++] = insertion;
					++suggestion;
					// remove the entry so it isnt displayed multiple times
					hashTable.Remove(insertion);
				}
			}
		}
		// transposition & deletion	
		for(unsigned x = 0; x < word.length()-1;++x) 
		{
			// transposition 
			string transposition = word.substr(0,x) + word[x+1] + word[x] + word.substr(x+2);
			if(hashTable.Count(transposition))
			{
				cout<<transposition<<", ";
				remove[numRemove++] = transposition;
				++suggestion;
				// remove the entry so it isnt displayed multiple times
				hashTable.Remove(transposition);

			}
			// deletion
			string deletion = word.substr(0, x)+ word.substr(x + 1);
			if(hashTable.Count(deletion))
			{
				cout<<deletion<<", ";
				remove[numRemove++] = deletion;
				++suggestion;
				// remove the entry so it isnt displayed multiple times
				hashTable.Remove(deletion);
			}
		}		
		// place the removed items back inside the hash table
		while(numRemove>=0)
		{
			hashTable.Insert(remove[numRemove--]);
		}
		if(suggestion < 1)
		{
			cout<<"No spelling suggestion found...";
		}
		cout<<endl<<endl;
	}
	return result;
}// end of SpellCheck

string ToLowerCase(string word)
{
	for(unsigned x = 0; x < word.length(); ++x)
	{
		word[x] = tolower(word[x]);
	}
	return word;
}// http://programmingnotes.org/
			
				
					
				
					1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
				
						// ============================================================================
//    Author: Kenneth Perkins
//    Date:   Jan 31, 2013
//    Taken From: http://programmingnotes.org/
//    File:  SpellCheck.cpp
//    Description: This is a simple spell checker which tests the HashTable.h
//          class. 
// ============================================================================
#include <iostream>
#include <fstream>
#include <cctype>
#include <cstring>
#include <string>
#include <iomanip>
#include <ctime>
#include <limits>
#include "HashTable.h"
using namespace std;
 
// iterator declaration for hash table
typedef HashTable<string>::Iterator iterDec;
 
// hash table size
const int TABLE_SIZE = 19000;
 
// strtok delimiters
const char* DELIMITERS = " ,.-\':;?()+*/\\%$#!\"@\^&";
 
// function prototypes
void PrintTableStats(HashTable<string>& hashTable);
int SpellCheck(HashTable<string>& hashTable, string word);
string ToLowerCase(string word);
 
int main()
{	
	// declare variables
	int result = 0;
	string userInput;
	string currWord;
	clock_t beg; // used to time the hashtable load
	clock_t end; // used to time the hashtable load
	char response;
	ifstream infile;
	HashTable<string> hashTable(TABLE_SIZE);
 
	// open the dictionary file
	infile.open("INPUT_Dictionary_programmingnotes_freeweq_com.txt");
 
	// check if the file exists, EXIT if it doesnt
	if(infile.fail())
	{
		cout<<"\n\n**ERROR - The dictionary file could not be found...\n";
		exit(1);
	}
 
	cerr<<"\nLoading dictionary....";		
	beg = clock(); // start the timer
 
	// get data from file and put into hashtable
	while(infile >> currWord)
	{
		// makes sure duplicate words arent inserted into table
		if(!hashTable.Count(currWord))
		{
			hashTable.Insert(currWord);
		}
	}	
	infile.close();
	PrintTableStats(hashTable);	
 
	end = clock()-beg; // end the timer
	cout<<"\n\nDictionary loaded in "<<
		(double)end / ((double)CLOCKS_PER_SEC)<<" secs!";
	
	// creates a line separator
	cout<<endl;
	cout.fill('-');
	cout<<left<<setw(50)<<""<<endl;	
 
	do{ // get user input
		cout<<"\n>> Please enter a sentence: ";	
		getline(cin,userInput);
		cout<<endl;
 
		// split each word from the string into individual words to check if 
		// they are spelled correctly 
		char* splitInput = strtok(const_cast<char*>(userInput.c_str()),DELIMITERS);
		while(splitInput!=NULL)
		{		
			currWord = splitInput;
			currWord = ToLowerCase(currWord);
			result += SpellCheck(hashTable,currWord);
			splitInput = strtok(NULL,DELIMITERS);
		}
 
		// display results		
		if(result > 0)
		{
			cout<<"Number of words spelled incorrectly: "<<result<<endl;
			result = 0;
		}
 
		// ask for more data
		cout<<"\nDo you want to enter another sentence? (y/n): ";
		cin >> response;
		cin.ignore(numeric_limits<streamsize>::max(),'\n'); // clear the cin buffer
	}while(toupper(response)=='Y');
 
	cout<<"\nBYE!!\n";
 
	return 0;
}// end of main
 
void PrintTableStats(HashTable<string>& hashTable)
{
	int largestBucket = -9999999;
	int largestIndex = 0;
	int smallestBucket = 9999999;
	int smallestIndex = 0;
	double numBuckestUsed = 0;
	ofstream outfile("OUTPUT_HashTable_Stats_programmingnotes_freeweq_com.txt");
	
	for(int x=0; x < hashTable.TableSize(); ++x)
	{
		// iterator is used to traverse each hashtable bucket
		iterDec it = hashTable.begin(x);
		if(!hashTable.IsEmpty(x))
		{
			if(smallestBucket > hashTable.BucketSize(x))
			{
				smallestBucket = hashTable.BucketSize(x);
				smallestIndex = x;
			}
			if(largestBucket < hashTable.BucketSize(x))
			{
				largestBucket = hashTable.BucketSize(x);
				largestIndex = x;
			}
			++numBuckestUsed;
			outfile<<"\nBucket #"<<x<<":\n";
			for(int y = 0; y < hashTable.BucketSize(x); ++y)
			{
				outfile <<"\t"<< it[y] << endl;
			}
		}
	}
	cout<<"Complete!\n";
 
	// creates a line separator
	cout<<endl;
	cout.fill('-');
	cout<<left<<setw(50)<<""<<endl;
 
	cout<<"Total dictionary words = "<<hashTable.TotalElems()<<endl
		<<"Hash table size = "<<hashTable.TableSize()<<endl
		<<"Largest bucket size = "<<largestBucket<< " items at index #"<<largestIndex<<endl
		<<"Smallest bucket size = "<<smallestBucket<< " items at index #"<<smallestIndex<<endl
		<<"Total buckets used = "<<numBuckestUsed<<endl
		<<"Total percent of hash table used = "<<(numBuckestUsed/hashTable.TableSize())*100<<"%"<<endl
		<<"Average bucket size = "<<(hashTable.TotalElems()/numBuckestUsed)<<" items";
}// end of PrintTableStats
 
int SpellCheck(HashTable<string>& hashTable, string word)
{	
	int result = 0;
	int suggestion = 0;
	string remove[256];
	int numRemove=0;
 
	if(!hashTable.Count(word))
	{
		++result;
		cout<<"** "<<word<<": ";			
		// alteration & insertion
		for(unsigned x = 0; x < word.length(); ++x)
		{		
			string alteration = word;
			for(char c = 'a'; c <= 'z'; ++c)
			{
				//alteration
				alteration[x] = c;
				if(hashTable.Count(alteration))
				{
					cout<<alteration<<", ";
					remove[numRemove++] = alteration;
					++suggestion;
					// remove the entry so it isnt displayed multiple times
					hashTable.Remove(alteration);
				}
				//insertion
				string insertion = word.substr(0, x) + c + word.substr(x);
				if(hashTable.Count(insertion))
				{
					cout<<insertion<<", ";	
					remove[numRemove++] = insertion;
					++suggestion;
					// remove the entry so it isnt displayed multiple times
					hashTable.Remove(insertion);
				}
			}
		}
		// transposition & deletion	
		for(unsigned x = 0; x < word.length()-1;++x) 
		{
			// transposition 
			string transposition = word.substr(0,x) + word[x+1] + word[x] + word.substr(x+2);
			if(hashTable.Count(transposition))
			{
				cout<<transposition<<", ";
				remove[numRemove++] = transposition;
				++suggestion;
				// remove the entry so it isnt displayed multiple times
				hashTable.Remove(transposition);
 
			}
			// deletion
			string deletion = word.substr(0, x)+ word.substr(x + 1);
			if(hashTable.Count(deletion))
			{
				cout<<deletion<<", ";
				remove[numRemove++] = deletion;
				++suggestion;
				// remove the entry so it isnt displayed multiple times
				hashTable.Remove(deletion);
			}
		}		
		// place the removed items back inside the hash table
		while(numRemove>=0)
		{
			hashTable.Insert(remove[numRemove--]);
		}
		if(suggestion < 1)
		{
			cout<<"No spelling suggestion found...";
		}
		cout<<endl<<endl;
	}
	return result;
}// end of SpellCheck
 
string ToLowerCase(string word)
{
	for(unsigned x = 0; x < word.length(); ++x)
	{
		word[x] = tolower(word[x]);
	}
	return word;
}// http://programmingnotes.org/

QUICK NOTES:
The highlighted lines are sections of interest to look out for.

The code is heavily commented, so no further insight is necessary. If you have any questions, feel free to leave a comment below.

Remember to include the data file.

Once compiled, you should get this as your output

Loading dictionary....Complete!
-------------------------------------------------- Total dictionary words = 61286 Hash table size = 19000 Largest bucket size = 13 items at index #1551 Smallest bucket size = 1 items at index #11 Total buckets used = 18217 Total percent of hash table used = 95.8789% Average bucket size = 3.36422 items Dictionary loaded in 1.861 secs! -------------------------------------------------- >> Please enter a sentence: wird ** wird: bird, gird, ward, weird, word, wild, wind, wire, wired, wiry, Number of words spelled incorrectly: 1 Do you want to enter another sentence? (y/n): y -------------------------------------------------- >> Please enter a sentence: woprd ** woprd: word, Number of words spelled incorrectly: 1 Do you want to enter another sentence? (y/n): y -------------------------------------------------- >> Please enter a sentence: wrd ** wrd: ard, ord, ward, wed, word, Number of words spelled incorrectly: 1 Do you want to enter another sentence? (y/n): y -------------------------------------------------- >> Please enter a sentence: wrod ** wrod: brod, trod, wood, rod, word, Number of words spelled incorrectly: 1 Do you want to enter another sentence? (y/n): y -------------------------------------------------- >> Please enter a sentence: New! Safe and efective ** efective: defective, effective, elective, Number of words spelled incorrectly: 1 Do you want to enter another sentence? (y/n): y -------------------------------------------------- >> Please enter a sentence: This is a sentance with no corections gygyuigigigiug ** sentance: sentence, ** corections: corrections, ** gygyuigigigiug: No spelling suggestion found... Number of words spelled incorrectly: 3 Do you want to enter another sentence? (y/n): n
BYE!!

Was this article helpful?

👍 YesNo

Posted in: Advanced, C++ Tags: hash table, if/else, linked list, spell checker

4 Responses to C++ || Simple Spell Checker Using A Hash Table

Pintos says:

September 19, 2013 at 10:36 am

What code does the header file hashtable.h consists?

Reply
- admin says:
  
  September 19, 2013 at 1:44 pm
  
  The source code link for the hashtable.h class is listed on this current page
  
  Reply
Matt Fomich says:

February 8, 2015 at 5:54 am

Very nicely done, thanks for posting this.

Reply
George Brown says:

August 27, 2018 at 2:11 am

is there i c version of this code?

Reply

C++ || Simple Spell Checker Using A Hash Table

Related

4 Responses to C++ || Simple Spell Checker Using A Hash Table

Leave a ReplyCancel reply

Search

Compile Your Code Now

Translate Page

Popular Pages

Recent

Archives

Categories