Sorting Data with Actian Zen

Flat files are a simple way of storing data. For data that can be easily represented as text, reading and writing such files costs little effort. Plain-text formats such as comma-separated value (CSV) files are a go-to solution for developers for this reason. But using a flat format can have repercussions down the road. To see why, this tutorial will first work with information in a CSV file. Then, it will work with the same information in a Zen data file.

This page covers the topics listed at right.

Reading a CSV File

Our sample CSV file consists of a list of people’s last names, first names, and birth dates. Each field is separated by a comma, and the record for each person is on a line by itself. Here are a few sample lines from the file:

Smith,John,1985-04-23
Quinton,James,1960-12-23
Doe,Jane,2000-01-19

We’ll read all of the lines from the file into a list in memory, then write that list back out to the console. Right now, we’re not trying to parse the file, only read it into memory and display it.

#include <iostream>
#include <fstream>
#include <vector>
using namespace std;
const int READ_BUFFER_SIZE = 1024;
int main()
{
     vector<string> fileLines;
     //Reading the file
     ifstream input("userList.csv");
     if (!input)
          return -1;
     char readBuffer[READ_BUFFER_SIZE];
     while (input) {
          input.getline(readBuffer, READ_BUFFER_SIZE);
          if(strlen(readBuffer))
               fileLines.push_back(readBuffer);
     }
     input.close();
     //Print the contents of the file  
     for (auto i = 0; i < fileLines.size(); ++i)
     {
          cout << fileLines[i] << endl;
     }
     return 0;
}

Sorting CSV Data as Strings

Now let’s say we want to get the first five elements based on the last name. The first step would be to sort the data by the last name. Since the data read by this file starts with a person’s last name, we could just perform sorting on the full line to get the desired effect. Note that to use this sorting algorithm, the entire list has to be in memory.

void sortList(vector<string> & sourceList) 
{ 
    bool moreSortingNeeded = true; 
    while (moreSortingNeeded) { 
             moreSortingNeeded = false; 
             for (auto i = 0; i < sourceList.size()-1; ++i) 
             { 
                  std::string tempA = sourceList[i]; 
                  std::string tempB = sourceList[i + 1]; 
                  if (tempB < tempA) 
                  { 
                       sourceList[i] = tempB; 
                       sourceList[i + 1] = tempA; 
                       moreSortingNeeded = true; 
                  } 
             } 
      } 
}

Sorting CSV Data on a Field

With the data sorted, we can find the first five elements based on last name by taking the first five elements in the sorted list. However, if we want to select the first five elements based on some other criteria, a bit more work is needed. We need to separate the fields of each line. If there’s numerical or date data in the CSV, it must be parsed and converted to be properly sorted. To demonstrate, let’s sort the data on the first name field instead. Because the first name field isn’t at the beginning of the record, we need to parse the fields. Here’s a data structure for holding the information on a person:

struct Date {
     unsigned char Day;
     unsigned char Month;
     short int Year;
};
struct Employee {
     int ID;
     char FirstName[32];
     char LastName[32];
     Date DOB;
};

We can copy the data out of the lines into separate fields. There’s a C function named strtok_s (string tokenizer) that’s used to separate a string into substrings divided by a delimiter. Using this function, the fields on a line in our CSV and the parts of the date are separated and parsed. Our Employee struct is populated with these values:

{
     Employee p = { 0 };
     char buffer[1024];
     char* pch;
     char* text = new char[line.size() + 1];
     strncpy_s(text,line.size()+1,  line.c_str(), line.size());
     char *lastName  = strtok_s(text, ",",&pch);
     char* firstName  = strtok_s(NULL, ",", &pch);
     char* dobString = strtok_s(NULL, ",", &pch);
     strncpy_s(p.FirstName, sizeof(p.FirstName), firstName, strlen(firstName));
     strncpy_s(p.LastName, sizeof(p.LastName), lastName, strlen(lastName));
     //////////
     char* year = strtok_s(dobString, "-", &pch);
     char* month = strtok_s(NULL, "-", &pch);
     char* day = strtok_s(NULL, "-", &pch);
     p.DOB.Month = stoi(month);
     p.DOB.Day = stoi(day);
     p.DOB.Year = stoi(year);
     return p;
}

Sorting CSV Data with Strongly Typed Fields

Now we need a different sort method. The following code is based on the algorithm used earlier, but it is written to work with the strongly typed fields of the structure:

void sortEmployeeList(vector<Employee>& sourceList)
{
     bool moreSortingNeeded = true;
     while (moreSortingNeeded)
     {
          moreSortingNeeded = false;
          for (auto i = 0; i < sourceList.size() - 1; ++i)
          {
               if (strncmp(sourceList[i].LastName, sourceList[i + 1].LastName, sizeof(Employee::LastName)) > 0)
               {
                    Employee temp = sourceList[i];
                    sourceList[i] = sourceList[i + 1];
                    sourceList[i + 1] = temp;
                    moreSortingNeeded = true;
               }
          }
     }
}

If we later want to search by some other attributes, we’d need to change the code again. We’d also need to update the parsing if we began working with additional data types.

This works, but the code has gotten more complex. And if we want to sort on some other criteria, we’d need to make further changes to the code. Also, to perform the sort, we have to load all the data into memory. As the amount of data grows, this could start causing performance problems.

Reading a Zen Data File

Let’s look now at a better way. Suppose our data is stored in an Actian Zen database file instead of a CSV file. How would the sorting work?

Zen data files can be created through API calls or through the Zen Control Center (ZenCC) application. The ZenCC interface is easy to use for quickly defining a table and its fields. I created a table named Employee that contains the same data as our CSV. There are two ways to access data in Zen data files: You can either read it using a database driver with SQL commands or use the Btrieve 2 NoSQL API. SDKs for both methods are available for a variety of platforms and programming languages, providing a wide range of portability options among different development environments. While both methods have their advantages, Btrieve 2 is simpler to use (you don’t need to know SQL) and executes faster, so that’s what we’ll use here.

The fields in our Zen data file are similar to those of our CSV. If data already exists in a CSV file, you can import it by right-clicking a database table and choosing the Import Data command.

Btrieve 2 classes will handle many of the details of working with the data. With just a few lines of code, you can open a database file and begin reading from it. And there are other advantages to using a Zen data file with Btrieve 2. Instead of reading a string that may require further parsing, Btrieve can read directly into a matching typed data structure.

struct Date {
         unsigned char Day;
         unsigned char Month;
         short int Year;
};
struct Employee { 
         int ID;
         char FirstName[33];
         char LastName[33];
         Date DOB;
};

Some C/C++ compilers have an option to ensure that all fields within a struct are aligned on 4-byte boundaries. The option should be turned off for these structures to compile as needed. If the option isn’t turned off, the data may populate the wrong fields as it’s read.

Let’s start by reading five records and printing their values:

const char* FILE_NAME = "C:\\zendb\\Employees\\EMPLOYEE.MKD";
BtrieveClient btrieveClient;
Btrieve::StatusCode status;
BtrieveFile btrieveFile;
Employee record;
int recordCount = 0;

status = btrieveClient.FileOpen(&btrieveFile, FILE_NAME, NULL, Btrieve::OPEN_MODE_NORMAL);
if (status != Btrieve::STATUS_CODE_NO_ERROR)
    return -1;
int bytesRead = btrieveFile.RecordRetrieveFirst(Btrieve::INDEX_NONE, (char*)&record, sizeof(record));
while (status == Btrieve::STATUS_CODE_NO_ERROR && recordCount < 5)
{
    cout << record.FirstName << " " << record.LastName << endl;
    btrieveFile.RecordRetrieveNext((char*)&record, sizeof(record));
    status = btrieveFile.GetLastStatusCode();
    ++recordCount;
}
btrieveClient.FileClose(&btrieveFile);
btrieveClient.Reset();

Reading the values is much simpler – we didn’t need additional code for parsing the values and populating individual fields. But we don’t want to read just any five records. We want to read the top five records according to the value of one of the fields.

Using an Index to Sort Data in Zen

With a Zen data file, you can select the order in which records are read by using an index. An index provides a quick and efficient way to retrieve records based on a set of key fields within the record. Using Btrieve 2, a new index can be created with a few function calls.

To create an index, we identify the fields that we want considered in the sorting of the records. This could be only one field or several fields. These fields are collected together and assigned an index ID.

When the records were read from the database in the previous example, the index used was Btrieve::INDEX_NONE. With this value, no specific index is used and the records come back in an undefined order. If the number of a newly defined index is used, then the records will be retrieved in the order described by the index.

Let’s make an index for sorting on employee name. For this index, the last name will be the primary sorting value and the first name will be a secondary sorting value. If two people have the same last name, then their first name will determine their sorting order.

//Declarations for the variables that will be used
BtrieveIndexAttributes nameIndex;
BtrieveKeySegment lastNameKey, firstNameKey;
BtrieveFileInformation bfi;

//Setting the index number for our index
Btrieve::Index targetIndex = Btrieve::INDEX_100;
nameIndex.SetIndex(targetIndex);

const int LastName_Offset = 37;
lastNameKey.SetField(LastName_Offset, 34, Btrieve::DataType::DATA_TYPE_CHAR);
const int FirstName_Offset = 4; // offsetof(Employee, FirstName);
firstNameKey.SetField(FirstName_Offset, 34, Btrieve::DataType::DATA_TYPE_CHAR);

//Add the last name field to our collection of keys for
//     the index and then the first name
lastNameIndex.SetIndex(targetIndex);

status = btrieveFile.IndexDrop(targetIndex);
status = btrieveFile.IndexCreate(&nameIndex);

When the index is created, it is populated with the relevant data from all the existing records in the file, and it is automatically maintained for all subsequent insert/update/delete operations. You do not have to recreate the index every time you want to sort the data. After the index is created, there’s one line in the previous example that must be changed to use the index. The line was previously written as:

int bytesRead = btrieveFile.RecordRetrieveFirst(Btrieve::INDEX_NONE, (char*)&record, sizeof(record));

The updated line of code is:

int bytesRead = btrieveFile.RecordRetrieveFirst(targetIndex, (char*)&record, sizeof(record));

Had we wanted to read the results in a reverse order we could start reading with RecordRetrieveLast/RecordRetrievePrevious instead of using RecordRetrieveFirst/RecordRetrieveNext.

Or the sort order on the keys used to make the index could be set to use descending order instead of their default ascending order.

lastNameKey.SetDescendingSortOrder(true);
firstNameKey.SetDescendingSortOrder(true);

Setting the sort order on the fields gives the option of using a mixture of ascending and descending order on different keys within the same index.

By defining multiple indices, the order in which records are retrieved is selectable by changing the index used. If indices were defined for sorting by the first name, last name, and date of birth, we could easily select an index based on some parameter. For example, if our program accepted a command line string with the possible values of first, last, and dob, then the index could be selected with code similar to the following:

Btrieve::Index sortIndex = Btrieve::INDEX_NONE;
if (argc > 1)
{
     if (!strncmp("first", argv[1], 5))
          sortIndex = Btrieve::INDEX_2;
     else if (!strncmp("last", argv[1], 4))
          sortIndex = Btrieve::INDEX_3;
     else if (!strncmp("dob", argv[1], 3))
          sortIndex = Btrieve::INDEX_4;
}

Now the program supports the sort order being changed at run time.

Using the Btrieve 2 API to select a sort order involves far less effort than implementing a sort algorithm on your own. With much less code, we’re now able to read the top elements of our data based on sorting criteria – without loading the complete set of data into memory.

If you’d like to learn more about using Btrieve 2 and Zen databases, browse Btrieve 2 content under Developer Reference > Data Access Methods in the online documentation. You’ll find the SDKs for Actian Zen on the Electronic Software Distribution page.