ACID Operations: Why Actian Zen is a Better Data Store than Flat Files

At first glance, writing application data to a flat file seems like the easiest solution, and that may be true if your only requirement is to write data to a file. But as soon as you realize that the data in the file needs to be read and updated, you’ll most likely find there’s far more work that must be done in your application to ensure the integrity of your data.

Even the most basic data file storage implementations require a lot of code. Through the ages there have been innumerable attempts to make flat file storage easier to deal with.

There were special file formats, including initialization (.ini) files from MS-DOS and early Windows days and comma-separated values (CSV) files, among others, that arose in attempts to make file parsing easier. Later, Microsoft implemented the app.config file (under .NET) in XML, which was intentionally created as a read-only application configuration format to discourage developers from saving values back to the config file. Most likely, the idea was to create a simpler API to manage those app.config files.

Even in the simplest case, if you save data in a flat file for later use, you’re often creating extra work for yourself.

Writing to a Flat File

Let’s take a look at a quick example in C++ that saves two values for later use. The program simply gets the date and time from the system and saves the name of the executable and the last DateTime that executable ran to a file:

#include <iostream>
#include <fstream>
#include <iomanip>
#include <ctime>
using namespace std;
string getDateTime();
int main()
{
     string outFileName = "lastRunTimes.txt";
     ofstream outfile (outFileName,ofstream::binary);
     string delimiter = ",";
     string programName = "saveValues";
     string lastRunTime = getDateTime();
     outfile.write(programName.c_str(),programName.length());
     outfile.write(delimiter.c_str(),delimiter.length());
     outfile.write(lastRunTime.c_str(),lastRunTime.length());
     outfile.close();
     return 0;
}
string getDateTime(){
     auto t = std::time(nullptr);
     auto tm = *std::localtime(&t);
     std::ostringstream oss;
     oss << put_time(&tm, "%m-%d-%Y %H:%M:%S");
     return oss.str();
}

The data in the file looks like this:

saveValues,01-21-2020 14:10:07

Of course, both the file name and the executable name are hard-coded here to make the example shorter. Moreover, every time the program runs, the file is overwritten, since there are no checks to determine whether the file already has data. And of course there’s no error checking, so if the program can’t write to the current working directory, it will fail silently.

This small program has numerous issues. Most significantly, it’s quite a lot of code to do very little work.

Now, suppose a situation arises where you decide you need to alter the program so that a number of services can write to the file to update their last run time. In this case, you know you can alter the original so that it takes the name of the service executable as an input argument, then write a new line with the .exe name and the last run time.

But what if the service has already written a line in the file for its last run time? In that case, the code needs to update the file, not just append to it.

Furthermore, since several services will be using the program to write to the lastRunTimes.txt file, what will happen if the file is locked because one service is writing to it and another service attempts to access the file to update its last run time?

But wait, there’s more. What if you decide to solve the queuing problem by not locking the file while writing so that numerous processes can access the file? In that case, the data written to the file would most likely get entirely out of sync as the process is started concurrently by many calling processes, each one writing its own data to the file without waiting for others. This simple process of writing data to a file quickly unwinds and becomes quite complex.

ACID Transactions

These types of challenges are the focus of the computer science concept referred to by the term ACID, an acronym that stands for:

Atomicity – Ensuring any change to the data is completed, with a success or error returned, before another process interrupts.
Consistency – The state of the data is the same for all calling processes.
Isolation – Ensuring that while the data is being updated by a single process, other processes can’t read the changes until they’re committed. There’s no ability to read partially updated data because the update process is isolated from other processes until complete. Contrast this to a situation when a file is being updated with 10 new bytes, but another process comes along and reads the file stream when only 5 of the bytes have been updated. This would constitute a dirty read.
Durability – Data that has been committed represents the new state of the data even if there’s power loss or other catastrophic failure. Once data is committed, it’s written to some form of permanent storage, not to volatile RAM alone.

With that in mind, let’s think about how we might solve our problems using Btrieve 2, the Zen transactional interface. We want to allow a user to provide the name of a process (the executable file name) and we will save the last DateTime the process ran.

We’ll call the program that uses Btrieve 2: ProcessLogger. ProcessLogger will allow the user to call it to log the process name, the start DateTime, and the user name that started the process.

This is a bit contrived, but it will work well as an example of how much easier it is to save and update data when you use the Btrieve 2 API. Despite being contrived, the idea is real enough.

You can grab the complete code from the GitHub repository, build it in Visual Studio (2017 or later), and then try it out.

A ProcessLogger Program

We want to allow users to write a log entry to a central file each time they start a process on a server.

We’re creating a simple console application here, but you can imagine this as a service that takes the input (executable name, user name) and logs the last start time to a file (named Exe.Info in our example).

The challenge is that multiple processes may want to update the file concurrently. Also, multiple users may manually attempt to start a process and need to write log data to the file.

It is a lot of work to ensure that no two processes ever write to the log file at the same time. But using the Btrieve 2 API makes things much simpler because we can offload the responsibility of managing the output file as if it’s a database, and the Zen engine knows how to handle updates so that they’re protected from multiple processes writing to the file at the same time (atomicity and isolation). This gives you confidence that when you read from the file, the data you see is the latest written (consistency). And of course you know the system commits the data for you so you don’t have to worry about disk failures or flushing bytes from memory (durability).

You can write a record to the file using the following command line input:

c:/>ProcessLogger [process name] [user name]
c:/>ProcessLogger calcSales.exe jim.smith

If the Exe.Info file doesn’t exist in the location where you run ProcessLogger, it is created and the record is added along with a field that represents the current DateTime, indicating the last time the process was run.

The record in the file looks something like this:

record: (calcSales.exe, jim.smith, 02-06-2020 11:29:30)

All this work is done by simply calling a few Btrieve 2 API methods in the ProcessLogger code.

The compacted code looks like the following (see more at the GitHub repository):

// create the file (FILE.INFO) where data will be stored
createInfoFile(&btrieveClient, exeInfoFileName.c_str())
btrieveFileAttributes.SetFixedRecordLength(TOTAL_EXEINFO_RECORD_LENGTH)

This second line of code creates the fixed record length (max record length) that holds the data. We’ve arbitrarily set this length to 122 bytes to hold 50 chars for the .exe name, 50 chars for the user name, and 19 chars for our special date-time format, plus terminating chars.

You’ll see a DateTime helper method (string getDateTime) that’s been added to the code to format it into an easily used string. After that, we simply open the file so we can work on it:

btrieveClient->FileOpen(btrieveFile, fileName, NULL,
Btrieve::OPEN_MODE_NORMAL))

Initially we create an index on the first column of data (exe name) so we can later search for records by exe name:

btrieveKeySegment.SetField(0, 51, Btrieve::DATA_TYPE_ZSTRING)
btrieveIndexAttributes.AddKeySegment(&btrieveKeySegment)
btrieveFile->IndexCreate(&btrieveIndexAttributes)

After the index is created, we add new records as users provide us with data on the command line (calcSales.exe, jim.smith), which keys off the .exe name:

btrieveFile->RecordCreate((char*) &record, TOTAL_EXEINFO_RECORD_LENGTH))

We always ensure we don’t exceed our max record length (122), and the Btrieve API enforces the value to be sent in upon record creation.

Later, when you want to find a record, you can do so by calling the RecordRetrieve method from the Btrieve API as follows:

btrieveFile->RecordRetrieve(Btrieve::COMPARISON_EQUAL,
Btrieve::INDEX_1, (char *)key.c_str(),
key.size(),
(char*)&record, TOTAL_EXEINFO_RECORD_LENGTH )

You can retrieve records from the sample program by providing the key (.exe name) to ProcessLogger. The command line should look like this:

C:\>ProcessLogger calcSales.exe
record: (calcSales.exe, jim.smith, 02-06-2020 11:29:30)

With the Btrieve 2 API helping you, you can make all your programs write to shared data sources and never have to worry about data corruption due to overwrites and the like.

Circling Back to Our Talk About ACID

Because we are now accessing our data through the Btrieve 2 API, we benefit from a layer of data protection that our original saveValues program doesn’t have.

When we call RecordCreate(), we are assured that the data can rely on the underlying Zen engine to insert the record, update all indexes, and notify us of failure (atomicity).

We can also depend upon the consistency of our data because we know that data updates are managed by the Zen engine even if multiple processes are attempting to alter the data. That’s because the API smoothly handles the challenges of isolation so that whether our inserts or updates succeed or fail, we know the state of our data, even as multiple processes access the data.

The main point is that all of these concerns related to ACID are handled by the Zen engine in such a way that we can focus on saving and reading our data without having to spend time designing and testing our code to ensure it supports the ACID philosophy. Instead we can depend upon the Zen engine to provide that foundation.

Wrapping Up

Grab the code from the GitHub repository and try it out. And be sure to take a look at the examples on the Actian site. The Btrieve Getting Started page is a great jumping off place – it’s where I started with the ProcessLogger sample. Also, check out the documentation of all the Btrieve 2 methods. The list of Btrieve classes is very helpful. The classes you will work with most are BtrieveClient and BtrieveFile.

As you’ve seen, ensuring data integrity is a nontrivial problem, especially if you try to handle it yourself. But you don’t have to. Offload that responsibility to Btrieve and let it take care of those concerns for you.