Data Persistence – Why Flat Files Just Don’t Cut It

Programmers often use flat files for data storage because they’re quick and easy to set up. But using a database like Zen is easy too, and it greatly simplifies your code and database management tasks. Zen also makes it easier to persist data in a multilevel distributed data environment such as you might encounter with IoT edge computing deployments.

In this tutorial, we’ll talk about data persistence, the issues that arise with the use of flat files, and how Zen lets you persist data at different levels in ways that flat files can’t.

This page covers the topics listed at right.

Data Persistence and Data Access

What is data persistence? Data persistence is the retention of data outside of the program that created it. Persisted data can be read by other processes or transmitted to other systems. As a system continues to operate and generate new data, that information may be analyzed for making decisions and driving other processes. It may need to be persisted for short-term storage before orchestration to a central data warehouse, for long-term storage, or for logging.

In IoT scenarios, some analysis and decision-making need to occur close to where the data is generated. Imagine a medical device that’s monitoring a patient. The device may need to make immediate decisions based on readings it’s taking from moment to moment on the patient. For example, if the patient’s temperature rises above a safe range, the device may need to sound an alarm to local staff.

The data gathered over time may be more than what can be held in memory, so it needs to be written out to a local storage location where it can be quickly retrieved as the monitoring continues. As the device continues to operate, the amount of data stored on it increases. Working with larger data collections brings new challenges that may not have been an issue before.

The readings from several devices may be gathered by a gateway where they can be further analyzed or accumulated and prepared for transport to the cloud for storage or other processing.

Data doesn’t move at an infinite speed. It takes time for data packets to travel over a network. Transmitting the data to a remote location for decision-making results in increased demands on bandwidth and latency. When immediate action must be taken in response to events, locating the computing power close to the edge reduces these demands.

Distributing computing power where data is generated also improves reliability. In solutions where computing is centrally located, lost connectivity can disrupt operations at a remote location. When processing power is close to where it’s needed, operations can continue regardless of connectivity.

Edge devices may also run a variety of different operating systems, and they may share data in non-uniform formats that are operating system-specific or involve different protocols. Gathering and sharing the information from these different environments can become complicated. For example, you may need a shared file system that other devices can read from and write to. This adds complication and failure points to a data sharing implementation – a client could find file entirely or partially inaccessible because another client is working in it. Moreover, the data can become corrupted if a client is manipulating a file and that client fails or if a glitch disrupts communications and a file is only partially updated.

The apparent simplicity of using flat files turns out to be remarkably complex once you take into account differences in operating systems and data integrity concerns.

Persisting Data in Flat Files

Flat files are commonly used to capture data locally and to move data from one system to another. Reading a flat file requires knowing the data types being transferred, since you may be unable to infer the intended data type from its encoding in a file. In the following example, a structure holds values from sensors on a device. The readings are being written to a local flat file.

struct Vital {
    int ID;
    int bmp;
    float temperature;
    int respitoryRate;
    int oxygenLevel;
    float systolicPressure;
    float diastolicPressure;
};

void saveReadings(std::vector<Vital > readings)
{
    ofstream exportFile("sensorReadings.csv");
    for (auto current = readings.begin(); current != readings.end(); ++current)
    {
        exportFile << current->time
            << ", " << current->sensorID
            << ", " << current->temperature
            << ", " << current->systolic
            << ", " << current->diastolic
            << ", " << current->oxygen<< endl;
    }
    exportFile.close();
}

But what happens when the IoT devices have limited storage or need to be replaced and need to share their data with another device? The IoT device could push its data to a gateway or the gateway could poll its devices and read the data. In either case, one machine must read the data from another.

The way a program accesses a remote file depends on operating systems. If both computers are running Windows, the file could be read by constructing a path with the computer name or IP address and the name of a file share. Some configuration of the machine first might be necessary to enable file sharing and set up the share:

std::wstring machineName = L"192.168.1.23";
std::wstring folderName = L"sharedFiles";
std::wstring fileName = L"\\\\" + machineName + L"\\" + folderName +
    L"\\sensorReadings.csv";
//create an input file stream from this file name
wifstream sourceFile(fileName);

If both of the machines are running Linux, parts of the file system can be made available over a network by configuring additional software, such as Network File System (NFS) or Samba, and by mounting it to the file system of the device that needs to access the shared files. The path to the mounted file system also depends on configuration decisions. All these solutions depend on their platforms. Environments with mixed platforms could have a complex mix of solutions.

When devices interact, more than one device or process may read and write to a file at the same time. With overlapped reading and writing, a process might receive inconsistent data – a mix of a previous version of a file and updates made while the file is read. Writing to a file doesn’t guarantee that the data has actually been persisted to storage. The data from file writes may exist within a file cache before being committed to storage.

Other challenges can arise as the amount of stored data grows. Finding a set of records in the data may involve reading all of the records and checking them one at a time – a slow process when the number of records is large. It may be even slower when the file is read from another location on the network.

Operating systems also have file size limitations. For large amounts of data, an application may need to split its data across several files. Suddenly, the simplicity of working with a single flat file is gone. Finding a needed set of records may also require sorting of the data. Even with optimized sorting algorithms, sorting large data sets can be slow.

Luckily, there’s a better way.

Persisting Data with Zen

Zen products simplify data retrieval across multiple platforms. Since the software is compatible with a wide variety of platforms and programming languages, you have a solution that ensures the same set of data management benefits no matter which platform you target.

With the Zen SDK, your program can connect to Zen data instances running on the same device or to instances on other devices on the same network in a consistent way. Zen handles all multiuser access and data integrity issues so that you don’t have to worry about complicating your application to handle these.

As more devices are added to an environment, the chances of any of them experiencing a failure or disruption grows. Data corruption and inconsistency from such failures is a potential problem that must be considered with flat files.

Zen is designed to be compliant with four essential ACID principles for data transactions: Atomicity, Consistency, Isolation, and Durability. Since Zen handles all of this automatically, your application can focus on data gathering and analysis tasks.

With a common way to represent data, Zen data files are movable from one system to another without the overhead of ETL operations to transform the data from one format to another. Whether the data is coming from a tiny IoT device or a powerful server, its format is the same — Zen uses the same data format on all of the platforms it supports. And, because Zen maintains backward compatibility with previous versions, you can upgrade Zen software without needing to migrate data.

With Zen’s capability to manage data sets as large as 64 terabytes, you are freed from having to organize how data is written on the file system. While native file paths are supported, there’s an additional way of specifying the location of a data file to be opened on remote systems. This file identifier looks similar to the following:

btrv://host/?file=c:/data/Vitals.mkd

Devices running various operating systems could use the same identifier for the file. Zen takes care of providing access to the file even across different operating systems. The following code opens a Zen data file on another computer:

BtrieveFile btrieveFile;
const FILE_NAME = " btrv://User@VitalServer/?file=c:/data/Vitals.mkd&pwd=P@ssw0rd "; Btrieve::StatusCode status = btrieveClient.FileOpen(&btrieveFile, FILE_NAME, "My0wner!", Btrieve::OPEN_MODE_NORMAL);

Writing a record to the file requires only a single call:

Btrieve::StatusCode status = btrieveFile.RecordCreate((char*)&vital, sizeof(vital));

If a successful status code is returned, then the record has been committed to storage. Records are read as follows:

void readAllRecords(BtrieveFile& btrieveFile,vector<Vital>& vitalList )
{
    // Error handling removed for clarity

    Btrieve::StatusCode status;
    Vital vital;
    int bytesRead = btrieveFile.RecordRetrieveFirst(Btrieve::INDEX_NONE,
        (char*)&vital, sizeof(Vital));
    status = btrieveFile.GetLastStatusCode();
    while (status == Btrieve::STATUS_CODE_NO_ERROR )
    {
        vitalList.push_back(event);
        btrieveFile.RecordRetrieveNext((char*)&vital, sizeof(Vital));
        status = btrieveFile.GetLastStatusCode();
    }
}

One of the strengths of using Zen for data storage is that data can be indexed. With an index, sorting and finding records is much easier than it is for flat files. In addition to reading records one at a time in the order in which they are stored, the Zen data can also be read in a sort order based on one of the fields in the record.

With flat files, a developer would need to read all the records and perform sorting to do this. In Zen we create an index with the fields to be used for sorting.

Let’s say that you want to start looking at vitals readings that had the highest temperature. The following would create a temperature for reading records from highest to lowest:

void createTemperatureIndex(BtrieveFile& btrieveFile, Btrieve::Index indexNumber)
{
    BtrieveIndexAttributes temperatureIndexAttributes;
    BtrieveKeySegment temperatureKeySegment; 
    temperatureKeySegment.SetField( offsetof(Vital, temperature),
        4, Btrieve::DATA_TYPE_INTEGER );
    temperatureKeySegment.SetDescendingSortOrder(true);
    temperatureIndexAttributes.AddKeySegment(&temperatureKeySegment);
    btrieveFile.IndexCreate(&temperatureIndexAttributes);  
}
//Creating the index and assign it to the identifier Index_9
createTemperatureIndex(myFile, Btrieve::Index_9);

In the earlier code sample, the Btrieve file was read without an index, with Btrieve::INDEX_NONE passed in the Index parameter. Using the identifier from the creation of the temperature index instead enables reading of the records in order of temperature:

btrieveFile.RecordRetrieveFirst(Btrieve::Index_9, (char*)&vital, sizeof(Vital));

This is much faster and requires fewer resources than reading the entire file and sorting it.

Wrapping Up

Ensuring reliable data persistence in a multiplatform environment is a nontrivial problem. While flat files appear to be an easy solution, they don’t address problems that can arise in an environment where the files must be shared by multiple processes or devices. Zen databases, in contrast, provide a reliable, fast, and portable solution for sharing and saving data.

To learn more about how Zen can help you manage data in your IoT environment, visit the Actian Zen Edge Data Management page.

You’ll find evaluation versions of Zen products available at the Actian Electronic Software Distribution site.