Having the potential to collect massive amounts of data from a variety of sources is the latest tool for trend spotting, predictive modeling, forecasting, parsing information and more. Information is power and Big Data promises to provide substantial, significant data that can be used, by all ecosystems; business, communications, social media, entertainment, medicine, government, and others.
Still, Big Data is yet to be fully vetted. It is still very much an ongoing phenomenon with a rather doddling pace. That is not for lack of trying. Simply put, developing Big Data platforms is daunting. As well, there is still a bit of a lack of talent, i.e. data architects and analysts.
As well, there are a number of approaches to how Big Data can be analyzed. Hadoop, an open source distributed data processing framework, is perhaps the most visible, but other big players with a stake in it include, HP, Microsoft, and IBM. Smaller, more niche players include Infobright, Kognitio, ParAccel, and Teradata.
However, the fact that this is an emerging platform with little consistency among analysis packages opens it up to any number of security vulnerabilities.
Today, data can be processed wherever resources are available. That can be locally (the edge), the cloud, or the fog, and, of course, the core. Larger platforms, such as the cloud create a diverse setting that supports, and enables, a massively parallel computation environment. This means that there is a burgeoning opportunity for exploits, with plenty of attack surfaces available. Such an environment makes it extremely difficult to implement and maintain security, consistently, across such a highly distributed cluster of heterogeneous platforms.
One of the underlying issues is that there are some fundamental architectural security obstacles that are inherent to Big Data. While they are not specific to the Big Data platform, they are inherent to any Big Data project. Why? Because the only effective way to process large volumes of data is to use a distributed computing architecture. And, most distributed mathematical models being applied use an environment made up of simple programming models, and open framework architectures.
In trying to add security to the Big Data environment, one problem is that security capabilities need to scale with the data. That is a real challenge for “bolt-on” security platforms because most cannot scale sufficiently. Some can scale to address the control points, but the clusters used in Big Data analysis remain largely unprotected. This is a relatively complex discussion that will be covered in future articles.
Another challenge is that such data is fluid. There are, likely, multiple copies of data active simultaneously and much of it being be shared. That makes it difficult to know, precisely, where any given data may be at any given time within massive computing environments. Big Data is often replicated at multiple locations and stratified across multiple systems for analysis.
There are other challenges, as well; more than there is room to discuss in this article. But the dynamic operations that are needed to realize, fully, the capabilities of Big Data are difficult to rein in. There are so many new paradigms with Big Data that we have just begun to realize all the challenges.
By default, the “big” platform of Big Data presents a variety of security issues, and at a number of levels. Because there is so much data that needs to be sifted through, this vector represents the biggest problem; both from a management perspective and a security perspective. It is just the sheer volume. And, the fact that it is unstructured makes it that much more challenging.
That means the security segment of Big Data has to be approached from a different angle than typical, structured data. With Big Data there more ways that have to be considered when looking at its security. For example, how to upscale and store it is one significant issue; another is analysis.
As the amount of data being analyzed, or stored, grows, there is pressure to analyze faster, cheaper and do it with less power. One place to do that in the infrastructure. How that relates to security, from a fraud detection perspective, is how quickly the data can be analyzed. The point being that the less time the data is exposed and being manipulated, the less are the chances for compromise. Big Data requires special algorithms and special algorithms require unique security approaches.
Big vs. Traditional Data
With Big Data, that basic premise is not that much different than trying to protect regular data. However, the complexity is with the algorithms, applications and IP. Since Big Data requires a new set of metrics in that vein, new methods of protecting keys have to be developed.
Big Data provides many more opportunities for a hacker to get in, because it has so many connection points. Some may be in the cloud, others in the fog, still others on local servers.
Somehow, all of these potential vulnerabilities have to be identified and secured. The security techniques applied are often relative to the circumstances. Nevertheless, the hardware that handles the data should always be secured, and keys are a tried and true method for doing that, at any level.
Big Data and the Cloud
HSM’s are a rather elegant solution to many of those virtual networks that have data scattered all over the cloud. They work well for Big Data networks as well, whether they are localized or cloud-based because the models are similar in many circumstances.
In the past, the view of the enterprise network has been one of a perimeter-bounded, physically isolated entity that is specific to a particular enterprise.
However, that model is changing. Of late, the data segment has come to realize that the vision of the hard perimeter with a “soft” underbelly no longer exists. That is especially true with the evolution of uniform communications (UC), and subsets like bring your own device/technology (BYOD/T). Such elements make the securable perimeter a thing of the past.
The cloud, Big Data, UC, virtual networks and peripheral platforms have brought to light that security must now be moved to the network core and remote elements, wherever and whatever those are.
One of the developments of security technology that has been deployed lately, to address the issues of fluid data, apps and services is the HSM. For a while, it was thought of as the “golden child” of network security, and it worked extremely well for physical networks. Unfortunately, that blush came off the rose eventually as things went virtual.
HSMs are really just black boxes that contain and protect sensitive keys. They integrate a dedicated crypto processor, which has the job of protecting the crypto key. They manage processing and the storing of cryptographic keys inside of a hardened, tamper-resistant device. They can be in the form of a plug-in card or an add-on black box at the server.
It is managed separately from the server so any compromise of the server will not allow the HSM to be compromised. HSMs provide services such as secure management of private keys, hardware-based, crypto operations (RNGs, digital signatures, key generation), hardware protection of private keys, via asymmetric, secure cryptographic operations; and they offload the processor-intensive cryptographic operations. They can also provide services such as hashing and message authentication.
HSMs are usually protected by a multi-layered hardware platform and generally use software tokens for additional security.
As the cloud is becoming the prevailing element for data storage, HSMs are evolving as well. There is a move to offer cloud HSM services. Amazon is actually at the front of that movement, with Microsoft not far behind.
Essentially, cloud HSMs offer the same type of platform their hardware brethren offer. The deployment is not all that new. It will utilize a VPN to gain access to it, wherever it may be. You use the VPN, or other private portals, to upload and store your keys. The service provider will not have access to data, keys, or be able to access the tunnel.
That seems like a really good approach, at least for the key angle. However, cloud HSM adoption has not been without challenges, even if solutions have expanded in the last couple of years. For one, they are expensive. For another, there have been breaches of HSMs. And, the hybrid cloud is also posing new challenges to HSMs. Still, their integration has seen movement and they continue to evolve.
Big Data is one big issue! Many organizations that are trying to work with Big Data are finding it complex, convoluted, and expensive. On top of that, few, if any, have any type of grasp on just what it takes to make security a “first-class design parameter.”
Security, not just with Big Data, but with all data, everywhere has to, from day one, become a high-priority requirement that is designed in, architecturally, from the onset. Big Data has some unique challenges. But, no doubt, as time goes on they will be addressed. Here is an opportunity to marry security to the technology on the ground floor. Let us see if that happens.