The Breakpoints of Big Data

What is Big Data? Why does it matter to businesses? Steven Totman, Data Integration Business Unit Executive at Syncsort, briefly explains the whats and the whys.

Small data: Punch cards (above) were once the future of data processing!

Over dinner with some colleagues recently, we  got into an interesting discussion around when data actually became “big data.” We decided that we could track it  back to the 1970s when data was stored on punch cards of 880 bytes each. Having a “big data” problem at that time meant you needed a bigger cupboard for your cards and you had paper cuts from handling too many of them!

In the 1980s, when 3.5 inch disks storing a massive 1.44 MB each were the norm, a “big data” problem meant your stack of disks with Monkey Island and Wing Commander spread across 20 of them fell over.

In the commercial world, IBM came out with the 3380 storing an amazing 2.5 GB. As one of the team pointed out, Google processes approximately 23 petabytes of new data a day! When did “big data” actually start breaking our IT infrastructures?

Exponential: each blue box holds around 172 BILLION times more data than a punch card.

Well, the reality is that “big data” has been breaking stuff for a while. We regularly see customers who find a mere terabyte of data terrifying.

This conversation reminded me of a recent discussion with a CIO for a telecommunications company who explained that thanks to issues with doing ETL (where you push transformations into the database), he was going to have to ask the CFO for 40 per cent more nodes (at $500k a pop) on his data warehouse database to handle the annual 10 per cent growth.

When the CFO asks him what he’s going to get for $2 million, the CIO is going to have to tell him that he will continue to get the same report he got yesterday with no improvements. Not surprisingly, this CIO was not exactly excited about presenting this “business case” to his CFO.

“Big data” that breaks IT infrastructures (especially ETL tools) has been a dirty little secret for years and is just now generating mainstream awareness. The amount of customers using DMExpress to “accelerate” their existing ETL tools is testimony to that and in my view “big data” is as valid a description for a five person team with 10 terabytes of data as it is for a 500 person team with a petabyte.

Acceleration: Speed (like this Hawker Hunter) is critical for crunching big data.

If your company is combining data from multiple sources and it takes the IT team more than three months to add a new data source or create a new report, then chances are you have “big data.”

The good news is that since back in the 1970s when punch cards had just been phased out, Syncsort has been enabling customers to seamlessly drop our software into their existing environments to accelerate and solve “big data” problems.

If “big data” is a new name for a long existing problem, then we’ve been solving “big data” breakpoints for years.

Steven Totman is the EMEA Integration Business Unit Executive for Syncsort Incorporated. Totman has been working in the data integration space for over 15 years and prior to Syncsort he was a key driver in the vision and creation of the IBM Information Server working with over 500+ customers worldwide.

His areas of specialty include Data Governance, Data Integration, Metadata, ETL, SOA and Connectivity.

Totman holds several patents in data integration and metadata related designs. Most recently he was a senior architect in IBM’s Information Agenda Team focusing on Central and Eastern Europe, Middle East and Africa. Totman is based in the United Kingdom but supports customers worldwide.