Big Data – Introduction

Image credit :

There is a lot of buzz and there is a lot of emerging hype on Big Data, and I like to begin a series of posts on everything, well almost everything that I can capture, store, search, share, analyse and visualise there is to know about Big Data. I think I will by no means have discussed everything there is to know about Big Data in a running series of posts in this blog as I think Big Data is so huge that I can start an entire blog devoted to just this.

As this blog is about healthcare data and information issues,  the issue of Big Data in healthcare is that there’s a tremendous amount of data and information about the patient. I wish to think it is organised, but the real issue is, it isn’t organised as well as it should be and all that data is a mixture of structured and unstructured data. Here is when I like to agree with Joe Petro, senior vice president of healthcare research and development at Nuance Communications who sums up the current state of big data. Petro believes that there’s a tremendous amount of information when you’re in the institution – it is a big data problem, you’re trying to figure out what’s going on and how to report on something and you’re dying of thirst in a sea of information, and the issue is how to tap into that to make sense of what’s going on.

Big Data is everwhere, not just in healthcare but as well in as many other sectors of the global economy.


The separation of data among hospital systems – clinical components, laboratories, radiology are all separate repositories for information. The main issue is with leveraging all of these data. Their use is to provide clinical care or provide scheduling information or operational information. Often there is a problem if we want systems to talk to each other. An organisation can also end up with redundant information due to a legacy system, a system we may continue to use, sometimes well past its vendor-supported lifetime, resulting in support and maintenance challenges. It may be that the system still provides for the users’ needs, even though newer technology or more efficient methods of performing a task are now available.


Big Data thus refers to sets of data that are so large, that they become awkward and complex that traditional database management tools struggle to capture, store, analyse and share this information. Difficulties include capture, storage, search, sharing, analytics and visualising.

IBM (IBM 2012) claims that “every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few.” IBM adds that “This data is big data.”

Big data spans three dimensions, sometimes referred to as the 3 “Vs”: Volume, Velocity and Variety. But IBM says Big Data spans four dimensions: Volume, Velocity, Variety, and Veracity.

I shall leave this post for now with this info graphic and continue with the “Vs” and Big Data Basics in the next post on Big Data.

Image credit : :
Click on the image above to view the image in a new tab of your current window and in order to obtain a larger image, or a closer view of the image in this new tab, zoom in.

Lorraine, F, Michele, O’C,  & Victoria, W 2012, Data, Bigger Outcomes, American Health Information Management Association, viewed 18 November 2012,
< >

Michelle, MN 2012, 5 basics of big data, Healthcare IT News, viewed 18 November 2012,
< >

What is big data? 2012, International Business Machines Corporation (IBM), viewed 18 November 2012, < >

Leave a Reply

Your email address will not be published. Required fields are marked *