Big Data ain’t Big Deal- Part 1 for absolute beginners.

Every day, we create 2.5 Quintilian bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information , IOT , posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data.

Big data typically refers to collections of datasets that, due to size and complexity, are difficult to store, query, and manage using existing data management tools or data processing applications.

If it is still ambiguous to you that’s fine, Big Data may be best define by what it is not

  • It’s not regular data, it’s not business as usual.
  • Big data is data that, doesn’t fit well into a familiar analytic paradigm.

// It won’t fit into the rows and columns of an Excel spreadsheet    .

  • It can’t be analyzed with conventional multiple regression, and it probably won’t fit on your, normal computer’s hard drive anyhow.

On the other hand, one way of describing big data is by looking at the three V’s of

                       Volume, velocity, and variety.  

img_bigdata

We’ll talk about some other possible V’s to consider

Before we continue watch this it might help.

https://www.youtube.com/watch?v=TzxmjbL-i4Y

Now let’s discuss the Three V’s technically they are more than juts 3 v’s but we will talk about them later on.

  • 1Volume. 

In its simplest possible definition, big data is data that’s just too big to work on your computer Obviously this is a relative definition What’s big for one system at one time is common place for another system at another time .That’s the general point of Moore’s Law a well-known observation in computer science that physical capacity and performance of computers double about every two years. So for example, my Mac Classic two, which got me through graduate school, had two megabytes of ram and an 80 megabyte hard drive and so as far as it was concerned, big data is something that would fit onto a one dollar flash drive right now. On the other hand, in Excel the maximum number of rows that you could have in a single spreadsheet has changed over time. Previously it was 65,000. Now it’s over a million, which seems like a lot, but if you’re logging internet activity where something can occur hundreds or thousands of times per second, you’ll reach your million rows very, very quickly. On the other hand, if you’re looking at photos or video and you need to have all of the information in memory at once, you have an entirely different issue. Even my Windows phone takes photos at two or three megabytes per photo and video at about 18 megabytes per minute, or one gigabyte per hour. That’s on my windows phone and if you have a Red Epic video camera you could do up to 18 gigabytes per minute.  And instantly you have very big data. Now, some people call this lots of data, meaning it’s the same idea of the data that we’re generally used to, there’s just a lot more of it. And that gets into the issues of velocity and variety. We’ll talk about velocity next.

  • 2- Velocity

 

So for velocity, this is when data is coming in very fast In conventional scientific research, it could take months to gather data from 100 cases, weeks to analyze the data, and years to get that research published. Not only is this kind of data time consuming to gather it’s generally static once it’s entered, that is, it doesn’t change.

If you’re interested in using data from a social media platform, like Twitter,  twitter-big  you may have to deal with the so-called “fire hose”. In fact, right now they’re processing about 6,000 tweets globally per second.

That works out to 500,000,000 tweets per day and about 200,000,000,000 tweets per year. In fact, a neat way to see this

Is with a live counter on the web. At Internet Live Stats, it’s showing us that there are about 341,000,000 tweets that have been sent so far today, and they’re updating extremely quickly.

Even a simple temperature sensor hooked up to an Arduino microprocessor through a serial connection, and is sending just one bit of data a time, that can eventually overwhelm a computer if left running long enough. Now, this kind of constant influx of data, better known as streaming data,

Presents special challenges for analysis, because the data set, itself, is a moving target.

If you’re accustomed to working with static data sets, in a program like SPSS or R,

// don’t Worry we will talk about them later on.

Note that the demands and complexities of streaming data can be very daunting.

  • 3- Variety

 

And now we get to the third aspect of big data, variety. What we mean here is that it’s not just The rows and columns of a nicely formatted Data set in a spread sheet, for instance. Instead you can have many data sheets in many different formats you can have unstructured text, like books and blog posts and comments on news articles and tweets. One researcher has estimated that 80 percent of enterprise data may be unstructured, so it’s the majority as the common case. this can also include photos and videos and audio. Similarly, data sets that include things like networked graph data, that’s social connections data.

Or if you’re dealing with data sets in what is called NoSQL databases, so you may have graphs of social connections you may have hierarchical structures and documents. Any number of data formats that don’t fit well into the rows and columns of a conventional Relational database or a spreadsheet then you can have some very serious analytical challenges. In fact, a recent study by Forester Research shows that variety is the biggest factor that’s leading companies to big data solutions. In fact, variety was mentioned over four times as often as data volume.  

05379701-photo-les-3v-du-big-data-d-apres-teralytics          BigData.001

So now you might be asking yourselves Do you have to have all three V’s– volume, velocity

And variety– at once, or just one, to have Big Data??  It may be true that if you have all three V’s at once, then you have Big Data, but any one of them can be too much for your standard approach to data and really, what Big Data means is that you can’t use your standard approach with it.

As a result, Big Data can present a number of special challenges, we’ll be discussing those later, but first, let’s take a look at how Big Data is used and some of the amazing things that are already being accomplished by using Big Data for research, for business, and even for the casual consumer.

  Big Data Applications for Consumers.  Polling

– Most of the time when you hear people talk about big data, they’re talking about it within the commercial setting about how businesses can use big data in advertising or marketing strategies. But one really important place that big data is also used is for consumers, and what’s funny about this is that while the data is there and the algorithms are there and as incredibly sophisticated processing it’s nearly invisible the results are so clean they give you just a little piece of information, but exactly what you need.

One small example about Big Data applications for consumers is SIRI so for instance, aside from saying what’s the weather like?, and Siri actually knows what it is you mean, and where you are, and what time you’re talking about, it can do things like look for restaurants of a particular kind of food and see if they have reservations available. It can do an enormous amount of things that requires the recommendation of other people, awareness of your locations, and awareness of the changes over time of what is most preferable for people so Siri is like a big data eater she feeds on BIGDATA.

Big data plays an enormous role in providing valuable services, but again, with the irony that it operates invisibly by taking a huge amount of information from several different sources and distilling it into just two or three things that give you what you need.

Big Data in the Business world.

Big data is revolutionizing the way people do commerce in an unusual and interesting way the first thing we’re going to do is look at the place where most people have encountered big data in commerce and that’s in the results for

Google ad searches ,Whenever you search for something on GoogleFHCGSQg0 or any other search engine, you type in your term , You’re going to get the results that you want, but you’re also going to get ads for instance, on the top I’m searching for big data I have three ads on the top and I have a series of ads down the right side Those ads are not placed at random They’re placed there because they are based first on the thing that I am currently searching for, but also based on what Google knows about me.

Big data in the research field.

Big data has been revolutionizing aspects of scholarship and research. I want to show you an interesting example of where big data has influenced scientific progress. The first one we want to look at is Google flu trends where they were able to find that search patterns for flu related words were actually able to identify outbreaks of the flu in the United States much faster than the research that the Center for Disease Control could do. Similarly, a more recent project found that Wikipedia searches could identify them with even greater accuracy. The National Institutes of Health created the Brain Initiative as a way of taking enormous numbers of brain scans to create a full map of brain functioning.

Google-Flu-Trends-Historical

So this will be the end of part 1 , see you next time , we will talk about some advanced topics in big data. 😉

 References:

Principles of Big Data preparing

Analyzing the analyzer

Techniques and concepts of Big Data

http://www.forbes.com/sites/oreillymedia/2012/01/19/volume-velocity-variety-what-you-need-to-know-about-big-data/

https://msdn.microsoft.com/en-us/library/dn749868.aspx

https://msdn.microsoft.com/en-us/library/dn749785.aspx

https://msdn.microsoft.com/en-us/library/dn749874.aspx

https://msdn.microsoft.com/en-us/library/dn749868.aspx