The Best Blogs Every Data Analyst Should Follow

Advanced Analytics || Media Recommendations Platform

Advanced Analytics is the autonomous or semi-autonomous examination of data or content using sophisticated techniques and tools, typically beyond those of traditional business intelligence (BI), to discover deeper insights, make predictions, or generate recommendations.

Advanced Analytics:

While the traditional analytical tools that comprise basic business intelligence (BI) examine historical data, tools for advanced analytics focus on forecasting future events and behaviors, enabling businesses to conduct what-if analyses to predict the effects of potential changes in business strategies.

What Are Advanced Analytics?

More specifically, there are a number of factors inherent in the concept of advanced analytics:

  • Data and text mining may be used to find specific trends or pieces of data.
  • Visualization is used to gather existing information for the creation of visual images showing trends, comparisons, and other statistical points.
  • Cluster analysis helps to take pieces of data that are similar to each other and separate that data from other groups. This is to facilitate effective comparisons.
  • Predictive analytics uses techniques associated with data mining, machine learning, statistical analysis, and others to generate highly-accurate predictions about future business trends.

Advantage of Advanced Analysis in Business

By 2018, more than half of large organizations will compete using advanced analytics and proprietary algorithms. Despite this, data analytics is sometimes seen by businesses as being too complicated. Simply put, many businesses see analytics as being too costly, time-consuming, and inaccurate to justify more than a token effort at implementation.

The Truth About Data Analytics:

Data analytics are designed for businesses that want to make good use of the data that they are taking in. Businesses that can use data analytics properly are more likely than others to succeed and thrive. But with all of the advantages of data analytics, the key benefits can be described in this way:

  • Data analytics reduces the costs associated with running a business.
  • It cuts down on the time needed to come to strategy-defining decisions.
  • Data analytics help to more-accurately define customer trends. Determining the Effectiveness of Your Analytics Program

Given the growing familiarity and popularity of data analytics, there are a number of advanced analytics programs available on the market. As such, there are certain traits to look for in any analytics solution that will help you gage just how effective it will be in improving your business.

 

However, if I be a little biased, I would like you to read one of my blogs that debunks some major myths about data analytics.

Even when data analytics has got people in every industry talking, a lot of people still consider analytics and big data as something that they can ignore – when in reality, they are about to be run over by the steamroller that is data analytics.

The volume of data is exploding; more data has been created in the past two years than in the entire previous history of the human race.

Many of the early adopters of big data and analytics have faced a lot of challenges in setting a right implementation plan in place, and unfortunately, have experienced a poor return on investment. Since then, we have witnessed loads of improvement in this area and have overcome many of the shortfalls. However, what still prevails is a series of myths surrounding the discipline of data analytics that some feel still has some truth to it.

Readmore: Error "Query Couldn't Return All Data" When Connecting to Google Analytics

Recessions, Big Data, Data Science and Liberating Power of “Own It”

There, I said it.  I said the “R” word.  And no, I’m not talking the political “R” word.  I’m talking about the potential of a… r-e-c-e-s-s-i-o-n.  There are many indicators pointing to the potential of a worldwide recession, and unfortunately during a recession, many organizations hunker down, cut spending, and try to ride it out – all the wrong things to do if you want to actually avoid a recession.

Figure 1: “What Is a Recession? Examples, Impact, Benefits

However, leading organizations see a recession as an opportunity to capture new customers and expand their value-creation ecosystem that grows market share and to sweep up the best talent and training that focuses on deriving and driving new sources of customer, product and operational value.

Yes, a recession separates the “sheep” organizations – where management is just worried about surviving the “musical chairs” of desperation – from the “wolf” organizations where management aggressively seeks opportunities to embrace innovation to create new sources of customer and market differentiation. 

Let’s explore how your organization can become the wolf.

A Recession Forces a Focus of Urgency

A recession forces a focus on creating a sense of urgency.  A recession would create a focus on delivering measurable and material business value in the next 9 to 12 months.  It would short-circuit those AI and Machine Learning “science experiments” and give these initiatives a kick in the ass to start delivering measurable and meaningful business value today!

The Big Data Business Model Maturity Index provides a roadmap for those organizations who are serious about leveraging data and analytics to power their business models…wannabes need not apply (see Figure 2).

Figure 2: Big Data Business Model Maturity Index

This is no need for organizations to commit to Big Bang technology investments and hope that something of value squirts out at the end. 

A Recession Forces a Focus on Collaboration

A recession forces a focus on embracing an IT-Business collaborative engagement methodology that is focused on identifying, validating, valuing and prioritize the organizations key business and operation use cases.  A key to creating an effective and efficient data science community is to teach your Business Stakeholders to “Think like a data scientist,” which enables Business Stakeholders to understand how best to collaborate with a data scientist and a data engineer to uncover the customer, product, service and operational insights that will drive business success (see Figure 3).

Figure 3: “The Art of Thinking Like a Data Scientist

Data science is a team sport comprised of Data Engineers, Data Scientists and Business Stakeholders.  And like a baseball team who can’t win with only shortstops and catchers, one’s data science initiative MUST clearly articulate the team’s roles, responsibilities and expectations. If the goal of your organization is to become more effective at leveraging data and analytics to power your business models, you can’t win that game with a team full of pitchers. 

A Recession Forces a Focus on Value Creation

A recession forces a focus on a value engineering frame that delivers on the promise of the “4 M’s of Big Data” …” Make Me More Money!”  To drive the focus on value, we use the Data Science Value Engineering Framework (see Figure 4).

Figure 4:  Data Science Value Engineering Framework

The Data Science Value Engineering Framework starts with the identification of a key business initiative that not only determines the sources of value, but also provides the framework for a laser-focus on delivering business value and relevance in the immediate-term.

The heart of the Data Science Value Engineering Framework is the collaboration with the different stakeholders to identify, validate, value and prioritize the key decisions (use cases) that they need to make in support of the targeted business initiative.

A Recession Forces a Focus on Sharing and Re-using

A recession forces a focus on sharing, re-using and refining your data and analytic assets – assets that never deplete, never wear out and can be used across an infinite number of use cases at near zero marginal cost – in order to increase time-to-value and de-risk analytics-driven business initiatives (see Figure 5).

Figure 5: Economic Digital Asset Valuation Theorem

The Economic Digital Asset Valuation Theorem exploits the “Economics of Learning” that rewards those organizations who take an incremental approach to building out their data and analytics capabilities that yield business value by 1) accelerating time to value (by monetizing incremental learning) while 2) de-risking business investment risks.

In the digital era, the “economies of learning” is more important than “economies of scale”

Summary: The Liberating Power of “Owning It”!

Organizations, like people, can choose their own destiny but only if they are willing to “own” their current situation.  If you are a victim of the consequences of others (the sheep), then you have abdicated control to others. But if you “own” the situation (the wolf), then you put yourself in control.

Embrace the organizational and personal liberating power of “owning it” and put yourself and your organization in control of your own destiny.

Figure 6: The Liberating Power of Owning it

Blog points:

  • During a recession, the sheep huddle together trying to hang on for survival. But the wolf sees an opportunity to attack aggressively by creating new sources of customer, product and operational; the wolf owns the situation.  Which are you?
  • A recession forces a focus…on creating a sense of urgency; to deliver material business value in the next 9 to 12 months. It would short-circuit those AI and Machine Learning “science experiments” and give these initiatives a kick in the ass to start delivering measurable and meaningful business value today!
  • A recession forces a focus… on embracing a business-IT collaborative engagement methodology that is focused on identifying, validating, valuing and prioritize the organizations key business and operation use cases.
  • A recession forces a focus… on value creation by aggressively adopting a value engineering frame that delivers on the promise of the “4 M’s of Big Data.”
  • A recession forces a focus… on sharing, re-using and refining their data and analytic assets – those assets that never deplete, never wear out and can be used across an infinite number of use cases at near zero marginal cost – in order to increase time-to-value and de-risk analytics-driven business initiatives.
  • Organizations, like people, can choose their own destiny but only if they are willing to “own” their current situation. If you are a victim of the consequences of others (the sheep), then you have abdicated control to others. But if you “own” the situation (the wolf), then you put yourself in control.
  • Readmore: Schmarzo’s Big, Hairy, Audacious Data Analytics Predictions for 2020

What is BIG DATA? Introduction, Types, Characteristics & Example

What is Data?

The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.

What is Big Data?

Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently.

What is Big Data

  Examples Of Big Data

Following are some the examples of Big Data-

 Introduction to BIG DATA: Types, Characteristics & Benefits

Social Media

The statistic shows that 500+terabytes of new data get ingested into the databases of social media site Facebook, every day. This data is mainly generated in terms of photo and video uploads, message exchanges, putting comments etc.

 Introduction to BIG DATA: Types, Characteristics & Benefits

A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many thousand flights per day, generation of data reaches up to many Petabytes.

 Introduction to BIG DATA: Types, Characteristics & Benefits

  Types Of Big Data

BigData' could be found in three forms:

  1. Structured
  2. Unstructured
  3. Semi-structured

Structured

Any data that can be stored, accessed and processed in the form of fixed format is termed as a 'structured' data. Over the period of time, talent in computer science has achieved greater success in developing techniques for working with such kind of data (where the format is well known in advance) and also deriving value out of it. However, nowadays, we are foreseeing issues when a size of such data grows to a huge extent, typical sizes are being in the rage of multiple zettabytes.

Do you know? 1021 bytes equal to 1 zettabyte or one billion terabytes forms a zettabyte.

Looking at these figures one can easily understand why the name Big Data is given and imagine the challenges involved in its storage and processing.

Do you know? Data stored in a relational database management system is one example of a 'structured' data.

Examples Of Structured Data

An 'Employee' table in a database is an example of Structured Data

Employee_ID  Employee_Name  Gender  Department  Salary_In_lacs
2365  Rajesh Kulkarni  Male  Finance 650000
3398  Pratibha Joshi  Female  Admin  650000
7465  Shushil Roy  Male  Admin  500000
7500  Shubhojit Das  Male  Finance  500000
7699  Priya Sane  Female  Finance  550000

Unstructured

Any data with unknown form or the structure is classified as unstructured data. In addition to the size being huge, un-structured data poses multiple challenges in terms of its processing for deriving value out of it. A typical example of unstructured data is a heterogeneous data source containing a combination of simple text files, images, videos etc. Now day organizations have wealth of data available with them but unfortunately, they don't know how to derive value out of it since this data is in its raw form or unstructured format.

Examples Of Un-structured Data

The output returned by 'Google Search'

 Introduction to BIG DATA: Types, Characteristics & Benefits

 Semi-structured

Semi-structured data can contain both the forms of data. We can see semi-structured data as a structured in form but it is actually not defined with e.g. a table definition in relational DBMS. Example of semi-structured data is a data represented in an XML file.

Examples Of Semi-structured Data

Personal data stored in an XML file-

<rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec>
<rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec>
<rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>

Data Growth over the years

Introduction to BIG DATA

 Please note that web application data, which is unstructured, consists of log files, transaction history files etc. OLTP systems are built to work with structured data wherein data is stored in relations (tables).

  Characteristics Of Big Data

(i) Volume – The name Big Data itself is related to a size which is enormous. Size of data plays a very crucial role in determining value out of data. Also, whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of data. Hence, 'Volume' is one characteristic which needs to be considered while dealing with Big Data.

(ii) Variety – The next aspect of Big Data is its variety.

Variety refers to heterogeneous sources and the nature of data, both structured and unstructured. During earlier days, spreadsheets and databases were the only sources of data considered by most of the applications. Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being considered in the analysis applications. This variety of unstructured data poses certain issues for storage, mining and analyzing data.

(iii) Velocity – The term 'velocity' refers to the speed of generation of data. How fast the data is generated and processed to meet the demands, determines real potential in the data.

Big Data Velocity deals with the speed at which data flows in from sources like business processes, application logs, networks, and social media sites, sensors, Mobile devices, etc. The flow of data is massive and continuous.

(iv) Variability – This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively.

  Benefits of Big Data Processing

Ability to process Big Data brings in multiple benefits, such as-

    • Businesses can utilize outside intelligence while taking decisions

Access to social data from search engines and sites like facebook, twitter are enabling organizations to fine tune their business strategies.

    • Improved customer service

Traditional customer feedback systems are getting replaced by new systems designed with Big Data technologies. In these new systems, Big Data and natural language processing technologies are being used to read and evaluate consumer responses.

    • Early identification of risk to the product/services, if any
    • Better operational efficiency

Big Data technologies can be used for creating a staging area or landing zone for new data before identifying what data should be moved to the data warehouse. In addition, such integration of Big Data technologies and data warehouse helps an organization to offload infrequently accessed data.

Summary

  • Big Data is defined as data that is huge in size. Bigdata is a term used to describe a collection of data that is huge in size and yet growing exponentially with time.
  • Examples of Big Data generation includes stock exchanges, social media sites, jet engines, etc.
  • Big Data could be 1) Structured, 2) Unstructured, 3) Semi-structured
  • Volume, Variety, Velocity, and Variability are few Characteristics of Bigdata
  • Improved customer service, better operational efficiency, Better Decision Making are few advantages of Bigdata

Readmore: Big data behind rapid response to Beijing market coronavirus cluster

Big data behind rapid response to Beijing market coronavirus cluster

  • Concerns raised about the use of surveillance technology in the fight against Covid-19 and how it will be used in future
  • Chinese social media users share their experiences of being contacted about the need to get tested because of the digital tools
  • Illustration: Henry Wong
  • “Have you taken the nucleic acid test?” has become a popular greeting in Beijing, where a Covid-19 outbreak linked to a 
    wholesale food market
     has been brought under rapid control through widespread testing.
    More than 3 million people – about 15 per cent of the Chinese capital’s population – have been tested for the disease since the first cases in the new outbreak emerged on June 11, according to the municipal government. Just eight days after the first infection was identified, 
    Beijing declared the transmission had been controlled
    .

    “We will continue to see more infected people in the near future but the disease is under control,” Wu Zunyou, chief epidemiologist of the Chinese Centre for Disease Control and Prevention, said on June 18, while also paying tribute to the city’s response. “Beijing’s prompt handling and effective control made a remarkable contribution,” he said.

    As of Saturday, the number of patients in Beijing had climbed to 297, most of them related to the Xinfadi market in the city’s southwest district of Fengtai. But, while mass tracing and testing quickly brought the cluster under control, the role of “big data” has raised concerns, along with questions over whether the surveillance measures will be rescinded once the 
    pandemic
     has passed.
     
    The city urged people who worked at the sprawling market, and those who had visited it, since May 30, to get tested – as well as nearby residents and employees of restaurants, grocery stores, wholesale markets and food delivery companies.
     

    While many people were registered for testing by their employers or residential communities because of known links to the market, Fu Juan, 38, said she was spotted by big data. The process was quick but unnerving, she said, and began with a “suspicious” phone call from someone claiming to work for a disease control unit of the Beijing municipal government.

     
     

    “I was told that big data showed that I had been to Xinfadi recently, hence I should register with my neighbourhood to get a nucleic acid test as soon as possible. My first impression is that it must be a fraud. I’ve never shopped at Xinfadi,” she said.

     

    “Then my husband reminded me. I had picked him up somewhere 3km (1.8 miles) away from Xinfadi several days before. But I was in the car all the time.
     
    Beijing to open a mobile, inflatable Covid-19 virus testing lab
     

    Before she had an opportunity to check whether the call was genuine, community cadres had knocked on her door to obtain her identity information and persuaded her to get tested. The next day, it was arranged for her to attend a testing site at a stadium and, one day later, Fu received the result, which was negative.

     

    “The whole process was impressively fast,” Fu said. “When I was lining up for the test with thousands of people in the stadium, I was shocked by the capability, that China can identify so many people so quickly and get them tested.”

     

    In the neighbouring municipality of Tianjin, Wu Zhengyu, a 51-year-old teacher at a chess training centre, was also required last week to test for Covid-19 after returning from Beijing in early June. “I was in Beijing before the first case was reported. I probably passed by the Xinfadi area in the subway, but I’ve never been to the market,” Wu said.

    “However, I was told big data had spotted me and unless I was tested, my daughter could not go to school.” Wu said the test had cost him 200 yuan (US$28) and he was unable to continue teaching while under home quarantine. “I feel helpless. But who can I complain to? I was lectured by community cadres that all was for the sake of coronavirus control.”