Trends in Big Data

Trends in Big Data

The last 10 years of “data management” has been very exciting indeed. We have seen giant leaps in technology related to processing power, storage, and data collection technologies. These leaps have allowed us to build projects, companies, and enterprises faster than ever before. There is however one the nagging bullet point that must be engaged and it has become the elephant in the room for some time – who is paying for this?

The 3 Vs

Throughout the last 2 decades, we have lived on the tenants of Big Data – the 3 Vs – Volume, Variety, and Velocity. How much can we store? How do we collect and curate that data? How fast can we process and publish that data? These challenges will never go away. We can accept this and move forward.

The New 6 Vs

The face of business is changing as we head into 2020. It is becoming clear that a new world of Big Data is emerging.  We now have 6 Vs. This includes the original 3 Vs but now has a focus on the business side of Big Data. Here are the new rules of the game; Veracity, Value, and Virtue.

Veracity

Data Veracity is a constant problem. This relates to all of the errors we create for ourselves. Simply ask yourself if you have ever entered data incorrectly on a web form. Do you enter addresses, phone numbers, postal codes in the correct format – are you sure? Surprisingly, a large majority of people don’t know how to enter data correctly or are too lazy to do so.

In 2016 IBM estimated the cost of poor data quality to be $3.1 trillion dollars. That’s a staggering number that can be related to; invalid user input, poor user interface design and form validation, poor data collection techniques, poor database architecture, and data interchange formats. This is no small order of things to consider when building Big Data technologies and infrastructure. Even the best, brightest or biggest companies can end with snags and glitches.

About 15 years ago Google and Microsoft gave up on the idea of forcing users to enter contact information correctly. They opted for a simple text box for address and phone numbers. The user will be educated when they attempt to use the myriad of tools supplied by these companies. If the user did not enter data correctly, the tools will not work. Simple fix? Not so fast.

The process of extracting valid and veracious data is a daunting one. There is a market for this and companies are making moves here. It’s not perfect yet but we are moving.  

Value

The Value prospect is always critical to the success of a company. Companies that produce and mine data will be the biggest benefactor of Value. If you are mining data, you acquire it at a cost far less than the market value and sell it to consumers at a high mark-up.

Content is still king, own it, build it, sell it. Data mining is a key element to the success of a business. How do companies make money here? Getting started is quite simple. Start with a Sales contact list and find out where customers come from. Go deeper and ask if your buyers come from areas of affluence or conversely, less privileged areas. These factors will influence buying patterns. For example, an automotive finance company may be very interested to know why a region is underrepresented. A proper and informed analysis may show that there are more foreclosures and repossessions in that area and as a result, it would not be a good choice to sell to the area.

Operations (Ops) data is the next area for industries to focus on. This is the data that communicates the cost of production. Sales and Operation (S&Op) is the key ingredient to a successful pipeline of business. S&Op tells the story of the cost of sales, the production process, shortfalls and overstocking. This data can indicate the health of your business. If S&Op is managed correctly, it can produce results that meet customer needs and maintain durable production levels with few shortfalls.

IoT devices are breaking new ground in the value sector. By 2025 the volume of IoT devices is expected to reach 75 billion. Connected devices allow greater communication and connection to the future needs of consumers. This opens an enormous potential for “Smart” industries.

The Canadian government launched a program through Next Generation Manufacturing Canada (NGEN). Their focus is to bring Smart technologies to the manufacturing sector with a $1 billion federally funded program for new technology innovations (https://www.ngen.ca/).

Virtue

Virtue in Big Data is the most important topic of the next decade. How will our data be used? Will it make us better or further separate the masses?

In 2016 Europe's General Data Protection Regulations (GDPR) introduced a set of rules for data governance. This dramatically changed the way data handlers use and manage their products. The entire state of electronic business affairs for most of the world changed overnight. Most companies ran in fear and locked down and secured every aspect of technology that touch personal data.

Sadly, the most visible consumer-facing product of GDPR was the introduction of the “Do you accept cookies?” button on every website – Yes, I will accept your cookies. I will admit, I have pressed this button too many times. I am confident that this new feature simply urged us to sign our lives away yet again.

The introduction of these laws has clearly opened a need to interpret them. We do need more people to establish boundaries in data governance however, we need educated people to find ways to communicate data without prejudice and find solutions that will allow many of these opportunities to flourish.

You may have heard the assertion, “if the product is free, you are the product”. Throughout the last 10 years, Social Media giants have been in a lot of hot water over the collection and sale of personal data. The algorithms used by these companies are so good that they predict our need for products before we even recognize it ourselves.

It has become very common for companies to gamble on their use of personal data. Some will weigh out the cost of a lawsuit over the money they will make from the sale of the data.

Data collection products are utilizing techniques that draw small parts of your personal data profile. That data is sold to other entities who use your data alongside a related dataset that completes areas of their data profile. None of these entities will have the whole picture but, combined they can build a product that suits their needs.

Cambridge Analytics was a great example of sneaky data collection. They used online surveys to collect small pieces of your Social Media profile. Each result set would build a better picture of who you are. They were very successful with the use of personal data profiles.

Not too long ago, a mid-range TV cost over $1500. Today we can get the same quality or better for under $400. Why? Data collection is driving the innovation is Smart technology. These Smart technologies offer a staggering amount of data collection opportunities from viewers.

In 2017 Visio, a Smart TV manufacturer was fined for $3.7 million for sharing the IP addresses of consumers. The IP addresses were converted into household addresses by an external data aggregation vendor. This is a common and reasonably ordinary thing most programmers can do using a few simple calls to an Application Programmers Interface (API).

Samba TV is a company tracking viewer information to make better recommendations. Their user agreement page places a simple and easy to use the “Enable” button for the user to select. There is very little explanation of how the data collected from Samba TV will be used. Heck, it’s easy to use and it gives me great recommendations, right?

Alexa and Google Home have seen their days in court and the media for their sound collection practices. Our friendly Smart TV also has the ability to collect sound. All speech to text software must connect to the internet in order to decode and transfer commands back to the device. The Machine Learning (ML) systems that translate these commands are managed by people who build the definitions and continually make the algorithm smarter.

Conclusion

In conclusion, if you apply the 6 Vs of Big Data to your business philosophy for the next 10 years, you will certainly have great success. Find the right consultants to help you build opportunities in these areas.  

Students in Big Data Analytics at Georgian College learn these concepts and much more. Feel free to connect with me for more information on how you can work with us chris.dyck@georgiancollege.ca.