Analysis, interpretation, storytelling – these are some of the words that come to mind whenever we talk working with data.
Over the last decade, more and more buzz has been created around data. Several business publications have made the fearless forecast that in the years to come, data might just become more valuable than oil. Truth be told, they’re not wrong. Data allows for a better understanding of the business landscapes across varying industries. It also makes it easier to save day-to-day problems while making plausible future outcomes.
Through different conversations and online events, our Data Science Lead, Rikki Mendiola, breaks down the essential know-hows of dealing with data and everything in between the good, the bad, and the ugly.
The Good
Measuring the Data You’re Working With
Having measurable indicators has always been essential when it comes to business. But because of the pandemic, it has become more important to pay attention to the metrics you have set. With data and data analytics, a complex problem can be made more manageable by breaking it down into more digestible components. Anything can be quantifiable using assumptions and problems can be represented better with the use of models; be it a regression, progression trees, or clustering.
Better Decision Making with Data for Organizations
The possibilities brought about by artificial intelligence and data are close to limitless. With analytics as a major part of digitalization, businesses would be able to make more informed and fact-based choices that could serve as catalysts for their growth.
Sometimes, its not just about applying the best machine learning solution out there. The great thing about data is that as long as you are able to properly identify your problems- the how’s and the why’s that you need to resolve- it becomes clear that simple, usable data is really the solution.
The Bad
Working with Data is Not Clean
Data is never clean- not all systems were designed with analytics or artificial intelligence as the main goal. It’s best to keep in mind that when it comes to making the most of data if it’s too clean – it’s always suspicious. In fact, 80% of the job of data engineers entails cleaning up after the data.
You Are Only As Good As The Data You Have
Diving into it even deeper, it’s for your benefit to understand right from the get go that data teams are not magicians. Hence, some problems cannot be solved by simply imagining the data. An empirical approach needs to be used. This is why it is essential to come up with strategies based on exploration and experimentation.
A Double Edged-Sword
Ever heard of the saying that numbers don’t lie? When it comes to data, they can in fact, do. Descriptive statistics can hide underlying phenomena. While it’s good that numbers help you simplify your problems, this may also mean that you miss out on important factors, results, or information. Data can tell you what – but it cannot tell you why. As we work on assumptions, context cannot be dismissed.
As in all promising things, it takes time for results to bear fruit. While it holds true that analytics is all about speeding things up, it is almost just as valuable to take your time when examining data. “Marinating” the data you have can bring in new insights. Collaboration between business units and groups with different priorities also takes time.
The Ugly
Not All Systems are Data-driven
Most organizations- and their systems- are not ready for analytics. Sometimes, the bulk of the work requires standardizing pools of data for ready use.
Artificial Intelligence is Never Agnostic
How are you using the data that you have? How do you intend to use the data that you have?
Engineers and scientists create algorithms to help machines understand the world worldviews and biases can be baked right in the creation process. Racial and gender bias exist in these systems. These designs opens up data-driven systems to bias and interpretation.
These conversations must never be skipped when we are creating systems that claim to help us understand the world.
It’s Not Just Data For Data’s Sake
Everything today can be represented with the use of 0s and 1s – including inarguably complex problems and systems. Data and data science provide adept businesses with competitive advantages, so long as they invest the time and resources for it. It is also about speed, scale, and convenience, as well as realizing that it’s not just about data collection and ingestion. Starting with big data also entails cleaning and curating data to maximize the value it can bring to your business.
Working with data not only requires the right mindset or resources- time is also a factor to consider in delivering business value. We work with our clients to deliver results towards insights through Amihan Analyze.
Built with the world’s most successful open source data projects–it combines the massive-scale data processing speed of Apache Spark, Trino’s highly parallel and distributed query engine, and the elasticity and power of Apache Kafka and Apache Nifi.
Maddie Cruz is a content writer and marketing professional.