When we talk about data science or data science solutions, things that come into mind are sexy and advanced implementation in technology–we think about knowing our users on such a deep level, predicting behavior, having personalized recommendations right on our notifications, having an almost seamless conversation with a chatbot. We think about the endless possibilities of data that’s right on our fingertips accessible through our smartphones and browsers. These endless possibilities spell convenience, accessibility, and choices.

But not everyone has a smartphone. Or an internet connection. In the Philippines, only 65% of the adult population has a smartphone, while the internet penetration rate is at  71% (Digital Report: Philippines). When we think about data solutions, how do we make it accessible to the marginalized? How do we take these techy, sexy solutions about data science for social good?


Inclusion at the core of data science for social good

During this COVID19 health crisis, what are the most intuitive data science solutions have you seen or used?

A lot of it is probably convenience related–delivery of goods and meals at your doorstep, finding open banks and other services within your area. Maybe something altruistic–finding the donation centers, where to bring the right donations like food or PPE for our front liners. Or just to know what’s happening out there by looking at dashboards that describe the state of the pandemic in the Philippines, or how our government is faring in terms of poverty assistance or budget disbursement.

All of these, at the touch of a button on our smartphones or personal computers.

It’s easy to assume pristine conditions when cooking up AI or ML solutions–data is readily available, people will use it, it will improve lives, and we will live happily ever after. One of the challenges in developing AI/ML solutions is bringing it to the masses. How will these data science solutions improve the everyday lives of people at the margins?

Data science for social good is not new, but it has yet to gain popularity in development sectors. There are a lot of opportunities on data that the development sectors can start with. This crisis produced a lot of great collaborations as well as lessons that we can learn from.

So, where do we start?


Of open data and platforms: Sharing as a habit

A lot of the first barriers of collaboration is inaccessible data. There can be a lot of reasons: it can simply be no one wants to share the data, or data formats released are usually not machine-readable (oh yes, hello tables in PDFs). If it’s possible to share data, let’s make it easier for everyone! What does it mean to go open data?

Open Data Handbook says there are three important things to keep in mind when you want to share your data: 

  1. Data should be accessible and available: Minimize reproduction cost and maximize convenience, remember to render it into file formats that are easy to modify (like CSV or XLSX) and acquire. For example, during this public health crisis, government institutions can easily get people working on analysis and applications when data is convenient to get—less work for collaborators on cleaning and scraping the data. But getting this started is not easy at all. It requires integrating data and knowledge management to everyday operations of an institution or organization.
  2. Data should be reusable and covers redistribution: Make sure policies on sharing data are in place. Doing open data practices means you should consider reuse and redistribution—like merging it with other datasets for analysis. 
  3. Data should cater to universal participation: You can’t just restrict particular fields, groups, or individuals from using that data. No discrimination in participation.

The keyword here is interoperability. To work together, data should easily adapt to diverse systems, flexible enough to be wrangled and merged for analysis and applications. Interoperability also touches on the opportunities of creating great (and well documented, I may add!) application programming interfaces (API)—giving other developers and data scientists to work on other solutions you haven’t thought of yet. 

While not all datasets are meant to be open, there are opportunities that can be explored for open datasets. Share, if you can!


No more redoing the wheel: Collaboration is key

At the peak of the information age, data is a resource. From one dataset, we can derive different values for it depending on our analysis or purpose. Don’t reinvent the wheel! An innovation mindset requires a faster turnaround and collaborations. From my experience in the academe and the industry, good solutions bring different disciplines together. Various domain knowledge outside of computing brings fresh perspectives on how a team will look at solving problems.

During this COVID19 crisis, as data scientists, we can see the value of collaborating with epidemiologists, medical doctors, biologists, mathematicians, statisticians and social scientists on making assumptions and models that can help create a clear view of the situation. Development problems are complex—privileging one point of view is a simplistic way of solving problems like poverty.


Participation is not a token

In my work as a development practitioner, one of the things I always emphasize is our community partners should have a sense of ownership with the solutions we bring to them. What does this mean? When we create applications and solutions with a particular sector, asking for their feedback should not remain as lip service. Development initiatives fail when the community feels it does not solve the problem at all—bringing AI/ML is doubly challenging.

Accessibility is the first consideration when working with communities in the development sector. Imagine designing a good solution that you can use with a smartphone and internet, but the area has no data connectivity available! 

In creating AI/ML solutions, we often forget to look at representation from our end users. One federal study found that a facial recognition technology in the US will likely fail in recognizing people of color. Algorithms are representations of our worldview as developers and data scientists, and therefore we should be wary of how our code teaches machines how to see the world.

This brings us to the last point I want to make.


Reflection is essential

Like how machines learn from mistakes and uncertainties, development initiatives concerned with data science solutions will benefit from asking questions: What can we do better? How do we reach the people who need the solution? I heard once from my mentor that we usually skip reflecting because it’s non-billable work—we don’t make money off it, we don’t create groundbreaking academic work, it’s not even considered productive. But reflection makes us stop and think about what happened and maybe learn lessons for future reference. 


What now?

We can take small steps in starting data science for social good initiatives. Sharing and collaboration are achievable goals. Have an idea for communities in need? Talk about it with a friend who can help you. In the words of the hit HBO series Silicon Valley, let’s start making the world a better place.

Rikki Mendiola is a data analyst under the Product Engineering – Analytics pod at Amihan. Before joining the industry, she worked in the academe implementing various communication and information technology projects for the development sector.