abstract pattern on dark purple background

Microsoft Research Lab – India

Podcast: Enabling Rural Communities to Participate in Crowdsourcing, with Dr. Vivek Seshadri

分享这个页面

Photo of Dr. Vivek Seshadri

Episode 002 | March 20, 2020

Crowdsourcing platforms and the gig economy have been around for a while. But are they equally accessible to all communities? Dr. Vivek Seshadri, a researcher at  Microsoft Research India, doesn’t think so, and is trying to change this. On this podcast, Vivek talks about what motivated him to focus on research that can help underserved communities, and in particular, about Project Karya, a new platform to provide digital work to rural communities. The word “Karya” literally means “work” in a number of Indian languages.

Vivek primarily works with the Technology for Emerging Markets group at Microsoft Research India. He received his bachelor’s degree in Computer Science from IIT Madras, and a Ph.D. in Computer Science from Carnegie Mellon University where he worked on problems related to Computer Architecture and Systems. After his Ph.D., Vivek decided to work on problems that directly impact people, particularly in developing economies like India.

Related

 

Transcript

Vivek Seshadri: If you look at crowdsourcing platforms today, there are a number of challenges that actually prevent them from being accessible to people from rural communities. The first one is, most of these platforms contain tasks only in English. And all their task descriptions, everything, is in English which is completely inaccessible to rural communities. Secondly, if you go to rural India today, the notion of digital work is completely alien to them. And finally, there is a logistical challenge here. Most crowdsourcing platforms will assume that the end-user has a computer and constant access to internet. This is actually a luxury in many rural communities in India even today.

(Music plays)

Host: Welcome to the Microsoft Research India podcast, where we explore cutting-edge research that’s impacting technology and society. I’m your host, Sridhar Vedantham.

Crowdsourcing platforms and the gig economy have been around for a while. But are they equally accessible to all communities? Dr. Vivek Seshadri, a researcher at  Microsoft Research India, doesn’t think so, and is trying to change this. On this podcast, Vivek talks about what motivated him to focus on research that can help underserved communities, and in particular, about Project Karya, a new platform to provide digital work to rural communities. The word “Karya” literally means “work” in a number of India languages.

Vivek primarily works with the Technology for Emerging Markets group at Microsoft Research India. He received his bachelor’s degree in Computer Science from IIT Madras, and a Ph.D. in Computer Science from Carnegie Mellon University where he worked on problems related to Computer Architecture and Systems. After his Ph.D., Vivek decided to work on problems that directly impact people, particularly in developing economies like India.

(Music plays)

HOST: Vivek, welcome to the podcast.

Vivek: Thanks, Sridhar. This is the first time I am doing anything like this, so I am really excited and a little bit nervous.

Host: Oh, I don’t think there’s anything to be nervous about really here. You guys are used to speaking in public all the time. So, I’m sure it’ll be fine.

Vivek, you are a computer scientist and you did your PhD in Computer Science in Systems, right? What made you gravitate towards research that helps underserved communities, typically the kind of research that one associates with the ICTD space?

Vivek: So, Sridhar, when I finished my PhD in 2016, I sort of had two decisions to make- should I stay in the US or should I move back to India? Should I stay in the same area that I am doing research in or should I move to a different field? Both these questions were sort of answered when I visited MSR and had interactions with people like Bill Thies. The kind of research that they were doing impressed me and also influenced me to make the decision to come back to India and work on similar problems that directly impact people.

Host: That’s interesting. So this is something that was brought upon by meeting people in the lab here rather than something that was there in your mind all along.

Vivek: Absolutely. Actually, when I started my PhD, I wanted to come back and become professor in places like IIT or IISc. And when I moved back, I was actually introduced to MSR by one of my friends who actually visited MSR before me. And I just thought I’ll pay a visit. And the conversations that I had with people here, sort of made my decision absolutely easy.

Host: And the rest is history, as they say.

Vivek: Absolutely. It’s been three years since I moved here and I couldn’t be happier.

Host: Great. So Vivek, walk us through this project called Karya, which I know you have been associated with for quite a while. What exactly is Project Karya and what are your goals with that project?

Vivek: So, there are two trends that enables or motivates the need for a project like Karya. The first trend is that there is a digital revolution in the world today, where improvements in technologies like Machine Learning are allowing people to interact with devices using natural language. The second trend is specific to India where we are trying to push towards a digital future which is creating a lot of tasks like audio transcription, document digitization, etc. Both these trends are going to result in a huge amount of what we call digital work. And the goal for Project Karya is to take this digital work and make it accessible to people from rural communities who typically have very low incomes today and are predominantly stuck with physical labor. We believe completing these digital tasks and getting paid for them will be a valuable source of supplemental income for people from rural communities.

Host: Crowdsourcing and crowdsourcing platforms have been around for quite a while now. And they are also well-established methods of gig work. So what’s the need for another approach or a different framework like Karya?

Vivek: That’s a great question. If you look at crowdsourcing platforms today, there are a number of challenges that actually prevent them from being accessible to people from rural communities. Specifically, let me describe to you three challenges. The first one is, most of these platforms contain tasks only in English. And all their task descriptions, everything, is in English which is completely inaccessible to rural communities. Secondly, if you go to rural India today, the notion of digital work is completely alien to them. In fact, when we went to rural communities in our first visit and told them we will actually pay some money for completing some set of digital tasks, they looked at us in disbelief. Like they actually didn’t believe that we are going to pay them until we actually did. So, there is this huge issue of awareness. And finally, there is a logistical challenge here. Most crowdsourcing platforms will assume that the end-user has a computer and constant access to internet. This is actually a luxury in many rural communities in India even today.

Host: So, does Karya enable people to use their existing skillsets and knowledge to earn supplemental or extra income?

Vivek: So, Sridhar, like I mentioned, there are two sources of digital work that we are looking at currently. One is creating label data sets for models like automatic speech recognition, and other language-based machine learning models. The second source of digital work that we are looking at is things like speech transcription or document digitization, which the government is very extremely interested in. Now depending on what type of task we are going to do, people may have to be able to read in their regional language or type in their regional language. Now, when it comes to reading, we find that most people from rural communities are adept at reading in their regional language. When it comes to typing, as you can imagine there are not many good keyboards that will allow you to type in your local language. This is something that most people in rural communities have never done before. In fact, even though, most people in rural communities are not familiar with English, they actually use a very crude form of transliteration to actually communicate in their regional languages. That’s what we observed- most people used WhatsApp and when communicating with each other they actually use transliteration in English and not type in their native language.

Host: So, you are saying that there is a large number of people who are actually typing in the English script, but the language that they are representing is their own vernacular.

Vivek: Exactly. And the transliteration is very crude. They know what sounds each English alphabet corresponds to and they just put together a bunch of characters next to each other and it’s almost like they have created a whole new script for their local language.

Host: Right.

Vivek: But something like that wouldn’t actually be useful for us. We would want them to type in their local language. For instance, let’s take an example of document digitization. The idea there is, the government has a whole of government records which contain hand-written words in their local language. It could be names of people, it could be addresses, etc. When I want to digitize these documents, I may actually want someone to type out the names that they see in the document in the local language. Now, there, I would actually want them to use the native script. And not, some crude form of transliteration.

Host: Sure.

Vivek: So, in this particular case, we actually used a keyboard that was developed by IIT Bombay called Swarachakra. And our users actually learnt to use that keyboard within a very short span of time and they were able to perform extremely well in the task that we had assigned them.

Host: So, it sounds like there is a lot of work that is readily available. What is required is to actually deliver it and make it possible for people to leverage that work in order to earn extra income.

Vivek: Absolutely. Actually, the Government of India has its own crowdsourcing platform, where they outsource text digitization like I mentioned to anyone in India who wants to do it. Unfortunately, even that platform is not accessible to rural communities. If I go to rural India and ask anyone about that platform, they wouldn’t know anything about it. So, in some sense, there is work that is readily available, but there is this huge gap in access.

Host: And the gap in access is because these platforms work on their traditional paradigm of needing a desktop computer with an internet connection?

Vivek: Exactly. In fact, the platform that the Government of India has, it’s a website that you have to access and you need internet connection to receive tasks and complete tasks. And our goal is to sort of eliminate that requirement. In fact, the goal of Project Karya is to enable anyone with just a smartphone to be able to perform digital tasks on their phone.

(Music plays)

Host: I know you’ve already conducted some experiments with Karya. And you’ve also published a paper in Chi in 2019. Can you walk us through some of the results of the experiments that you’ve conducted?

Vivek: So, one of the biggest challenges in creating a platform like Karya is the perceived lack of trust in rural labor. When we actually spoke to many potential work providers on whether they would be willing to outsource their work to rural workers, one of the first questions that they ask is if they can trust the quality of labor that we get from rural workers. So, in the Chi paper, what we wanted to sort of evaluate was the accuracy and effectiveness with which workers from rural India can actually complete a specific type of digital task. So, in that particular paper, we actually looked at text digitization, where the task is as simple as the user is shown an image of an hand-written text and all they had to do was type out whatever word they see in the particular image. And of course, they will be given thousands of images that they have to digitize over a period of two weeks. And what we actually found in the paper was that workers from rural India actually did fantastically well. In fact, in a crowdsource setting, they outperformed a professional transcription firm to which we gave the same data set. So, that was very interesting for us.

Host: That’s really interesting. Do you have any insights into why that might have happened or, how this community of people that you engaged with were able to outperform professional services?

Vivek: So, with respect to the performance of the transcription firm itself, we could only guess, because it  was a black box for us. We just gave them the data set and asked them to provide the results and the results that we got were not that good. But we can definitely guess why workers from rural communities did so well. First of all, the additional income that workers from rural communities are getting out of completing these tasks is significant. So, for them there is actually a fear that they may not get paid if they don’t complete the tasks accurately. So, from that point of view, most users paid extreme attention to completing the tasks accurately. And these workers also found it a lot of fun. Like I mentioned before, most of their current work is typically physical labor, be it farming, many of them are actually unemployed. So, for them, this is actually a fun activity that they can do together with their friends where they also get some money. So, from their point of view, it was both fun and it gave them very very valuable supplemental income. I think both these were significant factors in the rural workers performing really well in the task that we gave them.

Host: Your Chi paper was based on text digitization by members of rural communities. But have you looked at other types of tasks that can be completed through Karya?

Vivek: Yes. Actually, as we were working on the platform, we realized that there is a real need for speech data sets in various languages in India. In fact, in our very lab, Kalika Bali, who is a researcher, is working on this project called Ellora, whose goal it is, is to create voice technologies for all the languages in India. One of the fundamental bottlenecks in achieving this is labeled speech data sets. A labeled speech data set is essentially a data set that contains various audio recordings, and the transcripts that correspond to those recordings. We actually found a mechanism to use Karya to collect such a data set for various languages. In fact, we have an ongoing study where we are collecting hundreds of hours of speech data for languages for which there is almost no data today.

Host: So, when you give out these speech collection tasks, what is the actual process, how does it actually work?

Vivek: So, at the lowest level, the task is essentially for the user to read out, record themselves reading out a sentence. However, to make the task more fun, we actually made them read out stories. Some empowering stories, some stories about history of our country, some stories about popular figures like Buddha, and users really liked reading out stories as opposed to reading out random bits of sentences.

Host: So, we’ve been talking about Karya as a project in which we are helping or building a new paradigm in crowdsourcing. What are the actual components that go into Karya as a system?

Vivek: So, Sridhar, if you look at any crowdsourcing platform that is out there today there are two major components. One is the server that actually contains all the tasks that have to be completed, that is the component that work providers interact with to submit the task that they want to get completed. The second component is actually the client that the workers will use to actually complete the tasks. In a typical crowdsourcing platform where internet connection is assumed, the client will directly talk to the server, get the tasks and the responses are also directly submitted to the server.

Host: Right.

Vivek: Now, like I mentioned, most rural communities in India do not have internet connectivity. In fact, two of the three locations that we have worked with have absolutely no connectivity. Which means a platform that assumes internet connectivity is going to exclude those people from participating in the platform and get paid for completing valuable tasks.

Host: So, how do you bridge that?

Vivek: So, the way we bridge this gap is by introducing this third component that we are calling a Karya Box. Now, the Karya Box is essentially a device that we will place in the village where we want to work with people. And you can think of the box as a local crowdsourcing server for that particular village.

Host: Okay.

Vivek: So, the Karya Box will essentially act as a local crowdsourcing server in the village where we have placed it. Users in the village can directly interact with the box through the Wi-Fi access point that the box will expose. So, anyone with a smartphone can just connect to the Karya box Wi-Fi and then interact with the box to get tasks and submit their responses as well. Now the question is, how does the box communicate with our server?

Host: Yeah.

Vivek: So, in most villages which do not have connectivity what we observe is there are definitely people who go to nearby cities for work or even to get digital content that they can get back to the village. What we need to do is to employ someone like that who can carry the box to a location where there is internet connectivity, periodically, maybe once a day or even once a week. And at that instant, when the box gets connectivity to the server it can exchange, both the responses that have been submitted already by the rural workers and also get any new tasks for the village, if any.

Host: That seems to be a smart and inexpensive way to get around the lack of connectivity issue.

Vivek: Absolutely. Actually, I can tell you a story around this.

Host: Oh, please, we love stories.

Vivek: When we did our recent study, we actually deployed the box in the village. That village actually has really good connectivity. So, we were actually expecting the box to be in regular contact with our server. But due to various reasons, there was an internet shutdown in the village for the first one week after we deployed the box. But there you go. Our system actually worked because it does not assume that the box regularly talks to the server.

Host: I am assuming and correct me if I’m wrong, that a lot of the people who are interacting with the system, the Karya app especially on the phone, right, they’d be doing something of this nature for the first time. How did people typically find working with the Karya app and were there significant hurdles, were there issues in the communities that you went and did your experiments with?

Vivek: So, like I mentioned before, most people actually found doing this kind of an activity a lot of fun. So, from that point of view, there was not much boredom even though the tasks were extremely repetitive. Now imagine, looking at words screen after screen and typing them out or sentences screen after screen and reading them out. This is probably a very mundane task for people in urban communities. But for people in rural communities, where they don’t get to do this kind of thing very often or even interact with a smartphone very often, they actually found it a lot of fun. In fact, many people actually found some sense of pride in actually completing tasks in their local language.

(Music plays)

Host: It seems like this kind of digital work has the potential over time to provide people with livelihoods and enhance existing incomes. Do you think there is a potential downside to digital work or a potential downside to the online gig economy?

Vivek: Definitely, there is a limitation, similar to any other gig economy, like your, cab-hailing services where it’s a physical gig work that you are doing, or delivering food, it’s again a physical gig economy. As more people join the platform, the amount of work that is going to be available for every individual person is going to go down. So in that sense, one should not think of even the digital gig economy as a sustainable source of livelihood. So from that point of view, one of the limitations is the excitement that workers in rural communities have for such kinds of tasks. These tasks are much easier to complete than the task that they are involved in right now. And they also pay much higher than the task that they are doing right now. So there is definitely the possibility that some of them may think this is a much more lucrative job that will provide a full-time income for them. But we have to warn them in advance, saying, this is not the case.

Host: So expectation setting is going to be key.

Vivek: Expectation setting is, in fact, a huge part of what we need to do when we actually scale out the platform. In fact, even for the small studies that we conducted in these villages where studies were for a period of two weeks, during which time people may earn let’s say 3000 rupees, their question at the end of the study is, “When are you going to come back?” Right? So, that sort of enthusiasm is both encouraging and scary. Because, if you don’t have a sustainable source of work that you can provide to these villagers, it can end up in disappointment.

Host: Was there anything that surprised you when you were working with and when you were talking to various communities during the experiments with Karya?

Vivek: Yes. Actually, two things stood out for us. The first thing is, how inclusive the notion of digital work can be when it comes to employing people from diverse backgrounds. What we observed was, women who were typically not allowed to get out of their house in rural communities for various reasons were able to participate on our platform and actually earn income for the first time in their lives. People with physical disabilities were able to participate on our platform.

Host: That must have felt extremely empowering for them.

Vivek: Absolutely. And the second thing that we observed is like I mentioned before, this sense of pride that they had when they were completing tasks in their local language. Like I mentioned before, this is not something that they get to do often. In fact, in one of our studies where the task involved was recording themselves reading out stories, many people actually went over and did the tasks all over again, just so that they can read the stories to their kids or to the community. This is something that was completely surprising to us. Now imagine if someone in an urban community would actually be willing to do that.

Host: Yeah. That’s good food for thought. So it certainly seems like your experiments with Karya show that it’s got a huge amount of promise and potential. Over time, where to you see or where do you hope to see Karya?

Vivek: So Sridhar, like I mentioned, as language technologies keep improving the need for creating these technologies for various Indian languages is only going to increase. There are going to be many startups which would want data sets for creating the models that they want in local languages. We believe, with our insights and solutions that we have built for creating a crowdsourcing platform for rural communities, Karya can be the platform that these organizations, both private startups or even the government, can come to, to get their valuable task competed.

Host: Vivek, this has been an extremely interesting conversation. Thank you for your time.

Vivek: Thanks a lot, Sridhar for giving me this opportunity to both talk about the project and also do my first podcast.

Host: My pleasure.

To learn more about Dr. Vivek Seshadri and the Technology for Emerging Markets Group, visit Microsoft Research India.