We're hoping you enjoyed Dr. Jillian Ney's post as much as we did here at Socialgist. This week we chat with Dr. Naushad UzZaman, the co-founder & CTO of Blackbird.AI, a technology platform focused on providing real-time analysis to identify and characterize disinformation.
First of all, what is your background?
I began working on Natural Language Processing for my native language, Bengali, when I was a junior at college. This is before NLP and Data Science became such a hot space. I spent another two years doing research on Bengali NLP, five years towards a Ph.D. at the University of Rochester, then another eight years of industry at Microsoft, Yahoo and Nuance, and 30+ published papers later, I co-founded Blackbird.AI, where I’m currently CTO. I have also worked on social media data; dialog systems for cars, TV, media boxes; medical NLP and temporal information extraction processing.
What is the bleeding edge for with regards to data?
There are lots of exciting AI developments recently due to advancements in deep learning and the availability of computational resources, especially as it relates to social media data. What’s most challenging and exciting to me is the manipulation or amplification of voice.
Working on disinformation for the last several years on social media, I can state very strongly that any trending topics you see, there will be synthetic amplification. It could be from state actors, to competitors to anarchists that want to watch the world burn, but the threats are very real.
Legacy social listening systems that count keywords to understand what is trending and important aren't equipped to handle this new reality. Disinformation doesn't just affect “trending” topics, but also impacts our politics, policies, businesses and stock market and as we recently discovered during our research on COVID-19 Disinformation, in the current pandemic world, disinformation through narrative manipulation is literally killing people.
What new dataset intrigues you the most?
Real-time data from social media intrigues me the most. Nothing beats the challenge of interpreting real-time data coming from different social media platforms to get a sense of the global pulse. The challenge here is that social media is overrun with fakes, forgeries, and bots, but if you can understand synthetic amplification it is possible to cut through the noise and get a real feel for what is organic and what is manipulated.
What findings could you derive from it?
To go a level deeper, you can discover linkages between different social media platforms. For example, when a conversation or narrative goes from 4chan to Telegram groups to Twitter to mainstream media.
Datasets like these, don't just quantify disinformation but helps us find the source and lineage of the narratives.
Is there anything else you'd like to add?
Democracy works when the public is informed. It breaks down completely when the flow of information is corrupted. Information integrity is very crucial for functioning democratic societies.
To build an informed democratic society, we need the transparent flow of information. We need to identify when some conversations are being manipulated and how these narratives are being spread from one platform or influencer to another and deceive a large number of people from repeated exposure.
None of us want to live in a censored society, whether it comes from an authoritarian government or a few powerful social media companies. We all want to be part of a society with transparency, where people can see if the ideas and stories they consume are manipulating them to alter their behavior for malicious gain, and it’s critical that we all do our part to make sure that we don’t allow that to happen.