Discover more from Escaping Flatland
Language models as community moderators
Souvenir d’un Futur, Laurent Kronental, 2014
Community moderation works. This was the overwhelming lesson of the early internet. It works because it mirrors the social interaction of real life, where social groups exclude people who don’t fit in. And it works because it distributes the task of policing the internet to a vast number of volunteers, who provide the free labor of keeping forums fun, because to them maintaining a community is a labor of love. And it works because if you don’t like the forum you’re in — if the mods are being too harsh, or if they’re being too lenient and the community has been taken over by trolls — you just walk away and find another forum.
— Noah Smith, The internet wants to be fragmented
If you want to have public conversations of high quality, algorithmic content filtering is the wrong approach. As Noah Smith points out in the quote above, we’ve known for a long time that community moderation works better. But the internet has largely moved away from that, for several reasons. First, community moderation reduces conflict which is bad for Twitter and other social media companies that want to increase engagement to sell more ads.
Second, community moderation doesn’t scale. To maintain good norms around conversations, human moderators need to read what is written and remove low-quality comments that otherwise, like a broken window on a street, would undermine the standards so that more people start making stupid comments. (Most communities of size do not moderate this strictly but limit moderator overview to flagged comments, which leads to a corresponding drop in quality.) There is an inverse correlation between community size and quality of conversation.
My hunch is that conversations on platforms that rely on algorithmic content filtering are going to deteriorate even more when generative AI models start flooding the feed with convincing bots. The next US presidential election will give us a sense of how crazy that will be.
But the same AI models that threaten to undermine platforms like Twitter could also power more productive speech communities. Following Lars Doucet and Maggie Appleton, I predict that we’re going to see an acceleration of the trend of people moving off the big platforms in favor of chats and smaller forums that are community moderated and often gated. In these self-selected communities with clear norms, AI models can help out by automating some of the work that community moderators do. This could allow us to grow moderated communities larger than we’ve been able to before.
AI is already being used for community moderation in a small way, with OpenAI and others offering tools that can flag comments by reading them and indicating if they say anything offensive. This is still fairly close to normal algorithmic moderation. But as prices drop, we can go much further. We can do fluid taste-based moderation.
An important reason why community moderation works is that it can be highly opinionated. When policing the comment section of a blog or the threads on a Subreddit, you don’t need to make decisions that are acceptable for everyone in the way you have to if you are Twitter. It is perfectly fine to remove a comment simply for not being funny, for repeating stuff that’s already been said, for not being offensive enough, or for any other arbitrary reason. The reason doesn’t even have to be well-defined. For the comments on my Substack, my policy is basically “Would I let someone talk like this in my living room?” And if my gut tells me, “No, I wouldn’t suffer this,” I remove it. Isn’t that despotic? Yes! I’m the Emperor of this living room.
Moderation in situations like this comes down to taste. And you can’t define taste in a simple flagging algorithm—it is more a fuzzy pattern matching. In other words, the kind of task that language models are getting good at. You could provide GPT-4 with a bunch of examples of comments I, as the moderator of Escaping Flatland, have upvoted, and comments that I have removed, and my hunch (after running a few rapid experiments) is that GPT-4 can predict if a comment meets the vague criteria I use most of the time. Using language models this way, I could scale my taste, or at least a weaker version of my taste. That could allow us to maintain the lovely tone we have here even if the comment section grows beyond the size where I feel like reading it all.
But let’s go further. When you have a language model that can tirelessly moderate, it doesn’t make sense to just have it do my job. We can have it do things that I would never have the time to do!
If we prompt a language model to recognize a particular taste, it doesn’t have to wait until the comments are written—it can scaffold you already as you write. It might say, as you type, “That’s an interesting thought! But I notice that you are making assumptions about what Scott Alexander means by this statement—and in this community, the norm is that you ask questions to clarify the assumptions if they aren’t clear. Do you want me to rewrite it as a question?” A salon hostess more than a moderator.
A lot of people would hate being scaffolded like that, but they will have to find another community to hang out in. For others scaffolding will be a relief, being held by the hand and given scaffolding as they figure out the tone and jargon of a community. The risk that the community will shun you goes down.
You could also use semantic search to see if the point that a commenter is about to make has already been voiced and guide them to where the discussion is already happening. This way users are encouraged to participate in a cumulative conversation rather than repeat the same talking points. The language model might say, “The question you are raising reminds me of this one raised by Bob. Maybe you can see if this part of the conversation answers it and perhaps voice a follow-up question if you were looking for something slightly different? Also, here’s a point that the author of the blog post made in another thread that might be relevant.”
This would reconceptualize what the comment section is. Instead of a way to comment, it becomes a search bar that lets you surface passages from the blog post, the comments, and other sources in the community, and in the act of searching you can contribute to relevant parts of the conversation.
Say you are reading a post about zero-knowledge proofs and feel confused. You start asking questions or voicing ideas, as a way to process what you read, and that surfaces information you need and sets you in contact with other people who are thinking about the same topics. In the act of thinking about what you read, you almost unintentionally help further the conversation. You might ask a question and then get a prompt saying, “That’s a really interesting question that has not been asked before! Do you feel like posing it to the community? You might want to tag Charlie who made this comment which is similar in spirit.” This could lower the barrier of participation for those who are too self-critical to voice their thoughts in public.
I notice with some sadness that many of the internet writers that inspired me a few years ago have moved away from working in public. They are publishing less, if at all, and are no longer having open-ended conversations on Twitter. Talking to a few of them, I get the impression that it is simply not worth it dealing with low-quality replies. They never cared about reach; they cared about thinking well. Having summoned a network through their writing, they shifted more of their conversations to private channels. This is a loss for those who do not have access to these channels, those who enjoyed peaking in as they worked in public, learning from them as they figured out how to navigate this world.
Being able to offload moderation to a language model might shift this somewhat, making it more attractive to work in public. If you had a language model that filtered your interactions with the public, you could move more of the notetaking and conversations out from private channels. You wouldn’t get distracted by people making canned replies or sneering at your half-formulated thoughts. The language model could filter that and notify you only when a comment, or a summary of a set of comments, seems like it could help move your thinking forward, given what the language model knows about your preferences and goals.
Another effect of having language models interface between readers and writers is that you, as a writer, could put out data that is too messy to be meaningful if read through from start to finish—like transcripts of every intellectual conversation you have—because the readers could use language models to summarize it at various level of detail allowing them to zoom in and out depending on what they are looking for.
I have a strong suspicion that my thinking is skeuomorphic (that it borrows too much from already existing interfaces). The right interface for AI moderation would probably dissolve many of the assumptions we have about how communities look; it would reinvent the interfaces we use to make meaning together. Figuring out the right form factor, finetuning the prompts, researching how users experience these types of interactions—there are a lot of open questions.
If done right, AI moderation could enable what Nick Szabo calls social scalability:
Innovations in social scalability involve institutional and technological improvements that move function from mind to paper or mind to machine, lowering cognitive costs while increasing the value of information flowing between minds, reducing vulnerability, and/or searching for and discovering new and mutually beneficial participants.
Large language models, while disrupting open social platforms like Twitter, could end up scaling the size of more opinionated and self-selected communities. More people would get to experience well-moderated and coordinated communities that move the conversation and their collective aims forward.