Twitter publishes the code of its recommendation algorithm on GitHub

Twitter has published the source code of its recommendation algorithm on GitHub, in response to Elon Musk’s proposal to make it public almost a year ago. In a conversation on Twitter Space, Musk said that he hoped users could find potential “issues” in the code and help improve it.

An opportunity for transparency and collaboration

Twitter’s decision to make the code of its recommendation algorithm public is an opportunity for transparency and collaboration in improving the technology. The publication of the code is a sign that the company is open to feedback and willing to improve its algorithm.

Twitter’s recommendation algorithm is responsible for displaying tweets in the “For you” section of the platform. The published code covers only this section, so the code behind the search algorithm or how content is displayed elsewhere on Twitter has not been released.

The details found in the code

Twitter users have found interesting details in the code of the recommendation algorithm. For example, Jane Manchun Wong noted that “Twitter’s algorithm specifically labels whether the author of the tweet is Elon Musk.” This could explain why Musk’s tweets appear so frequently. Wong also noted that the algorithm has tags that indicate whether the tweeter is a “power user” and whether he is a Republican or a Democrat.

The response of Elon Musk and Twitter

When asked about this aspect of the algorithm, Musk disagreed, stating that he “definitely shouldn’t be dividing people into Republicans and Democrats, that doesn’t make sense.” A Twitter engineer clarified that the categories are only used “for statistical tracking purposes and have nothing to do with the algorithm.” He said the labels are meant “to make sure we’re not biased towards a particular group.” However, it was not explained why Musk had his own category.

SEE ALSO  The Nintendo Switch OLED is at the best price on this site so don't miss the offer

The publication of the code of its recommendation algorithm by Twitter is good news for transparency in the technology industry. Allowing users to access the algorithm’s source code is a sign that the company is open to receiving feedback and collaboration on improving the technology.

The components of the Twitter recommendation algorithm

Twitter’s recommendation algorithm is made up of several interconnected services and jobs. While there are many areas of the app where Tweets are recommended, in this article we’ll focus on the “For You” timeline feed.

The Twitter recommendation algorithm consists of three main stages:

  • Get the best Tweets from different referral sources through a process called “lead sourcing.”
  • Classify each Tweet using a machine learning model.
  • Apply heuristics and filters, such as filtering out Tweets from users you’ve blocked, adult content, and Tweets you’ve already seen.

The service that is responsible for building and serving the “For You” feed is Home Mixer. Home Mixer is built on top of Product Mixer, a custom framework from Scala that makes it easy to create content feeds. This service acts as a software backbone connecting different candidate sources, scoring functions, heuristics, and filters.

How Tweets are chosen

The foundation of Twitter recommendations is a set of core models and features that extract latent information from Tweet, user, and engagement data. These models seek to answer important questions about the Twitter network, such as “What is the probability that you will interact with another user in the future?” or “What are the communities on Twitter and what are the most popular Tweets within them?”

SEE ALSO  A Tesla mobile? Why electric car makers now want to make smartphones

The Twitter recommendation pipeline is made up of several components that consume these features. The process begins with obtaining the candidates from different sources of recommendation.

Candidate sources

Twitter has several lead feeds that it uses to retrieve recent and relevant Tweets for the user. For each request, an attempt is made to extract the top 1,500 Tweets from a pool of hundreds of millions of Tweets across these sources. Leads are among the people you follow (Online) and those you don’t (Offline). In the “For You” feed, on-net and off-net Tweets have a 50% chance on average, although this can vary from user to user.

sources within the network

The in-network feed is the largest lead feed and aims to provide the most relevant and recent Tweets from the users you follow. Tweets from users you follow are efficiently ranked based on relevance using a logistic regression model. The best Tweets are sent to the next stage.

Once the candidates to appear on the timeline have been obtained, they are ranked to decide their order of appearance. The objective is to offer the most relevant tweets with the greatest potential for user interaction. To do this, a neural network with approximately 48 million parameters is used, which is continuously trained with user interactions with tweets.

The neural network uses thousands of features to determine the relevance of each tweet and give it a score. Of these scores, the 10 highest are selected to give the tweet a final ranking. It is important to note that at this point in the process, all tweets are treated equally, regardless of their origin.

SEE ALSO  We tested the Motorola Edge 40 Neo: elegant and powerful to conquer Spain for less than 400 euros

Heuristics, filters and product characteristics

After the ranking process, a series of heuristics, filters, and product characteristics are applied to create a balanced and diverse timeline. Some examples of these features include:

  • Visibility Filters: Tweets are removed based on their content and user preferences. For example, tweets from blocked or muted accounts are removed.
  • Diversity of authors: prevents many tweets from the same author from appearing in a row.
  • Content Balance – Makes sure that a balanced amount of In-Network and Out-of-Network tweets are delivered.
  • Feedback-Based Fatigue: The score of certain tweets is lowered if the user has provided negative feedback on them.
  • Social Proof: Out-of-Network tweets that do not have a second-degree connection to the tweet are excluded as a measure of quality. In other words, it makes sure that someone the user follows has either interacted with the tweet or follows the tweeter.
  • Conversations: Provide more context for replies by linking to the original tweet.
  • Edited tweets: Determine if the tweets on the device are out of date and send instructions to replace them with the edited versions.

mix and delivery

Finally, the Home Mixer service has a selection of tweets ready to send to the user’s device. As a final step, the tweets are mixed with other non-tweet content, such as ads, follow recommendations, and suggestions for new users, and returned to the device for display.

There will be many details that will appear from now on, without a doubt.

You can download the code at this link.

More information in blog.twitter.com.