Translation

Discord continues to grow faster than we expected, as does user-generated content. The more users - the more messages in the chat. In July we announced 40 million messages per day, in December we announced 100 million messages, and in mid-January we passed 120 million. We immediately decided to store chat history forever, so that users can return at any time and access their data from any devices. This is a lot of data, the flow and volume of which is growing, and all of it must be available. How do we do it? Cassandra!

What did we do

The original version of Discord was written in less than two months in early 2015. Perhaps one of the best databases for fast iteration is MongoDB. Everything in Discord was specifically stored in a single MongoDB replica set, but we were also preparing everything for a simple migration to a new DBMS (we knew that we were not going to use MongoDB sharding due to its complexity and unknown stability). In fact, this is part of our corporate culture: develop quickly to experience new feature product, but always heading for a more reliable solution.

The messages were stored in a MongoDB collection with a single composite index on channel_id and created_at . Around November 2015, we reached the milestone of 100 million messages in the database, and then we began to understand the problems that await us: the data and index no longer fit in RAM, and delays become unpredictable. It's time to migrate to a more suitable DBMS.

Choosing the right DBMS

Before choosing a new DBMS, we needed to understand the available read/write patterns and why there were problems with the current solution.

It quickly became clear that reads were extremely random, and read/write ratios were about 50/50.
Heavy voice chat servers Discord sent almost no messages. That is, they sent one or two messages every few days. In a year, a server of this type is unlikely to reach the milestone of 1000 messages. The problem is that even with such a small number of messages, this data is more difficult to deliver to users. Simply returning 50 messages to the user can result in many random disk lookups, resulting in a disk cache flush.
The heavyweight Discord private text chat servers send a decent amount of messages, easily falling into the range between 100K and 1M messages per year. They usually request only the latest data. The problem is that these servers usually have less than 100 members, so the data request rate is slow and unlikely to be in the disk cache.
Large public Discord servers send a lot of messages. There are thousands of members sending thousands of messages a day. Millions of messages a year are easily typed. They almost always request messages sent in the last hour, and this happens frequently. Therefore, the data is usually in the disk cache.
We knew that in the coming year, users would have even more ways to generate random reads: the ability to view their mentions in the last 30 days and then jump to that moment in history, view and go to sticky posts, and full text search. All of this means even more random reads!

We then defined our requirements:

Linear scalability- We don't want to revise the decision later or manually move the data to another shard.
Automatic failover- We like to sleep at night and make Discord as self-healing as possible.
Little support- It should work as soon as we install it. We are only required to add more nodes as the data grows.
Proven at Work We love trying new technologies, but not too new.
Predictable performance- Messages are sent to us if the API response time exceeds 80ms in 95% of cases. We also don't want to face the need to cache messages in Redis or Memcached.
Not blob storage- Writing thousands of messages per second won't work great if we have to continuously deserialize blobs and attach data to them.
open source- We believe that we control our own destiny, and do not want to depend on a third-party company.

Cassandra turned out to be the only DBMS that met all our requirements. We can just add nodes while scaling, and it handles the loss of nodes without any impact on the application. Large companies like Netflix and Apple have thousands of Cassandra nodes. Associated data is stored side by side on disk, ensuring a minimum of lookups and easy distribution across the cluster. It is maintained by DataStax, but distributed openly. source code and community forces.

Having made a choice, it was necessary to prove that he was really justified.

Data Modeling

The best way to describe Cassandra to a beginner is the acronym KKV. The two letters "K" contain the primary key. The first "K" is the partition key. It helps to determine in which node the data lives and where to find it on disk. There are many rows inside a section, and a particular row inside a section is determined by the second “K” - the clustering key. It acts as a primary key within a partition and defines how the rows are sorted. You can think of a section as an ordered dictionary. All these qualities combined allow for very powerful data modeling.

Remember that messages in MongoDB were indexed using channel_id and created_at ? channel_id has become a section key since all messages work in a channel, but created_at does not good key clustering because two messages can be generated at the same time. Luckily, every ID on Discord is actually created in Snowflake, meaning it's sorted chronologically. So they could be used. The primary key has become (channel_id, message_id) where message_id is Snowflake. This means that when the channel is loaded, we can tell Cassandra the exact range where to look for messages.

Here is a simplified schema for our message table (it skips about 10 columns).

CREATE TABLE messages (channel_id bigint, message_id bigint, author_id bigint, content text, PRIMARY KEY (channel_id, message_id)) WITH CLUSTERING ORDER BY (message_id DESC);
While Cassandra's schemas are similar to relational database schemas, they are easy to change without any temporary performance impact. We have taken the best from blob storage and relational storage.

As soon as the import of existing messages into Cassandra began, we immediately saw warnings in the logs that partitions larger than 100 MB were found. Yah?! After all, Cassandra claims to support 2 GB partitions! Apparently, the possibility itself does not mean that this should be done. Large partitions put a heavy load on the garbage collector in Cassandra when compacting, expanding the cluster, etc. Having a large partition also means that the data in it cannot be distributed across the cluster. It became clear that we would have to somehow limit the size of the partitions, because some Discord channels can exist for years and constantly increase in size.

We decided to distribute our messages in blocks (buckets) by time. We looked at the largest channels in Discord and determined that if we store messages in blocks of about 10 days, we will comfortably fit into the 100 MB limit. Blocks must be obtained from message_id or timestamp.

DISCORD_EPOCH = 1420070400000 BUCKET_SIZE = 1000 * 60 * 60 * 24 * 10 def make_bucket(snowflake): if snowflake is None: timestamp = int(time.time() * 1000) - DISCORD_EPOCH else: # When a Snowflake is created it contains the number of # seconds since the DISCORD_EPOCH. timestamp = snowflake_id >> 22 return int(timestamp / BUCKET_SIZE) def make_buckets(start_id, end_id=None): return range(make_bucket(start_id), make_bucket(end_id) + 1)
Cassandra partition keys can be composite, so our new primary key is ((channel_id, bucket), message_id) .

CREATE TABLE messages (channel_id bigint, bucket int, message_id bigint, author_id bigint, content text, PRIMARY KEY ((channel_id, bucket), message_id)) WITH CLUSTERING ORDER BY (message_id DESC);
To query for recent messages in a channel, we generated a range of blocks from current time to channel_id (this is also sorted chronologically like Snowflake and must be older than the first message). We then poll the partitions sequentially until we have collected enough messages. back side such a method is that occasionally, active Discord instances will have to poll many different blocks in order to collect enough messages over time. In practice, it turned out that everything is in order, because for an active Discord instance there are usually enough messages in the first section, and most of them are.

The import of messages into Cassandra went smoothly and we were ready to try it out in production.

Heavy launch

Output new system production is always scary, so it's a good idea to test it without affecting users. We configured the system to duplicate read/write operations in MongoDB and Cassandra.

Immediately after the launch, the bug tracker showed errors that the author_id was zero. How can it be null? This is a required field!

Consistency in the end

Cassandra is a type system, that is, guaranteed integrity is sacrificed here for accessibility, which is what we wanted, in general. Cassandra discourages reading before writing (read operations are more expensive) and so all Cassandra does is an update and an upsert, even if only certain columns are provided. You can also write to any node and it will automatically resolve conflicts using "last write wins" semantics per column. So how does this affect us?

Race condition example edit/delete

In the event that a user was editing a post while another user was deleting the same post, we would end up with a row with completely missing data, except for the primary key and text, because Cassandra only records updates and inserts. There are two possible solutions for this problem:

Write back the whole message while editing the message. Then there is the possibility of resurrection deleted messages and adds chances of conflicts for concurrent entries in other columns.
Identify the corrupted message and remove it from the database.

We chose the second option by defining the required column (in this case author_id) and deleting the post if it is empty.

While solving this problem, we noticed that we were quite inefficient with write operations. Because Cassandra is ultimately consistent, it can't just go ahead and delete the data immediately. It needs to replicate deletions to other nodes, and this should be done even if the nodes are temporarily unavailable. Cassandra handles this by equating deletion with a peculiar form of record called “tombstone”. During the read operation, it simply skips over the "tombstones" that are encountered along the way. The lifetime of "tombstones" is configurable (by default, 10 days), and they are permanently removed during the compaction of the base, if the term has expired.

Deleting a column and writing zero to a column are exactly the same thing. In both cases, a "headstone" is created. Since all writes in Cassandra are updates and inserts, you create a tombstone even if you write zero initially. In practice, our full message layout consisted of 16 columns, but the average message only had 4 values set. We recorded 12 headstones in Cassandra, usually for no reason. The solution to the problem was simple: write only non-zero values to the database.

Performance

Cassandra is known to be faster on writes than reads, and this is exactly what we have observed. Write operations occurred in the interval less than a millisecond, and read operations - less than 5 milliseconds. These rates were observed regardless of the type of data accessed. Performance remained unchanged throughout the week of testing. No surprise, we got exactly what we expected.

Read / write delay, according to the data from the log

In keeping with fast, reliable read performance, here's an example of jumping to a year-old message in a channel with millions of messages:

big surprise

Everything went smoothly, so we rolled out Cassandra as our main database and took MongoDB out of service within a week. She continued to work flawlessly ... for about 6 months, until one day she stopped responding.

We noticed that Cassandra would stop continuously for 10 seconds during garbage collection, but we couldn't quite figure out why. We started digging and found a Discord channel that took 20 seconds to load. The culprit was the public Discord server of the Puzzles & Dragons subreddit. Since it is public, we joined to watch. To our surprise, there was only one message on the channel. At that moment, it became apparent that they had deleted millions of messages through our APIs, leaving only one message per channel.

If you've been reading carefully, remember how Cassandra handles deletions using "tombstones" (mentioned in the "Eventual Consistency" chapter). When a user loads this channel, even though there is only one message, Cassandra has to efficiently scan millions of tombstones of messages. Then it generates garbage faster than the JVM can collect it.

We solved this problem in the following way:

Decreased the lifespan of tombstones from 10 days to 2 days because we run a Cassandra repair (anti-entropy process) every evening on our message cluster.
Changed the request code to track empty blocks in the channel and avoid them in the future. This means that if the user initiates this request again, then in the worst case, Cassandra will only scan the very last block.

Future

IN this moment we have a 12 node cluster running with a rep factor of 3 and will continue to add more Cassandra nodes as needed. We believe this approach works in the long run, but as Discord grows, it looks like a distant future where we have to save billions of messages a day. Netflix and Apple run clusters with hundreds of nodes, so for now we have nothing to worry about. However, I would like to have a couple of ideas in reserve.

Near future

Upgrade our message cluster from Cassandra 2 to Cassandra 3. The new storage format in Cassandra 3 can reduce storage by more than 50%.
Newer versions of Cassandra are better at handling more data in each node. We currently store approximately 1 TB of compressed data in each of them. We think that it is safe to reduce the number of nodes in the cluster by increasing this limit to 2 TB.

distant future

Learn Scylla is a Cassandra-compatible DBMS written in C++. IN normal operation our Cassandra nodes actually consume a bit of CPU, however, during off-peak hours during Cassandra repair (anti-entropy process) they are quite CPU dependent, and the repair time increases depending on the amount of data written since the last repair. Scylla promises to significantly increase the speed of repairs.
Create a system for archiving unused feeds to Google Cloud Storage and uploading them back on demand. We want to avoid it and we don't think we have to do it.

Conclusion

More than a year has passed since the transition to Cassandra, and despite "big surprise" it was a calm swim. We've gone from over 100 million total messages to over 120 million messages per day while maintaining performance and stability.

Due to the success of this project, we have since migrated all of our other data in production to Cassandra, with success as well.

In the continuation of this article, we will explore how we perform full-text searches on billions of messages.

We still don't have dedicated DevOps engineers (only four backend engineers), so it's really cool to have a system that you don't have to worry about. We're hiring, so get in touch if these puzzles tickle your fancy.

Tags: Add tags

Often, users of text chats have a desire to delete all old correspondence, as it has lost its relevance and takes up extra space. Meanwhile, attempts to do this through the software interface do not allow this. How to be in this situation, and is it possible to carry out cleaning?

Chat concept

Chat in Discord refers to individual channels. They, in turn, according to the type of communication methods used, are divided into voice and text.

As base element work in Discord is a server created by the user. After that, it already has the ability to create an unlimited number of channels of both types.

Clearing a Chat in Discord

It must be said right away that the application does not provide users with tools for instantly deleting all messages accumulated in the channel. Deletion is possible only for individual entries (messages). However, there is one way to speed up the cleaning process. We are talking about such a term as "reaction", that is, the response of other participants in the conversation to the posted messages.

To delete all reactions available in the chat, the following procedure is provided:

Log in to the Discord program;
Select the channel of interest;
On the right side of the selected message, click on the "colon" icon;
Select "Remove All Reactions" from the drop-down menu.

Of course, in this way it is impossible to clean up all messages with one click, but this will significantly reduce the time required to carry out complete cleaning. The messages themselves will have to be deleted manually one by one.

This requires:

Login to the application;
Select the channel of interest;
Opposite each message, click on the "colon" icon;
Select the "Delete" option from the drop-down menu.

You can also delete it by highlighting the text of the message with the right key and then clicking on the proposed action option "Delete".

In the event that the need for a channel disappears, it can simply be deleted, and this will require only a few keystrokes. This requires the following set of steps:

Log into the application;
Select the channel of interest;
Go to the "Menu" section or right-click;
From the proposed list of actions, select the "Delete channel" option.

Thus, it is possible both to delete the channel itself and the records in it. Let the last procedure require certain losses of time.

About the program itself

The Discord program is designed for organizing voice communication online game players. Of course, the service also offers standard text communication capabilities, but it must be recognized that in terms of level functionality text channels of the application are inferior to messengers. It must be admitted that the Discord project is primarily aimed at the gaming audience.

After creating your own Discord server, many beginners may have various questions regarding server administration. One of these questions may be “How to delete all messages in Discord?”. This question can be asked for many reasons: from deleting one unnecessary message to clearing the entire database from prying eyes.

Delete a conversation in Discord

Unfortunately, the developers did not provide for the possibility of how to delete all messages in Discord at one time. But do not be upset! There are many methods to partially circumvent this prohibition.

Method 1: Delete a conversation by one message

This method is only suitable if you need to delete the correspondence only partially. As a large-scale cleaning option, this method is the worst, because it requires the expense of repeating actions many times.

Method 2: Delete all messages in Discord by deleting the chat

Delete correspondence with this method can be done fairly quickly. The only negative is that you will need to re-create the channel and make its settings. The main advantage - allows you to delete all messages in Discord stored on the selected channel.

Recall that officially Discord does not recognize such concepts as a group or chat. They are most often understood as a place where you can conduct group communication. Only in Discord such places are called a little differently - channels. In this case, channels can be of two types: text and voice. Many people prefer Discord for the reason that it allows participants to communicate in several text channels at the same time without leaving the voice channel.

Method 3: Delete a Discord Conversation Using Reactions

It is impossible to delete all messages in Discord using reactions, they alone help to administer some processes.

For example, there is a useful bot that allows you to add polls to the Discord server, consisting of up to nine answers. With the help of reactions, you can see which answer options were selected user defined and also remove them. Initially, reactions were intended for emoji emoticons.

Method 4: Delete messages from the last week

There is also a place to be a little radical way for clearing messages. However, it allows you to delete all messages in Discord from the last 7 days.

It is worth noting that this method is only suitable if you need to erase the messages of an individual user, and not the entire chat.

Finally, the banned user will be blocked from the server, that is, they will not be able to use the invitation link to the resource until they are unblocked by the creator or a member with administrator rights. How to unban a person describes the following.

Method 5: Mee6 Bot

Although Discord does not allow you to delete all messages, there is a way to delete up to 100 messages at a time. To do this, you just need to install an additional Mee6 bot. It is worth recognizing that this is the easiest and fastest method that allows you to delete correspondence.

In essence, Mee6 is a music bot, but it has several useful features.

The first step is to add the bot to your server. Follow this link .
Run command " Add to Discord».
Authorize the Mee6 bot.
Next, you need to select a server on which Mee6 will dominate.
In the next window, you can set the rights that will be available to the added bot.
After you click "Authorize", confirm that you are not a robot.
After adding the bot to the server, a window will open where you should set the configuration and parameters of the bot. For example, several plugins have been included. One of them is help.
For the changes to take effect, check the box and click on " update”, i.e. update.
To check the operation of the command, send a message! help on the server in any of the chats.
After that, a description of the various points of the plugins should come in a personal message from the bot.
It will be clear from Mee6's supporting post that the set of commands for deleting posts can contain the "Moderator" plugin. In the bot configuration editing window, click on the orange oval with the caption "Disabled" next to the "Moderator" section.
Confirm adding the plugin.
Note the inclusion of the!Clear command and save the changes with the "Update" operation. The !clear xx command allows you to delete up to 100 messages at a time, where you need to add the number of messages to delete instead of xx.
Send a message with the !clear 50 command.
This means that the last 50 messages will be deleted from the chat. The countdown of deleted messages is from the end of the correspondence, that is, first new messages are deleted, and only then old ones.

The algorithm of actions that allows you to delete correspondence through the interface mobile phone, is completely similar to the operations performed. That is why we will not illustrate them.

Discord has become a meeting place for gamers all over the internet. No matter what game you play, Discord has a community for it. Considering the number of gamers around the world who use this program, it's no surprise that Discord bots are becoming more and more popular.

Private servers can use bots to play with friends, while public servers can use them to moderate users. Fully functional music bots are available for both types of servers, so gamers can always listen to music while playing. With so many various types bots, you might find it hard to find one that's right for you. Today we would like to talk about the most popular bots, and we hope that this information will help you decide whether to add this or that bot to your channel or not.

Music bots

Any server can use music bots. In fact, music bots are so popular that there's a whole section dedicated to them on the Discord bots page. There are a lot of music bots out there, but two of the best are Erisbot and Pancake.

Erisbot is intuitive and easy to use. The program works like music player, and includes many features similar to the iPod. You can even use the program standalone. In addition, it has funny pictures and a built-in urban slang dictionary search.

Pancake is a little different and only offers a few features. It works like a music bot similar to ErisBot, albeit with fewer settings and options. However, it also has some images, games, and moderation options. It does a bit of everything and is a great tool to use if you want one feature rich bot and don't want to add many different ones.

Moderator bots

In addition to Pancake, there are many moderation bots. Auto moderation can be very useful when managing a public Discord server, especially if you want to keep the language and content of the chats in order.

Like Pancake, Dyno is a feature rich bot. Like Pancake, it can play music, but its main function is moderation. Unlike most bots, Dyno has automatic moderation and welcome messages, so it can manage an entire channel without the need for moderators to be in the chat all the time.

While Dyno is a feature rich bot, Auttaja is designed almost exclusively for moderation. At the same time, the potential of Auttaja is huge. In addition to automatically moderating chats, the bot can also moderate systematic processes such as usernames and punishments.

Game / entertainment bots

Some of the most popular bots are interactive gaming or entertainment bots. The two most popular bots in this category are Pokecord and Dank Memer.

Pokecord is a cute interactive game that basically creates a Pokemon text adventure in your channel. It's a fun and easy way to get your Discord members involved in the game.

is a stupid bot that will only allow your mind to have some fun. It allows you to create memes and share them, and also has some other fun in its functionality. However, it is primarily designed to create memes from popular online images.

Can't find the right bot - create your own!

While there are hundreds of Discord bots, each with their own unique settings and wide range of commands, you may not find one that is tailored to your specific needs. Or maybe you just want to have full customization or more limited access to the bot. In any case, there are several ways to make your own bot, but we will focus on the simplest and most effective way:

The first thing you need to do is log into Discord, go to the applications page and click "Create an application".

Then add an app name (and an avatar if you want), and click "Bot" in the panel on the left side. In the same place, click “Add bot” (“Add bot”). You will then see a pop-up window asking if you want to continue, click "Yes".

From here you can specify your specific powers. This can be customized according to your preferences. Under the bot's username, a section called "Token" should now appear. Below it, click on the “Click to reveal token” link. Copy the token code, you will need it later

Then click "OAuth2" on the left side. Here you must specify which program you are creating. Click "Bot" and then copy the URL that will be displayed. You will be taken to a page where you can add the bot to any of the servers you manage. Next, select the server you want to add the bot to.

Here you will need some kind of coding program and some knowledge to make full use of the bot. To activate the bot, you will need text editor, such as NotePad, and a coding tool, such as JavaScript. You will need to take the token you got earlier and save it as a NotePad document to a folder for your bot. This document should be named "auth.json" and should be written like this:

token:

You'll need to create two more files to run your bot. One should be saved as package.json with the following code:

"name": "greeter-bot",

"version": "1.0.0",

“description”: “”,

“main”: “bot.js”,

"author": "",

dependencies: ()

For the last code, create a file and name it “bot.js.” Here you should detail the main functions of the bot. It is preferable if you have some coding knowledge and skills so that you can create a bot with the feature set you need. However, the Medium site provides a simple Discord bot code. Thus, if you are not familiar with JavaScript, you can use the following code to create a bot with a simple set of features.

var Discord = require('discord.io'); var logger = require('winston'); var auth = require('./auth.json'); // Configure logger settings logger.remove(logger.transports.Console); logger.add(logger.transports.Console, ( colorize: true )); logger.level = 'debug'; // Initialize Discord Bot var bot = new Discord.Client(( token: auth.token, autorun: true )); bot.on('ready', function (evt) ( logger.info('Connected'); logger.info('Logged in as: '); logger.info(bot.username + ' - (' + bot.id + ')'); )); bot.on('message', function (user, userID, channelID, message, evt) ( // Our bot needs to know if it will execute a command // It will listen for messages that will start with `!` if ( message.substring(0, 1) == '!') ( var args = message.substring(1).split(' '); var cmd = args;

args = args.splice(1); switch(cmd) ( // !ping case 'ping': bot.sendMessage(( to: channelID, message: 'Pong!' )); break; // Just add any case commands if you want to.. ) ) ) );

Finally, you can run this code by opening JavaScript (or any other coding program of your choice) and typing "npm install discord.io winston --save" - which will install all the programs you need to run the bot. Then type "node bot.js" every time you want to run the bot.

Since this process requires some coding knowledge and there are hundreds of bots already created, you should consider them before trying to create your own. Of course, you might be tempted to just create your own right away when there are plenty of existing bots out there that can do exactly what you need.