Icseon

Building an LLM-based text translator

Icseon — Thu, 14 Dec 2023 13:58:34 GMT

Contrary to popular belief, AI does not exist. They're LLMs. I've corrected the inaccuracy.

These days, we hear all about AI/LLMs and how useful they can be. There's one thing that LLMs do exceptionally well: translating text!

I wanted to create my own translation service because I did not want to rely on any external service for providing translations. This can be really costly depending on the volume of requests I generate.

So that's when my experiment began. Initially, I was planning to make an automatic message translator for Matokai, but decided I must start simple and decided to create a Google Translator-like service instead, for the time being.

Planning out what I have to build

For the server portion, I am aiming to use LLAMA which provides bindings for Python combined with Bottle for the web-server, so I can easily create requests from the client to the server.

As for the client, I will be using Vue alongside TypeScript to build a basic interface where the user can provide inputs for the translation endpoint.

Why Bottle?

Bottle is extremely light weight and is perfect for this service that will really only have one single endpoint. I like using Bottle for integrations such as rendering 3D models and connecting AI with the web like I am doing today in this post.

Since it's so light weight, it's very quick and easy to get started with it. But honestly, Bottle can also be used for large scale APIs using other integrations and ORMs to manage data, but that is out of the scope of this post.

Bottom line: It's simple, small and nice.

Building the front end

Now that we have a general idea of what we want to build, let's start by creating a simple Vue component where the user is greeted with a simple form where they can

Provide what language they wish to translate to - this can be any language, so I am opting to make this a text field instead, as the AI will understand what they mean in most cases.
Type text in any language they desire, as the AI will attempt to automatically detect what language their input is.
See a readonly text area where they can see the result of the translation.

So after quickly designing a logo in Figma, coming up with a very fitting name and writing some text, this is what I came up with:

It is an extremely simple form where the user can specify all the required information as outlined earlier. Perfect for our use case.

But let's make it a little nicer. Because we're using Vue, we can use its reactivity system to dynamically render elements based on a condition. So let's make the text areas and translate button not render at all until the user has given an input.

To do this, we must declare a variable within our component using Refs. But we must also do this to know the language in the first place, so let's go ahead and declare everything we need within an interface:

/**
 * Represents the properties expected by the AppComponent.
 */
interface IAppComponent {

    /**
     * A reactive reference to indicate whether the network is currently processing.
     * We use this to give the user feedback through the user interface.
     */
    isNetworkProcessing: Ref,

    /**
     * A reactive reference to indicate whether the network request has failed due to an error.
     */
    hasErrorOccurred: Ref,

    /**
     * A reactive reference to store the target language for translation.
     * This is entered by the user through the user interface.
     */
    targetLanguage: Ref,

    /**
     * A reactive reference to store the text the user wishes to translate.
     * This is also entered by the user through the user interface.
     */
    contentText: Ref,

    /**
     * A reactive reference to store the translated text returned from the server.
     */
    responseContent: Ref,

    /**
     * Performs translation by requesting a translation from the server.
     * @returns A Promise that resolves when the translation is complete.
     */
    translate: () => Promise;
}

Let's break this down:

isNetworkProcessing - This is a simple boolean I will be using to dictate whether the translation request is pending so I can update the user interface accordingly to give the user feedback and to show a loading spinner.
hasErrorOccurred - This is yet another boolean that I will be using when the network request has failed for any reason. This could range from the API being unavailable to the user not having an internet connection. It will be used to show a basic error.
targetLanguage - This is the language we will be translating into and this is the user input provided by our user. We will send this to the server and we will be using this to hide the rest of the form if the length of this string is 0.
contentText - This is the content provided by our user that we will be passing to the AI to recognize (it will detect what language it is) and translate.
responseContent - Once the server has translated the content provided by our user, it will have to be stored somewhere - this is it.
translate - This method is the function that will send the request to the API and fetch its response - the entire logic of the application relies on this one function.

Now that we have all the variables we needf for the application to function, let's start making sure the form is not rendered at all if the string length of targetLanguage is 0 (empty).

We can easily achieve this by using the v-if directive that Vue provides:

Great! Now the rest of the form will not render until we know the language we are going to translate into, so this is what we are left with:

Nice! Now the user has to type in a language before they can start giving the text they wish to translate. This one simple thing will guide the user through the steps without having to explain anything to them.

Sending a request to the server

Now that we have built a form and added some nice-to-have functionality to it, it is time to start building the payload that will be received by the back-end server. Before we do this, let's collect what we need to perform a translation:

contentText - This is the content provided by the user and what we are going to be translating.
targetLanguage - This is the language we will be translating into.

That's all we need - we don't need to know the language we are translating from because the AI will do its best to know that.

So with that in mind, let's establish that our payload is going to be formatted as such:

{
	"content": ,
	"to": 
}

Remember the translate method I mentioned earlier? We'll be implementing this logic in there, using the native fetch API that all major browsers currently support.

/**
     * Performs translation by requesting a translation from the server.
     */
    const translate = async(): Promise => {

        /* Set network processing state and error state. */
        isNetworkProcessing.value = true;
        hasErrorOccurred.value = false;

        await fetch("http://localhost:8081", {
            method: "POST",
            headers: {
                "Content-Type": "application/json"
            },
            body: JSON.stringify({
                content: contentText.value,
                to: targetLanguage.value
            })
        }).then(async(response) => {

            /* Retrieve json from the response */
            const json = await response.json();

            /* Set content */
            responseContent.value = json.content;

        }).catch(() => {

            /* Set error state */
            hasErrorOccurred.value = true;

        }).finally(() => {

            /* Update network processing state again */
            isNetworkProcessing.value = false;

        });

    }

Let's break down what happens in here.

isNetworkProcessing is set to true to indicate that a request is currently pending for the purpose of updating the user interface to reflect this.
hasErrorOccurred is set to false in case it was set to true as a result of a previous request, we're trying again, so nothing has gone wrong just yet.
A fetch call is made to http://localhost:8001 where I intend to host my Bottle server that will handle the translation request for us with the body in the format mentioned earlier.
Once we receive a response, we retrieve the JSON content of it and set responseContent to be the response's content value.
If an error occurrs, we update hasErrorOccurred to be true for the purpose of telling the user something went wrong.
In any case, regardless of state, we update isNetworkProcessing to be false again as the request has finished.

We are now sending a request to a server that does not exist yet. So, let's go ahead and build the server now!

Building the server

We have completed our client for the purpose of translating text, we just have to go ahead and process this now using a server. Like I have mentioned before, I will be using Bottle because of its simplicity. So let's create a basic Bottle server.

from bottle import run

if __name__ == "__main__":
	run(host="localhost", port=8081, debug=True)

We now have a basic Bottle server that returns 404 for all routes - because we haven't created any yet! That's how simple Bottle is.

Allowing origins (CORS)

If you have ever developed an API like this before, you will know that all requests imposed from the client we have developed earlier will fail due to CORS. That's okay, easy fix. We'll be using the bottle_cors_plugin Python package to deal with this for us.

Our code now becomes this:

from bottle import app, run
from bottle_cors_plugin import cors_plugin


# Configure Bottle server.
app = app()
app.install(cors_plugin("*"))  # CORS - allow all origins.

if __name__ == "__main__":
	run(host="localhost", port=8081, debug=True)

We are now allowing all origins. This is incredibly insecure - if you wish to deploy this to production, you must have an environment variable using something like dotenv to control this value, but for the purposes of making a fun project that will not be deployed and will only run on our local machine - this is fine.

To give an example of a secure implementation:

app.install(cors_plugin("translator.icseon.com"))  # CORS - allow requests to only come from translator.icseon.com

Processing JSON

As established earlier, our payload is in the JSON format. However, our server does not understand this yet and will not be able to read request.json just yet. Adding support for this is relatively simple, using a hook.

Let's create this hook. Our code now looks like this:

from bottle import app, run, hook, response
from bottle_cors_plugin import cors_plugin


@hook("before_request")
def set_default_content_type() -> None:

	"""
	Sets the default content type, so we can read JSON bodies from clients.
    :return None:
    """
    response.content_type = "application/json"


# Configure Bottle server.
app = app()
app.install(cors_plugin("*"))  # CORS - allow all origins.

if __name__ == "__main__":
	run(host="localhost", port=8081, debug=True)

Congratulations, we can now work with JSON using our Bottle server.

Creating a route for translation and implementing the AI

Now that we have a basic server setup and working, we will proceed to create a new controller to handle the translation logic. Let's start by creating a new file called translate_controller.py and creating a method called translate, which will look something like this:

from bottle import request


def translate(body: request) -> str:
	return "This is a response from the translate controller"

Let's go back to the main file and create a route for this controller and import the controller.

from bottle import app, run, hook, response, route
from bottle_cors_plugin import cors_plugin
from translate_controller import translate


@hook("before_request")
def set_default_content_type() -> None:

	"""
	Sets the default content type, so we can read JSON bodies from clients.
    :return None:
    """
    response.content_type = "application/json"
    

@route("/", method="POST")
def translate_route() -> str:


	"""
    Performs the translation by executing the translation controller.
    :return str:
    """
    return translate(body=request)
    
  
# Configure Bottle server.
app = app()
app.install(cors_plugin("*"))  # CORS - allow all origins.

if __name__ == "__main__":
	run(host="localhost", port=8081, debug=True)

We have now successfully defined a route for translation requests on path / and we will only accept POST requests through - other methods will be given a 405 response.

Let's go back to the translate_controller.py file and start implementing the AI. As mentioned earlier, I will be using the Llama Python bindings for this.

from bottle import request
from llama_cpp import Llama


# Intialize the LLM.
llm = Llama(
	model_path="./models/.gguf",
    n_ctx=4096,
    chat_format=""
)


def translate(body: request) -> str:

	"""
    Performs the translation by prompting LLM to translate for us.
    :param body:
    :return str:
    """
    
    print(llm)
    
    return "This is a response from the translate controller"

We have now initialized the large language model properly. However, you may notice I am unable to provide the model_path and chat_format to you - that's because you must have your own model and each model may have a different chat_format.

Now that that's out of the way, we must give a prompt to the AI. From some experience, I know that doing it manually will be ugly, so let's create a PromptBuilder class to deal with this for us. Start by creating a new file called prompt_builder.py (for example).

import json


class PromptBuilder:

	def __init__(self, to: str, content: str) -> None:
    
    	self.to = to
        self.content = content
        
        # Build request dictionary.
        self.request = self.__dict__
        
    def build(self) -> str:
    
    	"""
        Builds the prompt that we will be passing to the LLM.
        :return str:
        """
        # Give an example for the AI to work with.
        response_example = {
        	"to": None,
            "from": "",
            "content": ""
        }
        
        # Build, format and return the built prompt.
        return "You are a translation assistant that translates messages and responds with JSON like: {0}" \
               "You are translating the JSON string to {1} - ensure the response 'to' key is also that." \
               "Request: {2}".format(json.dumps(response_example), self.to, json.dumps(self.request))

What this will do is format promps that look like the following:

You are a translation assistant that translates messages and responds with JSON like: {"to": null, "from": "", "content": ""}You are translating the JSON string to nl - ensure the response 'to' key is also that.Request: {"to": "nl", "content": "Hello, I am bored."}

This informs the AI of its purpose and instructs it on what to do with the request given to it.

Now that we have a prompt, let's complete the implementation by importing the newly created PromptBuilder class into the translation controller.

from bottle import request
from llama_cpp import Llama
from prompt_builder import PromptBuilder
import json


# Intialize the LLM.
llm = Llama(
	model_path="./models/.gguf",
    n_ctx=4096,
    chat_format=""
)


def translate(body: request) -> str:

	"""
    Performs the translation by prompting LLM to translate for us.
    :param body:
    :return str:
    """
    # Initialize the prompt builder.
    prompt_builder = PromptBuilder(
		to=body.json.get("to"),
        content=body.json.get("content")
    )
    
    # Ask the LLm for an answer.
    answer = llm.create_chat_completion(
		messages=[
        	{
				"role": "user",
                "content": prompt_builder.build()
            }
        ]
    )
    
    # The AI may respond with "Result: ", so let's account for that.
    response = answer["choices"][0]["message"]["content"]
    result = "{" + response.split("{", 1)[1] if "{" in response else response
    return result

Awesome. We have now fully implemented the LLM and created an AI to expose it to the web.

The result

Now that we have built both the client and server, let's look at what we've built and see it in action...

Pretty cool!

Thanks for reading.

You have now read how I utilized AI to power a text translation service for me in as much detail as possible.

I hope you have learnt something or otherwise found entertainment in reading about this.

— Icseon

Voice Activity Detection with WebRTC

Icseon — Mon, 17 Apr 2023 12:35:00 GMT

Note: I wrote this back when one of my core principles was to do as much as possible on the serverside. I've since altered my way of thinking and believe it would make more sense to send VAD detections from the client to the server, which then broadcasts that.

In theory this would let clients send fake states, but they're also the ones sending us audio in the first place. This is a small thing I'm willing to let happen, since VAD detection is still pretty expensive to do on scale.

Demonstrating voice activity detection

While implementing voice calls for Matokai, I needed a way to know which user is speaking. WebRTC does not handle this by default so it was time to get creative again.

What I needed to achieve

To implement voice activity detection properly, I needed to have a way to:

send data other than video and audio over RTC
access an audio track’s PCM data that we receive from a peer
identify whether a PCM frame contains speech
broadcast a packet to all connected peers to let them know who is speaking

Sending packets using an unordered/unreliable data channel

In WebRTC, there are two types of data channels; an ordered and an unordered one. While RTC works entirely on UDP, an ordered data channel ensures that data is received like TCP would do at the cost of latency and performance.

For our use case, we’ll need an unordered data channel because it’s okay if some packets never arrive, and we can use the performance, so we can rest assured that the activity packet is relatively lined up with audio data.

Creating an unordered data channel

Start by creating an unordered data channel on the client side like so:

/* Build an unordered data channel */
const unorderedDataChannel = this.peerConnection.createDataChannel('channel', {
    ordered: false
});

Receiving data channels from clients

On the other end, we will receive this data channel during SDP negotiation. We can retrieve it as such:

/**
 * @author Icseon
 * @description This callback is invoked once a peer sends a data channel to us
 * @param channel
 */
peerConnection.ondatachannel = async({ dataChannel }) => {
  
  /* Tell ourselves about the fact we have received a data channel from a client */
  debug(`received a data channel labeled: ${dataChannel.label}`);
    
  /* If this is the data channel we expect, register it for our peer somehow. That's up to you. */
  if (dataChannel.label === 'channel')
  {
    
      /* Add the dataChannel to our peerConnection somehow for later access */
      peerConnection.someClass.dataChannel = channel;
        
  }
  
};

Listening for packets from the server

Now we have a one-sided communication channel between the client and the server (server to client only), but we are not yet handling any packets on the client. Listen to packets from the server like this:

/* Listen for data from the server. */
unorderedDataChannel.onmessage = (data) => {
    
    /* We are going to do something with this data later on. For now, let's do a simple console.log */
    console.log(data);
    
});

We can now send data to the client that is not audio or video which means we can send voice activity packets later!

Accessing the audio tracks

I am assuming that you have sent your media streams over RTC before creating your connection. If not, go back and implement that first.

Receiving media tracks from clients

Let’s start simple and create a way to echo back audio data to our client. Make sure to check if we are dealing with an audio track because clients can also send video tracks. We are going to expand on this very soon:

/**
 * @author Icseon
 * @description This callback is invoked once a peer sends a track to us
 * @param channel
 */
peerConnection.addEventListener('track', async({ track, streams }) => {

    /* Let's know what we have received */
    debug(`got track of kind: ${track.kind}`);
    
    /* Check to see if we got an audio track */
    if (track.kind === 'audio')
    {
    
        /* Add a transceiver to our peer connection which will transmit audio data back to our client */
        peerConnection.addTransceiver(track, {
            direction: 'sendonly',
            streams
        });
    
    }

});

WebRTC is now sending back your own audio. However, you can’t hear yourself yet. We’ll need to handle tracks on the client side as well and playback the media stream on the client side after receiving.

Receiving media tracks from the server

That is done by listening for an audio track, exactly the same way we have just done on the server side - except we also add a new Audio element.

/**
 * @author Icseon
 * @description Process incoming tracks
 * @param RTCTrackEvent
 */
this.peerConnection.ontrack = (RTCTrackEvent) => {

    /* Are we dealing with an audio track? */
    if (RTCTrackEvent.track.kind === 'audio')
    {
    
        /* Create a new audio element and begin playing the media stream */
        const audioElement = document.createElement('audio');
        audioElement.srcObject = RTCTrackEvent.streams[0]; /* A track may contain many streams - we only care about the first one */
        audioElement.play();
    
    }

}

After handling the ontrack event on the client side, we should be able to hear ourselves! We are not listening to our own microphone directly, rather, we are listening to our microphone through WebRTC.

Reading PCM audio data using RTCAudioSink

We are now handling audio data from client peers on the server side. However, as it stands, we do not have a way to access PCM data yet.

To begin receiving PCM audio data from a remote audio track, we are going to be using the non-standard RTCAudioSink component WebRTC provides. This component will allow us to very easily access raw PCM data from any audio track.

/* Construct a new RTCAudioSink using the audio track we have received */
const audioSink = new RTCAudioSink(track);

/* Handle audio data */
audioSink.ondata = (data) => {

    /* Read PCM data from the samples */
    const pcm = data.samples;
    
    /* This will spam your console every 10ms with raw PCM data. We now have access to PCM audio data! */
    console.log(pcm);

}

At this point, we have successfully implemented a way to receive raw PCM audio data from an RTC peer and can start to use this data to see if there is speech in it.

Using VAD to detect speech#

Installing & Initializing VAD

We now have the ability to access raw PCM audio frames and can use this alongside VAD to detect if the audio frame contains speech. For this, we can use the @ozymandiasthegreat/vad npm package. Let’s start by constructing VAD:

/* Retrieve VAD through the VadBuilder */
const VAD = await VADBuilder();
const vad = new VAD(VADMode.VERY_AGGRESSIVE, 48000); /* WebRTC has a sample rate of 48000 */

Using VAD to detect voice activity

Right now, we have access to VAD and can start using it to detect speech in audio frames. We can do this by using the processFrame method VAD provides. Let’s go back to our RTCAudioSink and add the logic required for identifying speech.

/* Construct a new RTCAudioSink using the audio track we have received */
const audioSink = new RTCAudioSink(track);

/* Handle audio data */
audioSink.ondata = (data) => {

    /* Read PCM data from the samples */
    const pcm = data.samples;
    
    /* Determine if the PCM data contains speech */
    const vadResult = vad.processFrame(pcm);
    
    /* If the vadResult indicates we have speech, log a message to the console indicating such */
    if (vadResult === VADEvent.VOICE)
    {
        console.log('speech detected!');
    }

}

Awesome. We now have a way to detect speech from audio. We’re almost there, we only need to notify all peers that somebody is speaking.

Sending packets to all peers to notify them of voice activity

Note: To not overcomplicate this too much, we are going to be using a simple array of WebRTCPeerConnection instances. I’ll assume this array is named peers.

Defining the voice activity packet

A clean approach of building a packet in my personal opinion is abstracting the structure away in a class. Let’s start by building the VoiceActivityPacket class which we are going to be sending to all peers.

export default class VoiceActivityPacket {

    /**
    * @author Icseon
    * @description VoiceActivityPacket constructor
    * @param username
    */
    constructor(username)
    {
        
        /* For easy packet identification, I am choosing to add the packet type in the constructor */
        this.packetId = 'VoiceActivity';
        
        /* We really just need to know who is speaking. That's all. */
        this.username = username;
        
    }

}

Broadcasting the voice activity packet

Now that we have defined the voice activity packet, we can start sending it to all peers and handle it. Let’s start by sending it to everyone:

/* Construct a new RTCAudioSink using the audio track we have received */
const audioSink = new RTCAudioSink(track);

/* Handle audio data */
audioSink.ondata = (data) => {

    /* Read PCM data from the samples */
    const pcm = data.samples;
    
    /* Determine if the PCM data contains speech */
    const vadResult = vad.processFrame(pcm);
    
    /* If the vadResult indicates we have speech, log a message to the console indicating such */
    if (vadResult === VADEvent.VOICE)
    {
    
        /* Build the voice activity packet */
        const packet = new VoiceActivityPacket(peerConnection.someClass.username); /* You need to deal with authentication somehow, I'll assume the username is accessible like this. */
        
        /* Loop through every peer in the peers array */
        peers.forEach((peer) => {
        
            /* We can only send arrayBuffers, blobs and strings. That's why JSON.stringify() is required */
            peer.someClass.dataChannel.send(JSON.stringify(packet));
        
        });
    }

}

We are now sending the voice activity packet to all peers. Obviously, this is a very primitive way of broadcasting packets but for demonstration purposes it should suffice. All that’s left to be done is handle the packet on the client side.

Handling the voice activity packet

The voice activity packet is now being sent and received to clients. It’s time to start handling it.

/* Listen for data from the server. */
unorderedDataChannel.onmessage = (data) => {

    /* Parse JSON */
    data = JSON.parse(data);
    
    /* Determine what packet we have received */
    switch(data.packetId)
    {
        
        /* Handle voice activity packets */
        case 'VoiceActivity':
            
            /* You can handle this in any way you'd like. In this post, we are just going to log who is speaking. */
            console.log(`${data.username} is speaking!`);
            break;
    }
    
});

In this snippet, we are checking the packetId of the data we receive and handling the VoiceActivityPacket by logging the username of the speaker. You may do anything with this information like invoking a UI transition to clarify that a participant is speaking.

That’s a wrap!

You have now read how I deal with voice activity detection with WebRTC. I left out a lot of implementation specific details because I do not know your use case - if you’re going to use this knowledge then you should apply it in the scope of your project.

Keep in mind that SDP negotiation is required after adding a new track/transceiver to peers and that it needs to be handled through your signaling server(s) accordingly.

Thank you for reading my post, and I hope that this helps someone out. I’ll be writing more technical posts like this one in the near future as I have more to write about.

— Icseon

Some thoughts

Icseon — Fri, 21 Oct 2022 12:48:00 GMT

Up until this point, I’ve been using the traditional means of creating an application on the web.

That being the utilization of the HTTP protocol and using either Ajax or standard means to transmit data to the server.

I have been considering an alternative which would enable me to:

Transmit data in realtime, with little to no latency between your peer and the server
Have the ability to transmit data from client to client, with a relay sitting in between for secret messaging without exposing IP addresses to the peers and without exposing any data to us, except encrypted and rather useless data.
Reduce loading times by at least a hundred times from what I currently have
Follow a more traditional means of application development

Naturally, my eyes went to WebSockets. It would follow a similar strategy as the one used for T-Bot Rewritten. The thing is, I’d have to implement these things too:

A room system, allowing me to only emit packets to those within scope
A session handler, purely just for connections
A ping-pong system to ensure that we are not talking to a dead link for too long - connections will die

Why would I need to bother with any of that, if all of this has already been solved? Enter: socket.io

socket.io

All the things that I had mentioned above are already solved here. Its usage would be similar as using Express and almost everything that I do in Express can be done with socket.io as well.

In all, this is my next choice. This is the step I am taking for all my future personal projects. It would allow me to develop things that were once either incredibly hard or impossible to implement without adding another networking layer on top of what I already had, which I am naturally against. It’s either one, but never extend one to two. You don’t want to repeat yourself.

Handling of sessions

Historically, I’ve always been writing my own session handlers for full control. Really, I don’t have to. I can just utilise express-session along with connect-redis to have the exact same functionality, but overwrite the method that is used to generate session IDs so that all sessions can easily be retrieved for users, voiding the need to even have a session store at all. This sounds like the right approach.

You might think that I’d need to use Express to use the express-session component. I understand you might think so, but that’s totally and completely wrong. I’d just have to slightly modify it so that it no longer uses cookies and that’ll be that!

Goodbye cookies

Since sessions will no longer be using cookies, I will no longer have to have any cookies in the next thing I build. This will void the need for a lot of hassle on the legal end and best of all, we would not need a cookie banner telling users that we have cookies like 99% of other platforms on planet Earth which is in my opinion, a tad annoying.

Session tokens will then be stored in LocalStorage, ready to be transmitted over the link once a connection is created to the remote endpoint. A perfect solution, I’d say.

Handling denial of service attacks

If you create something, there will always be one to try to destroy it through any means possible. That is true for almost every single online project in existence, and I am certain that whatever I create will not be exempt from this rule. Traditionally, I’d only need to bother with limit_req to throttle the number of requests that can be made, but since that all data will now be sent over the socket, this no longer becomes possible.

The solution fortunately, is rather simple. Instead of relying on limit_req, implement a rate limit for actions such as creating an account, sending a message, etc - on the application level. Sure, you’d still need to open a connection, but we can easily limit that through nginx with little to no overhead.

Querying the data source

Up until now I’ve been manually writing SQL queries to query for data from the database. I wonder what’s been going through my head all this time. I’m going to be using an ORM that will make my life a lot easier going forward.

Closing words

These are only a few of my ideas that I’ll be using moving forward. I am posting this online to get feedback on these ideas and maybe create a better strategy although I do not expect much to change.

Thanks for reading!

Keeping players in sync

Icseon — Sun, 15 May 2022 12:57:00 GMT

Note: This post is quite old. While a large part of this post is still true, I would approach things differently now.

The goal of multiplayer games more often than not is to synchronize state between all peers where, preferably, the server is authoritative

A small introduction

As some of you may know, I am the developer of Cubash. Cubash was an online game where users could interact with each-other and customise their character to their likings and make new friends.

Cubash had a game client in development but unfortunately, never came to see the light of day.

However, as of recent I became interested to play around with the client, and I started rebuilding the client from the ground up.

One of the subjects I find fascinating throughout all the projects I have ever worked on is networking and that’s exactly what we’re going to be talking about today.

This post will aim to cover the bare basics and will leave out a lot and focus only on the networking side of things.

Listening for client connections

For this project, I have settled with using the Godot Engine which uses the ENet networking library that offers a high level interface for using the UDP and TCP protocols.

Before we can start sending RPCs to peers, we need a server for clients to connect to:

extends Node

var peer:NetworkedMultiplayerENet

# Simple method to start a server using ENet
func _host(max_clients:int = 16, port:int = 22000, in_bandwidth:int = 2457600, out_bandwidth:int = 2457600):

    # We need the SceneTree singleton so we can register signals and register the network peer
    var tree = get_tree()
    
    # Register signals
    tree.connect("network_peer_connected", self, "_network_peer_connected")
    tree.connect("network_peer_disconnected", self, "_network_peer_disconnected")
    
    # Create a new instance of NetworkedMultiplayerENet so we can start listening for connections
    self.peer = NetworkedMultiplayerENet.new()
    self.peer.compression_mode = NetworkedMultiplayerENet.COMPRESS_ZLIB
    self.peer.create_server(port, max_clients, in_bandwidth, out_bandwidth)
    
    # Finally we attach the network peer to the SceneTree singleton
    tree.set_network_peer(self.peer)
    tree.set_meta("network_peer", self.peer)

# Method that is called once the script is ready
func _ready():

    # Start server
    _host()

Once our script is ready, we invoke the _host() method that will initialize a NetworkedMultiplayerENet instance that we can use for hosting our game server.

Now we have a game server ready to happily accept connections from clients!

Connecting to the server

What would a server be without a client to serve? Let’s make our client connect to our server. For that, we’ll need to create a script that handles all the networking for the client.

It will look quite similar to the script we have just made for creating a server except now instead of creating a server, we connect to (hopefully) a listening server.

extends Node

var peer:NetworkedMultiplayerENet

func _connect(address:String = "127.0.0.1", port:int = 22000):

    # Just like before, we need the SceneTree singleton for the exact same purpose
    var tree = get_tree()
    
    # Register the relevant signals
    tree.connect("connected_to_server", self, "_connected_to_server")
    tree.connect("connection_failed", self, "_connection_failed")
    tree.connect("server_disconnected", self, "_server_disconnected")
    
    # Attempt to connect to the server. Similar to before, we create a new instance of NetworkedMultiplayerENet
    self.peer = NetworkedMultiplayerENet.new()
	self.peer.compression_mode = NetworkedMultiplayerENet.COMPRESS_ZLIB
	self.peer.create_client(address, port) # Note: create_client() instead of create_server()
	
	# Just like before, we do the attaching of the network peer to our SceneTree singleton
	tree.set_network_peer(self.peer)
    tree.set_meta("network_peer", self.peer)
    
    
func _ready():

    # Connect to the server once we're ready to do so
    _connect()

A word on signals

Remember the signals we registered with tree.connect()?

They can help us make our network logic, but before we start doing that, let’s explain what they mean:

Server side signals

network_peer_connected
Invoked when a new client peer connects to our server.
We can use this signal to register network ownership of the Player node (or better known as their character, the only thing they should be in control terms of security).

network_peer_disconnected
Invoked when a client loses connection to our server or otherwise closes the socket. We should handle removal of client information once this is called.

Client side signals

connected_to_server
Invoked when we have successfully connected to a game server.
We could use this signal to claim ownership of our local Player and to get the game going.

connection_failed
Invoked when we couldn’t connect to a server after a time out occurs. We should let our user know that connection has failed.

server_disconnected
Invoked when the server disconnects you for any reason, like cheating or when the server closes. We should let the user know that the connection has closed down.

Making network code

Now that we know what every signal means and what it should do, let’s start writing some basic network code.

Creating a new player on the server side

Once our player has connected to our server, we need to create a new Player node for them. This will be their character that they can move around. Let’s go back to the script we wrote to host a server and implement that logic:

func _network_peer_connected(peer_id:int):
    
    # Create a new instance of the player Node
    var player = load("res://scenes/player.tscn").instance()
    
    # Set player name
    player.set_name(String(peer_id))
    
    # Give the client ownership of the player Node
    player.set_network_master(peer_id)
    
    # Insert the new Player Node to our game Node
    $"/root/Game/Players".add_child(player)
    
    # Let the client(s) know about this. I'll explain this very soon!
    rpc("new_player", peer_id)

Now, there’s a new Player node reserved for the client that just connected to our game.

Creating a new player on the client side

We have successfully created a new player for our client, but the client is not aware of this at all yet!

We need a way to tell all clients (including the one who just connected) about the creation of a new player. One way that I think is good is to have an RPC that sends over the client peer ID to all the peers, so that the clients can repeat what we just did.

Let’s go back to the script we made to connect to a server and implement the new_player RPC:

remotesync func new_player(peer_id:int):

    # Like on the server, we create a new instance of the player Node
    var player = load("res://scenes/player.tscn").instance()
    
    # Set player name
    player.set_name(String(peer_id))
    
    # Give network ownership to the peer we have received
    player.set_network_master(peer_id)
    
    # Finally, we add the player to the game Node
    $"/root/Game/Players".add_child(player)

That’s it! Our clients are now automatically aware of any client that joins the server and creates a character for the player, including themselves.

Synchronizing player movement

Inside the player Node movement script, we have to add additional logic, so we can send the Body velocity over the network and thus, let all other peers (including the server) be aware of where we are.

Why the server you ask? We want our movement to be secure, so we simulate the movement on the server side to check if the client is doing anything that’s not supposed to happen.

Clientside

I assume you have already written your player movement logic script. If not, why are you adding networking to it now?

Moving to the player movement logic script, we add additional logic after we have calculated the velocity that we would be applying to our own player to make movement happen.

We also want to add a check to the Input handler that ensures that we only control our own player Node:

onready var game = $"/root".get_node("Game")

func _physics_process(delta:float = 0):

    # This variable contains the central force vector that will be applied to the Node
    var force = Vector3()
    
    # This variable contains the direction vector that is calculated by the Input handler
    var direction = Vector3()
    
    # Check if we are in control of this player.
    # Otherwise do no calculation here.
    if is_network_master():
    
        # Handle your input here using Input.is_action_pressed() - I leave that up to you.
        direction = Vector3()
        
    
        # This assumes you are using Vector3 and add_central_force to move your player around.
        # We populate this with the direction you calculated using the Input handler.
        # Again, this is up to you and heavily depends on the game you are making.
        force = Vector3()
        
    add_central_force(force)
    
    # Send our transform and direction over the network
    if is_network_master():
    
        # We use the script that's attached to the Game node to perform the RPC call
        game.rpc_unreliable_id(1, "move", direction, get_transform())

Now the direction and transform of our player Node is sent every physics process tick. By default, that’s 60 times a second.

Serverside

The client is now sending us movement data. All we have to do is process the movement ourselves and send the result of that to all clients.

Let’s look at the script we made to start a server again and add new logic:

remote func move(direction, transform):
    
    # Obtain the RPC sender ID so we can find the relevant player
    var rpc_id = get_rpc_sender_id()
    
    # Find the player node
    var player = $"/root/Game/Players".get_node(String(rpc_id))
    
    # Call movement function
    player.move(direction, transform)

Let’s also implement the move method that we are calling. In the player Node script:

onready var game = $"/root".get_node("Game")
onready var direction = Vector3()

func move(direction:Vector3, transform:Transform):
    
    # Update player state with the state received from the client
    direction = direction
    client_transform = transform # explicitly calling it client_transform here
    
func _physics_process(delta:float = 0):
    
    # Like the client, we have a force and a direction vector. We just run the same calculation here.
    # I still leave that up to you to do. It's your game and I do not know what you're making. :)
    var force = Vector3()
    
    add_central_force(force)
    
    # Send the transform that we, as server have calculated to all the clients
    game.rpc_unreliable(get_name(), "move", get_transform())

You may have noticed I am not using the client_transform variable here.

That’s because you may choose to implement a way to allow slight differences between the client and server transform (there will be differences, no getting around that!) - but that is out of the scope of this post.

The server is now sending the transform of the player!

Processing server calculated transforms

We are almost finished. Let’s go back to the script we created to connect to servers and listen for the move RPC call:

remote func move(peer_id:int, transform:Transform):
    
    # Get player node
    player = $"/root/Game/Players".get_node(String(peer_id))
    player.move(transform)

Lastly, we handle the transform data through the move method inside our client player Node script:

func move(transform:Transform):

    # We can ignore the packet if we are the network owner of the player.
    # We already know our own state!
    if is_network_master():
        return
    
    # Set serversided calculated transform. You may want to use Tweening to make it look smoother.
    set_transform(transform)

You made it!

In this post, we have successfully:

Created a game server and game client
Handled adding of new players
Handled the movement synchronization of player Nodes

Thanks for reading. I hope you have learnt something useful.

Why I moved on from PHP

Icseon — Tue, 01 Dec 2020 13:08:00 GMT

Note: At the time of writing this warning, this post is over 2 years old. I no longer use ExpressJS. I still stand by much of this post. Without expensive infrastructure, PHP is sub-optimal for any large scale web applications.

Do not get me wrong, I love PHP for the most part. I have been using it for years and I have to thank PHP for where I am right now. It has built opportunities for me that I otherwise possibly wouldn’t have gotten.

Unfortunately, it has become clear that I need to make some changes. Up until now, I have been using the Phalcon framework to develop my web projects. However, starting PHP 8.0 they have announced that they were switching to “native” PHP which is just a fancy way of saying that they’re going to be switching to being a framework based solely on PHP…just like Laravel and Symfony. The only reason I was using Phalcon was because it’s really an extension you can just install and have amazing performance just because of that. Plus, they have little to no implementations like authentication which allowed me to do that myself… which is exactly what I was looking for!

I’m afraid that will no longer be a thing. Also, the development for that framework in particular is really unpromising, their team is small. I do not want to take the risk of using Phalcon for all of my future projects, only to find I am using an unmaintained framework.

So I went looking for alternatives. There are no real alternatives, outside of just going ahead and writing my own PHP framework, which is what I did up until switching to Phalcon. The reason I’m not using something like Laravel? Its performance is terrible (unless you cache everything, but does that really solve the problem?) and it’s an extremely abstract framework. I want to implement things myself. I don’t want things to “just work”, without knowing the underlying technology. I don’t want to rely on someone else to provide security for my projects. That’s something I want to have in my own hands.

Having realized at this point that most PHP frameworks just aren’t for me, not because I can’t use them, but because they don’t allow much freedom, I considered writing my own PHP framework again. But do I really want to deal with that? It’s almost like I am contradicting myself… I want more control but when I have the option, I don’t want to deal with it… but hear me out.

Writing my own PHP framework would require me to implement routing and a controller & view + model system myself and I would’ve done that, had I not been introduced to the Phalcon framework that handled those things amazingly. I came to realize that I don’t want to deal with those things at all.

So to be clear, yes, I want more control over how my application functions, but things like routing should be trivial and easy to go about. I shouldn’t have to write my own framework that’s maintained by a large group of people: me, I and myself. Furthermore, it wouldn’t make much sense. My framework would be in “native” PHP, which is exactly the same route Phalcon is taking. Could as well just use Phalcon at that rate again.

So. I needed an alternative. An alternative to the thing we call PHP itself. I do not want to give up my control for abstraction and less performance. I do not want to rely on too much implementation by other people, as in, I want to implement some sensitive things myself to ensure its security and performance.

That’s where it hit me. I remember using express.js at one point. Even its slogan is very promising. It is an unopinionated framework, just like Phalcon is. You do a lot of things yourself, and that’s exactly what I am looking for! Plus, it has basic things like routing and MVC already implemented right out of the box! Plus, it’s NodeJS. I’ve wanted to move on for a while, and seeing how performant the vast majority of the applications based on Express are, I’ve determined that it might be the right choice moving forward.

After tinkering around with it for a little bit, I have come to the conclusion that it is indeed what I am looking for. No more setbacks from PHP being PHP, and not a compiled language in itself unless you use OPCache which feels hacky, plus it’s served through fastcgi, which cannot be good for performance whatsoever. I am aware that PHP8 ships with JIT and all those “magical” work-arounds… but they’re not really solving the issue itself, just patching it like you would with bandaid and toiletpaper. NodeJS is not compiled in ways things like C++ are, it is compiled once you run it once, unlike PHP that has the tendency to go through all of that again every single request which hurts performance terribly. Not to mention all of those frameworks that sit on top of that like Luigi’s spaghetti…

So ExpressJS it is! None of my future projects, if I have the choice, will be using PHP. It’s time to move on.