Welcome, Guest. Please login or register. Did you miss your activation email?

Author Topic: TCP packet corruption in non-blocking mode  (Read 4919 times)

0 Members and 1 Guest are viewing this topic.

binary1248

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1405
  • I am awesome.
    • View Profile
    • The server that really shouldn't be running
TCP packet corruption in non-blocking mode
« on: June 06, 2013, 09:17:20 pm »
Hi,

from these 2 threads
http://en.sfml-dev.org/forums/index.php?topic=11072.0
http://en.sfml-dev.org/forums/index.php?topic=11235.0
which have been ongoing for about the last 2 months, I investigated what could be the cause of Jungletoe's issues. At first I thought it might be something connection based but ruled that out after I checked the packet captures. It also wasn't his code because it was too simple to contain any unnoticeable errors. After a bit of testing, Jungletoe determined the errors only occured when transferring data in fairly large blocks (I believe > 2000 bytes) very rapidly which led me to check the buffering and unbuffering of data over a TCP connection using SFML. Normal byte-based data transfers were unaffected, however packet-based transfers were experiencing data corruption if they could not be queued for sending in non-blocking mode. I checked the SFML TcpSocket source and found the source of the error.

In high congestion scenarios the receive buffer of the remote host is full. The send buffer of the local host might be full or only partially full. This means that if the packet is large enough to not fit into the available buffer space, it won't be queued. However, due to the way SFML sends packets over a TCP connection, it sends the size of the packet first in a separate send() call. Because the size of the packet is generally 4 bytes large, chances are it will fit into the available space. The first send() call will thus not return any error or NotReady in non-blocking mode. When SFML tries to send the data, it will fail with NotReady and consequently forward the return value back to the user. The user thus has no way to determine whether the size was sent successfully or not. Most users will then proceed to retry sending the packet, however at the receiving side, the first packet size that was sent is already regarded as the beginning of a new packet and all the following data would be shifted by 4 bytes resulting in packet data corruption.

Here are examples that will cause this to happen. The packet size might have to be changed to be sure the buffer gets filled fast enough due to this being a synthetic scenario.

Sender:
#include <iostream>
#include <string>
#include <SFML/Network.hpp>

int main() {
        sf::TcpSocket socket;
        socket.connect( "12.34.56.78", 4422 );
        socket.setBlocking( false );

        sf::Uint32 counter = 1;
        sf::Clock clock;
        sf::Socket::Status status = sf::Socket::Done;
        sf::Packet packet;

        while( true ) {
                if( status != sf::Socket::NotReady ) {
                        packet.clear();

                        for( int i = 0; i < 4096; i++ ) {
                                packet << counter;
                                counter += 2;
                        }
                }

                status = socket.send( packet );

                switch( status ) {
                        case sf::Socket::Error:
                                std::cout << "Error\n";
                                return 1;
                                break;
                        case sf::Socket::Disconnected:
                                std::cout << "Disconnected\n";
                                return 1;
                                break;
                        case sf::Socket::NotReady:
                                std::cout << "NotReady\n";
                                break;
                        default:
                                break;
                }

                if( clock.getElapsedTime() > sf::seconds( 5 ) ) {
                        return 0;
                }
        }

        return 0;
}
 

Receiver:
#include <cstring>
#include <iostream>
#include <SFML/Network.hpp>

int main() {
        sf::TcpListener listener;
        listener.listen( 4422 );
       
        sf::TcpSocket socket;
        listener.accept( socket );
       
        sf::sleep( sf::seconds( 10 ) );
       
        sf::Packet packet;
        sf::Uint32 counter = 1;
       
        while( !socket.receive( packet ) ) {
                std::cout << "Dequeued packet of size: " << packet.getDataSize() << "\n";
               
                for( int i = 0; i < 4096; i++ ) {
                        sf::Uint32 value = 0;
                        packet >> value;
                        std::cout << value << "\n";
                       
                        if( value == counter ) {
                                counter += 2;
                        } else {
                                std::cout << "Packet data corruption!\n";
                        }
                }
        }
       
        return 0;
}
 

What you would get from the receiver output is something like:
Code: [Select]
...
8189
8191
Dequeued packet of size: 16384
16384
Packet data corruption!
8193
8195
...
16379
16381
Dequeued packet of size: 16383
16384
Packet data corruption!
16384
Packet data corruption!
16385
...

The fix I submitted for pulling concatenates the packet size and packet data into a single block for sending. That way, if the whole packet containing size and data cannot be sent atomically, any future retries will be handled properly at the receiver because no stray data is sent.

https://github.com/SFML/SFML/pull/402
SFGUI # SFNUL # GLS # Wyrm <- Why do I waste my time on such a useless project? Because I am awesome (first meaning).

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
Re: TCP packet corruption in non-blocking mode
« Reply #1 on: June 06, 2013, 10:17:56 pm »
When you mentioned a bug in SFML, I was sure it was related to the two calls that are done in send(sf::Packet) ;D but of course I didn't know why.

Thank you so much for these two months of investigation with Jungletoe, this is something that I wouldn't have been able to do.
Laurent Gomila - SFML developer

Jungletoe

  • Full Member
  • ***
  • Posts: 132
    • View Profile
    • Email
Re: TCP packet corruption in non-blocking mode
« Reply #2 on: June 07, 2013, 12:54:22 am »
Thank you very much for investigating this. I'm happy we are finally able to move towards a fix after these months! Binary, I don't have money to repay you after what you've done, but if there's ever anything I can provide to you (doubtful), I am forever in your servitude :)