Welcome, Guest. Please login or register. Did you miss your activation email?

Author Topic: Download text-file  (Read 24018 times)

0 Members and 1 Guest are viewing this topic.

dabo

  • Sr. Member
  • ****
  • Posts: 260
    • View Profile
    • http://www.dabostudios.net
Download text-file
« on: March 22, 2008, 12:45:11 pm »
Hi, I'm a beginner when it comes to network programming so I have a question.

Is it possible to use SFML's network part to download a text-file from a website, for example: www.mywebsite.com/mytextfile.txt and put it somewhere on the hard drive. If so, how would this be done?

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32504
    • View Profile
    • SFML's website
    • Email
Download text-file
« Reply #1 on: March 22, 2008, 12:55:23 pm »
You have to send an HTML request, and then extract the content of the page from what you receive.

You can take a look at the sf::IPAddress::GetPublicAddress() to see how to send and receive a HTML request.
Laurent Gomila - SFML developer

dabo

  • Sr. Member
  • ****
  • Posts: 260
    • View Profile
    • http://www.dabostudios.net
Download text-file
« Reply #2 on: March 22, 2008, 02:51:28 pm »
Ok thanks, I'm gonna check it out.

dabo

  • Sr. Member
  • ****
  • Posts: 260
    • View Profile
    • http://www.dabostudios.net
Download text-file
« Reply #3 on: March 22, 2008, 09:31:04 pm »
As expected I couldn't get it to work  :)

This is my code:
Code: [Select]
#include <iostream>
#include <SFML/Network.hpp>

int main()
{
sf::SocketHelper::SocketType sock = socket(PF_INET, SOCK_STREAM, 0);
if(sock == sf::SocketHelper::InvalidSocket())
return EXIT_FAILURE;

sf::IPAddress server("www.dabostudios.net");
sockaddr_in sockAddr;
memset(sockAddr.sin_zero, 0, sizeof(sockAddr.sin_zero));
sockAddr.sin_addr.s_addr = inet_addr(server.ToString().c_str());
sockAddr.sin_family = AF_INET;
sockAddr.sin_port = htons(80);

if(connect(sock, (sockaddr*) &sockAddr, sizeof(sockAddr)) == -1)
{
sf::SocketHelper::Close(sock);
return EXIT_FAILURE;
}

const char request[] = "GET /text.txt HTTP/1.0\r\n"
"From: camembert@fromage.com\r\n"
"User-Agent: SFML/1.0\r\n"
"\r\n";

if(send(sock, request, sizeof(request), 0) <= 0)
{
sf::SocketHelper::Close(sock);
return EXIT_FAILURE;
}

char response[1024] = {0};
int received;

do
{
char buffer[1024];
received = recv(sock, buffer, sizeof(buffer), 0);

if(received > 0)
{
buffer[received] = '\0';
strcat(response, buffer);
}
else if(received < 0)
return EXIT_FAILURE;
}
while(received > 0);

sf::SocketHelper::Close(sock);

std::cout << response << std::endl;

system("pause>>null");
return EXIT_SUCCESS;
}


This is the output:
Code: [Select]
HTTP/1.1 404 Not Found
Date: Sat, 22 Mar 2008 20:24:15 GMT
Server: Apache
Connection: close
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>404 Not Found</TITLE>
</HEAD><BODY>
<H1>Not Found</H1>
The requested URL /text.txt was not found on this server.<P>
</BODY></HTML>


If I use HTML/1.1 I get this output:
Code: [Select]
HTTP/1.1 400 Bad Request
Date: Sat, 22 Mar 2008 20:29:21 GMT
Server: Apache
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1

12f
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>400 Bad Request</TITLE>
</HEAD><BODY>
<H1>Bad Request</H1>
Your browser sent a request that this server could not understand.<P>
client sent HTTP/1.1 request without hostname (see RFC2616 section 14.23): /text
.txt<P>
</BODY></HTML>

0


If I do a html request from this website it works:

Btw, what should I set From & User-Agent to in the request?

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32504
    • View Profile
    • SFML's website
    • Email
Download text-file
« Reply #4 on: March 23, 2008, 06:07:28 am »
You're not supposed to use raw sockets like SFML, just use sf::SocketTCP. I told you to have a look at the internal code only to know what to send and how to receive it :)

The From and User-Agent fields are useless here, it's usually used as an identifier for the browser or website which does the request.
Laurent Gomila - SFML developer

dabo

  • Sr. Member
  • ****
  • Posts: 260
    • View Profile
    • http://www.dabostudios.net
Download text-file
« Reply #5 on: March 23, 2008, 12:40:15 pm »
Ok, I have a new non-working code:

Code: [Select]
#include <iostream>
#include <SFML/Network.hpp>

int main()
{
sf::SocketTCP sock;

sf::IPAddress server("www.dabostudios.net");

if(!sock.Connect(80, server))
return EXIT_FAILURE;

const char request[] = "GET /text.txt HTTP/1.0\r\n"
"From: camembert@fromage.com\r\n"
"User-Agent: SFML/1.0\r\n"
"\r\n";

if(sock.Send(request, sizeof(request)) != sf::Socket::Done)
return EXIT_FAILURE;

char buffer[1024];
std::size_t received;

if(sock.Receive(buffer, sizeof(buffer), received) != sf::Socket::Done)
return EXIT_FAILURE;

sock.Close();

std::cout << buffer << std::endl;

system("pause>>null");
return EXIT_SUCCESS;
}


EDIT: I found an error in the code and fixed it. So now I'm back to the first problem again. 404 Not found  :(

When I use this request line: "GET / HTTP/1.0\r\n" it works, but if a want to request another file like: "GET /text.html HTTP/1.0\r\n" I get nothing.

dabo

  • Sr. Member
  • ****
  • Posts: 260
    • View Profile
    • http://www.dabostudios.net
Download text-file
« Reply #6 on: March 23, 2008, 10:49:41 pm »
Using:
Code: [Select]
const char request[] = "GET / HTTP/1.0\r\n"
"\r\n";

 I get the following:

Code: [Select]
HTTP/1.1 200 OK

Date: Sun, 23 Mar 2008 21:40:01 GMT

Server: Apache

Connection: close

Content-Type: text/html



<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<HTML>
 <HEAD>
  <TITLE>Index of /</TITLE>
 <META NAME="generator", CONTENT="mod_autoindex"> </HEAD>
 <BODY bgcolor="#ffffff" text="#000000">

<TABLE><TR><TD bgcolor="#ffffff" class="title">
<FONT size="+3" face="Helvetica,Arial,sans-serif">
<B>Index of /</B></FONT>

</TD></TR></TABLE><PRE><IMG border="0" src="/icons/blank.gif" ALT="     "> <A HREF="?N=D">Name</A>                    <A HREF="?M=A">Last modified</A>       <A HREF="?S=A">Size</A>  <A HREF="?D=A">Description</A>
<HR noshade align="left" width="80%">
<IMG border="0" src="/icons/back.gif" ALT="[DIR]"> <A HREF="/">Parent Directory</A>        02-Nov-2004 10:22      -  
<IMG border="0" src="/icons/text.gif" ALT="[TXT]"> <A HREF="502-iloapp.html">502-iloapp.html</A>         17-Dec-2007 18:14     1k  
<IMG border="0" src="/icons/folder.gif" ALT="[DIR]"> <A HREF="apache2-default/">apache2-default/</A>        24-Jan-2008 11:46      -  
<IMG border="0" src="/icons/folder.gif" ALT="[DIR]"> <A HREF="default/">default/</A>                20-Sep-2006 16:34      -  
<IMG border="0" src="/icons/folder.gif" ALT="[DIR]"> <A HREF="external-sla/">external-sla/</A>           18-Dec-2007 10:48      -  
<IMG border="0" src="/icons/folder.gif" ALT="[DIR]"> <A HREF="iloapp/">iloapp/</A>                 12-Sep-2006 11:39      -  
<IMG border="0" src="/icons/folder.gif" ALT="[DIR]"> <A HREF="monitor/">monitor/</A>                18-Dec-2007 10:48      -  
<IMG border="0" src="/icons/folder.gif" ALT="[DIR]"> <A HREF="one.com-external/">one.com-external/</A>       15-Sep-2006 02:47      -  
<IMG border="0" src="/icons/folder.gif" ALT="[DIR]"> <A HREF="phpmyadmin/">phpmyadmin/</A>             22-Jan-2008 15:01      -  
</PRE><HR noshade align="left" width="80%">
</BODY></HTML>


...and I have no idea where it comes from, I know of no file looking like that. What file is supposed to be sent back to me doing the request above? It sure isen't the index page.

dabo

  • Sr. Member
  • ****
  • Posts: 260
    • View Profile
    • http://www.dabostudios.net
Download text-file
« Reply #7 on: March 23, 2008, 11:18:16 pm »
I have finally solved it :D. This is the solution:

Instead of using this request:
Code: [Select]
const char request[] = "GET /text.txt HTTP/1.1\r\n"
"\r\n";

I used this:
Code: [Select]
const char request[] = "GET /text.txt HTTP/1.1\r\n"
"Host: www.dabostudios.net\r\n"
"\r\n";

I dont know why that host row made the difference I'm just happy it works.

dabo

  • Sr. Member
  • ****
  • Posts: 260
    • View Profile
    • http://www.dabostudios.net
Download text-file
« Reply #8 on: March 24, 2008, 12:21:46 am »
I have one more question, how come there is such a big difference how much you receive when you request a page? Sometimes I get a buffer size of > 10000 and sometimes its only a couple of 1000 (using strlen()). This means I sometimes don't get the data I want.

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32504
    • View Profile
    • SFML's website
    • Email
Download text-file
« Reply #9 on: March 24, 2008, 02:26:18 am »
You must use a loop to make sure you receive all the data ;)
Laurent Gomila - SFML developer

dabo

  • Sr. Member
  • ****
  • Posts: 260
    • View Profile
    • http://www.dabostudios.net
Download text-file
« Reply #10 on: March 24, 2008, 04:12:51 am »
Quote from: "Laurent"
You must use a loop to make sure you receive all the data ;)
I thought that was done "under the hood". But now I know.

dabo

  • Sr. Member
  • ****
  • Posts: 260
    • View Profile
    • http://www.dabostudios.net
Download text-file
« Reply #11 on: August 03, 2008, 07:21:02 pm »
Do you need to open a new connection for every html request? or do I have to wait a while before making a new request? I tried making two requests after each other but only the first one succeeded. When I close the connection in between and re-connect again for the next request it works. The most likely reason though is that I'm doing something wrong  :D

elisee

  • Full Member
  • ***
  • Posts: 108
    • View Profile
Download text-file
« Reply #12 on: August 03, 2008, 07:36:57 pm »
You might want to take a look at HTTP 1.1 RFC over here : http://www.w3.org/Protocols/rfc2616/rfc2616.html.

Anyway, according to this document http://www.io.com/~maus/HttpKeepAlive.html : HTTP was originally a request-response protocol, with a socket by request.

With HTTP 1.0, you can ask (though it's an non official addition) for the server to keep the connection alive.
With HTTP 1.1, this is the default, but you should not rely upon the server keeping it alive (your socket can be broken at any time).

Guess you'll have to take some time to document and find the best behaviour on your own.

dabo

  • Sr. Member
  • ****
  • Posts: 260
    • View Profile
    • http://www.dabostudios.net
Download text-file
« Reply #13 on: August 03, 2008, 08:20:00 pm »
Ok Thanks.

Daazku

  • Hero Member
  • *****
  • Posts: 896
    • View Profile
Download text-file
« Reply #14 on: August 03, 2008, 08:22:39 pm »
Quote from: "dabo"
I have finally solved it :D. This is the solution:

Instead of using this request:
Code: [Select]
const char request[] = "GET /text.txt HTTP/1.1\r\n"
"\r\n";

I used this:
Code: [Select]
const char request[] = "GET /text.txt HTTP/1.1\r\n"
"Host: www.dabostudios.net\r\n"
"\r\n";

I dont know why that host row made the difference I'm just happy it works.


You always need to specify host name because for a same ip you can have 4395 sites on that ip so the server need to know wich one you want to see.

Exemple: SFML is on the IP 213.186.33.2 but if you past-it in your browser you will see the ovh webmail page. If you add the host field in your http header request with the name sfml-dev.org you will see sfml. (Firefox have alot of add-on to do that like tamper-data ou Modify headers or Http Lives Headers etc etc...
Pensez à mettre le tag [Résolu] une fois la réponse à votre question trouvée.
Remember to add the tag [Solved] when you got an answer to your question.