| _ _ ____ _ |
| ___| | | | _ \| | |
| / __| | | | |_) | | |
| | (__| |_| | _ <| |___ |
| \___|\___/|_| \_\_____| |
| |
| INTERNALS |
| |
| The project is split in two. The library and the client. The client part uses |
| the library, but the library is designed to allow other applications to use |
| it. |
| |
| The largest amount of code and complexity is in the library part. |
| |
| GIT |
| === |
| All changes to the sources are committed to the git repository as soon as |
| they're somewhat verified to work. Changes shall be committed as independently |
| as possible so that individual changes can be easier spotted and tracked |
| afterwards. |
| |
| Tagging shall be used extensively, and by the time we release new archives we |
| should tag the sources with a name similar to the released version number. |
| |
| Portability |
| =========== |
| |
| We write curl and libcurl to compile with C89 compilers. On 32bit and up |
| machines. Most of libcurl assumes more or less POSIX compliance but that's |
| not a requirement. |
| |
| We write libcurl to build and work with lots of third party tools, and we |
| want it to remain functional and buildable with these and later versions |
| (older versions may still work but is not what we work hard to maintain): |
| |
| OpenSSL 0.9.6 |
| GnuTLS 1.2 |
| zlib 1.1.4 |
| libssh2 0.16 |
| c-ares 1.6.0 |
| libidn 0.4.1 |
| cyassl 2.0.0 |
| openldap 2.0 |
| MIT krb5 lib 1.2.4 |
| qsossl V5R2M0 |
| NSS 3.11.x |
| axTLS 1.2.7 |
| Heimdal ? |
| |
| * = only partly functional, but that's due to bugs in the third party lib, not |
| because of libcurl code |
| |
| On systems where configure runs, we aim at working on them all - if they have |
| a suitable C compiler. On systems that don't run configure, we strive to keep |
| curl running fine on: |
| |
| Windows 98 |
| AS/400 V5R2M0 |
| Symbian 9.1 |
| Windows CE ? |
| TPF ? |
| |
| When writing code (mostly for generating stuff included in release tarballs) |
| we use a few "build tools" and we make sure that we remain functional with |
| these versions: |
| |
| GNU Libtool 1.4.2 |
| GNU Autoconf 2.57 |
| GNU Automake 1.7 (we currently avoid 1.10 due to Solaris-related bugs) |
| GNU M4 1.4 |
| perl 5.004 |
| roffit 0.5 |
| groff ? (any version that supports "groff -Tps -man [in] [out]") |
| ps2pdf (gs) ? |
| |
| Windows vs Unix |
| =============== |
| |
| There are a few differences in how to program curl the unix way compared to |
| the Windows way. The four perhaps most notable details are: |
| |
| 1. Different function names for socket operations. |
| |
| In curl, this is solved with defines and macros, so that the source looks |
| the same at all places except for the header file that defines them. The |
| macros in use are sclose(), sread() and swrite(). |
| |
| 2. Windows requires a couple of init calls for the socket stuff. |
| |
| That's taken care of by the curl_global_init() call, but if other libs also |
| do it etc there might be reasons for applications to alter that behaviour. |
| |
| 3. The file descriptors for network communication and file operations are |
| not easily interchangeable as in unix. |
| |
| We avoid this by not trying any funny tricks on file descriptors. |
| |
| 4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus |
| destroying binary data, although you do want that conversion if it is |
| text coming through... (sigh) |
| |
| We set stdout to binary under windows |
| |
| Inside the source code, We make an effort to avoid '#ifdef [Your OS]'. All |
| conditionals that deal with features *should* instead be in the format |
| '#ifdef HAVE_THAT_WEIRD_FUNCTION'. Since Windows can't run configure scripts, |
| we maintain two curl_config-win32.h files (one in lib/ and one in src/) that |
| are supposed to look exactly as a curl_config.h file would have looked like on |
| a Windows machine! |
| |
| Generally speaking: always remember that this will be compiled on dozens of |
| operating systems. Don't walk on the edge. |
| |
| Library |
| ======= |
| |
| There are plenty of entry points to the library, namely each publicly defined |
| function that libcurl offers to applications. All of those functions are |
| rather small and easy-to-follow. All the ones prefixed with 'curl_easy' are |
| put in the lib/easy.c file. |
| |
| curl_global_init_() and curl_global_cleanup() should be called by the |
| application to initialize and clean up global stuff in the library. As of |
| today, it can handle the global SSL initing if SSL is enabled and it can init |
| the socket layer on windows machines. libcurl itself has no "global" scope. |
| |
| All printf()-style functions use the supplied clones in lib/mprintf.c. This |
| makes sure we stay absolutely platform independent. |
| |
| curl_easy_init() allocates an internal struct and makes some initializations. |
| The returned handle does not reveal internals. This is the 'SessionHandle' |
| struct which works as an "anchor" struct for all curl_easy functions. All |
| connections performed will get connect-specific data allocated that should be |
| used for things related to particular connections/requests. |
| |
| curl_easy_setopt() takes three arguments, where the option stuff must be |
| passed in pairs: the parameter-ID and the parameter-value. The list of |
| options is documented in the man page. This function mainly sets things in |
| the 'SessionHandle' struct. |
| |
| curl_easy_perform() does a whole lot of things: |
| |
| It starts off in the lib/easy.c file by calling Curl_perform() and the main |
| work then continues in lib/url.c. The flow continues with a call to |
| Curl_connect() to connect to the remote site. |
| |
| o Curl_connect() |
| |
| ... analyzes the URL, it separates the different components and connects to |
| the remote host. This may involve using a proxy and/or using SSL. The |
| Curl_resolv() function in lib/hostip.c is used for looking up host names |
| (it does then use the proper underlying method, which may vary between |
| platforms and builds). |
| |
| When Curl_connect is done, we are connected to the remote site. Then it is |
| time to tell the server to get a document/file. Curl_do() arranges this. |
| |
| This function makes sure there's an allocated and initiated 'connectdata' |
| struct that is used for this particular connection only (although there may |
| be several requests performed on the same connect). A bunch of things are |
| inited/inherited from the SessionHandle struct. |
| |
| o Curl_do() |
| |
| Curl_do() makes sure the proper protocol-specific function is called. The |
| functions are named after the protocols they handle. Curl_ftp(), |
| Curl_http(), Curl_dict(), etc. They all reside in their respective files |
| (ftp.c, http.c and dict.c). HTTPS is handled by Curl_http() and FTPS by |
| Curl_ftp(). |
| |
| The protocol-specific functions of course deal with protocol-specific |
| negotiations and setup. They have access to the Curl_sendf() (from |
| lib/sendf.c) function to send printf-style formatted data to the remote |
| host and when they're ready to make the actual file transfer they call the |
| Curl_Transfer() function (in lib/transfer.c) to setup the transfer and |
| returns. |
| |
| If this DO function fails and the connection is being re-used, libcurl will |
| then close this connection, setup a new connection and re-issue the DO |
| request on that. This is because there is no way to be perfectly sure that |
| we have discovered a dead connection before the DO function and thus we |
| might wrongly be re-using a connection that was closed by the remote peer. |
| |
| Some time during the DO function, the Curl_setup_transfer() function must |
| be called with some basic info about the upcoming transfer: what socket(s) |
| to read/write and the expected file transfer sizes (if known). |
| |
| o Transfer() |
| |
| Curl_perform() then calls Transfer() in lib/transfer.c that performs the |
| entire file transfer. |
| |
| During transfer, the progress functions in lib/progress.c are called at a |
| frequent interval (or at the user's choice, a specified callback might get |
| called). The speedcheck functions in lib/speedcheck.c are also used to |
| verify that the transfer is as fast as required. |
| |
| o Curl_done() |
| |
| Called after a transfer is done. This function takes care of everything |
| that has to be done after a transfer. This function attempts to leave |
| matters in a state so that Curl_do() should be possible to call again on |
| the same connection (in a persistent connection case). It might also soon |
| be closed with Curl_disconnect(). |
| |
| o Curl_disconnect() |
| |
| When doing normal connections and transfers, no one ever tries to close any |
| connections so this is not normally called when curl_easy_perform() is |
| used. This function is only used when we are certain that no more transfers |
| is going to be made on the connection. It can be also closed by force, or |
| it can be called to make sure that libcurl doesn't keep too many |
| connections alive at the same time (there's a default amount of 5 but that |
| can be changed with the CURLOPT_MAXCONNECTS option). |
| |
| This function cleans up all resources that are associated with a single |
| connection. |
| |
| Curl_perform() is the function that does the main "connect - do - transfer - |
| done" loop. It loops if there's a Location: to follow. |
| |
| When completed, the curl_easy_cleanup() should be called to free up used |
| resources. It runs Curl_disconnect() on all open connectons. |
| |
| A quick roundup on internal function sequences (many of these call |
| protocol-specific function-pointers): |
| |
| Curl_connect - connects to a remote site and does initial connect fluff |
| This also checks for an existing connection to the requested site and uses |
| that one if it is possible. |
| |
| Curl_do - starts a transfer |
| Curl_handler::do_it() - transfers data |
| Curl_done - ends a transfer |
| |
| Curl_disconnect - disconnects from a remote site. This is called when the |
| disconnect is really requested, which doesn't necessarily have to be |
| exactly after curl_done in case we want to keep the connection open for |
| a while. |
| |
| HTTP(S) |
| |
| HTTP offers a lot and is the protocol in curl that uses the most lines of |
| code. There is a special file (lib/formdata.c) that offers all the multipart |
| post functions. |
| |
| base64-functions for user+password stuff (and more) is in (lib/base64.c) and |
| all functions for parsing and sending cookies are found in (lib/cookie.c). |
| |
| HTTPS uses in almost every means the same procedure as HTTP, with only two |
| exceptions: the connect procedure is different and the function used to read |
| or write from the socket is different, although the latter fact is hidden in |
| the source by the use of Curl_read() for reading and Curl_write() for writing |
| data to the remote server. |
| |
| http_chunks.c contains functions that understands HTTP 1.1 chunked transfer |
| encoding. |
| |
| An interesting detail with the HTTP(S) request, is the Curl_add_buffer() |
| series of functions we use. They append data to one single buffer, and when |
| the building is done the entire request is sent off in one single write. This |
| is done this way to overcome problems with flawed firewalls and lame servers. |
| |
| FTP |
| |
| The Curl_if2ip() function can be used for getting the IP number of a |
| specified network interface, and it resides in lib/if2ip.c. |
| |
| Curl_ftpsendf() is used for sending FTP commands to the remote server. It was |
| made a separate function to prevent us programmers from forgetting that they |
| must be CRLF terminated. They must also be sent in one single write() to make |
| firewalls and similar happy. |
| |
| Kerberos |
| |
| The kerberos support is mainly in lib/krb4.c and lib/security.c. |
| |
| TELNET |
| |
| Telnet is implemented in lib/telnet.c. |
| |
| FILE |
| |
| The file:// protocol is dealt with in lib/file.c. |
| |
| LDAP |
| |
| Everything LDAP is in lib/ldap.c and lib/openldap.c |
| |
| GENERAL |
| |
| URL encoding and decoding, called escaping and unescaping in the source code, |
| is found in lib/escape.c. |
| |
| While transferring data in Transfer() a few functions might get used. |
| curl_getdate() in lib/parsedate.c is for HTTP date comparisons (and more). |
| |
| lib/getenv.c offers curl_getenv() which is for reading environment variables |
| in a neat platform independent way. That's used in the client, but also in |
| lib/url.c when checking the proxy environment variables. Note that contrary |
| to the normal unix getenv(), this returns an allocated buffer that must be |
| free()ed after use. |
| |
| lib/netrc.c holds the .netrc parser |
| |
| lib/timeval.c features replacement functions for systems that don't have |
| gettimeofday() and a few support functions for timeval conversions. |
| |
| A function named curl_version() that returns the full curl version string is |
| found in lib/version.c. |
| |
| Persistent Connections |
| ====================== |
| |
| The persistent connection support in libcurl requires some considerations on |
| how to do things inside of the library. |
| |
| o The 'SessionHandle' struct returned in the curl_easy_init() call must never |
| hold connection-oriented data. It is meant to hold the root data as well as |
| all the options etc that the library-user may choose. |
| o The 'SessionHandle' struct holds the "connection cache" (an array of |
| pointers to 'connectdata' structs). There's one connectdata struct |
| allocated for each connection that libcurl knows about. Note that when you |
| use the multi interface, the multi handle will hold the connection cache |
| and not the particular easy handle. This of course to allow all easy handles |
| in a multi stack to be able to share and re-use connections. |
| o This enables the 'curl handle' to be reused on subsequent transfers. |
| o When we are about to perform a transfer with curl_easy_perform(), we first |
| check for an already existing connection in the cache that we can use, |
| otherwise we create a new one and add to the cache. If the cache is full |
| already when we add a new connection, we close one of the present ones. We |
| select which one to close dependent on the close policy that may have been |
| previously set. |
| o When the transfer operation is complete, we try to leave the connection |
| open. Particular options may tell us not to, and protocols may signal |
| closure on connections and then we don't keep it open of course. |
| o When curl_easy_cleanup() is called, we close all still opened connections, |
| unless of course the multi interface "owns" the connections. |
| |
| You do realize that the curl handle must be re-used in order for the |
| persistent connections to work. |
| |
| multi interface/non-blocking |
| ============================ |
| |
| We make an effort to provide a non-blocking interface to the library, the |
| multi interface. To make that interface work as good as possible, no |
| low-level functions within libcurl must be written to work in a blocking |
| manner. |
| |
| One of the primary reasons we introduced c-ares support was to allow the name |
| resolve phase to be perfectly non-blocking as well. |
| |
| The ultimate goal is to provide the easy interface simply by wrapping the |
| multi interface functions and thus treat everything internally as the multi |
| interface is the single interface we have. |
| |
| The FTP and the SFTP/SCP protocols are thus perfect examples of how we adapt |
| and adjust the code to allow non-blocking operations even on multi-stage |
| protocols. They are built around state machines that return when they could |
| block waiting for data. The DICT, LDAP and TELNET protocols are crappy |
| examples and they are subject for rewrite in the future to better fit the |
| libcurl protocol family. |
| |
| SSL libraries |
| ============= |
| |
| Originally libcurl supported SSLeay for SSL/TLS transports, but that was then |
| extended to its successor OpenSSL but has since also been extended to several |
| other SSL/TLS libraries and we expect and hope to further extend the support |
| in future libcurl versions. |
| |
| To deal with this internally in the best way possible, we have a generic SSL |
| function API as provided by the sslgen.[ch] system, and they are the only SSL |
| functions we must use from within libcurl. sslgen is then crafted to use the |
| appropriate lower-level function calls to whatever SSL library that is in |
| use. |
| |
| Library Symbols |
| =============== |
| |
| All symbols used internally in libcurl must use a 'Curl_' prefix if they're |
| used in more than a single file. Single-file symbols must be made static. |
| Public ("exported") symbols must use a 'curl_' prefix. (There are exceptions, |
| but they are to be changed to follow this pattern in future versions.) Public |
| API functions are marked with CURL_EXTERN in the public header files so that |
| all others can be hidden on platforms where this is possible. |
| |
| Return Codes and Informationals |
| =============================== |
| |
| I've made things simple. Almost every function in libcurl returns a CURLcode, |
| that must be CURLE_OK if everything is OK or otherwise a suitable error code |
| as the curl/curl.h include file defines. The very spot that detects an error |
| must use the Curl_failf() function to set the human-readable error |
| description. |
| |
| In aiding the user to understand what's happening and to debug curl usage, we |
| must supply a fair amount of informational messages by using the Curl_infof() |
| function. Those messages are only displayed when the user explicitly asks for |
| them. They are best used when revealing information that isn't otherwise |
| obvious. |
| |
| API/ABI |
| ======= |
| |
| We make an effort to not export or show internals or how internals work, as |
| that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI |
| for our promise to users. |
| |
| Client |
| ====== |
| |
| main() resides in src/main.c together with most of the client code. |
| |
| src/hugehelp.c is automatically generated by the mkhelp.pl perl script to |
| display the complete "manual" and the src/urlglob.c file holds the functions |
| used for the URL-"globbing" support. Globbing in the sense that the {} and [] |
| expansion stuff is there. |
| |
| The client mostly messes around to setup its 'config' struct properly, then |
| it calls the curl_easy_*() functions of the library and when it gets back |
| control after the curl_easy_perform() it cleans up the library, checks status |
| and exits. |
| |
| When the operation is done, the ourWriteOut() function in src/writeout.c may |
| be called to report about the operation. That function is using the |
| curl_easy_getinfo() function to extract useful information from the curl |
| session. |
| |
| Recent versions may loop and do all this several times if many URLs were |
| specified on the command line or config file. |
| |
| Memory Debugging |
| ================ |
| |
| The file lib/memdebug.c contains debug-versions of a few functions. Functions |
| such as malloc, free, fopen, fclose, etc that somehow deal with resources |
| that might give us problems if we "leak" them. The functions in the memdebug |
| system do nothing fancy, they do their normal function and then log |
| information about what they just did. The logged data can then be analyzed |
| after a complete session, |
| |
| memanalyze.pl is the perl script present in tests/ that analyzes a log file |
| generated by the memory tracking system. It detects if resources are |
| allocated but never freed and other kinds of errors related to resource |
| management. |
| |
| Internally, definition of preprocessor symbol DEBUGBUILD restricts code which |
| is only compiled for debug enabled builds. And symbol CURLDEBUG is used to |
| differentiate code which is _only_ used for memory tracking/debugging. |
| |
| Use -DCURLDEBUG when compiling to enable memory debugging, this is also |
| switched on by running configure with --enable-curldebug. Use -DDEBUGBUILD |
| when compiling to enable a debug build or run configure with --enable-debug. |
| |
| curl --version will list 'Debug' feature for debug enabled builds, and |
| will list 'TrackMemory' feature for curl debug memory tracking capable |
| builds. These features are independent and can be controlled when running |
| the configure script. When --enable-debug is given both features will be |
| enabled, unless some restriction prevents memory tracking from being used. |
| |
| Test Suite |
| ========== |
| |
| The test suite is placed in its own subdirectory directly off the root in the |
| curl archive tree, and it contains a bunch of scripts and a lot of test case |
| data. |
| |
| The main test script is runtests.pl that will invoke test servers like |
| httpserver.pl and ftpserver.pl before all the test cases are performed. The |
| test suite currently only runs on unix-like platforms. |
| |
| You'll find a description of the test suite in the tests/README file, and the |
| test case data files in the tests/FILEFORMAT file. |
| |
| The test suite automatically detects if curl was built with the memory |
| debugging enabled, and if it was it will detect memory leaks, too. |
| |
| Building Releases |
| ================= |
| |
| There's no magic to this. When you consider everything stable enough to be |
| released, do this: |
| |
| 1. Tag the source code accordingly. |
| |
| 2. run the 'maketgz' script (using 'make distcheck' will give you a pretty |
| good view on the status of the current sources). maketgz requires a |
| version number and creates the release archive. maketgz uses 'make dist' |
| for the actual archive building, why you need to fill in the Makefile.am |
| files properly for which files that should be included in the release |
| archives. |
| |
| 3. When that's complete, sign the output files. |
| |
| 4. Upload |
| |
| 5. Update web site and changelog on site |
| |
| 6. Send announcement to the mailing lists |
| |
| NOTE: you must have curl checked out from git to be able to do a proper |
| release build. The release tarballs do not have everything setup in order to |
| do releases properly. |