[pacman-dev] [RFC] curl-based internal downloader
Greetings, I've been working on and off on a curl-based internal downloader to replace the libfetch based implementation. Why? - libcurl is more widely used in the arch repos. fetch is used for one and only package: pacman. - support for compression over http connections. I see proposals for adding this to fetch, but never any acceptance. Please correct me if I'm mistaken. - personal bias. not really a reason, but I think the implementation is cleaner and more readable than the fetch equivalent. - dan wanted me to. (the devil made me do it?) In it's current form, I'd say this patch set is 90% complete. I've been using it on top of pacman-git for about a week now and haven't come across any functional issues. I've done a fair bit of testing with ftp, http, and file based sync'ing. So what's remaining? - setting pm_errno: libfetch provides a global variable fetchLastErrString, whereas curl returns error codes (CURLcode) from operations and you feed them to curl_easy_strerror() to get a char* back. I'm not sure what the best way is to get back an error string from curl when (what will be) PM_ERR_LIBCURL is set. could the global handle be used to store the CURLcode (it's just a typedef'd long). - the progress bar does _not_ play well with curl. The bar itself is fine, but the time remaining jumps around a fair bit. You'll see a fair bit of hackery in curl_progress() (lib/libalpm/dload.c) where I'm fighting with the progress bar. Perhaps this belongs in the progress bar voodoo itself? - my autoconf-fu sucks. Big time. I'm not sure what I've done is best practice, but after applying patch 3/4, compiling without extra flags links in curl and leaves out fetch. There is, however, no checking for both --with-curl and --with-fetch being given, which will break the build. Dan (and anyone else), I look forward to your tower of constructive criticism. dave configure.ac | 33 ++++--- lib/libalpm/alpm.c | 13 +-- lib/libalpm/dload.c | 263 ++++++++++++++++++++++++++++++--------------------- 3 files changed, 176 insertions(+), 133 deletions(-)
Signed-off-by: Dave Reisner <d@falconindy.com> --- lib/libalpm/alpm.c | 11 +++++++++++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/lib/libalpm/alpm.c b/lib/libalpm/alpm.c index 3f9cfff..44c513e 100644 --- a/lib/libalpm/alpm.c +++ b/lib/libalpm/alpm.c @@ -27,6 +27,9 @@ #ifdef HAVE_LIBFETCH #include <fetch.h> #endif +#ifdef HAVE_LIBCURL +#include <curl/curl.h> +#endif /* libalpm */ #include "alpm.h" @@ -63,6 +66,10 @@ int SYMEXPORT alpm_initialize(void) fetchConnectionCacheInit(5, 1); #endif +#ifdef HAVE_LIBCURL + curl_global_init(CURL_GLOBAL_NOTHING); +#endif + return(0); } @@ -86,6 +93,10 @@ int SYMEXPORT alpm_release(void) fetchConnectionCacheClose(); #endif +#ifdef HAVE_LIBCURL + curl_global_cleanup(); +#endif + return(0); } -- 1.7.3.4
Signed-off-by: Dave Reisner <d@falconindy.com> --- lib/libalpm/alpm.c | 11 +++++++++++ 1 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/lib/libalpm/alpm.c b/lib/libalpm/alpm.c index 3f9cfff..44c513e 100644 --- a/lib/libalpm/alpm.c +++ b/lib/libalpm/alpm.c @@ -27,6 +27,9 @@ #ifdef HAVE_LIBFETCH #include <fetch.h> #endif +#ifdef HAVE_LIBCURL +#include <curl/curl.h> +#endif
/* libalpm */ #include "alpm.h" @@ -63,6 +66,10 @@ int SYMEXPORT alpm_initialize(void) fetchConnectionCacheInit(5, 1); #endif
+#ifdef HAVE_LIBCURL + curl_global_init(CURL_GLOBAL_NOTHING); SSL functionality would be nice to at least have as an option- seems
On Sun, Jan 2, 2011 at 7:13 PM, Dave Reisner <d@falconindy.com> wrote: like most people init this with _ALL.
+#endif + return(0); }
@@ -86,6 +93,10 @@ int SYMEXPORT alpm_release(void) fetchConnectionCacheClose(); #endif
+#ifdef HAVE_LIBCURL + curl_global_cleanup(); +#endif + return(0); }
-- 1.7.3.4
This does not implement any linkage from current code to the curl based download_file() function. libfetch continues to be used unless configured and compiled explicitly with LDFLAGS/CPPFLAGS set for -lcurl and -DHAVE_LIBCURL, respectively. Signed-off-by: Dave Reisner <d@falconindy.com> --- lib/libalpm/dload.c | 325 +++++++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 313 insertions(+), 12 deletions(-) diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index 09f716f..9154d33 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -39,6 +39,10 @@ #include <fetch.h> #endif +#ifdef HAVE_LIBCURL +#include <curl/curl.h> +#endif + /* libalpm */ #include "dload.h" #include "alpm_list.h" @@ -55,7 +59,6 @@ static char *get_filename(const char *url) { return(filename); } -#ifdef HAVE_LIBFETCH static char *get_destfile(const char *path, const char *filename) { char *destfile; /* len = localpath len + filename len + null */ @@ -76,6 +79,32 @@ static char *get_tempfile(const char *path, const char *filename) { return(tempfile); } +#define check_stop() if(dload_interrupted) { ret = -1; goto cleanup; } +enum sighandlers { OLD = 0, NEW = 1 }; + +int dload_interrupted; +static void inthandler(int signum) +{ + dload_interrupted = 1; +} + +#ifdef HAVE_LIBCURL +static int gethost(const char *url, char *buffer) +{ + char *start, *end; + int ret; + + start = strstr(url, "//") + 2; + end = strchr(start, '/'); + + ret = snprintf(&buffer[0], end - start + 1, "%s", start); + buffer[end - start + 1] = '\0'; + + return(ret); +} +#endif + +#ifdef HAVE_LIBFETCH static const char *gethost(struct url *fileurl) { const char *host = _("disk"); @@ -85,15 +114,6 @@ static const char *gethost(struct url *fileurl) return(host); } -int dload_interrupted; -static void inthandler(int signum) -{ - dload_interrupted = 1; -} - -#define check_stop() if(dload_interrupted) { ret = -1; goto cleanup; } -enum sighandlers { OLD = 0, NEW = 1 }; - static int download_internal(const char *url, const char *localpath, int force) { FILE *localf = NULL; @@ -179,7 +199,7 @@ static int download_internal(const char *url, const char *localpath, if(fetchStat(fileurl, &ust, "") == -1) { pm_errno = PM_ERR_LIBFETCH; _alpm_log(PM_LOG_ERROR, _("failed retrieving file '%s' from %s : %s\n"), - filename, gethost(fileurl), fetchLastErrString); + filename, gethost(url), fetchLastErrString); ret = -1; goto cleanup; } @@ -332,10 +352,291 @@ cleanup: } #endif +#ifdef HAVE_LIBCURL +int curl_progress(void *filename, double dltotal, double dlnow, double ultotal, + double ulnow) +{ + static int done = 0; + + /* unused upload parameters */ + (void)ultotal; + (void)ulnow; + + /* curl calls the progress callback before the first recv() call, but we've + * already "initialized" the progress meter in download_internal. reporting 0 + * twice causes pacman's callback to print multiple debug outputs. However, + * not initializing prior to this callback causes the progress meter to + * display bogus values when downloading a file:// schema url. + */ + if (dlnow == 0) { + return(0); + } + + if (dload_interrupted) { + return(1); /* abort transfer */ + } + + /* curl still reports progress as its closing up the connection. if we keep + * pounding on the callback, we get a lot of extra progress bars. This is a + * horrible hack for a multitude of reasons. + */ + if (dltotal != dlnow) { + done = 0; + } else { + if (done) { + return(0); + } + done = 1; + } + + handle->dlcb((const char*)filename, (long)dlnow, (long)dltotal); + + return(0); +} + +static int download_internal(const char *url, const char *localpath, + int force) +{ + FILE *devnull, *localf = NULL; + struct stat st; + int ret = 0; + double dl_thisfile = 0; + char *tempfile, *destfile, *filename; + char hostname[PATH_MAX]; + struct sigaction sig_pipe[2], sig_int[2]; + + double remote_size; + long local_size = 0; + time_t remote_time, local_time = 0; + + CURLcode curlret; + + CURL *curl = curl_easy_init(); + if(curl == NULL) { + /* XXX: bad things happened here */ + return(-1); + } + + filename = get_filename(url); + if(!filename) { + _alpm_log(PM_LOG_ERROR, _("url '%s' is invalid\n"), url); + RET_ERR(PM_ERR_SERVER_BAD_URL, -1); + } + + gethost(url, &hostname[0]); + + curl_easy_setopt(curl, CURLOPT_URL, url); + + destfile = get_destfile(localpath, filename); + tempfile = get_tempfile(localpath, filename); + + if(stat(tempfile, &st) == 0 && st.st_size > 0) { + _alpm_log(PM_LOG_DEBUG, "tempfile found, attempting continuation\n"); + local_time = st.st_mtime; + local_size = (long)st.st_size; + dl_thisfile = st.st_size; + localf = fopen(tempfile, "ab"); + } else if(!force && stat(destfile, &st) == 0 && st.st_size > 0) { + _alpm_log(PM_LOG_DEBUG, "destfile found, using mtime only\n"); + local_time = st.st_mtime; + local_size = (long)st.st_size; + } else { + _alpm_log(PM_LOG_DEBUG, "no file found matching criteria, starting from scratch\n"); + } + + /* pass the raw filename for passing to the callback function */ + _alpm_log(PM_LOG_DEBUG, "using '%s' for download progress\n", filename); + + /* print proxy info for debug purposes */ + _alpm_log(PM_LOG_DEBUG, "HTTP_PROXY: %s\n", getenv("HTTP_PROXY")); + _alpm_log(PM_LOG_DEBUG, "http_proxy: %s\n", getenv("http_proxy")); + _alpm_log(PM_LOG_DEBUG, "FTP_PROXY: %s\n", getenv("FTP_PROXY")); + _alpm_log(PM_LOG_DEBUG, "ftp_proxy: %s\n", getenv("ftp_proxy")); + + /* set 10s connect timeout */ + curl_easy_setopt(curl, CURLOPT_CONNECTTIMEOUT, 10L); + + /* ignore any SIGPIPE signals- these may occur if our FTP socket dies or + * something along those lines. Store the old signal handler first. */ + sig_pipe[NEW].sa_handler = SIG_IGN; + sigemptyset(&sig_pipe[NEW].sa_mask); + sig_pipe[NEW].sa_flags = 0; + sigaction(SIGPIPE, NULL, &sig_pipe[OLD]); + sigaction(SIGPIPE, &sig_pipe[NEW], NULL); + + dload_interrupted = 0; + sig_int[NEW].sa_handler = &inthandler; + sigemptyset(&sig_int[NEW].sa_mask); + sig_int[NEW].sa_flags = 0; + sigaction(SIGINT, NULL, &sig_int[OLD]); + sigaction(SIGINT, &sig_int[NEW], NULL); + + /* Use /dev/null as a place to dump output during the file stat + * operation. This prevents connection messags from showing during + * ftp transactions and prevents file:// repos from being dumped + * to stdout. CURLOPT_WRITEDATA will be overwritten appropriately + * for the actual sync operation. + */ + devnull = fopen("/dev/null", "wb"); + + /* use a no-body request to get mtime and filesize */ + curl_easy_setopt(curl, CURLOPT_NOBODY, 1L); + curl_easy_setopt(curl, CURLOPT_FILETIME, 1L); + curl_easy_setopt(curl, CURLOPT_WRITEDATA, devnull); + + curlret = curl_easy_perform(curl); + fclose(devnull); + + if(curlret != CURLE_OK) { + pm_errno = PM_ERR_LIBFETCH; + _alpm_log(PM_LOG_ERROR, _("failed retrieving file '%s' from %s : %s\n"), + filename, hostname, curl_easy_strerror(curlret)); + ret = -1; + goto cleanup; + } + + curl_easy_getinfo(curl, CURLINFO_FILETIME, &remote_time); + curl_easy_getinfo(curl, CURLINFO_CONTENT_LENGTH_DOWNLOAD, &remote_size); + check_stop(); + + _alpm_log(PM_LOG_DEBUG, "remote_mtime: %ld local_time: %ld compare: %ld\n", + remote_time, local_time, local_time - remote_time); + _alpm_log(PM_LOG_DEBUG, "remote_size: %jd local_size: %jd compare: %jd\n", + (intmax_t)remote_size, (intmax_t)local_size, (intmax_t)(local_size - remote_size)); + if(!force && remote_time && remote_time == local_time + && remote_size != -1 && remote_size == local_size) { + /* the remote time and size values agreed with what we have, so move on + * because there is nothing more to do. */ + _alpm_log(PM_LOG_DEBUG, "files are identical, skipping %s\n", filename); + ret = 1; + goto cleanup; + } + + if(remote_time == -1 || remote_time != local_time) { + _alpm_log(PM_LOG_DEBUG, "mtimes were different or unavailable, downloading %s from beginning\n", filename); + if (localf) { + fclose(localf); + localf = NULL; + } + } else if((localf && local_size == 0)) { + _alpm_log(PM_LOG_WARNING, _("resuming download of %s not possible; starting over\n"), filename); + fclose(localf); + localf = NULL; + } else if(local_size) { + _alpm_log(PM_LOG_DEBUG, "resuming download at position %jd\n", (intmax_t)local_size); + curl_easy_setopt(curl, CURLOPT_RESUME_FROM, local_size); + } + + if(localf == NULL) { + _alpm_rmrf(tempfile); + dl_thisfile = 0; + localf = fopen(tempfile, "wb"); + if(localf == NULL) { /* still null? */ + pm_errno = PM_ERR_RETRIEVE; + _alpm_log(PM_LOG_ERROR, _("error writing to file '%s': %s\n"), + tempfile, strerror(errno)); + ret = -1; + goto cleanup; + } + } + + curl_easy_setopt(curl, CURLOPT_NOBODY, 0L); + curl_easy_setopt(curl, CURLOPT_WRITEDATA, localf); + curl_easy_setopt(curl, CURLOPT_ENCODING, "deflate, gzip"); + curl_easy_setopt(curl, CURLOPT_NOPROGRESS, 0L); + curl_easy_setopt(curl, CURLOPT_PROGRESSDATA, filename); + curl_easy_setopt(curl, CURLOPT_PROGRESSFUNCTION, curl_progress); + + /* Progress 0 - initialize */ + if(handle->dlcb) { + handle->dlcb(filename, 0, remote_size); + } + + curlret = curl_easy_perform(curl); + + /* check for user cancellation */ + if(curlret == CURLE_ABORTED_BY_CALLBACK) { + goto cleanup; + } + + if(curlret != CURLE_OK) { + pm_errno = PM_ERR_LIBFETCH; /* this needs to change */ + _alpm_log(PM_LOG_ERROR, _("failed retrieving file '%s' from %s : %s\n"), + filename, hostname, curl_easy_strerror(curlret)); + ret = -1; + goto cleanup; + } + + curl_easy_getinfo(curl, CURLINFO_SIZE_DOWNLOAD, &dl_thisfile); + + /* did the transfer complete normally? */ + if(dl_thisfile == 0) { + pm_errno = PM_ERR_RETRIEVE; + _alpm_log(PM_LOG_ERROR, _("failed retrieving file '%s' from %s\n"), + filename, hostname); + ret = -1; + goto cleanup; + } + + if(remote_size != -1 && local_size + dl_thisfile < remote_size) { + pm_errno = PM_ERR_RETRIEVE; + _alpm_log(PM_LOG_ERROR, _("%s appears to be truncated: %jd/%jd bytes\n"), + filename, (intmax_t)(local_size + dl_thisfile), (intmax_t)remote_size); + ret = -1; + goto cleanup; + } + + /* probably safer to close the file descriptors now before renaming the file, + * for example to make sure the buffers are flushed. + */ + fclose(localf); + localf = NULL; + + /* set the times on the file to the same as that of the remote file */ + if(remote_time) { + struct timeval tv[2]; + memset(&tv, 0, sizeof(tv)); + tv[0].tv_sec = tv[1].tv_sec = remote_time; + utimes(tempfile, tv); + } + rename(tempfile, destfile); + ret = 0; + +cleanup: + FREE(tempfile); + FREE(destfile); + if(localf != NULL) { + /* if we still had a local file open, we got interrupted. set the mtimes on + * the file accordingly. */ + fflush(localf); + if(remote_time) { + struct timeval tv[2]; + memset(&tv, 0, sizeof(tv)); + tv[0].tv_sec = tv[1].tv_sec = remote_time; + futimes(fileno(localf), tv); + } + fclose(localf); + } + if(curl != NULL) { + curl_easy_cleanup(curl); + } + + /* restore the old signal handlers */ + sigaction(SIGINT, &sig_int[OLD], NULL); + sigaction(SIGPIPE, &sig_pipe[OLD], NULL); + /* if we were interrupted, trip the old handler */ + if(dload_interrupted) { + raise(SIGINT); + } + + return(ret); +} +#endif + static int download(const char *url, const char *localpath, int force) { if(handle->fetchcb == NULL) { -#ifdef HAVE_LIBFETCH +#if defined(HAVE_LIBFETCH) || defined(HAVE_LIBCURL) return(download_internal(url, localpath, force)); #else RET_ERR(PM_ERR_EXTERNAL_DOWNLOAD, -1); -- 1.7.3.4
On Sun, Jan 2, 2011 at 7:13 PM, Dave Reisner <d@falconindy.com> wrote:
This does not implement any linkage from current code to the curl based download_file() function. libfetch continues to be used unless configured and compiled explicitly with LDFLAGS/CPPFLAGS set for -lcurl and -DHAVE_LIBCURL, respectively.
Signed-off-by: Dave Reisner <d@falconindy.com> Big comments that might make this easier to review, since it is a bit of a monster:
1. If you are just moving functions around to fix #ifdef locations, do that in a separate and preceeding no-functional-changes patch. It would help for the dload_interrupted/signal stuff here for instance. The two (different!) defintions of gethost() that occur is also noise that doesn't help review. 2. The dupe code stinks to have to review- I don't know how easy it would be to refactor some things out (mtime/size/signal/proxy stuff?), but yeah. 3. Follow. The. Coding. Style. You used spaces everywhere for indentation.
--- lib/libalpm/dload.c | 325 +++++++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 313 insertions(+), 12 deletions(-)
diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index 09f716f..9154d33 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -39,6 +39,10 @@ #include <fetch.h> #endif
+#ifdef HAVE_LIBCURL +#include <curl/curl.h> +#endif + /* libalpm */ #include "dload.h" #include "alpm_list.h" @@ -55,7 +59,6 @@ static char *get_filename(const char *url) { return(filename); }
-#ifdef HAVE_LIBFETCH static char *get_destfile(const char *path, const char *filename) { char *destfile; /* len = localpath len + filename len + null */ @@ -76,6 +79,32 @@ static char *get_tempfile(const char *path, const char *filename) { return(tempfile); }
+#define check_stop() if(dload_interrupted) { ret = -1; goto cleanup; } +enum sighandlers { OLD = 0, NEW = 1 }; + +int dload_interrupted; +static void inthandler(int signum) +{ + dload_interrupted = 1; +} + +#ifdef HAVE_LIBCURL +static int gethost(const char *url, char *buffer) +{ + char *start, *end; + int ret; + + start = strstr(url, "//") + 2; + end = strchr(start, '/'); + + ret = snprintf(&buffer[0], end - start + 1, "%s", start); + buffer[end - start + 1] = '\0'; + + return(ret); +} +#endif + +#ifdef HAVE_LIBFETCH static const char *gethost(struct url *fileurl) { const char *host = _("disk"); @@ -85,15 +114,6 @@ static const char *gethost(struct url *fileurl) return(host); }
-int dload_interrupted; -static void inthandler(int signum) -{ - dload_interrupted = 1; -} - -#define check_stop() if(dload_interrupted) { ret = -1; goto cleanup; } -enum sighandlers { OLD = 0, NEW = 1 }; - static int download_internal(const char *url, const char *localpath, int force) { FILE *localf = NULL; @@ -179,7 +199,7 @@ static int download_internal(const char *url, const char *localpath, if(fetchStat(fileurl, &ust, "") == -1) { pm_errno = PM_ERR_LIBFETCH; _alpm_log(PM_LOG_ERROR, _("failed retrieving file '%s' from %s : %s\n"), - filename, gethost(fileurl), fetchLastErrString); + filename, gethost(url), fetchLastErrString); ret = -1; goto cleanup; } @@ -332,10 +352,291 @@ cleanup: } #endif
+#ifdef HAVE_LIBCURL +int curl_progress(void *filename, double dltotal, double dlnow, double ultotal, + double ulnow) +{ + static int done = 0; + + /* unused upload parameters */ + (void)ultotal; + (void)ulnow; + + /* curl calls the progress callback before the first recv() call, but we've + * already "initialized" the progress meter in download_internal. reporting 0 + * twice causes pacman's callback to print multiple debug outputs. However, + * not initializing prior to this callback causes the progress meter to + * display bogus values when downloading a file:// schema url. + */ + if (dlnow == 0) { + return(0); + } + + if (dload_interrupted) { + return(1); /* abort transfer */ + } + + /* curl still reports progress as its closing up the connection. if we keep + * pounding on the callback, we get a lot of extra progress bars. This is a + * horrible hack for a multitude of reasons. + */ + if (dltotal != dlnow) { + done = 0; + } else { + if (done) { + return(0); + } + done = 1; + } + + handle->dlcb((const char*)filename, (long)dlnow, (long)dltotal); + + return(0); +} + +static int download_internal(const char *url, const char *localpath, + int force) +{ + FILE *devnull, *localf = NULL; + struct stat st; + int ret = 0; + double dl_thisfile = 0; + char *tempfile, *destfile, *filename; + char hostname[PATH_MAX]; + struct sigaction sig_pipe[2], sig_int[2]; + + double remote_size; + long local_size = 0; + time_t remote_time, local_time = 0; + + CURLcode curlret; + + CURL *curl = curl_easy_init(); + if(curl == NULL) { + /* XXX: bad things happened here */ + return(-1); + } + + filename = get_filename(url); + if(!filename) { + _alpm_log(PM_LOG_ERROR, _("url '%s' is invalid\n"), url); + RET_ERR(PM_ERR_SERVER_BAD_URL, -1); + } + + gethost(url, &hostname[0]); + + curl_easy_setopt(curl, CURLOPT_URL, url); + + destfile = get_destfile(localpath, filename); + tempfile = get_tempfile(localpath, filename); + + if(stat(tempfile, &st) == 0 && st.st_size > 0) { + _alpm_log(PM_LOG_DEBUG, "tempfile found, attempting continuation\n"); + local_time = st.st_mtime; + local_size = (long)st.st_size; + dl_thisfile = st.st_size; + localf = fopen(tempfile, "ab"); + } else if(!force && stat(destfile, &st) == 0 && st.st_size > 0) { + _alpm_log(PM_LOG_DEBUG, "destfile found, using mtime only\n"); + local_time = st.st_mtime; + local_size = (long)st.st_size; + } else { + _alpm_log(PM_LOG_DEBUG, "no file found matching criteria, starting from scratch\n"); + } + + /* pass the raw filename for passing to the callback function */ + _alpm_log(PM_LOG_DEBUG, "using '%s' for download progress\n", filename); + + /* print proxy info for debug purposes */ + _alpm_log(PM_LOG_DEBUG, "HTTP_PROXY: %s\n", getenv("HTTP_PROXY")); + _alpm_log(PM_LOG_DEBUG, "http_proxy: %s\n", getenv("http_proxy")); + _alpm_log(PM_LOG_DEBUG, "FTP_PROXY: %s\n", getenv("FTP_PROXY")); + _alpm_log(PM_LOG_DEBUG, "ftp_proxy: %s\n", getenv("ftp_proxy")); + + /* set 10s connect timeout */ + curl_easy_setopt(curl, CURLOPT_CONNECTTIMEOUT, 10L); + + /* ignore any SIGPIPE signals- these may occur if our FTP socket dies or + * something along those lines. Store the old signal handler first. */ + sig_pipe[NEW].sa_handler = SIG_IGN; + sigemptyset(&sig_pipe[NEW].sa_mask); + sig_pipe[NEW].sa_flags = 0; + sigaction(SIGPIPE, NULL, &sig_pipe[OLD]); + sigaction(SIGPIPE, &sig_pipe[NEW], NULL); + + dload_interrupted = 0; + sig_int[NEW].sa_handler = &inthandler; + sigemptyset(&sig_int[NEW].sa_mask); + sig_int[NEW].sa_flags = 0; + sigaction(SIGINT, NULL, &sig_int[OLD]); + sigaction(SIGINT, &sig_int[NEW], NULL); + + /* Use /dev/null as a place to dump output during the file stat + * operation. This prevents connection messags from showing during + * ftp transactions and prevents file:// repos from being dumped + * to stdout. CURLOPT_WRITEDATA will be overwritten appropriately + * for the actual sync operation. + */ + devnull = fopen("/dev/null", "wb"); + + /* use a no-body request to get mtime and filesize */ + curl_easy_setopt(curl, CURLOPT_NOBODY, 1L); + curl_easy_setopt(curl, CURLOPT_FILETIME, 1L); + curl_easy_setopt(curl, CURLOPT_WRITEDATA, devnull); This is probably a dumb hack we don't need to do with cURL, although my experience is limited to HTTP so I don't know the FTP case at all. If we set headers correctly we can probably get back to just doing one request.
I attempted something like this for a conky patch, see http://git.omp.am/?p=conky.git;a=commitdiff;h=93c6baebb6c6dcbc5e6e93fb6982a7...
+ + curlret = curl_easy_perform(curl); + fclose(devnull); + + if(curlret != CURLE_OK) { + pm_errno = PM_ERR_LIBFETCH; + _alpm_log(PM_LOG_ERROR, _("failed retrieving file '%s' from %s : %s\n"), + filename, hostname, curl_easy_strerror(curlret)); + ret = -1; + goto cleanup; + } + + curl_easy_getinfo(curl, CURLINFO_FILETIME, &remote_time); + curl_easy_getinfo(curl, CURLINFO_CONTENT_LENGTH_DOWNLOAD, &remote_size); + check_stop(); + + _alpm_log(PM_LOG_DEBUG, "remote_mtime: %ld local_time: %ld compare: %ld\n", + remote_time, local_time, local_time - remote_time); + _alpm_log(PM_LOG_DEBUG, "remote_size: %jd local_size: %jd compare: %jd\n", + (intmax_t)remote_size, (intmax_t)local_size, (intmax_t)(local_size - remote_size)); + if(!force && remote_time && remote_time == local_time + && remote_size != -1 && remote_size == local_size) { + /* the remote time and size values agreed with what we have, so move on + * because there is nothing more to do. */ + _alpm_log(PM_LOG_DEBUG, "files are identical, skipping %s\n", filename); + ret = 1; + goto cleanup; + } + + if(remote_time == -1 || remote_time != local_time) { + _alpm_log(PM_LOG_DEBUG, "mtimes were different or unavailable, downloading %s from beginning\n", filename); + if (localf) { + fclose(localf); + localf = NULL; + } + } else if((localf && local_size == 0)) { + _alpm_log(PM_LOG_WARNING, _("resuming download of %s not possible; starting over\n"), filename); + fclose(localf); + localf = NULL; + } else if(local_size) { + _alpm_log(PM_LOG_DEBUG, "resuming download at position %jd\n", (intmax_t)local_size); + curl_easy_setopt(curl, CURLOPT_RESUME_FROM, local_size); + } + + if(localf == NULL) { + _alpm_rmrf(tempfile); + dl_thisfile = 0; + localf = fopen(tempfile, "wb"); + if(localf == NULL) { /* still null? */ + pm_errno = PM_ERR_RETRIEVE; + _alpm_log(PM_LOG_ERROR, _("error writing to file '%s': %s\n"), + tempfile, strerror(errno)); + ret = -1; + goto cleanup; + } + } + + curl_easy_setopt(curl, CURLOPT_NOBODY, 0L); + curl_easy_setopt(curl, CURLOPT_WRITEDATA, localf); + curl_easy_setopt(curl, CURLOPT_ENCODING, "deflate, gzip"); + curl_easy_setopt(curl, CURLOPT_NOPROGRESS, 0L); + curl_easy_setopt(curl, CURLOPT_PROGRESSDATA, filename); + curl_easy_setopt(curl, CURLOPT_PROGRESSFUNCTION, curl_progress); + + /* Progress 0 - initialize */ + if(handle->dlcb) { + handle->dlcb(filename, 0, remote_size); + } + + curlret = curl_easy_perform(curl); + + /* check for user cancellation */ + if(curlret == CURLE_ABORTED_BY_CALLBACK) { + goto cleanup; + } + + if(curlret != CURLE_OK) { + pm_errno = PM_ERR_LIBFETCH; /* this needs to change */ + _alpm_log(PM_LOG_ERROR, _("failed retrieving file '%s' from %s : %s\n"), + filename, hostname, curl_easy_strerror(curlret)); + ret = -1; + goto cleanup; + } + + curl_easy_getinfo(curl, CURLINFO_SIZE_DOWNLOAD, &dl_thisfile); + + /* did the transfer complete normally? */ + if(dl_thisfile == 0) { + pm_errno = PM_ERR_RETRIEVE; + _alpm_log(PM_LOG_ERROR, _("failed retrieving file '%s' from %s\n"), + filename, hostname); + ret = -1; + goto cleanup; + } + + if(remote_size != -1 && local_size + dl_thisfile < remote_size) { + pm_errno = PM_ERR_RETRIEVE; + _alpm_log(PM_LOG_ERROR, _("%s appears to be truncated: %jd/%jd bytes\n"), + filename, (intmax_t)(local_size + dl_thisfile), (intmax_t)remote_size); + ret = -1; + goto cleanup; + } + + /* probably safer to close the file descriptors now before renaming the file, + * for example to make sure the buffers are flushed. + */ + fclose(localf); + localf = NULL; + + /* set the times on the file to the same as that of the remote file */ + if(remote_time) { + struct timeval tv[2]; + memset(&tv, 0, sizeof(tv)); + tv[0].tv_sec = tv[1].tv_sec = remote_time; + utimes(tempfile, tv); + } + rename(tempfile, destfile); + ret = 0; + +cleanup: + FREE(tempfile); + FREE(destfile); + if(localf != NULL) { + /* if we still had a local file open, we got interrupted. set the mtimes on + * the file accordingly. */ + fflush(localf); + if(remote_time) { + struct timeval tv[2]; + memset(&tv, 0, sizeof(tv)); + tv[0].tv_sec = tv[1].tv_sec = remote_time; + futimes(fileno(localf), tv); + } + fclose(localf); + } + if(curl != NULL) { + curl_easy_cleanup(curl); + } + + /* restore the old signal handlers */ + sigaction(SIGINT, &sig_int[OLD], NULL); + sigaction(SIGPIPE, &sig_pipe[OLD], NULL); + /* if we were interrupted, trip the old handler */ + if(dload_interrupted) { + raise(SIGINT); + } + + return(ret); +} +#endif + static int download(const char *url, const char *localpath, int force) { if(handle->fetchcb == NULL) { -#ifdef HAVE_LIBFETCH +#if defined(HAVE_LIBFETCH) || defined(HAVE_LIBCURL) return(download_internal(url, localpath, force)); #else RET_ERR(PM_ERR_EXTERNAL_DOWNLOAD, -1); -- 1.7.3.4
On Sun, Jan 02, 2011 at 08:35:01PM -0600, Dan McGee wrote:
On Sun, Jan 2, 2011 at 7:13 PM, Dave Reisner <d@falconindy.com> wrote:
This does not implement any linkage from current code to the curl based download_file() function. libfetch continues to be used unless configured and compiled explicitly with LDFLAGS/CPPFLAGS set for -lcurl and -DHAVE_LIBCURL, respectively.
Signed-off-by: Dave Reisner <d@falconindy.com> Big comments that might make this easier to review, since it is a bit of a monster:
1. If you are just moving functions around to fix #ifdef locations, do that in a separate and preceeding no-functional-changes patch. It would help for the dload_interrupted/signal stuff here for instance. The two (different!) defintions of gethost() that occur is also noise that doesn't help review.
Can't blame you here. Will fix.
2. The dupe code stinks to have to review- I don't know how easy it would be to refactor some things out (mtime/size/signal/proxy stuff?), but yeah.
I agree that it's not the most readable, but if eventual goal is to remove libfetch code, this seems like a lot of work for no gain. I'll see what I can though.
3. Follow. The. Coding. Style. You used spaces everywhere for indentation.
Totally stumped on this one. I'm staring at my source files and I'm seeing tabs. I'm not sure how they ended up as spaces, as I merely just peeled off HEAD~4 with git-send-email. d
--- lib/libalpm/dload.c | 325 +++++++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 313 insertions(+), 12 deletions(-)
diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index 09f716f..9154d33 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -39,6 +39,10 @@ #include <fetch.h> #endif
+#ifdef HAVE_LIBCURL +#include <curl/curl.h> +#endif + /* libalpm */ #include "dload.h" #include "alpm_list.h" @@ -55,7 +59,6 @@ static char *get_filename(const char *url) { return(filename); }
-#ifdef HAVE_LIBFETCH static char *get_destfile(const char *path, const char *filename) { char *destfile; /* len = localpath len + filename len + null */ @@ -76,6 +79,32 @@ static char *get_tempfile(const char *path, const char *filename) { return(tempfile); }
+#define check_stop() if(dload_interrupted) { ret = -1; goto cleanup; } +enum sighandlers { OLD = 0, NEW = 1 }; + +int dload_interrupted; +static void inthandler(int signum) +{ + dload_interrupted = 1; +} + +#ifdef HAVE_LIBCURL +static int gethost(const char *url, char *buffer) +{ + char *start, *end; + int ret; + + start = strstr(url, "//") + 2; + end = strchr(start, '/'); + + ret = snprintf(&buffer[0], end - start + 1, "%s", start); + buffer[end - start + 1] = '\0'; + + return(ret); +} +#endif + +#ifdef HAVE_LIBFETCH static const char *gethost(struct url *fileurl) { const char *host = _("disk"); @@ -85,15 +114,6 @@ static const char *gethost(struct url *fileurl) return(host); }
-int dload_interrupted; -static void inthandler(int signum) -{ - dload_interrupted = 1; -} - -#define check_stop() if(dload_interrupted) { ret = -1; goto cleanup; } -enum sighandlers { OLD = 0, NEW = 1 }; - static int download_internal(const char *url, const char *localpath, int force) { FILE *localf = NULL; @@ -179,7 +199,7 @@ static int download_internal(const char *url, const char *localpath, if(fetchStat(fileurl, &ust, "") == -1) { pm_errno = PM_ERR_LIBFETCH; _alpm_log(PM_LOG_ERROR, _("failed retrieving file '%s' from %s : %s\n"), - filename, gethost(fileurl), fetchLastErrString); + filename, gethost(url), fetchLastErrString); ret = -1; goto cleanup; } @@ -332,10 +352,291 @@ cleanup: } #endif
+#ifdef HAVE_LIBCURL +int curl_progress(void *filename, double dltotal, double dlnow, double ultotal, + double ulnow) +{ + static int done = 0; + + /* unused upload parameters */ + (void)ultotal; + (void)ulnow; + + /* curl calls the progress callback before the first recv() call, but we've + * already "initialized" the progress meter in download_internal. reporting 0 + * twice causes pacman's callback to print multiple debug outputs. However, + * not initializing prior to this callback causes the progress meter to + * display bogus values when downloading a file:// schema url. + */ + if (dlnow == 0) { + return(0); + } + + if (dload_interrupted) { + return(1); /* abort transfer */ + } + + /* curl still reports progress as its closing up the connection. if we keep + * pounding on the callback, we get a lot of extra progress bars. This is a + * horrible hack for a multitude of reasons. + */ + if (dltotal != dlnow) { + done = 0; + } else { + if (done) { + return(0); + } + done = 1; + } + + handle->dlcb((const char*)filename, (long)dlnow, (long)dltotal); + + return(0); +} + +static int download_internal(const char *url, const char *localpath, + int force) +{ + FILE *devnull, *localf = NULL; + struct stat st; + int ret = 0; + double dl_thisfile = 0; + char *tempfile, *destfile, *filename; + char hostname[PATH_MAX]; + struct sigaction sig_pipe[2], sig_int[2]; + + double remote_size; + long local_size = 0; + time_t remote_time, local_time = 0; + + CURLcode curlret; + + CURL *curl = curl_easy_init(); + if(curl == NULL) { + /* XXX: bad things happened here */ + return(-1); + } + + filename = get_filename(url); + if(!filename) { + _alpm_log(PM_LOG_ERROR, _("url '%s' is invalid\n"), url); + RET_ERR(PM_ERR_SERVER_BAD_URL, -1); + } + + gethost(url, &hostname[0]); + + curl_easy_setopt(curl, CURLOPT_URL, url); + + destfile = get_destfile(localpath, filename); + tempfile = get_tempfile(localpath, filename); + + if(stat(tempfile, &st) == 0 && st.st_size > 0) { + _alpm_log(PM_LOG_DEBUG, "tempfile found, attempting continuation\n"); + local_time = st.st_mtime; + local_size = (long)st.st_size; + dl_thisfile = st.st_size; + localf = fopen(tempfile, "ab"); + } else if(!force && stat(destfile, &st) == 0 && st.st_size > 0) { + _alpm_log(PM_LOG_DEBUG, "destfile found, using mtime only\n"); + local_time = st.st_mtime; + local_size = (long)st.st_size; + } else { + _alpm_log(PM_LOG_DEBUG, "no file found matching criteria, starting from scratch\n"); + } + + /* pass the raw filename for passing to the callback function */ + _alpm_log(PM_LOG_DEBUG, "using '%s' for download progress\n", filename); + + /* print proxy info for debug purposes */ + _alpm_log(PM_LOG_DEBUG, "HTTP_PROXY: %s\n", getenv("HTTP_PROXY")); + _alpm_log(PM_LOG_DEBUG, "http_proxy: %s\n", getenv("http_proxy")); + _alpm_log(PM_LOG_DEBUG, "FTP_PROXY: %s\n", getenv("FTP_PROXY")); + _alpm_log(PM_LOG_DEBUG, "ftp_proxy: %s\n", getenv("ftp_proxy")); + + /* set 10s connect timeout */ + curl_easy_setopt(curl, CURLOPT_CONNECTTIMEOUT, 10L); + + /* ignore any SIGPIPE signals- these may occur if our FTP socket dies or + * something along those lines. Store the old signal handler first. */ + sig_pipe[NEW].sa_handler = SIG_IGN; + sigemptyset(&sig_pipe[NEW].sa_mask); + sig_pipe[NEW].sa_flags = 0; + sigaction(SIGPIPE, NULL, &sig_pipe[OLD]); + sigaction(SIGPIPE, &sig_pipe[NEW], NULL); + + dload_interrupted = 0; + sig_int[NEW].sa_handler = &inthandler; + sigemptyset(&sig_int[NEW].sa_mask); + sig_int[NEW].sa_flags = 0; + sigaction(SIGINT, NULL, &sig_int[OLD]); + sigaction(SIGINT, &sig_int[NEW], NULL); + + /* Use /dev/null as a place to dump output during the file stat + * operation. This prevents connection messags from showing during + * ftp transactions and prevents file:// repos from being dumped + * to stdout. CURLOPT_WRITEDATA will be overwritten appropriately + * for the actual sync operation. + */ + devnull = fopen("/dev/null", "wb"); + + /* use a no-body request to get mtime and filesize */ + curl_easy_setopt(curl, CURLOPT_NOBODY, 1L); + curl_easy_setopt(curl, CURLOPT_FILETIME, 1L); + curl_easy_setopt(curl, CURLOPT_WRITEDATA, devnull); This is probably a dumb hack we don't need to do with cURL, although my experience is limited to HTTP so I don't know the FTP case at all. If we set headers correctly we can probably get back to just doing one request.
I attempted something like this for a conky patch, see http://git.omp.am/?p=conky.git;a=commitdiff;h=93c6baebb6c6dcbc5e6e93fb6982a7...
This is actually pretty easy. curl has CURLOPT_TIMECONDITION and the associated CURLOPT_TIMEVALUE which can be set for both http and ftp to skip a download if its unnecessary. I'll definitely refactor this.
+ + curlret = curl_easy_perform(curl); + fclose(devnull); + + if(curlret != CURLE_OK) { + pm_errno = PM_ERR_LIBFETCH; + _alpm_log(PM_LOG_ERROR, _("failed retrieving file '%s' from %s : %s\n"), + filename, hostname, curl_easy_strerror(curlret)); + ret = -1; + goto cleanup; + } + + curl_easy_getinfo(curl, CURLINFO_FILETIME, &remote_time); + curl_easy_getinfo(curl, CURLINFO_CONTENT_LENGTH_DOWNLOAD, &remote_size); + check_stop(); + + _alpm_log(PM_LOG_DEBUG, "remote_mtime: %ld local_time: %ld compare: %ld\n", + remote_time, local_time, local_time - remote_time); + _alpm_log(PM_LOG_DEBUG, "remote_size: %jd local_size: %jd compare: %jd\n", + (intmax_t)remote_size, (intmax_t)local_size, (intmax_t)(local_size - remote_size)); + if(!force && remote_time && remote_time == local_time + && remote_size != -1 && remote_size == local_size) { + /* the remote time and size values agreed with what we have, so move on + * because there is nothing more to do. */ + _alpm_log(PM_LOG_DEBUG, "files are identical, skipping %s\n", filename); + ret = 1; + goto cleanup; + } + + if(remote_time == -1 || remote_time != local_time) { + _alpm_log(PM_LOG_DEBUG, "mtimes were different or unavailable, downloading %s from beginning\n", filename); + if (localf) { + fclose(localf); + localf = NULL; + } + } else if((localf && local_size == 0)) { + _alpm_log(PM_LOG_WARNING, _("resuming download of %s not possible; starting over\n"), filename); + fclose(localf); + localf = NULL; + } else if(local_size) { + _alpm_log(PM_LOG_DEBUG, "resuming download at position %jd\n", (intmax_t)local_size); + curl_easy_setopt(curl, CURLOPT_RESUME_FROM, local_size); + } + + if(localf == NULL) { + _alpm_rmrf(tempfile); + dl_thisfile = 0; + localf = fopen(tempfile, "wb"); + if(localf == NULL) { /* still null? */ + pm_errno = PM_ERR_RETRIEVE; + _alpm_log(PM_LOG_ERROR, _("error writing to file '%s': %s\n"), + tempfile, strerror(errno)); + ret = -1; + goto cleanup; + } + } + + curl_easy_setopt(curl, CURLOPT_NOBODY, 0L); + curl_easy_setopt(curl, CURLOPT_WRITEDATA, localf); + curl_easy_setopt(curl, CURLOPT_ENCODING, "deflate, gzip"); + curl_easy_setopt(curl, CURLOPT_NOPROGRESS, 0L); + curl_easy_setopt(curl, CURLOPT_PROGRESSDATA, filename); + curl_easy_setopt(curl, CURLOPT_PROGRESSFUNCTION, curl_progress); + + /* Progress 0 - initialize */ + if(handle->dlcb) { + handle->dlcb(filename, 0, remote_size); + } + + curlret = curl_easy_perform(curl); + + /* check for user cancellation */ + if(curlret == CURLE_ABORTED_BY_CALLBACK) { + goto cleanup; + } + + if(curlret != CURLE_OK) { + pm_errno = PM_ERR_LIBFETCH; /* this needs to change */ + _alpm_log(PM_LOG_ERROR, _("failed retrieving file '%s' from %s : %s\n"), + filename, hostname, curl_easy_strerror(curlret)); + ret = -1; + goto cleanup; + } + + curl_easy_getinfo(curl, CURLINFO_SIZE_DOWNLOAD, &dl_thisfile); + + /* did the transfer complete normally? */ + if(dl_thisfile == 0) { + pm_errno = PM_ERR_RETRIEVE; + _alpm_log(PM_LOG_ERROR, _("failed retrieving file '%s' from %s\n"), + filename, hostname); + ret = -1; + goto cleanup; + } + + if(remote_size != -1 && local_size + dl_thisfile < remote_size) { + pm_errno = PM_ERR_RETRIEVE; + _alpm_log(PM_LOG_ERROR, _("%s appears to be truncated: %jd/%jd bytes\n"), + filename, (intmax_t)(local_size + dl_thisfile), (intmax_t)remote_size); + ret = -1; + goto cleanup; + } + + /* probably safer to close the file descriptors now before renaming the file, + * for example to make sure the buffers are flushed. + */ + fclose(localf); + localf = NULL; + + /* set the times on the file to the same as that of the remote file */ + if(remote_time) { + struct timeval tv[2]; + memset(&tv, 0, sizeof(tv)); + tv[0].tv_sec = tv[1].tv_sec = remote_time; + utimes(tempfile, tv); + } + rename(tempfile, destfile); + ret = 0; + +cleanup: + FREE(tempfile); + FREE(destfile); + if(localf != NULL) { + /* if we still had a local file open, we got interrupted. set the mtimes on + * the file accordingly. */ + fflush(localf); + if(remote_time) { + struct timeval tv[2]; + memset(&tv, 0, sizeof(tv)); + tv[0].tv_sec = tv[1].tv_sec = remote_time; + futimes(fileno(localf), tv); + } + fclose(localf); + } + if(curl != NULL) { + curl_easy_cleanup(curl); + } + + /* restore the old signal handlers */ + sigaction(SIGINT, &sig_int[OLD], NULL); + sigaction(SIGPIPE, &sig_pipe[OLD], NULL); + /* if we were interrupted, trip the old handler */ + if(dload_interrupted) { + raise(SIGINT); + } + + return(ret); +} +#endif + static int download(const char *url, const char *localpath, int force) { if(handle->fetchcb == NULL) { -#ifdef HAVE_LIBFETCH +#if defined(HAVE_LIBFETCH) || defined(HAVE_LIBCURL) return(download_internal(url, localpath, force)); #else RET_ERR(PM_ERR_EXTERNAL_DOWNLOAD, -1); -- 1.7.3.4
To avoid breaking compilation, fetch defaults to 'no', and curl defaults to 'check'. Signed-off-by: Dave Reisner <d@falconindy.com> --- configure.ac | 19 ++++++++++++++++++- 1 files changed, 18 insertions(+), 1 deletions(-) diff --git a/configure.ac b/configure.ac index 5627fb7..3d536cd 100644 --- a/configure.ac +++ b/configure.ac @@ -96,7 +96,12 @@ AC_ARG_WITH(openssl, # Help line for libfetch AC_ARG_WITH(fetch, AS_HELP_STRING([--with-fetch], [use libfetch as an internal downloader]), - [], [with_fetch=check]) + [], [with_fetch=no]) + +# Help line for libcurl +AC_ARG_WITH(curl, + AS_HELP_STRING([--with-curl], [use libcurl as an internal downloader]), + [], [with_curl=check]) # Help line for documentation AC_ARG_ENABLE(doc, @@ -152,6 +157,18 @@ AS_IF([test "x$with_openssl" != "xno"], AC_MSG_RESULT(no)) AM_CONDITIONAL([HAVE_LIBSSL], [test "x$ac_cv_lib_ssl_MD5_Final" = "xyes"]) +# Enable or disable usage of libcurl +AC_MSG_CHECKING(whether to link with libcurl) +AS_IF([test "x$with_curl" != "xno"], + [AC_MSG_RESULT(yes) + AC_CHECK_LIB([curl], [curl_easy_perform], , + [if test "x$with_curl" != "xcheck"; then + AC_MSG_FAILURE([--with-curl was given, but -lcurl was not found]) + fi], + [-lcurl])], + AC_MSG_RESULT(no)) +AM_CONDITIONAL([HAVE_LIBCURL], [test "x$ac_cv_lib_curl_curl_easy_perform" = "xyes"]) + # Enable or disable usage of libfetch AC_MSG_CHECKING(whether to link with libfetch) AS_IF([test "x$with_fetch" != "xno"], -- 1.7.3.4
Signed-off-by: Dave Reisner <d@falconindy.com> --- configure.ac | 22 +---- lib/libalpm/alpm.c | 12 --- lib/libalpm/dload.c | 264 +-------------------------------------------------- 3 files changed, 6 insertions(+), 292 deletions(-) diff --git a/configure.ac b/configure.ac index 3d536cd..23120e3 100644 --- a/configure.ac +++ b/configure.ac @@ -93,11 +93,14 @@ AC_ARG_WITH(openssl, AS_HELP_STRING([--with-openssl], [use OpenSSL crypto implementations instead of internal routines]), [], [with_openssl=check]) +<<<<<<< HEAD # Help line for libfetch AC_ARG_WITH(fetch, AS_HELP_STRING([--with-fetch], [use libfetch as an internal downloader]), [], [with_fetch=no]) +======= +>>>>>>> 62cb945... Perform a fetch-ectomy, removing libfetch entirely # Help line for libcurl AC_ARG_WITH(curl, AS_HELP_STRING([--with-curl], [use libcurl as an internal downloader]), @@ -169,25 +172,6 @@ AS_IF([test "x$with_curl" != "xno"], AC_MSG_RESULT(no)) AM_CONDITIONAL([HAVE_LIBCURL], [test "x$ac_cv_lib_curl_curl_easy_perform" = "xyes"]) -# Enable or disable usage of libfetch -AC_MSG_CHECKING(whether to link with libfetch) -AS_IF([test "x$with_fetch" != "xno"], - [AC_MSG_RESULT(yes) - AC_CHECK_LIB([fetch], [fetchParseURL], , - [if test "x$with_fetch" != "xcheck"; then - AC_MSG_FAILURE([--with-fetch was given, but -lfetch was not found]) - fi], - [-lcrypto -ldl]) - # Check if libfetch supports connnection caching which we use - AS_IF([test "x$ac_cv_lib_fetch_fetchParseURL" = "xyes"], - [AC_CHECK_DECL(fetchConnectionCacheInit, , - AC_MSG_ERROR([libfetch must be version 2.28 or greater]), - [#include <fetch.h>]) - ]) - ], - AC_MSG_RESULT(no)) -AM_CONDITIONAL([HAVE_LIBFETCH], [test "x$ac_cv_lib_fetch_fetchParseURL" = "xyes"]) - # Checks for header files. AC_CHECK_HEADERS([fcntl.h glob.h libintl.h locale.h mntent.h string.h \ sys/ioctl.h sys/mount.h sys/param.h sys/statvfs.h \ diff --git a/lib/libalpm/alpm.c b/lib/libalpm/alpm.c index 44c513e..4d66ca1 100644 --- a/lib/libalpm/alpm.c +++ b/lib/libalpm/alpm.c @@ -23,10 +23,6 @@ #include "config.h" -/* connection caching setup */ -#ifdef HAVE_LIBFETCH -#include <fetch.h> -#endif #ifdef HAVE_LIBCURL #include <curl/curl.h> #endif @@ -62,10 +58,6 @@ int SYMEXPORT alpm_initialize(void) bindtextdomain("libalpm", LOCALEDIR); #endif -#ifdef HAVE_LIBFETCH - fetchConnectionCacheInit(5, 1); -#endif - #ifdef HAVE_LIBCURL curl_global_init(CURL_GLOBAL_NOTHING); #endif @@ -89,10 +81,6 @@ int SYMEXPORT alpm_release(void) _alpm_handle_free(handle); handle = NULL; -#ifdef HAVE_LIBFETCH - fetchConnectionCacheClose(); -#endif - #ifdef HAVE_LIBCURL curl_global_cleanup(); #endif diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c index 9154d33..aad4673 100644 --- a/lib/libalpm/dload.c +++ b/lib/libalpm/dload.c @@ -29,15 +29,7 @@ #include <sys/types.h> #include <sys/stat.h> #include <signal.h> -/* the following two are needed for FreeBSD's libfetch */ -#include <limits.h> /* PATH_MAX */ -#if defined(HAVE_SYS_PARAM_H) -#include <sys/param.h> /* MAXHOSTNAMELEN */ -#endif - -#ifdef HAVE_LIBFETCH -#include <fetch.h> -#endif +#include <limits.h> #ifdef HAVE_LIBCURL #include <curl/curl.h> @@ -59,6 +51,7 @@ static char *get_filename(const char *url) { return(filename); } +#ifdef HAVE_LIBCURL static char *get_destfile(const char *path, const char *filename) { char *destfile; /* len = localpath len + filename len + null */ @@ -88,7 +81,6 @@ static void inthandler(int signum) dload_interrupted = 1; } -#ifdef HAVE_LIBCURL static int gethost(const char *url, char *buffer) { char *start, *end; @@ -102,257 +94,7 @@ static int gethost(const char *url, char *buffer) return(ret); } -#endif - -#ifdef HAVE_LIBFETCH -static const char *gethost(struct url *fileurl) -{ - const char *host = _("disk"); - if(strcmp(SCHEME_FILE, fileurl->scheme) != 0) { - host = fileurl->host; - } - return(host); -} - -static int download_internal(const char *url, const char *localpath, - int force) { - FILE *localf = NULL; - struct stat st; - int ret = 0; - off_t dl_thisfile = 0; - ssize_t nread = 0; - char *tempfile, *destfile, *filename; - struct sigaction sig_pipe[2], sig_int[2]; - - off_t local_size = 0; - time_t local_time = 0; - - struct url *fileurl; - struct url_stat ust; - fetchIO *dlf = NULL; - - char buffer[PM_DLBUF_LEN]; - - filename = get_filename(url); - if(!filename) { - _alpm_log(PM_LOG_ERROR, _("url '%s' is invalid\n"), url); - RET_ERR(PM_ERR_SERVER_BAD_URL, -1); - } - - fileurl = fetchParseURL(url); - if(!fileurl) { - _alpm_log(PM_LOG_ERROR, _("url '%s' is invalid\n"), url); - RET_ERR(PM_ERR_LIBFETCH, -1); - } - destfile = get_destfile(localpath, filename); - tempfile = get_tempfile(localpath, filename); - - if(stat(tempfile, &st) == 0 && st.st_size > 0) { - _alpm_log(PM_LOG_DEBUG, "tempfile found, attempting continuation\n"); - local_time = fileurl->last_modified = st.st_mtime; - local_size = fileurl->offset = (off_t)st.st_size; - dl_thisfile = st.st_size; - localf = fopen(tempfile, "ab"); - } else if(!force && stat(destfile, &st) == 0 && st.st_size > 0) { - _alpm_log(PM_LOG_DEBUG, "destfile found, using mtime only\n"); - local_time = fileurl->last_modified = st.st_mtime; - local_size = /* no fu->off here */ (off_t)st.st_size; - } else { - _alpm_log(PM_LOG_DEBUG, "no file found matching criteria, starting from scratch\n"); - } - - /* pass the raw filename for passing to the callback function */ - _alpm_log(PM_LOG_DEBUG, "using '%s' for download progress\n", filename); - - /* print proxy info for debug purposes */ - _alpm_log(PM_LOG_DEBUG, "HTTP_PROXY: %s\n", getenv("HTTP_PROXY")); - _alpm_log(PM_LOG_DEBUG, "http_proxy: %s\n", getenv("http_proxy")); - _alpm_log(PM_LOG_DEBUG, "FTP_PROXY: %s\n", getenv("FTP_PROXY")); - _alpm_log(PM_LOG_DEBUG, "ftp_proxy: %s\n", getenv("ftp_proxy")); - - /* 10s timeout */ - fetchTimeout = 10; - - /* ignore any SIGPIPE signals- these may occur if our FTP socket dies or - * something along those lines. Store the old signal handler first. */ - sig_pipe[NEW].sa_handler = SIG_IGN; - sigemptyset(&sig_pipe[NEW].sa_mask); - sig_pipe[NEW].sa_flags = 0; - sigaction(SIGPIPE, NULL, &sig_pipe[OLD]); - sigaction(SIGPIPE, &sig_pipe[NEW], NULL); - - dload_interrupted = 0; - sig_int[NEW].sa_handler = &inthandler; - sigemptyset(&sig_int[NEW].sa_mask); - sig_int[NEW].sa_flags = 0; - sigaction(SIGINT, NULL, &sig_int[OLD]); - sigaction(SIGINT, &sig_int[NEW], NULL); - - /* NOTE: libfetch does not reset the error code, be sure to do it before - * calls into the library */ - - /* find out the remote size *and* mtime in one go. there is a lot of - * trouble in trying to do both size and "if-modified-since" logic in a - * non-stat request, so avoid it. */ - fetchLastErrCode = 0; - if(fetchStat(fileurl, &ust, "") == -1) { - pm_errno = PM_ERR_LIBFETCH; - _alpm_log(PM_LOG_ERROR, _("failed retrieving file '%s' from %s : %s\n"), - filename, gethost(url), fetchLastErrString); - ret = -1; - goto cleanup; - } - check_stop(); - - _alpm_log(PM_LOG_DEBUG, "ust.mtime: %ld local_time: %ld compare: %ld\n", - ust.mtime, local_time, local_time - ust.mtime); - _alpm_log(PM_LOG_DEBUG, "ust.size: %jd local_size: %jd compare: %jd\n", - (intmax_t)ust.size, (intmax_t)local_size, (intmax_t)(local_size - ust.size)); - if(!force && ust.mtime && ust.mtime == local_time - && ust.size && ust.size == local_size) { - /* the remote time and size values agreed with what we have, so move on - * because there is nothing more to do. */ - _alpm_log(PM_LOG_DEBUG, "files are identical, skipping %s\n", filename); - ret = 1; - goto cleanup; - } - if(!ust.mtime || ust.mtime != local_time) { - _alpm_log(PM_LOG_DEBUG, "mtimes were different or unavailable, downloading %s from beginning\n", filename); - fileurl->offset = 0; - } - - fetchLastErrCode = 0; - dlf = fetchGet(fileurl, ""); - check_stop(); - - if(fetchLastErrCode != 0 || dlf == NULL) { - pm_errno = PM_ERR_LIBFETCH; - _alpm_log(PM_LOG_ERROR, _("failed retrieving file '%s' from %s : %s\n"), - filename, gethost(fileurl), fetchLastErrString); - ret = -1; - goto cleanup; - } else { - _alpm_log(PM_LOG_DEBUG, "connected to %s successfully\n", fileurl->host); - } - - if(localf && fileurl->offset == 0) { - _alpm_log(PM_LOG_WARNING, _("resuming download of %s not possible; starting over\n"), filename); - fclose(localf); - localf = NULL; - } else if(fileurl->offset) { - _alpm_log(PM_LOG_DEBUG, "resuming download at position %jd\n", (intmax_t)fileurl->offset); - } - - - if(localf == NULL) { - _alpm_rmrf(tempfile); - fileurl->offset = (off_t)0; - dl_thisfile = 0; - localf = fopen(tempfile, "wb"); - if(localf == NULL) { /* still null? */ - pm_errno = PM_ERR_RETRIEVE; - _alpm_log(PM_LOG_ERROR, _("error writing to file '%s': %s\n"), - tempfile, strerror(errno)); - ret = -1; - goto cleanup; - } - } - - /* Progress 0 - initialize */ - if(handle->dlcb) { - handle->dlcb(filename, 0, ust.size); - } - - while((nread = fetchIO_read(dlf, buffer, PM_DLBUF_LEN)) > 0) { - check_stop(); - size_t nwritten = 0; - nwritten = fwrite(buffer, 1, nread, localf); - if((nwritten != (size_t)nread) || ferror(localf)) { - pm_errno = PM_ERR_RETRIEVE; - _alpm_log(PM_LOG_ERROR, _("error writing to file '%s': %s\n"), - tempfile, strerror(errno)); - ret = -1; - goto cleanup; - } - dl_thisfile += nread; - - if(handle->dlcb) { - handle->dlcb(filename, dl_thisfile, ust.size); - } - } - - /* did the transfer complete normally? */ - if (nread == -1) { - /* not PM_ERR_LIBFETCH here because libfetch error string might be empty */ - pm_errno = PM_ERR_RETRIEVE; - _alpm_log(PM_LOG_ERROR, _("failed retrieving file '%s' from %s\n"), - filename, gethost(fileurl)); - ret = -1; - goto cleanup; - } - - if (ust.size != -1 && dl_thisfile < ust.size) { - pm_errno = PM_ERR_RETRIEVE; - _alpm_log(PM_LOG_ERROR, _("%s appears to be truncated: %jd/%jd bytes\n"), - filename, (intmax_t)dl_thisfile, (intmax_t)ust.size); - ret = -1; - goto cleanup; - } - - /* probably safer to close the file descriptors now before renaming the file, - * for example to make sure the buffers are flushed. - */ - fclose(localf); - localf = NULL; - fetchIO_close(dlf); - dlf = NULL; - - /* set the times on the file to the same as that of the remote file */ - if(ust.mtime) { - struct timeval tv[2]; - memset(&tv, 0, sizeof(tv)); - tv[0].tv_sec = ust.atime; - tv[1].tv_sec = ust.mtime; - utimes(tempfile, tv); - } - rename(tempfile, destfile); - ret = 0; - -cleanup: - FREE(tempfile); - FREE(destfile); - if(localf != NULL) { - /* if we still had a local file open, we got interrupted. set the mtimes on - * the file accordingly. */ - fflush(localf); - if(ust.mtime) { - struct timeval tv[2]; - memset(&tv, 0, sizeof(tv)); - tv[0].tv_sec = ust.atime; - tv[1].tv_sec = ust.mtime; - futimes(fileno(localf), tv); - } - fclose(localf); - } - if(dlf != NULL) { - fetchIO_close(dlf); - } - fetchFreeURL(fileurl); - - /* restore the old signal handlers */ - sigaction(SIGINT, &sig_int[OLD], NULL); - sigaction(SIGPIPE, &sig_pipe[OLD], NULL); - /* if we were interrupted, trip the old handler */ - if(dload_interrupted) { - raise(SIGINT); - } - - return(ret); -} -#endif - -#ifdef HAVE_LIBCURL int curl_progress(void *filename, double dltotal, double dlnow, double ultotal, double ulnow) { @@ -636,7 +378,7 @@ cleanup: static int download(const char *url, const char *localpath, int force) { if(handle->fetchcb == NULL) { -#if defined(HAVE_LIBFETCH) || defined(HAVE_LIBCURL) +#ifdef HAVE_LIBCURL return(download_internal(url, localpath, force)); #else RET_ERR(PM_ERR_EXTERNAL_DOWNLOAD, -1); -- 1.7.3.4
On 03/01/11 11:13, Dave Reisner wrote:
Signed-off-by: Dave Reisner<d@falconindy.com> --- configure.ac | 22 +---- lib/libalpm/alpm.c | 12 --- lib/libalpm/dload.c | 264 +-------------------------------------------------- 3 files changed, 6 insertions(+), 292 deletions(-)
diff --git a/configure.ac b/configure.ac index 3d536cd..23120e3 100644 --- a/configure.ac +++ b/configure.ac @@ -93,11 +93,14 @@ AC_ARG_WITH(openssl, AS_HELP_STRING([--with-openssl], [use OpenSSL crypto implementations instead of internal routines]), [], [with_openssl=check])
+<<<<<<< HEAD # Help line for libfetch AC_ARG_WITH(fetch, AS_HELP_STRING([--with-fetch], [use libfetch as an internal downloader]), [], [with_fetch=no])
+======= +>>>>>>> 62cb945... Perform a fetch-ectomy, removing libfetch entirely
Looks like you did not finish a merge here. Allan
Greetings,
I've been working on and off on a curl-based internal downloader to replace the libfetch based implementation. Why?
- libcurl is more widely used in the arch repos. fetch is used for one and only package: pacman. - support for compression over http connections. I see proposals for adding this to fetch, but never any acceptance. Please correct me if I'm mistaken. - personal bias. not really a reason, but I think the implementation is cleaner and more readable than the fetch equivalent. - dan wanted me to. (the devil made me do it?)
In it's current form, I'd say this patch set is 90% complete. I've been using it on top of pacman-git for about a week now and haven't come across any functional issues. I've done a fair bit of testing with ftp, http, and file based sync'ing.
So what's remaining?
- setting pm_errno: libfetch provides a global variable fetchLastErrString, whereas curl returns error codes (CURLcode) from operations and you feed them to curl_easy_strerror() to get a char* back. I'm not sure what the best way is to get back an error string from curl when (what will be) PM_ERR_LIBCURL is set. could the global handle be used to store the CURLcode (it's just a typedef'd long). Yeah, on the handle probably works although that seems like a slight crosscutting concern. Of course I don't have a much more spectacular idea. The other option is to call easy_strerror() immediately after a cURL error occurs and store a pointer to the string there instead of
On Sun, Jan 2, 2011 at 7:13 PM, Dave Reisner <d@falconindy.com> wrote: the error code- but not sure of the memory conventions surrounding that and if you need to free the string, who knows what. And it doesn't matter one iota what the typedef is, because you are going to use their type of course. :)
- the progress bar does _not_ play well with curl. The bar itself is fine, but the time remaining jumps around a fair bit. You'll see a fair bit of hackery in curl_progress() (lib/libalpm/dload.c) where I'm fighting with the progress bar. Perhaps this belongs in the progress bar voodoo itself?
- my autoconf-fu sucks. Big time. I'm not sure what I've done is best practice, but after applying patch 3/4, compiling without extra flags links in curl and leaves out fetch. There is, however, no checking for both --with-curl and --with-fetch being given, which will break the build. Fixing this would be kinda cool so we can at least test both side by side for a while; but I haven't looked near close enough to see what is going on. I'd probably start by confusing yourself less and renaming them to download_fetch and download_curl or something, rather than the ifdef madness and only having one of them at a time. And you could shoehorn another "temporary" patch in here that checked an env var or something to see which internal function to use.
Dan (and anyone else), I look forward to your tower of constructive criticism.
dave
configure.ac | 33 ++++--- lib/libalpm/alpm.c | 13 +-- lib/libalpm/dload.c | 263 ++++++++++++++++++++++++++++++--------------------- 3 files changed, 176 insertions(+), 133 deletions(-)
participants (3)
-
Allan McRae
-
Dan McGee
-
Dave Reisner