[pacman-dev] [PATCH 0/9] extract ini parser from parseconfig
This creates a generic ini parser than can be reused for hooks or other configuration files. I took a very different approach from the previous patchset by utilizing a callback that processes the options as the file is parsed. This should have a much lower memory footprint and is closer to how we were already parsing config files. Parsing differences: * Include directives can be outside a section. They can include their own sections so it made little sense to require that they be in one already. * Errors in included files are now fatal. Andrew Gregory (9): conf.c: add parse_options to section_t conf.c: move repo parsing out of _parseconfig conf.c: move directive parsing out of _parseconfig conf.c: move section handling out of _parseconfig conf.c: pass _parse_directive as a callback conf.c: extract ini parsing code to separate files ini.c: move recursion limit to a macro ini.c: reuse line buffer ini.c: make errors in includes fatal src/pacman/Makefile.am | 1 + src/pacman/conf.c | 257 ++++++++++++++----------------------------------- src/pacman/ini.c | 217 +++++++++++++++++++++++++++++++++++++++++ src/pacman/ini.h | 30 ++++++ 4 files changed, 319 insertions(+), 186 deletions(-) create mode 100644 src/pacman/ini.c create mode 100644 src/pacman/ini.h -- 1.8.3.3
This consolidates all of our state information into a single variable.
Signed-off-by: Andrew Gregory
Signed-off-by: Andrew Gregory
Include directives no longer have to be within a section.
Signed-off-by: Andrew Gregory
_parseconfig now tracks the current section name directly so that the
name stored in the section struct is just a pointer to the one stored by
_parseconfig.
Signed-off-by: Andrew Gregory
This will allow passing arbitrary key/value handlers.
Signed-off-by: Andrew Gregory
On 22/07/13 16:46, Andrew Gregory wrote:
This will allow passing arbitrary key/value handlers.
I have now gone through this entire patchset and this is the only patch that confuses me. What do we gain by doing this? Are you envisioning having _parse_directive and _parse_hook or something? Will that be needed? Allan
Signed-off-by: Andrew Gregory
--- src/pacman/conf.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/src/pacman/conf.c b/src/pacman/conf.c index ab8706c..704ce13 100644 --- a/src/pacman/conf.c +++ b/src/pacman/conf.c @@ -841,8 +841,9 @@ cleanup: }
static int _parse_directive(const char *file, int linenum, const char *name, - char *key, char *value, struct section_t *section) + char *key, char *value, void *data) { + struct section_t *section = data; if(!key && !value) { int ret = finish_section(section); pm_printf(ALPM_LOG_DEBUG, "config: new section '%s'\n", name); @@ -872,6 +873,9 @@ static int _parse_directive(const char *file, int linenum, const char *name, return 0; }
+typedef int (ini_parser_fn)(const char *file, int line, const char *section, + char *key, char *value, void *data); + /** The "real" parseconfig. Each "Include" directive will recall this method so * recursion and stack depth are limited to 10 levels. The publicly visible * parseconfig calls this with a NULL section argument so we can recall from @@ -881,7 +885,7 @@ static int _parse_directive(const char *file, int linenum, const char *name, * @param depth the current recursion depth * @return 0 on success, 1 on failure */ -static int _parseconfig(const char *file, struct section_t *section, +static int _parseconfig(const char *file, ini_parser_fn cb, void *data, char **section_name, int depth) { FILE *fp = NULL; @@ -936,7 +940,7 @@ static int _parseconfig(const char *file, struct section_t *section, name = strdup(line + 1); name[line_len - 2] = '\0';
- ret = _parse_directive(file, linenum, name, NULL, NULL, section); + ret = cb(file, linenum, name, NULL, NULL, data); free(*section_name); *section_name = name;
@@ -994,7 +998,7 @@ static int _parseconfig(const char *file, struct section_t *section, for(gindex = 0; gindex < globbuf.gl_pathc; gindex++) { pm_printf(ALPM_LOG_DEBUG, "config file %s, line %d: including %s\n", file, linenum, globbuf.gl_pathv[gindex]); - _parseconfig(globbuf.gl_pathv[gindex], section, + _parseconfig(globbuf.gl_pathv[gindex], cb, data, section_name, depth + 1); } break; @@ -1002,13 +1006,13 @@ static int _parseconfig(const char *file, struct section_t *section, globfree(&globbuf); continue; } - if((ret = _parse_directive(file, linenum, *section_name, key, value, section)) != 0) { + if((ret = cb(file, linenum, *section_name, key, value, data)) != 0) { goto cleanup; } }
if(depth == 0) { - ret = _parse_directive(NULL, 0, NULL, NULL, NULL, section); + ret = cb(NULL, 0, NULL, NULL, NULL, data); }
cleanup: @@ -1041,7 +1045,7 @@ int parseconfig(const char *file) /* call the real parseconfig function with a null section & db argument */ pm_printf(ALPM_LOG_DEBUG, "parseconfig: options pass\n"); section.parse_options = 1; - if((ret = _parseconfig(file, §ion, §ion_name, 0))) { + if((ret = _parseconfig(file, _parse_directive, §ion, §ion_name, 0))) { return ret; } if((ret = setup_libalpm())) { @@ -1050,7 +1054,7 @@ int parseconfig(const char *file) /* second pass, repo section parsing */ pm_printf(ALPM_LOG_DEBUG, "parseconfig: repo pass\n"); section.parse_options = 0; - return _parseconfig(file, §ion, §ion_name, 0); + return _parseconfig(file, _parse_directive, §ion, §ion_name, 0); }
/* vim: set ts=2 sw=2 noet: */
On 07/29/13 at 02:18pm, Allan McRae wrote:
On 22/07/13 16:46, Andrew Gregory wrote:
This will allow passing arbitrary key/value handlers.
I have now gone through this entire patchset and this is the only patch that confuses me. What do we gain by doing this? Are you envisioning having _parse_directive and _parse_hook or something? Will that be needed?
Allan
Yes, _parse_hook is exactly what I was envisioning. This patch completely decouples the ini parser from the handling of the key/value pairs, making it easy to reuse for any file that uses our format simply by passing a different key/value handler. Aside from hooks, I also hope to eventually make the parser available to other pacman-related projects so that they can all easily use the same config format. apg
Signed-off-by: Andrew Gregory
--- src/pacman/conf.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/src/pacman/conf.c b/src/pacman/conf.c index ab8706c..704ce13 100644 --- a/src/pacman/conf.c +++ b/src/pacman/conf.c @@ -841,8 +841,9 @@ cleanup: }
static int _parse_directive(const char *file, int linenum, const char *name, - char *key, char *value, struct section_t *section) + char *key, char *value, void *data) { + struct section_t *section = data; if(!key && !value) { int ret = finish_section(section); pm_printf(ALPM_LOG_DEBUG, "config: new section '%s'\n", name); @@ -872,6 +873,9 @@ static int _parse_directive(const char *file, int linenum, const char *name, return 0; }
+typedef int (ini_parser_fn)(const char *file, int line, const char *section, + char *key, char *value, void *data); + /** The "real" parseconfig. Each "Include" directive will recall this method so * recursion and stack depth are limited to 10 levels. The publicly visible * parseconfig calls this with a NULL section argument so we can recall from @@ -881,7 +885,7 @@ static int _parse_directive(const char *file, int linenum, const char *name, * @param depth the current recursion depth * @return 0 on success, 1 on failure */ -static int _parseconfig(const char *file, struct section_t *section, +static int _parseconfig(const char *file, ini_parser_fn cb, void *data, char **section_name, int depth) { FILE *fp = NULL; @@ -936,7 +940,7 @@ static int _parseconfig(const char *file, struct section_t *section, name = strdup(line + 1); name[line_len - 2] = '\0';
- ret = _parse_directive(file, linenum, name, NULL, NULL, section); + ret = cb(file, linenum, name, NULL, NULL, data); free(*section_name); *section_name = name;
@@ -994,7 +998,7 @@ static int _parseconfig(const char *file, struct section_t *section, for(gindex = 0; gindex < globbuf.gl_pathc; gindex++) { pm_printf(ALPM_LOG_DEBUG, "config file %s, line %d: including %s\n", file, linenum, globbuf.gl_pathv[gindex]); - _parseconfig(globbuf.gl_pathv[gindex], section, + _parseconfig(globbuf.gl_pathv[gindex], cb, data, section_name, depth + 1); } break; @@ -1002,13 +1006,13 @@ static int _parseconfig(const char *file, struct section_t *section, globfree(&globbuf); continue; } - if((ret = _parse_directive(file, linenum, *section_name, key, value, section)) != 0) { + if((ret = cb(file, linenum, *section_name, key, value, data)) != 0) { goto cleanup; } }
if(depth == 0) { - ret = _parse_directive(NULL, 0, NULL, NULL, NULL, section); + ret = cb(NULL, 0, NULL, NULL, NULL, data); }
cleanup: @@ -1041,7 +1045,7 @@ int parseconfig(const char *file) /* call the real parseconfig function with a null section & db argument */ pm_printf(ALPM_LOG_DEBUG, "parseconfig: options pass\n"); section.parse_options = 1; - if((ret = _parseconfig(file, §ion, §ion_name, 0))) { + if((ret = _parseconfig(file, _parse_directive, §ion, §ion_name, 0))) { return ret; } if((ret = setup_libalpm())) { @@ -1050,7 +1054,7 @@ int parseconfig(const char *file) /* second pass, repo section parsing */ pm_printf(ALPM_LOG_DEBUG, "parseconfig: repo pass\n"); section.parse_options = 0; - return _parseconfig(file, §ion, §ion_name, 0); + return _parseconfig(file, _parse_directive, §ion, §ion_name, 0); }
/* vim: set ts=2 sw=2 noet: */
On 29/07/13 23:19, Andrew Gregory wrote:
On 07/29/13 at 02:18pm, Allan McRae wrote:
On 22/07/13 16:46, Andrew Gregory wrote:
This will allow passing arbitrary key/value handlers.
I have now gone through this entire patchset and this is the only patch that confuses me. What do we gain by doing this? Are you envisioning having _parse_directive and _parse_hook or something? Will that be needed?
Allan
Yes, _parse_hook is exactly what I was envisioning. This patch completely decouples the ini parser from the handling of the key/value pairs, making it easy to reuse for any file that uses our format simply by passing a different key/value handler. Aside from hooks, I also hope to eventually make the parser available to other pacman-related projects so that they can all easily use the same config format.
Ah - this all makes sense to me now, and I think it would have earlier if _parse_directive had not been such a generic name that I thought it to be used for every ini parsing. Of course, looking at the code, it is not. I was going to suggest a less generic name, but once _parseconfig is moved and turned into parse_ini at next, and noting it is in conf.c, it is really quite clear. So the patch is all good to me. Allan
Move _parseconfig to ini.c as _parse_ini and create a convenient wrapper
for the public API.
Signed-off-by: Andrew Gregory
Signed-off-by: Andrew Gregory
On Jul 22, 2013 2:48 AM, "Andrew Gregory"
Signed-off-by: Andrew Gregory
---
OK... I'll bite. Why? Is this really going to be used elsewhere? Seems like an error case that should be handled by the like any other.
src/pacman/ini.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/src/pacman/ini.c b/src/pacman/ini.c index 4d25f58..1c37f5c 100644 --- a/src/pacman/ini.c +++ b/src/pacman/ini.c @@ -27,6 +27,8 @@ #include "ini.h" #include "util.h"
+#define INI_MAX_RECURSION 10 + /** * @brief INI parser backend. * @@ -46,11 +48,11 @@ static int _parse_ini(const char *file, ini_parser_fn cb, void *data, char line[PATH_MAX]; int linenum = 0; int ret = 0; - const int max_depth = 10;
- if(depth >= max_depth) { + if(INI_MAX_RECURSION && depth >= INI_MAX_RECURSION) { pm_printf(ALPM_LOG_ERROR, - _("config parsing exceeded max recursion depth of %d.\n"), max_depth); + _("config parsing exceeded max recursion depth of %d.\n"), + INI_MAX_RECURSION); ret = 1; goto cleanup; } -- 1.8.3.3
On 07/22/13 at 08:41am, Dave Reisner wrote:
On Jul 22, 2013 2:48 AM, "Andrew Gregory"
wrote: Signed-off-by: Andrew Gregory
--- OK... I'll bite. Why? Is this really going to be used elsewhere? Seems like an error case that should be handled by the like any other.
My goal wasn't reusability so much as visibility. It's an artificial limitation we impose to prevent memory exhaustion in a recursive call, not a true error condition. I also did this with the thought of eventually trying to make the parser usable for pacman-related programs outside of pacman proper and making the limit configurable. I don't know if a macro is the best way to do that, though. apg
On Mon, Jul 22, 2013 at 10:58:55AM -0400, Andrew Gregory wrote:
On 07/22/13 at 08:41am, Dave Reisner wrote:
On Jul 22, 2013 2:48 AM, "Andrew Gregory"
wrote: Signed-off-by: Andrew Gregory
--- OK... I'll bite. Why? Is this really going to be used elsewhere? Seems like an error case that should be handled by the like any other.
My goal wasn't reusability so much as visibility. It's an artificial limitation we impose to prevent memory exhaustion in a recursive call, not a true error condition. I also did this with the thought of eventually trying to make the parser usable for pacman-related programs outside of pacman proper and making the limit configurable. I don't know if a macro is the best way to do that, though.
apg
Although minor, I'm against this. If you want this to be extensible and useful outside of pacman proper (and this is indeed a noble goal), then you need to create an opaque context that users would instantiate similar to alpm_handle_t. The recursion limit would have get/set methods exposed in the API.
On 23/07/13 01:02, Dave Reisner wrote:
On Mon, Jul 22, 2013 at 10:58:55AM -0400, Andrew Gregory wrote:
On 07/22/13 at 08:41am, Dave Reisner wrote:
On Jul 22, 2013 2:48 AM, "Andrew Gregory"
wrote: Signed-off-by: Andrew Gregory
--- OK... I'll bite. Why? Is this really going to be used elsewhere? Seems like an error case that should be handled by the like any other.
My goal wasn't reusability so much as visibility. It's an artificial limitation we impose to prevent memory exhaustion in a recursive call, not a true error condition. I also did this with the thought of eventually trying to make the parser usable for pacman-related programs outside of pacman proper and making the limit configurable. I don't know if a macro is the best way to do that, though.
apg
Although minor, I'm against this.
If you want this to be extensible and useful outside of pacman proper (and this is indeed a noble goal), then you need to create an opaque context that users would instantiate similar to alpm_handle_t. The recursion limit would have get/set methods exposed in the API.
I do like the idea of increasing the visibility of this limitation. How about instead of using a macro, just make it a "static const int" at the top of the file? A
The recursion limit is an artificial limitation imposed to prevent
memory exhaustion in a recursive function. Giving it file-level scope
increases its visibility.
Signed-off-by: Andrew Gregory
By the time we make the recursive call we have already finished with the
line buffer, making it safe to reuse.
Signed-off-by: Andrew Gregory
If an error in the main file would be fatal there is little reason to
ignore the error in an included file.
Signed-off-by: Andrew Gregory
On 22/07/13 16:46, Andrew Gregory wrote:
This creates a generic ini parser than can be reused for hooks or other configuration files. I took a very different approach from the previous patchset by utilizing a callback that processes the options as the file is parsed. This should have a much lower memory footprint and is closer to how we were already parsing config files.
Parsing differences: * Include directives can be outside a section. They can include their own sections so it made little sense to require that they be in one already. * Errors in included files are now fatal.
These patches look fine to me and it sets us up for parsing ini based hook configuration files. Any other comments before I commit these? Allan
participants (3)
-
Allan McRae
-
Andrew Gregory
-
Dave Reisner