• Home   /  
  • Archive by category "1"

With Expected At Least One Variable Assignment In Python

Created on 2010-07-22 22:15 by andersk, last changed 2018-01-24 00:35 by paul.j3.

msg111221 - (view)Author: Anders Kaseorg (andersk) *Date: 2010-07-22 22:15
Porting the a2x program to argparse from the now-deprecated optparse subtly breaks it when certain options are passed: $ a2x --asciidoc-opts --safe gitcli.txt $ ./a2x.argparse --asciidoc-opts --safe gitcli.txt usage: a2x [-h] [--version] [-a ATTRIBUTE] [--asciidoc-opts ASCIIDOC_OPTS] [--copy] [--conf-file CONF_FILE] [-D PATH] [-d DOCTYPE] [--epubcheck] [-f FORMAT] [--icons] [--icons-dir PATH] [-k] [--lynx] [-L] [-n] [-r PATH] [-s] [--stylesheet STYLESHEET] [--safe] [--dblatex-opts DBLATEX_OPTS] [--fop] [--fop-opts FOP_OPTS] [--xsltproc-opts XSLTPROC_OPTS] [-v] a2x: error: argument --asciidoc-opts: expected one argument Apparently argparse uses a heuristic to try to guess whether an argument looks like an argument or an option, going so far as to check whether it looks like a negative number (!). It should _never_ guess: the option was specified to take an argument, so the following argument should always be parsed as an argument. Small test case: >>> import optparse >>> parser = optparse.OptionParser(prog='a2x') >>> parser.add_option('--asciidoc-opts', ... action='store', dest='asciidoc_opts', default='', ... metavar='ASCIIDOC_OPTS', help='asciidoc options') >>> parser.parse_args(['--asciidoc-opts', '--safe']) (<Values at 0x7f585142ef80: {'asciidoc_opts': '--safe'}>, []) >>> import argparse >>> parser = argparse.ArgumentParser(prog='a2x') >>> parser.add_argument('--asciidoc-opts', ... action='store', dest='asciidoc_opts', default='', ... metavar='ASCIIDOC_OPTS', help='asciidoc options') >>> parser.parse_args(['--asciidoc-opts', '--safe']) usage: a2x [-h] [--asciidoc-opts ASCIIDOC_OPTS] a2x: error: argument --asciidoc-opts: expected one argument
msg111224 - (view)Author: R. David Murray (r.david.murray) *Date: 2010-07-22 23:06
It seems like reasonable request to me to be able to allow such arguments, especially since optparse did and we want people to be able to use argparse as a replacement. Though in general I find argparse's default behavior more useful. Since argparse has been released, I'm thinking this still has to be a feature request, since argparse is *not* a drop-in replacement for optparse.
msg111227 - (view)Author: Nelson Elhage (nelhage)Date: 2010-07-22 23:40
For what it's worth, I have trouble seeing this as anything but a bug. I understand the motivation of trying to catch user errors, but in doing so, you're breaking with the behavior of every other option parsing library that I'm aware of, in favor of an arbitrary heuristic that sometimes guesses wrong. That's not the kind of behavior I expect from my Python libraries; I want them to do what I ask them to, not try to guess what I probably meant.
msg111228 - (view)Author: Anders Kaseorg (andersk) *Date: 2010-07-22 23:43
> Though in general I find argparse's default behavior more useful. I’m not sure I understand. Why is it useful for an option parsing library to heuristically decide, by default, that I didn’t actually want to pass in the valid option that I passed in? Shouldn’t that be up to the caller (or up to the program, if it explicitly decides to reject such arguments)? Keep in mind that the caller might be another script instead of a user.
msg111230 - (view)Author: R. David Murray (r.david.murray) *Date: 2010-07-23 00:51
Well, even if you call it a bug, it would be an argparse design bug, and design bug fixes are feature requests from a procedural point of view.
msg111279 - (view)Author: Steven Bethard (bethard) *Date: 2010-07-23 11:21
Note that the negative number heuristic you're complaining about doesn't actually affect your code below. The negative number heuristic is only used when you have some options that look like negative numbers. See the docs for more information: http://docs.python.org/library/argparse.html#arguments-containing Your problem is that you want "--safe" to be treated as a positional argument even though you've declared it as an option. Basically there are two reasonable interpretations of this situation. Consider something like "--conf-file --safe". Either the user wants a conf file named "--safe", or the user accidentally forgot to type the name of the conf file. Argparse assumes the latter, though either one is conceivable. Argparse assumes the latter because, while it occasionally throws an unnecessary exception, the other behavior would allow an error to pass silently. I'm definitely opposed to changing the default behavior to swallow some errors silently. If you'd like to propose an API for enabling such behavior explicitly and supply a patch and tests implementing it, I'll be happy to review it though.
msg111367 - (view)Author: Anders Kaseorg (andersk) *Date: 2010-07-23 17:39
> Note that the negative number heuristic you're complaining about > doesn't actually affect your code below. Yes it does: >>> import argparse >>> parser = argparse.ArgumentParser(prog='a2x') >>> parser.add_argument('--asciidoc-opts', ... action='store', dest='asciidoc_opts', default='', ... metavar='ASCIIDOC_OPTS', help='asciidoc options') >>> parser.parse_args(['--asciidoc-opts', '-1']) Namespace(asciidoc_opts='-1') >>> parser.parse_args(['--asciidoc-opts', '-one']) usage: a2x [-h] [--asciidoc-opts ASCIIDOC_OPTS] a2x: error: argument --asciidoc-opts: expected one argument > Your problem is that you want "--safe" to be treated as a positional > argument even though you've declared it as an option. No, it doesn’t matter whether --safe was declared as an option: argparse rejected it on the basis of beginning with a dash (as I demonstrated in my small test case, which did not declare --safe as an option, and again in the example above with -one). > Either the user wants a conf file named "--safe", or the user > accidentally forgot to type the name of the conf file. But it’s not argparse’s job to decide that the valid option I passed was actually a typo for something invalid. This would be like Python rejecting the valid call shell = "bash" p = subprocess.Popen(shell) just because shell happens to also be a valid keyword argument for the Popen constructor and I might have forgotten to specify its value. Including these special heuristics by default, that (1) are different from the standard behavior of all other option parsing libraries and (2) interfere with the ability to pass certain valid options, only leads to strange inconsistencies between command line programs written in different languages, and ultimately makes the command line harder to use for everyone. The default behavior should be the standard one.
msg111669 - (view)Author: Steven Bethard (bethard) *Date: 2010-07-26 21:53
I still disagree. You're giving the parser ambiguous input. If a parser sees "--foo --bar", and "--foo" is a valid option, but "--bar" is not, this is a legitimately ambiguous situation. Either the user really wanted "--bar", and the parser doesn't support it, or the "--bar" was meant to be the argument to the "--foo" flag. At this point, the parser must make an arbitrary decision, and argparse chooses the interpretation that the user wanted the "--bar" flag. I understand that you have a good use case for the other interpretation. That's why I suggest you come up with a patch that allows this other interpretation to be enabled when necessary. Changing the default behavior is really a non-starter unless you can propose a sensible transition strategy (as is always necessary for changing APIs in backwards incompatible ways).
msg111670 - (view)Author: Anders Kaseorg (andersk) *Date: 2010-07-26 22:43
> I still disagree. You're giving the parser ambiguous input. If a > parser sees "--foo --bar", and "--foo" is a valid option, but "--bar" > is not, this is a legitimately ambiguous situation. There is no ambiguity. According to the way that every standard option parsing library has worked for decades, the parser knows that --foo takes an argument, so the string after --foo is in a different grammatical context than options are, and is automatically interpreted as an argument to --foo. (It doesn’t matter whether that string begins with a dash, is a valid argument, might become a valid argument in some future version, looks like a negative number, or any other such condition.) arguments = *(positional-argument / option) [-- *(positional-argument)] positional-argument = string option = foo-option / bar-option foo-option = "--foo" string bar-option = "--bar" This is just like how variable names in Python are in a different grammatical position than keyword argument names, so that Popen(shell) is not confused with Popen(shell=True). This is not ambiguity; it simply follows from the standard definition of the grammar. argparse’s alternative interpretation of that string as another option does not make sense because it violates the requirement that --foo has been defined to take an argument. The only justification for considering that input ambiguous is if you start assuming that argparse knows better than the user (“the user accidentally forgot to type the name of the conf file”) and try to guess what they meant. This violates the user’s expectations of how the command line should work. It also creates subtle bugs in scripts that call argparse-based programs (think about call(["program", "--foo", foo_argument]) where foo_argument comes from some complex computation or even untrusted network input). > Changing the default behavior is really a non-starter unless you can > propose a sensible transition strategy (as is always necessary for > changing APIs in backwards incompatible ways). This would not be a backwards incompatible change, since every option that previously parsed successfully would also parse in the same way after the fix.
msg111673 - (view)Author: Anders Kaseorg (andersk) *Date: 2010-07-26 23:17
> arguments = *(positional-argument / option) [-- *(positional-argument)] > positional-argument = string > option = foo-option / bar-option > foo-option = "--foo" string > bar-option = "--bar" Er, obviously positional arguments before the first ‘--’ can’t begin with a dash (I don’t think there’s any confusion over how those should work). arguments = *(non-dash-positional-argument / option) ["--" *(positional-argument)] non-dash-positional-argument = <string not beginning with "-"> positional-argument = string The point was just that the grammar unambiguously allows the argument of --foo to be any string.
msg111691 - (view)Author: Steven Bethard (bethard) *Date: 2010-07-27 09:37
It *would* be a backwards incompatible change. Currently, if I have a parser with both a "--foo" and a "--bar" option, and my user types "--foo --bar", they get an error saying that they were missing the argument to "--foo". Under your proposal, the "--foo" option will now silently consume the "--bar" option without an error. I know this is good from your perspective, but it would definitely break some of my scripts, and I imagine it would break other people's scripts as well. As I keep saying, I'm happy to add your alternative parsing as an option (assuming you provide a patch), but I really don't think it's the right thing to do by default. Most command line programs don't have options that take other option-like things as arguments (which is the source of your problem), so in most command line programs, people want an error when they get an option they don't recognize or an option that's missing its argument. Under your proposal, more such errors will pass silently and will have to be caught by additional code in the script.
msg128014 - (view)Author: Gerard van Helden (drm)Date: 2011-02-05 19:13
The reporter imho is 100% right. Simply because of the fact that in the current situation, there is no way to supply an argument starting with a dash (not even for instance a filename). That is, of course, total nonsense to be dictated by the parser library.
msg128025 - (view)Author: Eric V. Smith (eric.smith) *Date: 2011-02-05 20:54
While I also dislike the existing behavior, note that you can get what you want by using an equal sign. >>> import argparse >>> parser = argparse.ArgumentParser(prog='a2x') >>> parser.add_argument('--asciidoc-opts', ... action='store', dest='asciidoc_opts', default='' ... metavar='ASCIIDOC_OPTS', help='asciidoc options') >>> parser.parse_args(['--asciidoc-opts', '-1']) Namespace(asciidoc_opts='-1') >>> parser.parse_args(['--asciidoc-opts=-one']) Namespace(asciidoc_opts='-one') I always use the equal sign, so I've never noticed this behavior before. I wish that help would display the equal sign, but that's another issue.
msg128047 - (view)Author: Steven Bethard (bethard) *Date: 2011-02-06 09:47
Yeah, I agree it's not ideal, though note that basic unix commands have trouble with arguments staring with dashes: $ cd -links-/ -bash: cd: -l: invalid option cd: usage: cd [-L|-P] [dir] If you're working with a file on a filesystem, the time honored workaround is to prefix with ./ $ cd ./-links-/ $ Anyway, it doesn't seem like anyone is offering to write up a patch to enable such an alternative parsing strategy, perhaps Eric's "=" workaround should be documented prominently somewhere?
msg128055 - (view)Author: Éric Araujo (eric.araujo) *Date: 2011-02-06 12:18
Documenting “--extra-args=--foo” or “--extra-args -- --foo” (untested, but should work) seems good.
msg128062 - (view)Author: Eric V. Smith (eric.smith) *Date: 2011-02-06 15:56
"--" won't work. Traditionally, this has been used to separate optional arguments from positional arguments. Continuing the "cd" example, that's what would let you cd into a directory whose name starts with a hyphen: $ cd -links-/ -bash: cd: -l: invalid option cd: usage: cd [-L|-P] [dir] $ cd -- -links- $ This would also work with argparse: import argparse parser = argparse.ArgumentParser(prog='cd') parser.add_argument('-L', help='follow symbolic links') parser.add_argument('-P', help='do not follow symbolic links') parser.add_argument('dir', help='directory name') print(parser.parse_args(['--', '-Links-'])) prints: Namespace(L=None, P=None, dir='-Links-') Continuing the example from my earlier post shows it won't work for values for optional arguments: >>> parser.parse_args(['--asciidoc-opts -- -one']) usage: a2x [-h] [--asciidoc-opts ASCIIDOC_OPTS] a2x: error: unrecognized arguments: --asciidoc-opts -- -one I believe it's only the '=' that will solve this problem. In fact, because of this issue, I suggest we document '=' as the preferred way to call argparse when optional arguments have values, and change all of the examples to use it. I also think it would cause less confusion (because of this issue) if the help output showed the equal sign. But I realize that's probably more controversial.
msg128071 - (view)Author: Anders Kaseorg (andersk) *Date: 2011-02-06 19:12
There are some problems that ‘=’ can’t solve, such as options with nargs ≥ 2. optparse has no trouble with this: >>> parser = optparse.OptionParser() >>> parser.add_option('-a', nargs=2) >>> parser.parse_args(['-a', '-first', '-second']) (<Values at 0x7fc97a93a7e8: {'a': ('-first', '-second')}>, []) But inputting those arguments is _not possible_ with argparse. >>> parser = argparse.ArgumentParser() >>> parser.add_argument('-a', nargs=2) >>> parser.parse_args(['-a', '-first', '-second']) usage: [-h] [-a A A] : error: argument -a: expected 2 argument(s)
msg128072 - (view)Author: Eric V. Smith (eric.smith) *Date: 2011-02-06 19:33
Good point, I hadn't thought of that. Maybe ArgumentParser needs a "don't try to be so helpful, parse like optparse" option. Which is what Steven suggested earlier, I believe. I'd take a crack at this if there's general consensus on that solution. We can change the documentation to point out the issue now, but the feature request can only go in 3.3.
msg128076 - (view)Author: Anders Kaseorg (andersk) *Date: 2011-02-06 19:53
That would be a good first step. I continue to advocate making that mode the default, because it’s consistent with how every other command line program works[1], and backwards compatible with the current argparse behavior. As far as documentation for older versions, would it be reasonable to un-deprecate optparse until argparse becomes a suitable replacement? There are still lots of programmers working in Python 2.7. [1] bethard’s msg128047 is confusing positional arguments with option arguments. All UNIX commands that accept option arguments have no trouble accepting option arguments that begin with -. For example, ‘grep -e -pattern file’ is commonly used to search for patterns beginning with -.
msg128078 - (view)Author: Eric V. Smith (eric.smith) *Date: 2011-02-06 20:02
I'd also like to see this as the default. After all, presumably we'd like Python scripts to work like all other command line programs, and I too am unaware of any other option parsing library that works the way argparse does. But changing released behavior in the stdlib is problematic, as everyone knows. I'll look into producing a patch to add this as optional behavior, then we can separately think about changing the default.
msg128090 - (view)Author: Steven Bethard (bethard) *Date: 2011-02-06 22:01
I don't think there's any sense in "un-deprecating" optparse because: (1) It's only deprecated in the documentation - there is absolutely nothing in the code to keep you from continuing to use it, and there are no plans to remove it from Python. (2) One (mis?)feature doesn't make the rest of the module useless. And yes Eric, it would be awesome if you could develop a patch that allows the alternate parsing to be enabled when someone wants it. We should think about deprecation strategy though. Maybe something like: == Python 3.3 == # Python 3.2 behavior parser = ArgumentParser(error_on_unknown_options=True) # proposed behavior parser = ArgumentParser(error_on_unknown_options=False) # deprecation warning when not specified parser = ArgumentParser() == Python 2.4 == # error warning when not specified parser = ArgumentParser() == Python 2.5 == # defaults to error_on_unknown_options=False parser = ArgumentParser() I'm not sure that's the right way to do it, but if the plan is to change the default at some point, we should make sure that we have a deprecation plan before we add the feature.
msg128091 - (view)Author: Éric Araujo (eric.araujo) *Date: 2011-02-06 22:34
s/2.4/3.4/ s/2.5/3.5/ obviously :)
msg128094 - (view)Author: Anders Kaseorg (andersk) *Date: 2011-02-07 02:08
> (1) It's only deprecated in the documentation Which is why I suggested un-deprecating it in the documentation. (I want to avoid encouraging programmers to switch away from optparse until this bug is fixed.) > # proposed behavior > parser = ArgumentParser(error_on_unknown_options=False) Perhaps you weren’t literally proposing “error_on_unknown_options=False” as the name of the new flag, but note that neither the current nor proposed behaviors have nothing to do with whether arguments look like known or unknown options. Under the proposed behavior, anything in argument position (--asciidoc-opts ___) is parsed as an argument, no matter what it looks like. So a more accurate name might be “refuse_dashed_args=False”, or more generally (in case prefix_chars != '-'), “refuse_prefixed_args=False”?
msg128104 - (view)Author: Steven Bethard (bethard) *Date: 2011-02-07 07:58
@Éric: yes, thanks! @Anders: The reason the current implementation gives you the behavior you don't want is that the first thing it does is scan the args list for things that look like flags (based on prefix_chars). It assumes that everything that looks like a flag is intended to be one, before it ever looks at how many arguments the flag before it takes or anything like that. This is the source of your problem - argparse assumes "-safe" is a flag, and as a result, there is no argument for "--asciidoc-opts'. So perhaps a better name would be something like dont_assume_everything_that_looks_like_a_flag_is_intended_to_be_one. ;-)
msg128134 - (view)Author: Eric V. Smith (eric.smith) *Date: 2011-02-07 16:34
Steven: Yes, the current structure of the first pass scan makes any patch problematic. It really would be an implementation of a different algorithm. I'm still interested in looking at it, though.
msg128179 - (view)Author: Eric V. Smith (eric.smith) *Date: 2011-02-08 14:01
Without guessing which args are options, I don't see how it's possible to implement parse_known_args(). I'd propose raising an exception if it's called and dont_assume_everything_that_looks_like_a_flag_is_intended_to_be_one (or whatever it ends up being called) is True.
msg128266 - (view)Author: Steven Bethard (bethard) *Date: 2011-02-10 07:11
Maybe dont_assume_everything_that_looks_like_a_flag_is_intended_to_be_one should actually be a new class, e.g. parser = AllowFlagsAsPositionalArgumentsArgumentParser() Then you just wouldn't provide parse_known_args on that parser.
msg128728 - (view)Author: Eric V. Smith (eric.smith) *Date: 2011-02-17 15:52
[I doubt my terminology is exactly correct in this post, but I've tried my best to make it so.) The more I think about this the more I realize we can't implement a parser that doesn't make guesses about '-' prefixed args and that works with arparse's existing behavior with respect to optional arguments. For example: parser = argparse.ArgumentParser() parser.add_argument('--foo', nargs='?') parser.add_argument('--bar', nargs='?') print parser.parse_args(['--foo', '--bar', 'a']) print parser.parse_args(['--foo', 'x', '--bar', 'a']) Unless the parser tries to guess that --bar is an optional argument by itself, it can't know that --foo has an argument or not. I guess it could look and say that if you called this with '--foo --baz', then '--baz' must be an argument for '--foo', but then you could never have an argument to '--foo' named '--bar', plus it all seems fragile. Maybe this new parser (as Steven described it) wouldn't allow a variable number of arguments to optional arguments? That is, nargs couldn't be '?', '*', or '+', only a number.
msg132220 - (view)Author: Steven Bethard (bethard) *Date: 2011-03-26 09:53
Thanks for the analysis Eric. Yeah, it does seem like it's not possible to implement this feature request while still supporting optionals with variable number arguments. @andersk: Would the restriction to only having flags with a fixed number of arguments be acceptable for your use case?
msg132260 - (view)Author: Anders Kaseorg (andersk) *Date: 2011-03-26 18:11
> @andersk: Would the restriction to only having flags with a fixed > number of arguments be acceptable for your use case? I think that’s fine. Anyone coming from optparse won’t need options with optional arguments. However, FWIW, GNU getopt_long() supports options with an optional argument under the restrictions that: • the option must be a long option, • the optional argument must be the only argument for the option, and • the argument, if present, must be supplied using the ‘--option=argument’ form, not the ‘--option argument’ form. This avoids all parsing ambiguity. It would be useful to have feature parity with getopt_long(), to facilitate writing Python wrapper scripts for C programs.
msg150310 - (view)Author: James B (skilletaudio)Date: 2011-12-28 18:15
I have encountered this issue(python 2.7) with respect to positional arguments that begin with a dash (linux/ bash). In the following example, the parser requires three positional arguments. I attempted to encase the arguments in single-quotes as that is expected in general to result in strings to be correctly handled (these args are API keys, so they could contain shell-unfriendly chars like - and &). ./tool.py arg1 'arg2' '-arg3&otherstuff' You'll note there are no optional arguments in this example, it just boils down to a positional argument being broken up on parse. Needless to say it was quite confusing to see the script complain after passing in what would typically be perfectly valid strings in most other apps / scripts. Is it possible to get argparse to correctly notice and handle shell-appropriate single-quoting methods(dont break down a string that has been implied as a complete token via ' ') As it stands, it appears I have two workaround options: 1) adopt the ./tool.py -- <postional args> convention mentioned in this thread, or 2) escape leading dashes in positional argument strings to avoid this issue.
msg150320 - (view)Author: Anders Kaseorg (andersk) *Date: 2011-12-28 22:20
James: That’s not related to this issue. This issue is about options taking arguments beginning with dash (such as a2x --asciidoc-opts --safe, where --safe is the argument to --asciidoc-opts), not positional arguments beginning with dash. Your observation isn’t a bug. In all getopt-like parsers, -- is the only way to pass positional arguments beginning with -. (Whether you shell-quoted the argument is irrelevant; the - is interpreted by the program, not the shell, after the shell has already stripped off the shell quoting.) If your program doesn’t take any options and you’d like to parse positional arguments without requiring --, don’t use a getopt-like parser; use sys.argv directly. If you still think your example is a bug, please file a separate report.
msg169712 - (view)Author: Christophe Guillon (Christophe.Guillon)Date: 2012-09-02 17:58
As a workaround for this missing feature, the negative number matching regexp can be used for allowing arguments starting with '-' in arguments of option flags. We basically do: parser = argparse.ArgumentParser(...) parser._negative_number_matcher = re.compile(r'^-.+$') This allow cases such as @andersk: $ a2x --asciidoc-opts --safe gitcli.txt where '--safe' is an argument to '--asciidoc-opts' As this behavioral change is quite simple, couldn't the requested feature be implemented like this with an optional setting to the ArgumentParser contructor?
msg169978 - (view)Author: Steven Bethard (bethard) *Date: 2012-09-07 07:24
Interesting idea! The regex would need a little extra care to interoperate properly with prefix_chars, but the approach doesn't seem crazy. I'd probably call the constructor option something like "args_default_to_positional" (the current behavior is essentially that args default to optional arguments if they look like optionals). I'd be happy to review a patch along these lines. It would probably be good if Anders Kaseorg could also review it to make sure it fully solves his problem.
msg184174 - (view)Author: paul j3 (paul.j3) *Date: 2013-03-14 17:09
If nargs=2, type=float, an argv like '1e4 -.002' works, but '1e4 -2e-3' produces the same error as discussed here. The problem is that _negative_number_matcher does not handle scientific notation. The proposed generalize matcher, r'^-.+$', would solve this, but may be overkill. I'm not as familiar with optparse and other argument processes, but I suspect argparse is different in that it processes the argument strings twice. On one loop it parses them, producing an arg_strings_pattern that looks like 'OAA' (or 'OAO' in these problem cases). On the second loop is consumes the strings (optionals and positionals). This gives it more power, but produces problems like this if the parsing does not match expectations.
msg184177 - (view)Author: Evgeny Kapun (abacabadabacaba)Date: 2013-03-14 17:46
The way how argparse currently parses option arguments is broken. If a long option requires an argument and it's value isn't specified together with the option (using --option=value syntax), then the following argument should be interpreted as that value, no matter what it looks like. There should be no guesses or heuristics here. That the behavior depends on whether some argument "looks like" a negative number is the most horrible. Argument parsing should follow simple, deterministic rules, preferably the same that used by standard getopt(3).
msg184178 - (view)Author: Eric V. Smith (eric.smith) *Date: 2013-03-14 17:59
Evgeny: I completely agree. It's unfortunate that argparse doesn't work that way. However, I think it's too late to change this behavior without adding a new parser. I don't think existing argparse can be changed to not operate the way it does, due to backward compatibility concerns. The discussion in this issue describes those compatibility concerns.
msg184180 - (view)Author: paul j3 (paul.j3) *Date: 2013-03-14 19:38
We need to be careful about when or where _negative_number_match is changed. " We basically do: parser = argparse.ArgumentParser(...) parser._negative_number_matcher = re.compile(r'^-.+$') " This changes the value for the parser itself, but not for the groups (_optionals, _positionals) or any subparsers. The code takes special care to make sure that the related property: _has_negative_number_optionals is properly shared among all these ActionContainers.
msg184211 - (view)Author: paul j3 (paul.j3) *Date: 2013-03-15 04:43
While parser._negative_number_matcher is used during parser.parse_args() to check whether an argument string is a 'negative number' (and hence whether to classify it as A or O). parser._optionals._negative_number_matcher is used during parser.add_argument() to determine whether an option_string is a 'negative number', and hence whether to modify the _hasNegativeNumberOptionals flag. If this matcher is the general r'^-.+$', adding the default '-h' will set this flag. We don't want that. Using a different matcher for these two containers might work, but is awfully kludgy.
msg184425 - (view)Author: paul j3 (paul.j3) *Date: 2013-03-18 05:45
I think the `re.compile(r'^-.+$')` behavior could be better achieved by inserting a simple test in `_parse_optional` before the `_negative_number_matcher` test. # behave more like optparse even if the argument looks like a option if self.args_default_to_positional: return None In effect, if the string does not match an action string, say it is a positional. Making this patch to argparse.py is simple. How much to test it, and how document it requires more thought.
msg184987 - (view)Author: paul j3 (paul.j3) *Date: 2013-03-22 17:16
This patch makes two changes to argparse.py ArgumentParser._parse_optional() - accept negative scientific and complex numbers - add the args_default_to_positional parser option _negative_number_matcher only matches integers and simple floats. This is fine for detecting number-like options like '-1'. But as used in _parse_optional() it prevents strings like '-1e4' and '-1-4j' from being classed as positionals (msg184174). In this patch it is replaced with try: complex(arg_string) return None except ValueError: pass Immediately before this number test I added if self.args_default_to_positional: return None to implement the idea suggested in msg169978. I added the args_default_to_positional parser option to the documentation, along with some notes on its implications in the `Arguments containing -` section. A few of the examples that I added use scientific or complex numbers. I tested test_argparse.py with args_default_to_positional=True default. A number of the 'failures' no longer failed. class TestDefaultToPositionalWithOptionLike illustrates this in the Option-Like situation. The only 'successes' to fail were in the TestAddSubparsers case. There an argument string '0.5 -p 1 b -w 7' produced 'wrong choice' error, since the '-p' was assumed to be a commands choice, rather than an unknown optional. I translated the TestStandard cases from the optparse test file. argparse ran most of these without problem. The value of args_default_to_positional makes no difference. There a few optparse tests that use '--' or a valid optional as positional that argparse does not handle.
msg239435 - (view)Author: paul j3 (paul.j3) *Date: 2015-03-27 20:51
http://bugs.python.org/issue22672 float arguments in scientific notation not supported by argparse is a newer complaint about the same issue. I've closed it with link to here.
msg251815 - (view)Author: Memeplex (memeplex)Date: 2015-09-29 03:28
What's missing for this patch to be applied? Can I help somehow?
msg251862 - (view)Author: Memeplex (memeplex)Date: 2015-09-29 14:41
Here is another manifestation of this problem: http://bugs.python.org/issue17050
msg263211 - (view)Author: Cherniavsky Beni (cben) *Date: 2016-04-11 22:03
+1, is there anything missing to apply Paul's patch? Can I additional suggest a change to the error message, e.g.: $ prog --foo -bar prog: error: argument --foo: expected one argument (tip: use --foo=-bar to force interpretation as argument of --foo) This can be safely added in the current mode with no opt-in required, and will relieve the immediate "but what can I do?" confusions of users. The workaround is hard to discover otherwise, as `--foo=x` is typically equivalent to `--foo x`. --- more discussion, though I suspect it's not productive --- I've tried to find what the GNU Standards or POSIX say about this and was surprised to see neither explains how exactly `--opt_with_mandatory_argument -quux` behaves. man getopt says: If such a character is followed by a colon, the option requires an argument, so getopt() places a pointer to the following text in the same argv-element, or the text of the following argv-element, in optarg. Two colons mean an option takes an optional arg; if there is text in the current argv-element (i.e., in the same word as the option name itself, for example, "-oarg"), then it is returned in optarg, otherwise optarg is set to zero. This is a GNU extension. POSIX similarly does explain that an optional arg after an option must follow within the same argument: (2)(b) If the SYNOPSIS shows an optional option-argument (as with [ -f[ option_argument]] in the example), a conforming application shall place any option-argument for that option directly adjacent to the option in the same argument string, without intervening <blank> characters. If the utility receives an argument containing only the option, it shall behave as specified in its description for an omitted option-argument; it shall not treat the next argument (if any) as the option-argument for that option. -- http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html Anyway, every argument parsing library I've ever seen parses options in a left-to-right pass, consuming non-optional arguments after an option whatever they look like. I've never seen a difference between `--foo bar` and `--foo=bar` when bar is *non-optional*. Both behaviors (--opt_with_mandatory_argument bar, --opt_with_optional_argument[=bar]) were clearly designed to avoid ambiguity. Whereas argparse innovated some constructs eg. '--opt', nargs='*' that are inherently ambiguous. But for the simple constructs, most notably nargs=1, there should be a way to get the traditional unix meaning.
msg263216 - (view)Author: Martin Panter (martin.panter) *Date: 2016-04-12 01:24
My main concern with the patch is that it only half fixes the problem. It sounds like it will allow parsing “--opt -x” (if “-x” is not registered as an option), but will still refuse “--opt -h”, assuming “-h” is registered by default. What is the barrier to parsing an argument to the option syntax independently of what option names are registered? Also the name “args_default_to_positional=True” name is both unwieldy and vague to me. The purpose seems to be to disable option-lookalike-strings from being reserved. Maybe call it something like “reserve_all_options=False” or “reserve_unregistered_options=False”? I left some thoughts in the code review for the documentation too.
msg276486 - (view)Author: Clint Olsen (Clint Olsen)Date: 2016-09-14 20:49
I'm not sure if this is applicable to this bug, but one feature missing from argparse is the ability to snarf arbitrary options up to a terminating '--'. The purpose of this is to collect arguments for potential children you may spawn. An example: --subscript_args --foo --bar --baz -- <other args> So, if you ran args = parser.parse_args() args.subscript_args = [ '--foo', '--bar', '--baz' ] Right now I have NO way of enabling this w/o writing my own argument parser, and I think it's bizarre that argparse can't do something like this. And no, I don't want to pass a singly-quoted string to this so I don't have to manually split() the arguments which may or may not match what /bin/sh does. Does this deserve it's own enhancement request?
msg276503 - (view)Author: paul j3 (paul.j3) *Date: 2016-09-15 02:30
Clint, 'nargs=argparser.REMAINDER' ('...') may do what you want p=argparse.ArgumentParser() p.add_argument('--subscipt_args', nargs='...') p.add_argument('pos',nargs='*') p.parse_args('--subscipt_args --foo --bar --baz -- other args'.split()) produces Namespace(pos=['other', 'args'], subscipt_args=['--foo', '--bar', '--baz']) 'REMAINDER' is like '*' except it takes everything. But the '--' means 'everything that follows is a positional argument, so it effectively ends the 'REMAINDER'. 'REMAINDER' is documented (briefly), but I don't recall reading about its interaction with '--'. I'm a little surprised that it wasn't mentioned earlier in this bug/issue. '+...', argparse.PARSER is similar except it requires at least one argument. It is used by the 'subparsers' argument, to collect the cmd string and use all that follow as subparser arguments. There is a bug issue (or two) about what should happen when there are more than one '--' argument.
msg276524 - (view)Author: Clint Olsen (Clint Olsen)Date: 2016-09-15 07:48
Thanks for the suggestion! It seems to be extremely limited, unfortunately. I don't want option processing to cease once I hit this switch. p=argparse.ArgumentParser() p.add_argument('--subscipt_args', nargs='...') #p.add_argument('pos',nargs='*') p.add_argument('--verbose', action='store_true') args = p.parse_args('--subscipt_args --foo --bar --baz -- --verbose '.split()) print(args) usage: test.py [-h] [--subscipt_args ...] [--verbose] test.py: error: unrecognized arguments: -- --verbose
msg276634 - (view)Author: paul j3 (paul.j3) *Date: 2016-09-15 22:48
Clint, the problem is the argparse uses different argument allocation method than optparse. optparse gives the '--subscipt_args` Action all of the remaining strings, and says - 'consume what you want, and return the rest'. So you consume up to (and including) the '--'. Then optparse continues with the rest. argparse performs the double pass described earlier, and allocates strings to each Action based on the larger context. It gives as many as the Action's nargs requires, but tries in various ways to reserve strings for other Actions. Individual Actions never see the big picture. So the 'REMAINDER' action has to be last (with this '--' positional exception). Your users will have to use the other flags Actions first. One alternative comes to mind: - omit the final positional - use parse_known_args instead of parse_args Then the 'extras' list will be something like: ['--', '--verbose'], which you can handle in another parser.
msg307193 - (view)Author: Evan Driscoll (evaned)Date: 2017-11-29 00:06
I ran into this issue today. (Or rather a couple weeks ago, and I just diagnosed it today.) Reading through the thread and from the bug's age it looks like a fix is probably not to promising, but Cherniavsky Beni's 2016-04-11 22:03 comment > Can I additional suggest a change to the error message, e.g.: > > $ prog --foo -bar > prog: error: argument --foo: expected one argument > (tip: use --foo=-bar to force interpretation as argument of --foo) > > This can be safely added in the current mode with no opt-in required, > and will relieve the immediate "but what can I do?" confusions of > users. The workaround is hard to discover otherwise, as `--foo=x` is > typically equivalent to `--foo x`. and found it intriguing. Messing around with the code, I was able to produce the attached patch, which, when run on the test case in the original comment, produces this output: >>> import argparse >>> parser = argparse.ArgumentParser(prog='a2x') >>> parser.add_argument('--asciidoc-opts', ... action='store', dest='asciidoc_opts', default='', ... metavar='ASCIIDOC_OPTS', help='asciidoc options') >>> parser.parse_args(['--asciidoc-opts', '--safe']) usage: a2x [-h] [--asciidoc-opts ASCIIDOC_OPTS] a2x: error: argument --asciidoc-opts: expected one argument (if you intended --safe to be the argument of --asciidoc-opts, pass --asciidoc-opts=--safe instead) Would a cleaned-up version of this patch be of interest? (There are a couple obvious problems, like the out-of-bounds access to the list, PEP8, etc.) Is there some other way you could suggest to achieve this aim if you don't like that approach? (I also think that nargs=## could maybe be special-cased to just ignore the A/O designation completely and only check there are enough, but I haven't tried this out. Does this seem like a viable approach? Would a patch that does that, subject to some flag, be of interest?) The patch is relative to, I believe, the distribution version of 2.7.8. (Sorry, it's what I had handy as a custom build. :-) Updating it to .14 and to 3.whatever would be part of the cleanup.)
msg307209 - (view)Author: Evan Driscoll (evaned)Date: 2017-11-29 05:54
> I also think that nargs=## could maybe be special-cased to just ignore > the A/O designation completely and only check there are enough, but I > haven't tried this out. Does this seem like a viable approach? Would a > patch that does that, subject to some flag, be of interest? I can't leave well enough alone, so, with the following additional patch: - def _match_argument(self, action, arg_strings_pattern): + def _match_argument(self, action, arg_strings_pattern, arg_strings, start_index): + import numbers + nargs = action.nargs if action.nargs is not None else 1 + if isinstance(nargs, numbers.Number) and len(arg_strings_pattern) >= nargs: + return nargs + # match the pattern for this action to the arg strings nargs_pattern = self._get_nargs_pattern(action) match = _re.match(nargs_pattern, arg_strings_pattern) ... Then I get this: >>> import argparse >>> parser = argparse.ArgumentParser(prog='a2x') >>> parser.add_argument('--asciidoc-opts', ... action='store', dest='asciidoc_opts', default='', ... metavar='ASCIIDOC_OPTS', help='asciidoc options') >>> parser.parse_args(['--asciidoc-opts', '--safe']) Namespace(asciidoc_opts='--safe') Comments on this approach? (Again, I haven't run tests, it'd need to be controlled by a flag per your desire to not change existing behavior, etc.)
msg307210 - (view)Author: Evan Driscoll (evaned)Date: 2017-11-29 06:01
One last comment for the time being. I actually think *both* changes are valuable. Fixing the bug, well, fixes the bug if you can set the appropriate flag. The improved error message still helps for existing code and new code that *doesn't* set the flag.
msg307213 - (view)Author: Raymond Hettinger (rhettinger) *Date: 2017-11-29 08:24
Steven, do you care to put this to rest?
msg307786 - (view)Author: paul j3 (paul.j3) *Date: 2017-12-07 01:04
In the recently pushed, https://bugs.python.org/issue14191, "argparse doesn't allow optionals within positionals" we added new parsing functionality by defining a new parser method: parse_intermixed_args It added functionality without requiring a new parameter for the parser (no point in confusing users with parameters they don't need or understand). It was also a good fit because it worked on top of the default parser, fidling with the nargs to parse positionals and options in different runs. I would like to see something similar for this problem. Define a parser.parse_opt_args() method that tries, as much as possible to follow the optparse strategy. As I commented previously, the difference in behavior starts at the top. argparse distinguishes between flag (optiona) and argument strings based on the dash(es), and then allocates strings to the Actions based on that pattern and nargs. It also alternates between handling positionals and optionals. optparse passes all the remaining strings to an Action, lets it consume what it wants, and resumes parsing with the remainder. It does not handle positionals; those are just accumulated in an 'extras' list (sort of like parse_known_args without any defined positionals). An unknown in this approach is whether the argparse.Action class(es) can be adapted to this 'consume what you want' strategy. It would be nice if such an alternative parser could be written that doesn't require any changes to the Action. We don't have to go so far as to allow custom Action classes that imitate optparse Options. But I haven't thought about this problem since 2013. I don't sense either, from other bug/issues, or Stackoverflow questions, that this is a pressing need.
msg309685 - (view)Author: Tom Karzes (karzes)Date: 2018-01-09 06:02
I'm dismayed to see that this bug was reported in 2010, yet as of January 2018 has not yet been fixed. This option parsing behavior is contrary to Unix option passing conventions. I certainly don't mind enhancements, but the last thing that should *ever* be introduced into an option parser is ambiguity. It's dangerous. I'm an old-school Unix guy. I know exactly what options I want to pass to my program, and I know exactly how they should be passed. Options first, with option arguments immediately following the option name, followed by positional arguments (with no more options being recognized after that point). If the first positional argument begins with a hyphen, "--" can be used to end option parsing. That's all I want. Simple. Precise. Unambiguous. And it doesn't place any constraints on what an option value can look like. Yes, I know I can get around the problem by using "=" to join option values to the preceding option names, but that shouldn't be necessary. In my opinion, the entire approach of attempting to distinguish option names from arguments out of context is fatally flawed. It cannot be done. Period. Furthermore, it's utterly unnecessary. All I want is a parameter I can pass to argparse to tell it to accept the entire command line at face value, without any second-guessing. If I specify an option that takes an integer argument, then the very next command line argument should be an integer. If it isn't, I want an immediate error. More to the point, if I specify an option that takes a string argument, then the very next command line argument is that string, period. It doesn't matter if it begins with a hyphen or any other character for that matter. It's a string. It can contain *any* character. And yet, argparse doesn't support this basic, fundamental functionality without the user inserting an "=" to join the arguments. Why?? Here's an analogy: Your favorite car company has released a new car. It adds lots of great features, like a sun roof and a fancy navigation system. But oops, the brakes no longer work if you're turning right when you apply them. But not to worry! You can always just throw it into park to stop the car. It's worth it for the new features, right? Wrong. It seems to me like there are three possible solutions to this: (1) Add a "traditional" mode to argparse that completely bypasses any attempt to classify command line arguments as "options" vs. "arguments", (2) Un-deprecate optparse, and resume development on it, adding support for some of the argparse features but without breaking standard Unix option parsing, or (3) Create yet another new option parsing package for Python, one which supports traditional Unix option parsing and doesn't introduce gratuitous ambiguities and restrictions on what strings can contain. My specific case: I have an option whose argument is a comma-separated list of signed integers, as a single string. This is the format another, non-Python application requires, and it needs to be compatible. But the following fails: --myopt -1,2 Apparently it thinks "-1,2" is an option name. No, it's a string option value, and should not be interpreted in any other way. I want it passed as the option value. I would want the same no matter *what* characters it contained. And I don't want to have to glue the two arguments together with "=".
msg309691 - (view)Author: Eric V. Smith (eric.smith) *Date: 2018-01-09 09:08
I tend to agree with you about pre-scanning the arguments to find options. But at this point, our options to change the code are limited. The last time I looked at this (and it's been years), I came to the conclusion that the argument pre-scanning was sufficiently baked in to argparse that a separate traditional" mode was better done as a separate library. But I lack the time and energy to research if there's an existing third party library that's acceptable, what it would take to enhance optparse, or write a new library. It sounds like what you want is optparse, but with help in processing positional arguments. Is that a fair statement? Or is there some other feature of argparse that's preventing you from using optparse? I know for me it's help with positional arguments. I think at some point we need to close this bug, because I don't see a way of modifying argparse to do what you (and I) want. paul.j3 explains several times in his messages on this thread that it's just how argparse fundamentally works.
msg309693 - (view)Author: Tom Karzes (karzes)Date: 2018-01-09 09:57
Here's my situation: I originally used optparse, although some of the guys I worked with at the time were starting to use argparse. At first I thought, I'm sticking with optparse, it's more standard than argparse and probably better supported. But at some point optparse became documented as deprecated, with argparse being hailed as its replacement. At that point my preference switched, again mostly because I wanted the most reliable, best supported option parsing package. And yes, I have come to appreciate some of the features of argparse, but to me that takes a back seat to correct functionality. The documentation for argparse promotes the idea that it completely subsumes optparse, and that applications that use optparse can easily be converted to use argparse. And for the most part that's true, except that optparse identifies options correctly and argparse does not. That really, really needs to be documented. What I want is a supported, standard option parsing library that knows how to extract option values correctly. I used to have that with optparse, but now I feel like the rug's been pulled out from under me. optparse is supposedly deprecated, and argparse doesn't work. So I either use a package that could be removed from Python distributions at any time, or I use a package that has dangerous bugs. I don't find either alternative very attractive. As I said, un-deprecating optparse would be sufficient, especially if people started adding some of the argparse functionality to it. Creating a new package would work too. But from what I've seen, it sounds like argparse is beyond hope of repair and will never work properly. I.e., it's a dead-end development path. So why isn't *argparse* deprecated? I've always maintained that core functionality is of primary importance. Shiny bells and whistles, no matter how useful, are of secondary importance. In my opinion, the wrong package was deprecated.
msg310540 - (view)Author: paul j3 (paul.j3) *Date: 2018-01-24 00:35
I attached a script that implements Evan's _match_argument idea, using a ArgumentParser subclass. I think this is the safest way to add different functionality to the parser. It (subclassing) is used, for example in pypi extensions like plac. My version places the special nargs case after the default match test. So it acts only if the regular action fails. But I don't know of a test case where that difference matters. I've tested it with all the examples posted in this issue, but have not tested it against test_argparse.py. I'd also like to know if it goes far enough in adapting to optparse/POSIX usage. It probably doesn't.
DateUserActionArgs
2018-01-24 00:35:35paul.j3setfiles: + argparse_opt.py

messages: + msg310540
2018-01-09 09:57:02karzessetmessages: + msg309693
2018-01-09 09:08:13eric.smithsetmessages: + msg309691
2018-01-09 06:02:23karzessetnosy: + karzes
messages: + msg309685
2017-12-07 01:04:26paul.j3setmessages: + msg307786
2017-11-29 08:24:07rhettingersetassignee: bethard

messages: + msg307213
nosy: + rhettinger
2017-11-29 06:01:42evanedsetmessages: + msg307210
2017-11-29 05:54:48evanedsetmessages: + msg307209
2017-11-29 00:06:10evanedsetfiles: + python-argparse-error.patch

messages: + msg307193
2017-11-28 23:29:09evanedsetnosy: + evaned
2016-09-15 22:48:34paul.j3setmessages: + msg276634
2016-09-15 07:48:09Clint Olsensetmessages: + msg276524
2016-09-15 02:30:30paul.j3setmessages: + msg276503
2016-09-14 20:49:08Clint Olsensetnosy: + Clint Olsen
messages: + msg276486
2016-06-15 10:04:03spaceonesetnosy: + spaceone
2016-04-12 01:25:00martin.pantersetnosy: + martin.panter

messages: + msg263216
stage: needs patch -> patch review
2016-04-11 22:03:47cbensetnosy: + cben
messages: + msg263211
2016-01-25 17:14:25eric.smithlinkissue26196 superseder
2015-09-29 14:41:48memeplexsetmessages: + msg251862
2015-09-29 03:28:52memeplexsetnosy: + memeplex
messages: + msg251815
2015-03-27 21:10:13terry.reedylinkissue22672 superseder
2015-03-27 20:51:18paul.j3setmessages: + msg239435
2014-04-14 20:21:01eric.smithsetversions: + Python 3.5, - Python 2.7, Python 3.2, Python 3.3
2013-03-22 17:16:18paul.j3setfiles: + final.patch
keywords: + patch
messages: + msg184987
2013-03-18 05:45:12paul.j3setmessages: + msg184425
2013-03-15 04:43:25paul.j3setmessages: + msg184211
2013-03-14 19:38:46paul.j3setmessages: + msg184180
2013-03-14 17:59:52eric.smithsetmessages: + msg184178
2013-03-14 17:46:25abacabadabacabasetmessages: + msg184177
2013-03-14 17:09:24paul.j3setnosy: + paul.j3
messages: + msg184174
2013-03-02 17:22:15abacabadabacabasetnosy: + abacabadabacaba
2012-12-31 23:41:26danielshsetnosy: + danielsh
2012-12-19 10:17:22gfxmonksetnosy: + gfxmonk
2012-09-07 07:24:48bethardsetmessages: + msg169978
2012-09-02 17:58:45Christophe.Guillonsetnosy: + Christophe.Guillon
messages: + msg169712
2012-08-22 15:40:37amcnabbsetnosy: + amcnabb
2011-12-28 22:20:10andersksetmessages: + msg150320
2011-12-28 18:15:42skilletaudiosetnosy: + skilletaudio
messages: + msg150310
2011-03-26 18:11:01andersksetmessages: + msg132260
2011-03-26 09:53:55bethardsetmessages: + msg132220
versions: - Python 3.1
2011-02-17 15:52:04eric.smithsetnosy: bethard, eric.smith, eric.araujo, r.david.murray, andersk, gdb, nelhage, drm, davidben
messages: + msg128728
2011-02-10 07:11:32bethardsetnosy: bethard, eric.smith, eric.araujo, r.david.murray, andersk, gdb, nelhage, drm, davidben
messages: + msg128266
2011-02-08 14:01:21eric.smithsetnosy: bethard, eric.smith, eric.araujo, r.david.murray, andersk, gdb, nelhage, drm, davidben
messages: + msg128179
2011-02-07 16:34:45eric.smithsetnosy: bethard, eric.smith, eric.araujo, r.david.murray, andersk, gdb, nelhage, drm, davidben
messages: + msg128134
2011-02-07 07:58:28bethardsetnosy: bethard, eric.smith, eric.araujo, r.david.murray, andersk, gdb, nelhage, drm, davidben
messages: + msg128104
2011-02-07 02:08:36andersksetnosy: bethard, eric.smith, eric.araujo, r.david.murray, andersk, gdb, nelhage, drm, davidben
messages: + msg128094
2011-02-07 01:08:07davidbensetnosy: + davidben
2011-02-06 22:34:53eric.araujosetnosy: bethard, eric.smith, eric.araujo, r.david.murray, andersk, gdb, nelhage, drm
messages: + msg128091
2011-02-06 22:01:26bethardsetnosy: bethard, eric.smith, eric.araujo, r.david.murray, andersk, gdb, nelhage, drm
messages: + msg128090
2011-02-06 20:02:17eric.smithsetnosy: bethard, eric.smith, eric.araujo, r.david.murray, andersk, gdb, nelhage, drm
messages: + msg128078
2011-02-06 19:53:44andersksetnosy: bethard, eric.smith, eric.araujo, r.david.murray, andersk, gdb, nelhage, drm
messages: + msg128076
2011-02-06 19:33:14eric.smithsetnosy: bethard, eric.smith, eric.araujo, r.david.murray, andersk, gdb, nelhage, drm
messages: + msg128072
2011-02-06 19:12:18andersksetnosy: bethard, eric.smith, eric.araujo, r.david.murray, andersk, gdb, nelhage, drm
messages: + msg128071
2011-02-06 15:56:11eric.smithsetnosy: bethard, eric.smith, eric.araujo, r.david.murray, andersk, gdb, nelhage, drm
messages: + msg128062
2011-02-06 12:18:05eric.araujosetversions: + Python 3.1, Python 2.7, Python 3.2
nosy: + eric.araujo

messages: + msg128055

stage: test needed -> needs patch
2011-02-06 09:47:37bethardsetnosy: bethard, eric.smith, r.david.murray, andersk, gdb, nelhage, drm
messages: + msg128047
2011-02-05 20:54:37eric.smithsetnosy: bethard, eric.smith, r.david.murray, andersk, gdb, nelhage, drm
messages: + msg128025
versions: + Python 3.3, - Python 3.2
2011-02-05 19:13:29drmsetnosy: + drm
messages: + msg128014
2010-07-27 09:37:39bethardsetmessages: + msg111691
2010-07-26 23:17:59andersksetmessages: + msg111673
2010-07-26 22:43:47andersksetmessages: + msg111670
2010-07-26 21:53:48bethardsetmessages: + msg111669
2010-07-23 17:39:20andersksetmessages: + msg111367
2010-07-23 11:26:16bethardunlinkissue9338 superseder
2010-07-23 11:21:12bethardsetmessages: + msg111279
2010-07-23 10:47:34eric.araujolinkissue9338 superseder
2010-07-23 00:51:56r.david.murraysetmessages: + msg111230
2010-07-23 00:11:39gdbsetnosy: + gdb
2010-07-22 23:43:53andersksetmessages: + msg111228
2010-07-22 23:40:25nelhagesetnosy: + nelhage
messages: + msg111227
2010-07-22 23:06:20r.david.murraysetversions: - Python 2.7, Python 3.3
nosy: + r.david.murray, bethard

messages: + msg111224

type: enhancement
stage: test needed
2010-07-22 22:44:48eric.smithsetnosy: + eric.smith
2010-07-22 22:15:36anderskcreate

6. Simple statements¶

Simple statements are comprised within a single logical line. Several simple statements may occur on a single line separated by semicolons. The syntax for simple statements is:

simple_stmt ::= | | | | | | | | | | | | | |

6.1. Expression statements¶

Expression statements are used (mostly interactively) to compute and write a value, or (usually) to call a procedure (a function that returns no meaningful result; in Python, procedures return the value ). Other uses of expression statements are allowed and occasionally useful. The syntax for an expression statement is:

expression_stmt ::=

An expression statement evaluates the expression list (which may be a single expression).

In interactive mode, if the value is not , it is converted to a string using the built-in function and the resulting string is written to standard output (see section The print statement) on a line by itself. (Expression statements yielding are not written, so that procedure calls do not cause any output.)

6.2. Assignment statements¶

Assignment statements are used to (re)bind names to values and to modify attributes or items of mutable objects:

assignment_stmt ::= ( "=")+ ( | ) target_list ::= ("," )* [","] target ::= | "(" ")" | "[" [] "]" | | |

(See section Primaries for the syntax definitions for the last three symbols.)

An assignment statement evaluates the expression list (remember that this can be a single expression or a comma-separated list, the latter yielding a tuple) and assigns the single resulting object to each of the target lists, from left to right.

Assignment is defined recursively depending on the form of the target (list). When a target is part of a mutable object (an attribute reference, subscription or slicing), the mutable object must ultimately perform the assignment and decide about its validity, and may raise an exception if the assignment is unacceptable. The rules observed by various types and the exceptions raised are given with the definition of the object types (see section The standard type hierarchy).

Assignment of an object to a target list is recursively defined as follows.

  • If the target list is a single target: The object is assigned to that target.
  • If the target list is a comma-separated list of targets: The object must be an iterable with the same number of items as there are targets in the target list, and the items are assigned, from left to right, to the corresponding targets.

Assignment of an object to a single target is recursively defined as follows.

  • If the target is an identifier (name):

    • If the name does not occur in a statement in the current code block: the name is bound to the object in the current local namespace.
    • Otherwise: the name is bound to the object in the current global namespace.

    The name is rebound if it was already bound. This may cause the reference count for the object previously bound to the name to reach zero, causing the object to be deallocated and its destructor (if it has one) to be called.

  • If the target is a target list enclosed in parentheses or in square brackets: The object must be an iterable with the same number of items as there are targets in the target list, and its items are assigned, from left to right, to the corresponding targets.

  • If the target is an attribute reference: The primary expression in the reference is evaluated. It should yield an object with assignable attributes; if this is not the case, is raised. That object is then asked to assign the assigned object to the given attribute; if it cannot perform the assignment, it raises an exception (usually but not necessarily ).

    Note: If the object is a class instance and the attribute reference occurs on both sides of the assignment operator, the RHS expression, can access either an instance attribute or (if no instance attribute exists) a class attribute. The LHS target is always set as an instance attribute, creating it if necessary. Thus, the two occurrences of do not necessarily refer to the same attribute: if the RHS expression refers to a class attribute, the LHS creates a new instance attribute as the target of the assignment:

    This description does not necessarily apply to descriptor attributes, such as properties created with .

    classCls:x=3# class variableinst=Cls()inst.x=inst.x+1# writes inst.x as 4 leaving Cls.x as 3
  • If the target is a subscription: The primary expression in the reference is evaluated. It should yield either a mutable sequence object (such as a list) or a mapping object (such as a dictionary). Next, the subscript expression is evaluated.

    If the primary is a mutable sequence object (such as a list), the subscript must yield a plain integer. If it is negative, the sequence’s length is added to it. The resulting value must be a nonnegative integer less than the sequence’s length, and the sequence is asked to assign the assigned object to its item with that index. If the index is out of range, is raised (assignment to a subscripted sequence cannot add new items to a list).

    If the primary is a mapping object (such as a dictionary), the subscript must have a type compatible with the mapping’s key type, and the mapping is then asked to create a key/datum pair which maps the subscript to the assigned object. This can either replace an existing key/value pair with the same key value, or insert a new key/value pair (if no key with the same value existed).

  • If the target is a slicing: The primary expression in the reference is evaluated. It should yield a mutable sequence object (such as a list). The assigned object should be a sequence object of the same type. Next, the lower and upper bound expressions are evaluated, insofar they are present; defaults are zero and the sequence’s length. The bounds should evaluate to (small) integers. If either bound is negative, the sequence’s length is added to it. The resulting bounds are clipped to lie between zero and the sequence’s length, inclusive. Finally, the sequence object is asked to replace the slice with the items of the assigned sequence. The length of the slice may be different from the length of the assigned sequence, thus changing the length of the target sequence, if the object allows it.

CPython implementation detail: In the current implementation, the syntax for targets is taken to be the same as for expressions, and invalid syntax is rejected during the code generation phase, causing less detailed error messages.

WARNING: Although the definition of assignment implies that overlaps between the left-hand side and the right-hand side are ‘safe’ (for example swaps two variables), overlaps within the collection of assigned-to variables are not safe! For instance, the following program prints :

6.2.1. Augmented assignment statements¶

Augmented assignment is the combination, in a single statement, of a binary operation and an assignment statement:

augmented_assignment_stmt ::= ( | ) augtarget ::= | | | augop ::= "+=" | "-=" | "*=" | "/=" | "//=" | "%=" | "**=" | ">>=" | "<<=" | "&=" | "^=" | "|="

(See section Primaries for the syntax definitions for the last three symbols.)

An augmented assignment evaluates the target (which, unlike normal assignment statements, cannot be an unpacking) and the expression list, performs the binary operation specific to the type of assignment on the two operands, and assigns the result to the original target. The target is only evaluated once.

An augmented assignment expression like can be rewritten as to achieve a similar, but not exactly equal effect. In the augmented version, is only evaluated once. Also, when possible, the actual operation is performed in-place, meaning that rather than creating a new object and assigning that to the target, the old object is modified instead.

With the exception of assigning to tuples and multiple targets in a single statement, the assignment done by augmented assignment statements is handled the same way as normal assignments. Similarly, with the exception of the possible in-place behavior, the binary operation performed by augmented assignment is the same as the normal binary operations.

For targets which are attribute references, the same caveat about class and instance attributes applies as for regular assignments.

x=[0,1]i=0i,x[i]=1,2printx

6.3. The statement¶

Assert statements are a convenient way to insert debugging assertions into a program:

assert_stmt ::= "assert" ["," ]

The simple form, , is equivalent to

The extended form, , is equivalent to

These equivalences assume that and refer to the built-in variables with those names. In the current implementation, the built-in variable is under normal circumstances, when optimization is requested (command line option -O). The current code generator emits no code for an assert statement when optimization is requested at compile time. Note that it is unnecessary to include the source code for the expression that failed in the error message; it will be displayed as part of the stack trace.

Assignments to are illegal. The value for the built-in variable is determined when the interpreter starts.

if__debug__:ifnotexpression:raiseAssertionError
if__debug__:ifnotexpression1:raiseAssertionError(expression2)

6.4. The statement¶

pass_stmt ::= "pass"

is a null operation — when it is executed, nothing happens. It is useful as a placeholder when a statement is required syntactically, but no code needs to be executed, for example:

deff(arg):pass# a function that does nothing (yet)classC:pass# a class with no methods (yet)

6.5. The statement¶

del_stmt ::= "del"

Deletion is recursively defined very similar to the way assignment is defined. Rather than spelling it out in full details, here are some hints.

Deletion of a target list recursively deletes each target, from left to right.

Deletion of a name removes the binding of that name from the local or global namespace, depending on whether the name occurs in a statement in the same code block. If the name is unbound, a exception will be raised.

It is illegal to delete a name from the local namespace if it occurs as a free variable in a nested block.

Deletion of attribute references, subscriptions and slicings is passed to the primary object involved; deletion of a slicing is in general equivalent to assignment of an empty slice of the right type (but even this is determined by the sliced object).

6.6. The statement¶

print_stmt ::= "print" ([ ("," )* [","]] | ">>" [("," )+ [","]])

evaluates each expression in turn and writes the resulting object to standard output (see below). If an object is not a string, it is first converted to a string using the rules for string conversions. The (resulting or original) string is then written. A space is written before each object is (converted and) written, unless the output system believes it is positioned at the beginning of a line. This is the case (1) when no characters have yet been written to standard output, (2) when the last character written to standard output is a whitespace character except , or (3) when the last write operation on standard output was not a statement. (In some cases it may be functional to write an empty string to standard output for this reason.)

Note

Objects which act like file objects but which are not the built-in file objects often do not properly emulate this aspect of the file object’s behavior, so it is best not to rely on this.

A character is written at the end, unless the statement ends with a comma. This is the only action if the statement contains just the keyword .

Standard output is defined as the file object named in the built-in module . If no such object exists, or if it does not have a method, a exception is raised.

also has an extended form, defined by the second portion of the syntax described above. This form is sometimes referred to as “ chevron.” In this form, the first expression after the must evaluate to a “file-like” object, specifically an object that has a method as described above. With this extended form, the subsequent expressions are printed to this file object. If the first expression evaluates to , then is used as the file for output.

6.7. The statement¶

return_stmt ::= "return" []

may only occur syntactically nested in a function definition, not within a nested class definition.

If an expression list is present, it is evaluated, else is substituted.

leaves the current function call with the expression list (or ) as return value.

When passes control out of a statement with a clause, that clause is executed before really leaving the function.

In a generator function, the statement is not allowed to include an . In that context, a bare indicates that the generator is done and will cause to be raised.

6.8. The statement¶

yield_stmt ::=

The statement is only used when defining a generator function, and is only used in the body of the generator function. Using a statement in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.

When a generator function is called, it returns an iterator known as a generator iterator, or more commonly, a generator. The body of the generator function is executed by calling the generator’s method repeatedly until it raises an exception.

When a statement is executed, the state of the generator is frozen and the value of is returned to ’s caller. By “frozen” we mean that all local state is retained, including the current bindings of local variables, the instruction pointer, and the internal evaluation stack: enough information is saved so that the next time is invoked, the function can proceed exactly as if the statement were just another external call.

As of Python version 2.5, the statement is now allowed in the clause of a … construct. If the generator is not resumed before it is finalized (by reaching a zero reference count or by being garbage collected), the generator-iterator’s method will be called, allowing any pending clauses to execute.

For full details of semantics, refer to the Yield expressions section.

Note

In Python 2.2, the statement was only allowed when the feature has been enabled. This import statement was used to enable the feature:

from__future__importgenerators

See also

PEP 255 - Simple Generators
The proposal for adding generators and the statement to Python.
PEP 342 - Coroutines via Enhanced Generators
The proposal that, among other generator enhancements, proposed allowing to appear inside a … block.

6.9. The statement¶

raise_stmt ::= "raise" [ ["," ["," ]]]

If no expressions are present, re-raises the last exception that was active in the current scope. If no exception is active in the current scope, a exception is raised indicating that this is an error (if running under IDLE, a exception is raised instead).

Otherwise, evaluates the expressions to get three objects, using as the value of omitted expressions. The first two objects are used to determine the type and value of the exception.

If the first object is an instance, the type of the exception is the class of the instance, the instance itself is the value, and the second object must be .

If the first object is a class, it becomes the type of the exception. The second object is used to determine the exception value: If it is an instance of the class, the instance becomes the exception value. If the second object is a tuple, it is used as the argument list for the class constructor; if it is , an empty argument list is used, and any other object is treated as a single argument to the constructor. The instance so created by calling the constructor is used as the exception value.

If a third object is present and not , it must be a traceback object (see section The standard type hierarchy), and it is substituted instead of the current location as the place where the exception occurred. If the third object is present and not a traceback object or , a exception is raised. The three-expression form of is useful to re-raise an exception transparently in an except clause, but with no expressions should be preferred if the exception to be re-raised was the most recently active exception in the current scope.

Additional information on exceptions can be found in section Exceptions, and information about handling exceptions is in section The try statement.

6.10. The statement¶

break_stmt ::= "break"

may only occur syntactically nested in a or loop, but not nested in a function or class definition within that loop.

It terminates the nearest enclosing loop, skipping the optional clause if the loop has one.

If a loop is terminated by , the loop control target keeps its current value.

When passes control out of a statement with a clause, that clause is executed before really leaving the loop.

6.11. The statement¶

continue_stmt ::= "continue"

may only occur syntactically nested in a or loop, but not nested in a function or class definition or clause within that loop. It continues with the next cycle of the nearest enclosing loop.

When passes control out of a statement with a clause, that clause is executed before really starting the next loop cycle.

6.12. The statement¶

import_stmt ::= "import" ["as" ] ( "," ["as" ] )* | "from" "import" ["as" ] ( "," ["as" ] )* | "from" "import" "(" ["as" ] ( "," ["as" ] )* [","] ")" | "from" "import" "*" module ::= ( ".")* relative_module ::= "."* | "."+ name ::=

Import statements are executed in two steps: (1) find a module, and initialize it if necessary; (2) define a name or names in the local namespace (of the scope where the statement occurs). The statement comes in two forms differing on whether it uses the keyword. The first form (without ) repeats these steps for each identifier in the list. The form with performs step (1) once, and then performs step (2) repeatedly.

To understand how step (1) occurs, one must first understand how Python handles hierarchical naming of modules. To help organize modules and provide a hierarchy in naming, Python has a concept of packages. A package can contain other packages and modules while modules cannot contain other modules or packages. From a file system perspective, packages are directories and modules are files.

Once the name of the module is known (unless otherwise specified, the term “module” will refer to both packages and modules), searching for the module or package can begin. The first place checked is , the cache of all modules that have been imported previously. If the module is found there then it is used in step (2) of import.

If the module is not found in the cache, then is searched (the specification for can be found in PEP 302). The object is a list of finder objects which are queried in order as to whether they know how to load the module by calling their method with the name of the module. If the module happens to be contained within a package (as denoted by the existence of a dot in the name), then a second argument to is given as the value of the attribute from the parent package (everything up to the last dot in the name of the module being imported). If a finder can find the module it returns a loader (discussed later) or returns .

If none of the finders on are able to find the module then some implicitly defined finders are queried. Implementations of Python vary in what implicit meta path finders are defined. The one they all do define, though, is one that handles , , and .

The implicit finder searches for the requested module in the “paths” specified in one of two places (“paths” do not have to be file system paths). If the module being imported is supposed to be contained within a package then the second argument passed to , on the parent package, is used as the source of paths. If the module is not contained in a package then is used as the source of paths.

Once the source of paths is chosen it is iterated over to find a finder that can handle that path. The dict at caches finders for paths and is checked for a finder. If the path does not have a finder cached then is searched by calling each object in the list with a single argument of the path, returning a finder or raises . If a finder is returned then it is cached in and then used for that path entry. If no finder can be found but the path exists then a value of is stored in to signify that an implicit, file-based finder that handles modules stored as individual files should be used for that path. If the path does not exist then a finder which always returns is placed in the cache for the path.

If no finder can find the module then is raised. Otherwise some finder returned a loader whose method is called with the name of the module to load (see PEP 302 for the original definition of loaders). A loader has several responsibilities to perform on a module it loads. First, if the module already exists in (a possibility if the loader is called outside of the import machinery) then it is to use that module for initialization and not a new module. But if the module does not exist in then it is to be added to that dict before initialization begins. If an error occurs during loading of the module and it was added to it is to be removed from the dict. If an error occurs but the module was already in it is left in the dict.

The loader must set several attributes on the module. is to be set to the name of the module. is to be the “path” to the file unless the module is built-in (and thus listed in ) in which case the attribute is not set. If what is being imported is a package then is to be set to a list of paths to be searched when looking for modules and packages contained within the package being imported. is optional but should be set to the name of package that contains the module or package (the empty string is used for module not contained in a package). is also optional but should be set to the loader object that is loading the module.

If an error occurs during loading then the loader raises if some other exception is not already being propagated. Otherwise the loader returns the module that was loaded and initialized.

When step (1) finishes without raising an exception, step (2) can begin.

The first form of statement binds the module name in the local namespace to the module object, and then goes on to import the next identifier, if any. If the module name is followed by , the name following is used as the local name for the module.

The form does not bind the module name: it goes through the list of identifiers, looks each one of them up in the module found in step (1), and binds the name in the local namespace to the object thus found. As with the first form of , an alternate local name can be supplied by specifying “ localname”. If a name is not found, is raised. If the list of identifiers is replaced by a star (), all public names defined in the module are bound in the local namespace of the statement..

The public names defined by a module are determined by checking the module’s namespace for a variable named ; if defined, it must be a sequence of strings which are names defined or imported by that module. The names given in are all considered public and are required to exist. If is not defined, the set of public names includes all names found in the module’s namespace which do not begin with an underscore character (). should contain the entire public API. It is intended to avoid accidentally exporting items that are not part of the API (such as library modules which were imported and used within the module).

The form with may only occur in a module scope. If the wild card form of import — — is used in a function and the function contains or is a nested block with free variables, the compiler will raise a .

When specifying what module to import you do not have to specify the absolute name of the module. When a module or package is contained within another package it is possible to make a relative import within the same top package without having to mention the package name. By using leading dots in the specified module or package after you can specify how high to traverse up the current package hierarchy without specifying exact names. One leading dot means the current package where the module making the import exists. Two dots means up one package level. Three dots is up two levels, etc. So if you execute from a module in the package then you will end up importing . If you execute from within you will import . The specification for relative imports is contained within PEP 328.

is provided to support applications that determine which modules need to be loaded dynamically.

6.12.1. Future statements¶

A future statement is a directive to the compiler that a particular module should be compiled using syntax or semantics that will be available in a specified future release of Python. The future statement is intended to ease migration to future versions of Python that introduce incompatible changes to the language. It allows use of the new features on a per-module basis before the release in which the feature becomes standard.

future_statement ::= "from" "__future__" "import" feature ["as" name] ("," feature ["as" name])* | "from" "__future__" "import" "(" feature ["as" name] ("," feature ["as" name])* [","] ")" feature ::= identifier name ::= identifier

A future statement must appear near the top of the module. The only lines that can appear before a future statement are:

  • the module docstring (if any),
  • comments,
  • blank lines, and
  • other future statements.

The features recognized by Python 2.6 are , , , , , and . , , are redundant in Python version 2.6 and above because they are always enabled.

A future statement is recognized and treated specially at compile time: Changes to the semantics of core constructs are often implemented by generating different code. It may even be the case that a new feature introduces new incompatible syntax (such as a new reserved word), in which case the compiler may need to parse the module differently. Such decisions cannot be pushed off until runtime.

For any given release, the compiler knows which feature names have been defined, and raises a compile-time error if a future statement contains a feature not known to it.

The direct runtime semantics are the same as for any import statement: there is a standard module , described later, and it will be imported in the usual way at the time the future statement is executed.

The interesting runtime semantics depend on the specific feature enabled by the future statement.

Note that there is nothing special about the statement:

That is not a future statement; it’s an ordinary import statement with no special semantics or syntax restrictions.

Code compiled by an statement or calls to the built-in functions and that occur in a module containing a future statement will, by default, use the new syntax or semantics associated with the future statement. This can, starting with Python 2.2 be controlled by optional arguments to — see the documentation of that function for details.

A future statement typed at an interactive interpreter prompt will take effect for the rest of the interpreter session. If an interpreter is started with the option, is passed a script name to execute, and the script includes a future statement, it will be in effect in the interactive session started after the script is executed.

See also

PEP 236 - Back to the __future__
The original proposal for the __future__ mechanism.
import__future__[asname]

6.13. The statement¶

global_stmt ::= "global" ("," )*

The statement is a declaration which holds for the entire current code block. It means that the listed identifiers are to be interpreted as globals. It would be impossible to assign to a global variable without , although free variables may refer to globals without being declared global.

Names listed in a statement must not be used in the same code block textually preceding that statement.

Names listed in a statement must not be defined as formal parameters or in a loop control target, definition, function definition, or statement.

CPython implementation detail: The current implementation does not enforce the latter two restrictions, but programs should not abuse this freedom, as future implementations may enforce them or silently change the meaning of the program.

Programmer’s note: is a directive to the parser. It applies only to code parsed at the same time as the statement. In particular, a statement contained in an statement does not affect the code block containing the statement, and code contained in an statement is unaffected by statements in the code containing the statement. The same applies to the , and functions.

6.14. The statement¶

exec_stmt ::= "exec" ["in" ["," ]]

This statement supports dynamic execution of Python code. The first expression should evaluate to either a Unicode string, a Latin-1 encoded string, an open file object, a code object, or a tuple. If it is a string, the string is parsed as a suite of Python statements which is then executed (unless a syntax error occurs). [1] If it is an open file, the file is parsed until EOF and executed. If it is a code object, it is simply executed. For the interpretation of a tuple, see below. In all cases, the code that’s executed is expected to be valid as file input (see section File input). Be aware that the and statements may not be used outside of function definitions even within the context of code passed to the statement.

In all cases, if the optional parts are omitted, the code is executed in the current scope. If only the first expression after is specified, it should be a dictionary, which will be used for both the global and the local variables. If two expressions are given, they are used for the global and local variables, respectively. If provided, locals can be any mapping object. Remember that at module level, globals and locals are the same dictionary. If two separate objects are given as globals and locals, the code will be executed as if it were embedded in a class definition.

The first expression may also be a tuple of length 2 or 3. In this case, the optional parts must be omitted. The form is equivalent to , while the form is equivalent to . The tuple form of provides compatibility with Python 3, where is a function rather than a statement.

Changed in version 2.4: Formerly, locals was required to be a dictionary.

As a side effect, an implementation may insert additional keys into the dictionaries given besides those corresponding to variable names set by the executed code. For example, the current implementation may add a reference to the dictionary of the built-in module under the key (!).

Programmer’s hints: dynamic evaluation of expressions is supported by the built-in function . The built-in functions and return the current global and local dictionary, respectively, which may be useful to pass around for use by .

Footnotes

One thought on “With Expected At Least One Variable Assignment In Python

Leave a comment

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *