Commit f9900f3b authored by Nong Hoang Tu's avatar Nong Hoang Tu
Browse files

New upstream version 3.7.4

parents
Pipeline #3229 failed with stages
version: 2
jobs:
build:
working_directory: ~/rocky/python-uncompyle6
parallelism: 1
shell: /bin/bash --login
# CircleCI 2.0 does not support environment variables that refer to each other the same way as 1.0 did.
# If any of these refer to each other, rewrite them so that they don't or see https://circleci.com/docs/2.0/env-vars/#interpolating-environment-variables-to-set-other-environment-variables .
environment:
CIRCLE_ARTIFACTS: /tmp/circleci-artifacts
CIRCLE_TEST_REPORTS: /tmp/circleci-test-results
COMPILE: --compile
# To see the list of pre-built images that CircleCI provides for most common languages see
# https://circleci.com/docs/2.0/circleci-images/
docker:
- image: circleci/python:3.6.9
steps:
# Machine Setup
# If you break your build into multiple jobs with workflows, you will probably want to do the parts of this that are relevant in each
# The following `checkout` command checks out your code to your working directory. In 1.0 we did this implicitly. In 2.0 you can choose where in the course of a job your code should be checked out.
- checkout
# Prepare for artifact and test results collection equivalent to how it was done on 1.0.
# In many cases you can simplify this from what is generated here.
# 'See docs on artifact collection here https://circleci.com/docs/2.0/artifacts/'
- run: mkdir -p $CIRCLE_ARTIFACTS $CIRCLE_TEST_REPORTS
# This is based on your 1.0 configuration file or project settings
- run:
working_directory: ~/rocky/python-uncompyle6
command: pip install --user virtualenv && pip install --user nose && pip install --user pep8
# Dependencies
# This would typically go in either a build or a build-and-test job when using workflows
# Restore the dependency cache
- restore_cache:
keys:
- v2-dependencies-{{ .Branch }}-
# fallback to using the latest cache if no exact match is found
- v2-dependencies-
- run:
command: | # Use pip to install dependengcies
pip install --user --upgrade setuptools
pip install --user -e .
pip install --user -r requirements-dev.txt
# Save dependency cache
- save_cache:
key: v2-dependencies-{{ .Branch }}-{{ epoch }}
paths:
# This is a broad list of cache paths to include many possible development environments
# You can probably delete some of these entries
- vendor/bundle
- ~/virtualenvs
- ~/.m2
- ~/.ivy2
- ~/.bundle
- ~/.cache/bower
# Test
# This would typically be a build job when using workflows, possibly combined with build
# This is based on your 1.0 configuration file or project settings
- run: sudo python ./setup.py develop && make check-3.6
- run: cd ./test/stdlib && bash ./runtests.sh 'test_[p-z]*.py'
# Teardown
# If you break your build into multiple jobs with workflows, you will probably want to do the parts of this that are relevant in each
# Save test results
- store_test_results:
path: /tmp/circleci-test-results
# Save artifacts
- store_artifacts:
path: /tmp/circleci-artifacts
- store_artifacts:
path: /tmp/circleci-test-results
# These are supported funding model platforms
github: [rocky]
patreon: # Replace with a single Patreon username
open_collective: # Replace with a single Open Collective username
ko_fi: # Replace with a single Ko-fi username
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
liberapay: # Replace with a single Liberapay username
issuehunt: # Replace with a single IssueHunt username
otechie: # Replace with a single Otechie username
custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']
---
name: Bug report
about: Tell us about uncompyle6 bugs
---
<!-- __Note:__ Bugs are not for asking questions about a problem you
are trying to solve that involve the use of uncompyle6 along the way,
although I may be more tolerent of this if you sponsor the project.
Also, the unless you are a sponsor of the project, it may take a
while, maybe a week or so, before the bug report is noticed, let alone
acted upon.
To set expectations, some legitimate bugs can take years
to fix, but they eventually do get fixed. Funding the project was
added to address the problem that there are lots of people seeking
help and reporting bugs, but few people who are willing or capable of
providing help or fixing bugs.
Finally, have you read https://github.com/rocky/python-uncompyle6/blob/master/HOW-TO-REPORT-A-BUG.md
?
Please remove any of the optional sections if they are not applicable.
Prerequisites/Caveats
* Make sure the bytecode you have can be disassembled with a
disassembler and produces valid results.
* Don't put bytecode and corresponding source code on any service that
requires registration to download.
* When you open a bug report there is no privacy. If you need privacy, then
contact me by email and explain who you are and the need for privacy.
But be mindful that you may be asked to sponsor the project for the
personal and private help that you are requesting.
* If the legitimacy of the activity is deemed suspicous, I may flag it as suspicious,
making the issue even more easy to detect.
Bug reports that violate the above may be discarded.
-->
## Description
<!-- Add a clear and concise description of the bug. -->
## How to Reproduce
<!-- Please show both the *input* you gave and the
output you got in describing how to reproduce the bug:
or give a complete console log with input and output
```console
$ uncompyle6 <command-line-options>
...
$
```
Provide links to the Python bytecode. For example you can create a
gist with the information. If you have the correct source code, you
can add that too.
-->
## Expected behavior
<!-- Add a clear and concise description of what you expected to happen. -->
## Environment
<!-- _This section sometimes is optional but helpful to us._
Please modify for your setup
- Uncompyle6 version: output from `uncompyle6 --version` or `pip show uncompyle6`
- Python version for the version of Python the byte-compiled the file: `python -c "import sys; print(sys.version)"` where `python` is the correct Cpython or Pypy binary.
- OS and Version: [e.g. Ubuntu bionic]
-->
## Additional Environment or Context
<!-- _This section is optional._
Add any other context about the problem here or special environment setup.
-->
---
name: Feature Request
about: Tell us about a new feature that you would like to see in uncompyle6
---
## Description
<!-- Add a short description of the feature. This might
include same input and output. -->
## Background
<!-- Add any additional background for the
feature, for example: user scenarios, or the value of the feature. -->
## Tests
<!-- _This section is optional._
Add text with suggestions on how to test the feature,
if it is not obvious.
-->
*.pyc
*.pyo
*_dis
*~
/.cache
/.eggs
/.hypothesis
/.idea
/.mypy_cache
/.pytest_cache
/.python-version
/.tox
.mypy_cache
/.venv*
/README
/__pkginfo__.pyc
/dist
/how-to-make-a-release.txt
/nose-*.egg
/tmp
/uncompyle6.egg-info
/unpyc
ChangeLog
__pycache__
build
nohup.out
language: python
python:
# - '3.5'
# - '2.7'
# - '3.4'
- '3.6'
- '3.8'
matrix:
include:
- python: '3.7'
dist: xenial # required for Python >= 3.7 (travis-ci/travis-ci#9069)
install:
- pip install -e .
- pip install -r requirements-dev.txt
script:
- python ./setup.py develop && COMPILE='--compile' make check
# blacklist
branches:
except:
- data-driven-pytest
This diff is collapsed.
This is the changelog from *decompyle*'s release 2.4 and before
passed on by Dan Pascu
release 2.4 (Dan Pascu)
- Replaced the way code structures are identified by the parser.
Previously, the scanner introduced some COME_FROM entries in the
dissasembly output to mark all the destinations of jump instructions.
Using these COME_FROM labels the parser was then able to identify the
code structures (if tests, while loops, etc). Up to python-2.3 this was
possible because the code structures were clearly defined and jump
targets were always to the same points in a given strcuture making it
easy to identify the structure. Python 2.3 however introduced optimized
jumps to increase code performance. In the previous version of decompyle
(2.3) we used a technique to identify the code structures and then used
these structures to determine where the jump targets would have been if
not optimized. Using this information we then added COME_FROM labels at
the points where they would have been if not optimized, thus emulating
the way decompyle worked with versions before python 2.3. However with
the introduction of even more optimizations in python 2.4 this technique
no longer works. Not only the jump targets are no longer an effective
mean for the parser to identify the code structures, but also trying to
emulate the old way things were solved when it clearly no longer works
is not the right solution. To solve this issue, the code to identify the
structures that we had developed in version 2.3, was used to add real
start/end points for strcuture identification, instead of the COME_FROM
labels. Now these new start/end labels are used by the parser to more
precisely identify the structures and the COME_FROM labels were removed
completely. The scanner is responsible to identify these code structures
and use any knowledge of optimizations that python applies to determine
the start/end points of any structure and then mark them with certain
keywords that are understood by the parser.
- Correctly identify certain `while 1' structures that were not
recognized in the previous version.
- Added support for new byte code constructs used by python 2.4
release 2.3.2
- tidied up copyright and changelog information for releases 2.3 and later
release 2.3.1 (Dan Pascu)
- implemented a structure detection technique that fixes problems with
optimised jumps in Python >= 2.3. In the previous release (decompyle 2.3),
these problems meant that some files were incorrectly decompiled and
others could not be decompiled at all. With this new structure detection
technique, thorough testing over the standard python libraries suggests
that decompyle 2.3.1 can handle everything that decompyle 2.2beta1 could,
plus new Python 2.3 bytecodes and constructs.
release 2.3 (Dan Pascu)
- support for Python 2.3 added
- use the marshal and disassembly code from their respective python
versions, so that decompyle can manipulate bytecode independently
of the interpreter that runs decompyle itself (for example it can
decompile python2.3 bytecode even when running under python2.2)
——————————————————
release 2.2beta1 (hartmut Goebel)
- support for Python 1.5 up to Python 2.2
- no longer requires to be run with the Python interpreter version
which generated the byte-code.
- requires Python 2.2
- pretty-prints docstrings, hashes, lists and tuples
- decompyle is now a script and a package
- added emacs mode-hint and tab-width for each file output
- enhanced test suite: more test patterns, .pyc/.pyo included
- avoids unnecessary 'global' statements
- still untested: EXTENDED_ARG
internal changes:
- major code overhoul: splitted into several modules, clean-ups
- use a list of valid magics instead of the single one from imp.py
- uses copies of 'dis.py' for every supported version. This ensures
correct disassemling of the byte-code.
- use a single Walker and a single Parser, thus saving time and memory
- use augmented assign and 'print >>' internally
- optimized 'Walker.engine', the main part of code generation
release 0.6.0: (hartmut Goebel)
- extended print (Python 2.0)
- extended import (Python 2.0) (may not cover all cases)
- augmented assign (Python 2.0) (may not cover all cases)
- list comprehensions (Python 2.0)
- equivalent for 'apply' (Python 1.6)
- if .. elif .. else are now nested as expected
- assert test, data
- unpack list corrected (was the same as unpack tuple)
- fixed unpack tuple (trailing semicolon was missing)
- major speed up :-)
- reduced memory usage (pre-alpha-0.5 has increased it a lot)
- still missing: EXTENDED_ARG
pre-alpha-0.5: (hartmut Goebel)
- *args, **kwargs
- global
- formal tuple parameters (eg. def a(self, (x,y,z)) )
- actual lambda parameters (eg. X(lambda z: z**2) )
- remove last 'return None' in procedures
- remove last 'return locals()' in class definitions
- docstrings
pre-alpha-0.4: (hartmut Goebel)
- assert
- try/except/finally
- parentheses in expressions
- nested expressions
- extracted dissassemble() from module dis and
removed ugly redirect of stdout, thus saved a lot of
ugly code and a lot of memory
pre-alpha-0.3: (hartmut Goebel)
- keyword arguments
- some boolean expressions
- and/or
- complex conditions in if/while
- read byte-code from .pyc without importing
- access to the body of classes and modules
- class and function definitions
- a = b = c = xxx
pre-alpha-0.1 -> pre-alpha-0.2:
- SET_LINENO filtered out in lexer now
- added support for subscripts (just for Christian Tismer :-)
- fixed bug with handling of BUILD_{LIST,TUPLE} & CALL_FUNCTION
- dict-building support
- comparison support
- exec support
- del support
- pass support
- slice support
- no more extraneous (albeit legal) commas
- finally, it excepts try [sic] but not all 42 variations of it
This project has history of over 18 years spanning back to Python 1.5
There have been a number of people who have worked on this. I am awed
by the amount of work, number of people who have contributed to this,
and the cleverness in the code.
The below is an annotated history from talking to participants
involved and my reading of the code and sources cited.
In 1998, John Aycock first wrote a grammar parser in Python,
eventually called SPARK, that was usable inside a Python program. This
code was described in the
[7th International Python Conference](http://legacy.python.org/workshops/1998-11/proceedings/papers/aycock-little/aycock-little.html). That
paper doesn't talk about decompilation, nor did John have that in mind
at that time. It does mention that a full parser for Python (rather
than the simple languages in the paper) was being considered.
[This](http://pages.cpsc.ucalgary.ca/~aycock/spark/content.html#contributors)
contains a of people acknowledged in developing SPARK. What's amazing
about this code is that it is reasonably fast and has survived up to
Python 3 with relatively little change. This work was done in
conjunction with his Ph.D Thesis. This was finished around 2001. In
working on his thesis, John realized SPARK could be used to deparse
Python bytecode. In the fall of 1999, he started writing the Python
program, "decompyle", to do this.
To help with control structure deparsing the instruction sequence was
augmented with pseudo instruction COME_FROM. This code introduced
another clever idea: using table-driven semantics routines, using
format specifiers.
The last mention of a release of SPARK from John is around 2002. As
released, although the Earley Algorithm parser was in good shape, this
code was woefully lacking as serious Python deparser.
In the fall of 2000, Hartmut Goebel
[took over maintaining the code](https://groups.google.com/forum/#!searchin/comp.lang.python/hartmut$20goebel/comp.lang.python/35s3mp4-nuY/UZALti6ujnQJ). The
first subsequent public release announcement that I can find is
["decompyle - A byte-code-decompiler version 2.2 beta 1"](https://mail.python.org/pipermail/python-announce-list/2002-February/001272.html).
From the CHANGES file found in
[the tarball for that release](http://old-releases.ubuntu.com/ubuntu/pool/universe/d/decompyle2.2/decompyle2.2_2.2beta1.orig.tar.gz),
it appears that Hartmut did most of the work to get this code to
accept the full Python language. He added precedence to the table
specifiers, support for multiple versions of Python, the
pretty-printing of docstrings, lists, and hashes. He also wrote test and verification routines of
deparsed bytecode, and used this in an extensive set of tests that he also wrote. He says he could verify against the
entire Python library. However I have subsequently found small and relatively obscure bugs in the decompilation code.
decompyle2.2 was packaged for Debian (sarge) by
[Ben Burton around 2002](https://packages.qa.debian.org/d/decompyle.html). As
it worked on Python 2.2 only long after Python 2.3 and 2.4 were in
widespread use, it was removed.
[Crazy Compilers](http://www.crazy-compilers.com/decompyle/) offers a
byte-code decompiler service for versions of Python up to 2.6. As
someone who worked in compilers, it is tough to make a living by
working on compilers. (For example, based on
[John Aycock's recent papers](http://pages.cpsc.ucalgary.ca/~aycock/)
it doesn't look like he's done anything compiler-wise since SPARK). So
I hope people will use the crazy-compilers service. I wish them the
success that his good work deserves.
Dan Pascu did a bit of work from late 2004 to early 2006 to get this
code to handle first Python 2.3 and then 2.4 bytecodes. Because of
jump optimization introduced in the CPython bytecode compiler at that
time, various JUMP instructions were classified to assist parsing For
example, due to the way that code generation and line number table
work, jump instructions to an earlier offset must be looping jumps,
such as those found in a "continue" statement; "COME FROM"
instructions were reintroduced. See
[RELEASE-2.4-CHANGELOG.txt](https://github.com/rocky/python-uncompyle6/blob/master/DECOMPYLE-2.4-CHANGELOG.txt)
for more details here. There wasn't a public release of RELEASE-2.4
and bytecodes other than Python 2.4 weren't supported. Dan says the
Python 2.3 version could verify the entire Python library. But given
subsequent bugs found like simply recognizing complex-number constants
in bytecode, decompilation wasn't perfect.
Next we get to ["uncompyle" and
PyPI](https://pypi.python.org/pypi/uncompyle/1.1) and the era of
public version control. (Dan's code although not public used
[darcs](http://darcs.net/) for version control.)
In contrast to _decompyle_, _uncompyle_ at least in its final versions,
runs only on Python 2.7. However it accepts bytecode back to Python
2.5. Thomas Grainger is the package owner of this, although Hartmut is
still listed as the author.
The project exists not only on
[github](https://github.com/gstarnberger/uncompyle) but also on
[bitbucket](https://bitbucket.org/gstarnberger/uncompyle) and later
the defunct [google
code](https://code.google.com/archive/p/unpyc/). The git/svn history
goes back to 2009. Somewhere in there the name was changed from
"decompyle" to "unpyc" by Keknehv, and then to "uncompyle" by Guenther Starnberger.
The name Thomas Grainger isn't found in (m)any of the commits in the
several years of active development. First Keknehv worked on this up
to Python 2.5 or so while acceping Python bytecode back to 2.0 or
so. Then hamled made a few commits earler on, while Eike Siewertsen
made a few commits later on. But mostly wibiti, and Guenther
Starnberger got the code to where uncompyle2 was around 2012.
While John Aycock and Hartmut Goebel were well versed in compiler
technology, those that have come afterwards don't seem to have been as
facile in it. Furthermore, documentation or guidance on how the
decompiler code worked, comparison to a conventional compiler
pipeline, how to add new constructs, or debug grammars was weak. Some
of the grammar tracing and error reporting was a bit weak as well.
Given this, perhaps it is not surprising that subsequent changes
tended to shy away from using the built-in compiler technology
mechanisms and addressed problems and extensions by some other means.
Specifically, in `uncompyle`, decompilation of python bytecode 2.5 &
2.6 is done by transforming the byte code into a pseudo-2.7 Python
bytecode and is based on code from Eloi Vanderbeken. A bit of this
could have been easily added by modifying grammar rules.
This project, `uncompyle6`, abandons that approach for various
reasons. Having a grammar per Python version is much cleaner and it
scales indefinitely. That said, we don't have entire copies of the
grammar, but work off of differences from some neighboring version.
Should there be a desire to rebase or start a new base version to work
off of, say for some future Python version, that can be done by
dumping a grammar for a specific version after it has been loaded
incrementally. You can get a full dump of the grammar by profiling the
grammar on a large body of Python source code.
Another problem with pseudo-2.7 bytecode is that that we need offsets
in fragment deparsing to be exactly the same as the bytecode; the
transformation process can remove instructions. _Adding_ instructions
with psuedo offsets is however okay.
`Uncompyle6` however owes its existence to the fork of `uncompyle2` by
Myst herie (Mysterie) whose first commit picks up at
2012. I chose this since it seemed to have been at that time the most
actively, if briefly, worked on. Also starting around 2012 is Dark
Fenx's uncompyle3 which I used for inspiration for Python3 support.
I started working on this late 2015, mostly to add fragment support.
In that, I decided to make this runnable on Python 3.2+ and Python 2.6+
while, handling Python bytecodes from Python versions 2.5+ and
3.2+. In doing so, it has been expedient to separate this into three
projects:
* marshaling/unmarshaling, bytecode loading and disassembly ([xdis](https://pypi.python.org/pypi/xdis)),
* parsing and tree building ([spark_parser](https://pypi.python.org/pypi/spark_parser)),
* this project - grammar and semantic actions for decompiling
([uncompyle6](https://pypi.python.org/pypi/uncompyle6)).
Over the many years, code styles and Python features have
changed. However brilliant the code was and still is, it hasn't really
had a single public active maintainer. And there have been many forks
of the code. I have spent a great deal of time trying to organize and
modularize the code so that it can handle more Python versions more
gracefully (with still only moderate success).
That it has been in need of an overhaul has been recognized by the
Hartmut a decade an a half ago:
[decompyle/uncompile__init__.py](https://github.com/gstarnberger/uncompyle/blob/master/uncompyle/__init__.py#L25-L26)
NB. This is not a masterpiece of software, but became more like a hack.
Probably a complete rewrite would be sensefull. hG/2000-12-27
This project deparses using an Earley-algorithm parse with lots of
massaging of tokens and the grammar in the scanner
phase. Earley-algorithm parsers are context free and tend to be linear
if the grammar is LR or left recursive. There is a technique for
improving LL right recursion, but our parser doesn't have that yet.
Another approach to decompiling, and one that doesn't use grammars is
to do something like simulate execution symbolically and build
expression trees off of stack results. Control flow in that approach
still needs to be handled somewhat ad hoc. The two important projects
that work this way are [unpyc3](https://code.google.com/p/unpyc3/) and
most especially [pycdc](https://github.com/zrax/pycdc) The latter
project is largely by Michael Hansen and Darryl Pogue. If they
supported getting source-code fragments, did a better job in
supporting Python more fully, and had a way I could call it from
Python, I'd probably would have ditched this and used that. The code
runs blindingly fast and spans all versions of Python, although more
recently Python 3 support has been lagging. The code is impressive for
its smallness given that it covers many versions of Python. However, I
think it has reached a scalability issue, same as all the other
efforts. To handle Python versions more accurately, I think that code
base will need to have a lot more code specially which specializes for
Python versions. And then it will run into a modularity problem.
Tests for the project have been, or are being, culled from all of the
projects mentioned. Quite a few have been added to improve grammar
coverage and to address the numerous bugs that have been encountered.
If you think, as I am sure will happen in the future, "hey, I can just
write a decompiler from scratch and not have to deal with all all of
the complexity here", think again. What is likely to happen is that
you'll get at best a 90% solution working for a single Python release
that will be obsolete in about a year, and more obsolete each
subsequent year. Writing a decompiler for Python gets harder as it
Python progresses, so writing one for Python 3.7 isn't as easy as it
was for Python 2.2. That said, if you still feel you want to write a
single version decompiler, look at the test cases in this project and
talk to me. I may have some ideas.
For a little bit of the history of changes to the Earley-algorithm parser,
see the file [NEW-FEATURES.rst](https://github.com/rocky/python-spark/blob/master/NEW-FEATURES.rst) in the [python-spark github repository](https://github.com/rocky/python-spark).
NB. If you find mistakes, want corrections, or want your name added
(or removed), please contact me.
<!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc -->
**Table of Contents**
- [The difficulty of the problem](#the-difficulty-of-the-problem)
- [Is it really a bug?](#is-it-really-a-bug)
- [Do you have valid bytecode?](#do-you-have-valid-bytecode)
- [Semantic equivalence vs. exact source code](#semantic-equivalence-vs-exact-source-code)
- [What to send (minimum requirements)](#what-to-send-minimum-requirements)
- [What to send (additional helpful information)](#what-to-send-additional-helpful-information)
- [But I don't *have* the source code!](#but-i-dont-have-the-source-code)
- [But I don't *have* the source code and am incapable of figuring how how to do a hand disassembly!](#but-i-dont-have-the-source-code-and-am-incapable-of-figuring-how-how-to-do-a-hand-disassembly)
- [Narrowing the problem](#narrowing-the-problem)
- [Karma](#karma)
- [Confidentiality of Bug Reports](#confidentiality-of-bug-reports)
- [Ethics](#ethics)
<!-- markdown-toc end -->
# The difficulty of the problem
This decompiler is a constant work in progress: Python keeps
changing, and so does its code generation.
There is no Python decompiler yet that I know about that will
decompile everything. Overall, I think this one probably does the best
job of *any* Python decompiler that handles such a wide range of
versions.
But at any given time, there are a number of valid Python bytecode
files that I know of that will cause problems. See, for example, the
list in