Jekyll2020-12-29T16:48:26+00:00https://petr-muller.github.io/feed.xmlFlawless!Petr Muller, Software Quality Guyclitest: command line tester2018-05-02T00:00:00+00:002018-05-02T00:00:00+00:00https://petr-muller.github.io/tools/2018/05/02/clitest<p>It’s like Python’s <a href="https://docs.python.org/3/library/doctest.html">doctest</a>
but for CLIs. Given a text file containing snippets of shell sessions (prompts
with commands and their expected outputs), clitest executes the snippets and
verifies the actual output matches the described output. It is written in shell
language. The project lives on
<a href="https://github.com/aureliojargas/clitest">GitHub</a> and is available under MIT
license. It does not see much development anymore, but that is fairly
understandable given its simplicity. Nevertheless, the author continues to
maintain deliverables and documentation. The project has comprehensive
documentation in its
<a href="https://github.com/aureliojargas/clitest/blob/master/README.md">README</a>.</p>
<h2 id="installation-and-usage">Installation and usage</h2>
<p>Installation and usage are trivial: the whole clitest is just a single shell
script with no external dependencies. Hence, it is easy to include copies of
clitest in repositories. I was curious how is the little tool implemented in
shell, and I discovered the code is nice and readable. I could not resist
trying to run <a href="https://www.shellcheck.net/">ShellCheck</a> on it, and it only spat
out few style issues: good job!</p>
<p>I tried to use clitest as a testing driver for
<a href="https://github.com/petr-muller/pyff">pyff</a> project examples so that I only
need to provide expected output for each example. It worked nice, although the
way how clitest executes the commands to test makes it a little sensitive
about <code class="language-plaintext highlighter-rouge">$CWD</code>, <code class="language-plaintext highlighter-rouge">$PATH</code> and the like. Fortunately, these are quite
straightforward to sort out. The test files for pyff are standard Python
source code files, so I needed to embed the clitest instructions into Python
comments (lines starting with the <code class="language-plaintext highlighter-rouge">#</code> character). I used the <code class="language-plaintext highlighter-rouge">--prefix</code> option
for this; it tunes the way how clitest searches for testable snippets in
files.</p>
<p>I have encountered a minor pitfall where pyff does not have entirely
deterministic output. The output is correct, but the ordering of some elements
may be different between runs. clitest fully matches the output against the expected
one, so it is harder to use clitest in these cases (I have decided to make
pyff deterministic, even if the order does not matter). There is a clitest
feature which can alter the matching when using the <a href="https://github.com/aureliojargas/clitest/blob/master/README.md#alternative-syntax-inline-output">inline output
syntax</a>,
but my use case did not match the inline output feature well.</p>
<h2 id="when-would-i-use-it">When would I use it?</h2>
<p>I like clitest because it is a single file drop-in without dependencies. This
makes it easy to drop it into a repository where it can serve as a lightweight
integration test driver: all you need are some input files, define how to
execute the program under test and specify expected output.</p>
<p>A second nifty use case is mentioned in the clitest documentation: it can be
nicely used to test the code snippets in your documentation for correctness.
Often, you will write the documentation in markdown or something similar, and
it will contain snippets of how to execute the program and what is the expected
output. This way, you can feed your documentation straight into clitest which
can, without further work, validate your examples are still correct. You
may even do this within a CI automatically, so your documentation is always
up-to-date.</p>It’s like Python’s doctest but for CLIs. Given a text file containing snippets of shell sessions (prompts with commands and their expected outputs), clitest executes the snippets and verifies the actual output matches the described output. It is written in shell language. The project lives on GitHub and is available under MIT license. It does not see much development anymore, but that is fairly understandable given its simplicity. Nevertheless, the author continues to maintain deliverables and documentation. The project has comprehensive documentation in its README.Python Diff2018-04-06T00:00:00+00:002018-04-06T00:00:00+00:00https://petr-muller.github.io/projects/2018/04/06/python-diff-started<p><strong>GitHub repository:</strong> <a href="https://github.com/petr-muller/pyff">petr-muller/pyff</a></p>
<p>The idea of syntactic/semantic-aware diff tool was in my head since we needed
something similar for a project we were working in <a href="https://research.redhat.com/">Red Hat
Lab</a> together with
<a href="http://www.fit.vutbr.cz/research/groups/verifit/">VeriFIT</a> research group. We
wanted to connect code differences (git commits or PRs) with test results and
build a “riskiness classifier”. The rough idea was something like <em>”whenever
people change I/O code in method M of class C, test T tends to break”</em>. We were
missing the analyzer that would easily give us, in machine-readable format, what
actually changed in the code, besides changed lines that a simple <code class="language-plaintext highlighter-rouge">diff</code> tool
can give you. We somehow managed to build something ad-hoc for C code
differences and continued, but since then I thought the smart diff could be an
interesting project.</p>
<h2 id="comparing-abstract-syntax-trees">Comparing abstract syntax trees</h2>
<p>I decided to start in a simple way: take two versions of a Python file as in
input, and work over their AST to detect differences. There is an <a href="https://docs.python.org/3/library/ast.html">AST
module</a> in Python standard library
that can parse Python code easily but I remembered a talk on Pylint which
described <a href="https://github.com/PyCQA/astroid">Astroid</a> as an improved module
with more functionality (build for usage in Pylint). I wanted to use it but
failed to find a current documentation link; for some reason I kept discovering
<code class="language-plaintext highlighter-rouge">www.astroid.org</code> which is dead for some time (I discovered the <a href="http://astroid.readthedocs.io">current
documentation</a> later).</p>
<p>So I decided to go with “vanilla” <code class="language-plaintext highlighter-rouge">ast</code> module for a while. I discovered the
very helpful <a href="https://greentreesnakes.readthedocs.io/en/latest/">Green Tree Snakes - the missing Python AST
docs</a> documentation for it
and from there, the first steps were quite simple. I chose the approach of
driving the development by examples: I selected a git commit from a different
project, looked at the diff and asked myself <em>“what changed in that code?”</em>,
then went to implement the necessary code.</p>
<p>I have started with detecting added and removed imports, classes and high-level
methods in the module, followed by detecting simple changes of these entities
such as added/removed methods and changed implementations. The entities are
currently identified by name, which means renaming is not properly detected (it
will be reported as one class/method removed and another added). At the moment,
the only supported output is the natural language summary of the changes.</p>
<p>After I had this MVP version of <code class="language-plaintext highlighter-rouge">pyff</code> ready I went on to set up some necessary
project infrastructure: README, tests and some helper code.</p>
<h2 id="further-steps">Further steps</h2>
<p>I would like to implement a programatical API and a machine readable output format
(probably JSON), then follow with implementing further change types detection. I
will probably continue with the example-driven approach, but I would like to
implement some “smart” detection soon: something like recognizing that the
program was not semantically changed (for example, a simple variable rename) and
not reporting an implementation change in that case.</p>GitHub repository: petr-muller/pyffSabbatical, Month 22018-04-03T00:00:00+00:002018-04-03T00:00:00+00:00https://petr-muller.github.io/personal/2018/04/03/sabbatical-month-2<p>I started working during my second university year (my first job was being a tester in Grisoft,
which later became AVG, which later merged with Avast) and I was employed non-stop since then. I
spent ten years with Red Hat, from where I moved to SAP Labs and had not taken any free time between
the jobs. So when I decided to leave SAP Labs, the idea of taking few months off was irresistible. I
did not have any clear idea of what I would like to achieve during the sabbatical. People often take
a career break to pursue a specific project, to work on a dream project. I did not have any such
goal. I knew I would like to reduce my book backlog, to live more healthy (exercise and sleep more),
get the driving license and finally, do more for fun, open-source coding.</p>
<h1 id="python-projects">Python projects</h1>
<p>I am already quite proficient in Python, so I went on to build some projects I had in my head for
some time. The projects are not that technologically advanced, so I decided to intentionally try
some new tools while working on them, and learn in the process. I am trying to use
<a href="https://docs.pytest.org/en/latest/">pytest</a> for testing (along with some plugins) and
<a href="http://mypy-lang.org/">mypy</a> and <a href="https://www.pylint.org/">Pylint</a> for static analysis. I also
connected few modern, cloudy, GitHub-connected tools to my repos to evaluate how they work - I tried
<a href="https://www.codacy.com/">Codacy</a>, <a href="https://codeclimate.com/">Code Climate</a>,
<a href="https://codecov.io/">Codecov</a>, <a href="https://coveralls.io/">Coveralls</a>, <a href="https://tidelift.com/">Dependency
CI/Tidelift</a> and <a href="https://travis-ci.org/">Travis CI</a>. I like the direction
where these tools are going, and I might write a post about them in the future.</p>
<h2 id="python-diff"><a href="https://github.com/petr-muller/pyff">Python Diff</a></h2>
<p>This is a Python-based toy project that is supposed to compare two versions of a Python module (say,
a part of a Git commit) and determine the syntactical/semantical difference between them. I do not
have a clear idea about what exactly could the detected differences be, but I am starting with
simple stuff like added/removed classes or methods and continue with detecting methods with changed
API or just implementation, etc. The initial goal is to provide some machine-readable artifact
describing the difference and then try to build further tools over them. Possible directions for
this project might be a GitHub PR commenter bot posting human-readable summaries of changes (I could
learn more about cloud services, GitHub API and natural language generation in the process), or
combining the difference data with other sources like code coverage and perhaps some machine
learning, and building a tool to predict “riskiness” of a Python project PR.</p>
<p>Of all my current project, I probably like Python Diff the best. After it moves a bit, I will do a
separate post about it, and I will surely consider doing a talk about it on some Python conference.</p>
<h2 id="vtes-game-log"><a href="https://github.com/petr-muller/vtes">VtES Game Log</a></h2>
<p>In last autumn, I started playing <a href="http://www.vekn.net/what-is-v-tes">Vampire: the Eternal Struggle</a>
(nearly dead, old-school, multiplayer CCG) again after a period of hiatus. I play using various
decks against various players in various settings (friendlies in a pub, online, tournaments) so I
decided to build a simple tool for tracking my games and results, going from a local, CLI-oriented
utility to first an online REST API and then possibly a web application. The underlying data
structures are trivial, so I would like to learn more about writing REST APIs (possibly
experimenting with some API tools like <a href="https://apiary.io/">Apiary</a>,
<a href="http://dredd.org/en/latest/">Dredd</a> or <a href="https://www.3scale.net/">3scale</a>) and running them on some
cloud platform like AWS or Openshift.</p>
<h1 id="collaborations">Collaborations</h1>
<p>After I finished in SAP, I let some of my friends know, we met and discussed some possible
collaboration.</p>
<h2 id="engeto-testing-course">Engeto Testing Course</h2>
<p>My former Red Hat colleague Filip founded an <a href="https://engeto.cz">IT education startup</a> where one of
the things they do are courses. They do not have a course focused on software testing yet, so we
agreed we would collaborate on creating one together. I was mostly researching on what content to
include in the course until now. Now we finally have the research done, and we are following with an
outline, and we will start producing the actual content soon.</p>
<h2 id="perun">Perun</h2>
<p><a href="https://github.com/tfiedor/perun">Perun</a> is a very interesting pet project of my former colleague
of <a href="http://www.fit.vutbr.cz/research/groups/verifit/">VeriFIT</a>, Tomáš Fiedor. It is a long-term
performance tracking/control system which attaches your project’s performance metrics (performance
test results, profiling information and the like) to your Git revision tree, so you can track,
analyze and visualize them over time, possibly spotting performance regressions as they happen
during development. We had a nice talk with Tomáš about possible directions of this project, one of
which was taking it into the cloud and making a Git/GitHub-connected web application (like Travis CI
or Code Climate) out of it. Unfortunately, this was the one project for which I failed to allocate
sufficient time :(</p>
<h1 id="books">Books</h1>
<h2 id="agile-testing"><a href="http://a.co/bRIw1v0">Agile Testing</a></h2>
<p>I have started reading this one in January already but had a hard time finishing it because it
became quite tedious to read (it tends to be repetitive and vague). I managed to finish it in March,
finally. It is starting to show its age because it focuses so much on managing situations where a
tester’s company is not that friendly to Agile, transitioning from Waterfall or other non-ideal
situations not that common today. I liked the Agile Testing Quadrant and the last part about how
testers can contribute in different stages of an agile project.</p>
<h2 id="coders-at-work"><a href="http://www.codersatwork.com/">Coders at Work</a></h2>
<p>This is also a fatty (around 600 pages), but it reads well being in an interview format; at least
for me, it does. I am currently somewhere in the middle. It does not give a reader any straight
applicable material, but it is interesting to read because the interviewees have different
achievements and backgrounds. I especially like to compare answers of different people to identical
questions.</p>
<h1 id="misc">Misc</h1>
<p>I was not doing just IT stuff. I finally obtained my driving license, which took quite a lot of
time, although I managed to pass both exams on a first try. I also spent a lot of time on
physiotherapy exercises to treat my <a href="https://en.wikipedia.org/wiki/Patellar_tendinitis">jumper’s
knee</a>. It healed nicely, but it seems to be back
after few football matches :(</p>I started working during my second university year (my first job was being a tester in Grisoft, which later became AVG, which later merged with Avast) and I was employed non-stop since then. I spent ten years with Red Hat, from where I moved to SAP Labs and had not taken any free time between the jobs. So when I decided to leave SAP Labs, the idea of taking few months off was irresistible. I did not have any clear idea of what I would like to achieve during the sabbatical. People often take a career break to pursue a specific project, to work on a dream project. I did not have any such goal. I knew I would like to reduce my book backlog, to live more healthy (exercise and sleep more), get the driving license and finally, do more for fun, open-source coding.Radamsa: A general-purpose fuzzer2018-01-05T00:00:00+00:002018-01-05T00:00:00+00:00https://petr-muller.github.io/tools/2018/01/05/radamsa<p><strong>Radamsa</strong> is a general-purpose, black-box oriented mutating fuzzer. It is
written in Scheme and is available on its <a href="https://github.com/aoh/radamsa">GitHub
page</a> under the MIT license. While the project
is not entirely abandoned (there are occasional commits on <code class="language-plaintext highlighter-rouge">develop</code> branch,
but the last commit on the <code class="language-plaintext highlighter-rouge">master</code> branch is a PR merge six months ago), there
does not seem much development to happen anymore. The project is a side result
of the research done by <a href="https://www.ee.oulu.fi/roles/ouspg/FrontPage">Oulu University Secure Programming
Group</a>. The project has simple but
straightforward and information documentation in the repository README file.</p>
<h2 id="basics">Basics</h2>
<p>Its documentation describes Radamsa as an “extremely black-box fuzzer”: it does
not need any information about neither the input format nor the internals of
the fuzzed program. The tool starts with a given sample input for an
application, on which it applies a mutation while trying to keep the general
format valid(-ish). The root of Radamsa was a research on the automatic
analysis of communication protocols.</p>
<p>Radamsa claims to be applicable, without any configuration, on programs
processing any format of input - binary or text. Quick experiments (see below)
show that Radamsa is quite successful. Although Radamsa’s output cannot
probably compare with format-specialized fuzzers (such as
<a href="https://embed.cs.utah.edu/csmith/">CSmith</a> for C programs), the applied
mutations go well beyond random garbage injection, leading to valuable testing
inputs for a program.</p>
<p>Intuitively, I would not expect many successful bug discoveries in proven,
battle-tested software, but Radamsa README file contains an impressive list of
discovered CVEs, including curl, libxslt or bzip2.</p>
<h2 id="installation-and-usage">Installation and usage</h2>
<p>The instructions say building Radamsa is a simple clone-and-run-make process
and I was able to build a single, dependency-free binary without any problem.
Mimicking few examples from the documentation worked as expected: feeding my
name to Radamsa’s standard input yielded multiple mangled variants which I
imagine can cause havoc to naive string processing routines.</p>
<p>The next experiment I tried was running Radamsa with a simple Python program as
an input. Again, the results were interesting, the applied modifications
varied a lot but kept a general structure of Python code. I saw lines removed
or duplicated and tokens changed (for example, integer literals changed to a
different value). I also encountered quite interesting, non-trivial mutations
like replacing the whole expression in a parentheses with a recursive-ish
expression (think something like <code class="language-plaintext highlighter-rouge">a(a(a(a(a(b)))))</code>). Again, several tries made
me convinced these inputs would be valuable when trying to fuzz something that
processes Python grammar.</p>
<p>As a last experiment, I tried an XML file. Specifically, xUnit result XML
file. Again, Radamsa changed the file mostly in a way that kept the overall
format, but the scale of applied mutations was similar to the Python input.</p>
<h2 id="when-would-i-use-it">When would I use it?</h2>
<p>Radamsa is extremely simple to start with - you only need the target system,
few sample inputs and you are good to go. Set up Radamsa in a loop, feed its
output to the system under test and detect bugs. Of course, the black box
approach limits the rate with which Radamsa can penetrate deep into the
tested system, especially compared to smart fuzzers guided by the instrumented
target system, such as <a href="http://lcamtuf.coredump.cx/afl/">American Fuzzy Lop</a>.
You also need to have a reasonable way how to detect error condition in the
tested system, given an unknown input (but this holds for most fuzzers). Of
course, you can usually start with some simple criteria like “the target should
terminate and not crash”.</p>
<p>I will certainly include Radamsa in my toolbelt. The fact that I can
immediately, without any setup, run it for few hours against pretty much
anything makes it useful in different situations, especially when
instrumentation or specialized format fuzzers are not available or worth the
effort to set up.</p>Radamsa is a general-purpose, black-box oriented mutating fuzzer. It is written in Scheme and is available on its GitHub page under the MIT license. While the project is not entirely abandoned (there are occasional commits on develop branch, but the last commit on the master branch is a PR merge six months ago), there does not seem much development to happen anymore. The project is a side result of the research done by Oulu University Secure Programming Group. The project has simple but straightforward and information documentation in the repository README file.Randoop: Automatic unit test generation for Java2017-12-25T00:00:00+00:002017-12-25T00:00:00+00:00https://petr-muller.github.io/tools/2017/12/25/randoop<p><strong>Randoop</strong> is an automatic unit test generator for Java (and .NET). Randoop is
written in Java and is available either from its <a href="https://randoop.github.io/randoop/">project
page</a> or <a href="https://github.com/randoop/randoop">GitHub
page</a>. It is available under the MIT license. As of
2017-12-24, the project seems to be quite alive, although most of the commits are
authored by a single developer (but the project accepts occasional PRs). Randoop
appears to be driven by a research group at the University of Washington, but the
overall quality of the project structure, supporting documentation, build
system and other project artifacts is excellent.</p>
<h2 id="basics">Basics</h2>
<p>According to its documentation, Randoop generates tests using feedback-directed random
test generation. It randomly (but smartly) generates sequences of constructor and method
invocation for input classes. These sequences are executed, and the results are used to
create assertions. This means the tests can mostly only capture the actual behavior of
the tested class (possibly for future regression testing), not reveal many new bugs.
There is an exception to this, though – Randoop can detect when the class under test
does not conform to basic Java contracts (<code class="language-plaintext highlighter-rouge">Object.equals()</code> and the like) and several
other likely-buggy behaviors, such as <code class="language-plaintext highlighter-rouge">NullPointerException</code> being thrown when no <code class="language-plaintext highlighter-rouge">null</code>
values are passed as params to a method. The documentation states that it is possible to
add more contracts for checking.</p>
<h2 id="installation-and-usage">Installation and usage</h2>
<p>I have cloned the Git repository and followed the manual to build Randoop from source
using Gradle. The build went for about five minutes and produced a JAR file. I have tried
to execute Randoop on a little library I developed when working on the static analysis of C
programs, <a href="https://github.com/petr-muller/smg.git">smg</a>.</p>
<p>I started with generating tests for the simple <code class="language-plaintext highlighter-rouge">SMGRegion</code>
<a href="https://github.com/petr-muller/smg/blob/master/src/main/java/cz/afri/smg/objects/SMGRegion.java">class</a>.
After a little fiddling with params, Randoop ran for a while, generating 9
files about 2MB each, with 4286 tests (so about 18MB total, which looks a bit
excessive a ~60 lines long class). No “error revealing” tests were generated,
just the regression tests. I have tried to execute the tests, and they all
passed. Their total runtime was 0.105 seconds, which is good. I tried to
introduce a change in the tested class and rerun the tests, and now 2506 tests
failed as a result.</p>
<p>Afterwards, I have tried to include all public classes, and the results were about the
same – about ~4200 tests, no error-revealing (but still, Randoop can find just basic
Java contract violations).</p>
<p>The generated tests are straightforward (just constructor and method invocation
sequences) but quite long, with the usual appearance of generated code (numbered variable
names, etc.). I was able to investigate the fails quickly, but of course, the generated
code has no real semantic meaning that would hint the programmer about why the bug is
there, not other than “this worked before”.</p>
<h2 id="when-would-i-use-it">When would I use it?</h2>
<p>Randoop seems quite useful to me. It is mature enough, well-documented and quite
easy to use. I also did not encounter any problems with the tool. Its error-revealing
mode could be run as part of CI, being basically a simple fuzzer for Java contracts (but
I think existing static analyzers could do the same job).</p>
<p>The generated tests usefulness is slightly more questionable. They could serve as
regression tests, as they can only alert you later when you, perhaps mistakenly, change
observable behavior. The good thing is Randoop can indeed create tests that you possibly
need but did not write. You could generate a testsuite at a particular point in time and keep
executing it: this way you would have a nice regression suite, but you would not test any
code added after the suite was generated. Regenerating the suite after each change seems
too expensive, but has some merit (of course, only if the original suite was run and
passed first). Perhaps some discard-few-old, generate-few-new strategy might be employed
there (I guess these strategies are probably discussed somewhere in the related
scientific papers, such as the authors’ <a href="https://homes.cs.washington.edu/~mernst/pubs/maintainable-tests-ase2011.pdf">Scaling Up Automated Test
Generation</a>
ASE’2011 paper.</p>
<p>I can also imagine situations where Randoop generates tests that capture “undefined”
behavior, like ordering or specific values that may change between execution. The user
manual briefly discusses this, and the tool provides few techniques that can be applied to
prevent such behavior.</p>Randoop is an automatic unit test generator for Java (and .NET). Randoop is written in Java and is available either from its project page or GitHub page. It is available under the MIT license. As of 2017-12-24, the project seems to be quite alive, although most of the commits are authored by a single developer (but the project accepts occasional PRs). Randoop appears to be driven by a research group at the University of Washington, but the overall quality of the project structure, supporting documentation, build system and other project artifacts is excellent.Set up this thing2017-04-15T00:00:00+00:002017-04-15T00:00:00+00:00https://petr-muller.github.io/meta/2017/04/15/Hello-World<p>Finally took some time and set up this thing. Hopefully more contant will start
to appear here.</p>Finally took some time and set up this thing. Hopefully more contant will start to appear here.