Development
The repository comes with a development environment file. We suggest to use conda
and mamba
.
In order to get going, you can run
# Clone the repository
git clone git@github.com:Quantco/pytsql.git
cd pytsql
# Set up a conda environment with name "pytsql" and activate it.
mamba env create
conda activate pytsql
# Set up our pre-commit hooks for black, mypy, isort and flake8.
pre-commit install
# Install this package in editable mode.
pip install --no-build-isolation -e .
Unit tests
pytsql
comes with some unit tests. Provided that the environment has been set
up as illustrated in the previous step, they can be run as such:
conda activate pytsql
cd pytsql
pytest tests/unit/
Integration tests
In addition to very modular unit tests, pytsql
also comes with some integration
tests against a database. In order to locally start a dockerized mssql database, you can
just execute the start_mssql.sql
script provided.
Once the docker container is up and running, you can run the tests:
conda activate pytsql
cd pytsql
pytest tests/integration/
Add the option --backend=mssql-freetds
to the test command to run the tests using
the freetds
driver.
Grammar
pytsql
relies on parsing the sql script at hand. In order to do so, it uses
antlr, a parser generator. antlr
expects grammar files
and produces parsing Python code.
Additionally, code generated using speedy-antlr-tool
package is used to parse SQL scripts
in C++ for better performance. The parsed tree is then converted into the Python equivalent.
pytsql
’s Transact-SQL (TSQL) grammar is based on antlr/grammars-v4.
To keep the package structure lean, pytql
aims to be in sync with the reference repository and generally does not maintain its own grammar.
Therefore, if you want to extend or modify the grammar please consider contributing to the external repository instead.
Update targets
All files in pytsql/src/pytsql/grammar/cpp_src/antlr4-cpp-runtime
are taken directly from
the ANTLR repository release 4.11.1
and the rest of the files in pytsql/src/pytsql/grammar
are generated by antlr
or
speedy-antlr-tool
.
You can generate these files by running the following commands in
pytsql/src/pytsql/grammar
directory after adapting the path to your
respective antlr
jar file as follows:
java -jar /usr/local/lib/antlr-4.9.2-complete.jar -Dlanguage=Cpp -o cpp_src TSqlLexer.g4
java -jar /usr/local/lib/antlr-4.9.2-complete.jar -Dlanguage=Cpp -visitor -no-listener -o cpp_src TSqlParser.g4
java -jar /usr/local/lib/antlr-4.9.2-complete.jar -Dlanguage=Python3 -o . TSqlLexer.g4
java -jar /usr/local/lib/antlr-4.9.2-complete.jar -Dlanguage=Python3 -no-visitor -no-listener -o . TSqlParser.g4
And then running the speedy-antlr-tool
in the same directory using Python
from speedy_antlr_tool import generate
generate(
py_parser_path="tsqlParser.py",
cpp_output_dir="cpp_src",
entry_rule_names=["tsql_file"],
)
helper_generate_grammar_targets.sh Instead of running the steps manually, you can also simply run bash helper_generate_grammar_targets.sh <ANTLR4_JAR_FILEPATH>
.