Development

The repository comes with a development environment file. We suggest to use conda and mamba.

In order to get going, you can run

# Clone the repository
git clone git@github.com:Quantco/pytsql.git
cd pytsql

# Set up a conda environment with name "pytsql" and activate it.
mamba env create
conda activate pytsql

# Set up our pre-commit hooks for black, mypy, isort and flake8.
pre-commit install

# Install this package in editable mode.
pip install --no-build-isolation -e .

Unit tests

pytsql comes with some unit tests. Provided that the environment has been set up as illustrated in the previous step, they can be run as such:

conda activate pytsql
cd pytsql
pytest tests/unit/

Integration tests

In addition to very modular unit tests, pytsql also comes with some integration tests against a database. In order to locally start a dockerized mssql database, you can just execute the start_mssql.sql script provided.

Once the docker container is up and running, you can run the tests:

conda activate pytsql
cd pytsql
pytest tests/integration/

Add the option --backend=mssql-freetds to the test command to run the tests using the freetds driver.

Grammar

pytsql relies on parsing the sql script at hand. In order to do so, it uses antlr, a parser generator. antlr expects grammar files and produces parsing Python code.

Additionally, code generated using speedy-antlr-tool package is used to parse SQL scripts in C++ for better performance. The parsed tree is then converted into the Python equivalent.

pytsql’s Transact-SQL (TSQL) grammar is based on antlr/grammars-v4. To keep the package structure lean, pytql aims to be in sync with the reference repository and generally does not maintain its own grammar. Therefore, if you want to extend or modify the grammar please consider contributing to the external repository instead.

Update targets

All files in pytsql/src/pytsql/grammar/cpp_src/antlr4-cpp-runtime are taken directly from the ANTLR repository release 4.11.1 and the rest of the files in pytsql/src/pytsql/grammar are generated by antlr or speedy-antlr-tool.

You can generate these files by running the following commands in pytsql/src/pytsql/grammar directory after adapting the path to your respective antlr jar file as follows:

java -jar /usr/local/lib/antlr-4.9.2-complete.jar -Dlanguage=Cpp -o cpp_src TSqlLexer.g4
java -jar /usr/local/lib/antlr-4.9.2-complete.jar -Dlanguage=Cpp -visitor -no-listener -o cpp_src TSqlParser.g4

java -jar /usr/local/lib/antlr-4.9.2-complete.jar -Dlanguage=Python3 -o . TSqlLexer.g4
java -jar /usr/local/lib/antlr-4.9.2-complete.jar -Dlanguage=Python3 -no-visitor -no-listener -o . TSqlParser.g4

And then running the speedy-antlr-tool in the same directory using Python

from speedy_antlr_tool import generate

generate(
    py_parser_path="tsqlParser.py",
    cpp_output_dir="cpp_src",
    entry_rule_names=["tsql_file"],
)

helper_generate_grammar_targets.sh Instead of running the steps manually, you can also simply run bash helper_generate_grammar_targets.sh <ANTLR4_JAR_FILEPATH>.