As a consequence data versioning becomes code versioning. As raw data files are the same for everyone, tracking changes in the sql queries via got ensures reproducibility.* *in a research environment where ârawâ data is in fact curated and immutable
At Nationalbanken our best practice for FS analyses in DST was - raw (sas) files transformed in parquets as is - data flows that load data in ram ready for analysis packed in a (series) of sql queries referring to the parquet files. Thanks to DuckDB, thatâs seconds
Lack of training imho. Itâs the sign of someone thinking of a linear storyline. Narrating a story. Makes sense in theory, but doesnât work in practice.
Super link! Another killer feature of DuckDB not mentioned there which still feels like black magic is that you can join an existing pd.DataFrame in memory with a eg parquet file like df = get_data() duckdb.sql(âselect a.x, b.y from âdata.parquetâ as a inner join df as b on a.k=b.kâ)
Embrace SQL
Our workshops were attended by a total of 3693 people: -2410 people registered directly -282 through the special registration form for Ukrainians -518 through the waiting list as their registration fee was sponsored by someone else. -483 people purchased recordings 8/n
More details on an upd report. As of 05/10 we raised total of 81 643 euros. In particular, we raised the following amounts for the following orgs: Leleka foundation: 48 984 euro Special account for Ukrainian armed forces by National Bank of Ukraine: 22 707 euro Other organizations: 9 952 euro 7/n
Well Rabois is a very prolific poster
He must have very big n for being so confident
Ehrm. Italy and Austria would like to have a word.