Changelog
Version 3.2
20 March 2024
enhancement Model training time improvements
Training time for models has been improved, with an ~2x speedup for training models across a variety of datasets.
enhancement Synthetic data quality improvements
Synthesized’s synthetic data quality has been improved when using the Pandas interface, with better support for complex data distributions resulting in more accurate
synthetic data generation.
enhancement Use configuration objects in-place of a large number of arguments
Previously, a large number of arguments had to be provided when using the TableSynthesizer class methods from_data_interface and
from_meta_collection. This has been considerably simplified through the use of a pydantic data transfer object. Now,
users can provide a TrainConfig object in when creating a TableSynthesizer instance using these methods:
v3.1 |
v3.2 |
|
|
Version 3.1
26 January 2024
feature YAML config auto-generation
It is now possible to automatically generate YAML config files for datasets.
feature YAML schema and hinting
A YAML schema for YAML config files can now be set in IDEs to enable YAML config file type hinting for improved and
easier writing of YAML config files. I.e. users can now hit the tab button when writing YAML config files and see
the available configuration options for the SDK.
feature Spark DateType native support
Native support to train and synthesize Spark DateType columns was added (in addition to the TimestampType and
TimestampNTZType data types already supported).
enhancement Faster Spark Meta Extraction
2x faster extraction of Spark dataset meta information was achieved by implementing various performance optimisations.
enhancement Automatic Sampling
Automatic detection of very high cardinality columns was added, with such columns
now automatically modelled with the SamplingModel model, matching the behaviour of SDK 2.9 for minimal code-conversion
impact.
enhancement Automatic Enumeration
Automatic detection of enumerated columns (i.e. columns with predictable increases in values, like ID columns) was
added, with such columns now automatically modelled with the EnumerationModel model, matching the behaviour of
SDK 2.9 for minimal code-conversion impact.