Contents

Synthetic data with SDV and CopulaGAN

```python data.head() ```

``` ## Pclass Sex Age SibSp Parch Fare ## 0 3 1 22.0 1 0 7.2500 ## 1 1 0 38.0 1 0 71.2833 ## 2 3 0 26.0 0 0 7.9250 ## 3 1 0 35.0 1 0 53.1000 ## 4 3 1 35.0 0 0 8.0500 ```

```python data.describe(include=‘all’) ```

``` ## Pclass Sex Age SibSp Parch Fare ## count 891.000000 891.000000 891.000000 891.000000 891.000000 891.000000 ## mean 2.308642 0.647587 29.758889 0.523008 0.381594 32.204208 ## std 0.836071 0.477990 13.002570 1.102743 0.806057 49.693429 ## min 1.000000 0.000000 0.420000 0.000000 0.000000 0.000000 ## 25% 2.000000 0.000000 22.000000 0.000000 0.000000 7.910400 ## 50% 3.000000 1.000000 30.000000 0.000000 0.000000 14.454200 ## 75% 3.000000 1.000000 35.000000 1.000000 0.000000 31.000000 ## max 3.000000 1.000000 80.000000 8.000000 6.000000 512.329200 ```

```python from sdv.tabular import CopulaGAN ```

```python model = CopulaGAN() ```

```python model.fit(data) ```

```python new_data = model.sample(200) ```

```python new_data.head() ```

``` ## Pclass Sex Age SibSp Parch Fare ## 0 1 0 11.797730 3 2 15.963036 ## 1 2 0 28.135436 2 2 12.701402 ## 2 3 0 26.367627 0 1 10.742400 ## 3 3 1 24.116738 0 1 144.611161 ## 4 2 0 31.368429 3 1 13.659402 ```

```python new_data.describe(include=‘all’) ```

``` ## Pclass Sex Age SibSp Parch Fare ## count 200.000000 200.000000 200.000000 200.000000 200.000000 200.000000 ## mean 1.990000 0.440000 31.717758 0.945000 1.240000 76.998168 ## std 0.795654 0.497633 14.745161 0.925341 1.487773 82.340578 ## min 1.000000 0.000000 -0.299758 0.000000 0.000000 4.534767 ## 25% 1.000000 0.000000 23.051398 0.000000 0.000000 14.265699 ## 50% 2.000000 0.000000 30.367405 1.000000 1.000000 36.297745 ## 75% 3.000000 1.000000 40.796266 1.000000 2.000000 115.727641 ## max 3.000000 1.000000 77.002768 5.000000 6.000000 350.698375 ```

```python from sdv.evaluation import evaluate

evaluate(new_data, data) ```

``` ## 0.4487854281948458 ```

```python model = CopulaGAN( field_transformers={ ‘Pclass’: ‘categorical’, ‘Sex’: ‘categorical’, ‘Age’: ‘float’, ‘SibSp’: ‘boolean’, ‘Parch’: ‘integer’, ‘Fare’: ‘float’ }, field_distributions={ ‘Fare’: ‘truncated_gaussian’ } ) ```

```python model.fit(data) ```

```python new_data = model.sample(200) ```

```python new_data.head() ```

```python new_data.describe(include=‘all’) ```

```python evaluate(new_data, data) ```

```python

```