I used **Power BI** to explore the reasons why changing **the mix and weighting of assessment types** in presentation 2014J compared with other presentations of module BBB appeared to be **associated** with a change in final results. The task shows my **Exploratory Data Analysis (EDA), Data Engineering, Data Visualisation, and Story Telling skills.**

**The published report:** <https://www.novypro.com/project/education-focused-analysis-how-assessment-types-shape-the-final-result>

**Data and Relevant Files** are here: <https://github.com/xiangivyli/data-science-portfolio/tree/main/part_c_education_focused_analysis_report>

### Context:

The Open University has launched a series of modules (courses) and collected user interaction data over two years. The academic module leader would like to discuss with other staff teaching on the module:

*   *Why changing the mix and weighting of assessment types in presentation 2014J compared with other presentations of module BBB appeared to be associated with a change in final results.*

### Data Source

[Open University Learning Analytics dataset](https://analyse.kmi.open.ac.uk/open_dataset#data)

7 Tables:

1.  courses.csv

2.  assessments.csv

3.  vle.csv

4.  studentInfo.csv

5.  studentRegistration.csv

6.  studentAssessment.csv

7.  studentVle.csv

Database schema

![](/images/Database_schema_Open_Learning_Analytics.png)

### Data Validation

I summarised **pivot tables** to understand the categories and the range of numbers

![](/images/Data%20Validation.png)

### Data Cleansing

I used Python to filter data, only module BBB data was kept

The code can be found on my Github: <https://github.com/xiangivyli/data-science-portfolio/tree/main/part_c_education_focused_analysis_report/OULAD_Python_Transformation>

![](/images/Data%20Cleansing.png)

I recorded the changes of cleansed dataset for future review

![](/images/Changes%20of%20cleansed%20dataset.png)

### Power BI Step

Generated Parquet files were imported in the Power BI and the 7 tables were created relationship using **Data Modelling**

![](/images/Power%20BI%20Data%20Modelling.png)

### Page 1 Overview of Module BBB

The data was collected during 2013 and 2014, they have **2 presentation types B and J**, B started in February, and J started in October. There are 4 Presentation Types, 2013B, 2014B, 2013J, and 2014J.

For Assessment Types, **2014J** was composed of Exam and **Tutor Marked Assessments**, while other Presentations have **Computer Marked Assessments.**

![](/images/report%201%20Overview.png)

*   For **2014J,** different assessment types have **different weights, id 15023 and id 15024 weighted 35, and id 15020 weighted nothing.**

*   For other presentations, **Computer Marked Assesments** counted for 5%, each assessment counted for 1%

![](/images/report%201%20Weightings-113ff154.png)

### Page 2 Student Persona

The most significant feature is **more female students** registered for the module BBB than male students

Students preferred to register for **presentation J than B.**

![](/images/report%202%20Persona.png)

### Page 3 Final Results

*   The **Pass percentage** of 2014J is higher than other presentations

*   The **distribution of scores** for Computer Marked Assessments and Tutor Marked Assessments is different

![](/images/report%203%20Final%20Results.png)

### Page 4 Relationship

I used a **measure** `Percentage_Below40 (%) = 
                         DIVIDE(    
                                COUNTROWS(FILTER('studentAssessment_BBB', [score] < 40)),    
                                COUNTROWS('studentAssessment_BBB'),                                     0) * 100 
`

to calculate the **fail percentage for each assessment.**

***

![](/images/report%204%20Relation.png)

### The final words

![](/images/Wrap-up.png)


Education-Focused Analysis: How Assessment Types Shape the Final Result

#### **Question: when the company gives you a dataset to build a dashboard before the interview, what should you do?**

# Step1. Check the datasets, and understand the meaning of each column.

I received two tables

###### Table 1. Copy of Life Satisfaction VS GDP Extract.csv

Columns: **Country Code, Country, GDP per capita, Life satisfaction, Region, Year**

![](/images/Overview%20of%20life%20satisfaction%20and%20gdp%20extract.png)

Use a pivot table to check the range:

![](/images/Pivot%20Summary%20for%20GDP%20and%20Satisfaction-6ff36840.png)

###### Table 2. Copy of Adult Education Extract.csv

Columns: **Country code, Country, Education level, Proportion of population, Year**

![](/images/Overview%20of%20adult%20education%20extract.png)

Use a pivot table to check the range:

![](/images/Pivot%20Summary%20for%20Education%20Level-061f5c5e.png)

So I have concluded that the datasets have

**4 dimensions that I can use: Year, Region, Country, Education Level**

**3 measures that I can compare: GDP per capita, Life satisfaction, and Proportion of education level**

# Step 2: Identify my objects

*   Show trends for each country

*   Correlations among GDP per capita, Life satisfaction, and Proportion of education level

# Step 3: Design the dashboards

3 Dashboards for 3 purposes

1.  From GDP to see Life satisfaction and education level

2.  From education level to see GDP and Life Satisfaction

3.  Relation Between Education Level and GDP per capita

# Step 4: Data Validation and Data Cleansing

Fortunately, two tables have no missing values and they can be directly imported into Tableau.

# Step 5: Build the dashboard

The finished dashboard has been published: [https://public.tableau.com/app/profile/xiang.li5182/viz/PersonalProjectsIvyLi/KeyEducation?publish=yes](https://public.tableau.com/app/profile/xiang.li5182/viz/PersonalProjectsIvyLi/KeyEducation?publish=yesKey)

# **User Stories**

#### Dashboard 1 From GDP to see Life satisfaction and education level

*   Action1: select different **regions** to check the minimum, median and maximum GDP values, Life Satisfaction and the average Education Level

*   Action 2: Choose **which year** users are interested in

*   Action 3: Choose **a country** on the map and see GDP, Life Satisfaction and Education Level (**tooltips** also show the same information)

![](/images/Dashboard1.png)

#### Dashboard 2: From Education Level to see GDP and Life Satisfaction

*   Action 1: Choose one **country** to check the **trend** of GDP per capita, Life Satisfaction and Education Level

*   Action 2: Choose the **year** and **Education Level** to check which country has a higher value in Education Level

![](/images/Dashboard2.png)

#### Dashboard 3: Relation between Education Level and GDP per capita

*   Action: Choose **which year** users want to **compare GDP and Education Level** among countries to understand if Education Level **correlates** with GDP

![](/images/Dashboard3.png)

# Final words

#### **The dashboarding shows strong interactive features, like adding year slicers and providing filters for all dimensions.**



A self-service platform for GDP, Life Satisfaction and Education Level

Problems appear naturally sometimes.  When I would like to use SQL Server to analyse a dataset coming from Kaggle, this dataset is an SQLite file containing 6 tables. My work became how to import SQLite files into SQL Server. I want to introduce the most efficient way to learn coding with ChatGPT this time. Some points are disclosed here: **structured questions result in structured answers, comparison and analogy help comprehension, find the easy material** and so on.

<div style="text-align: center">![](/images/Dataset.png)</div>

DataSource: [Kaggle 18,393 Pitchfork Reviews](https://www.kaggle.com/datasets/nolanbconaway/pitchfork-data).

## 1. How to ask ChatGPT with prompt engineering:

*   **Give a context**

*   **Clear purposes for the final results**

*   **Decide how the answer is structured**

Here is my example:

I have an SQLite file locally, it contains 6 tables. I would like to extract these tables and import them into SQL Server automatically. Could you please teach me how to achieve it with Python step by step?

Let me divide it into my key points:

*   **Context**: I have an SQLite file locally, it contains 6 tables.

*   **Purposeful results**:  I would like to extract these tables and import them into SQL Server automatically.

*   **Structure desired answers**: how to achieve it with Python step by step

Here are the steps that ChatGPT answered:

<div style="text-align: center">![](/images/Answer1.1.png)</div>

<div style="text-align: center">![](/images/Answer1.2.png)</div>

<div style="text-align: center">![](/images/Answer1.3.png)</div>

<div style="text-align: center">![](/images/Answer1.4.png)</div>

<div style="text-align: center">![](/images/Answer1.5.png)</div>

<div style="text-align: left">## 2. Google or watch YouTube videos to understand each item, a demo is better</div>

#### 2.1 The first thing is to understand the two libraries, sqlite3 and pyodbc.

sqlite3 is to manipulate sqlite files with Python, pyodbc is to control SQL Server with Python.

I did not read the official documentation ([sqlite3](https://docs.python.org/3/library/sqlite3.html) and [pyodbc](https://learn.microsoft.com/en-us/sql/connect/python/pyodbc/step-1-configure-development-environment-for-pyodbc-python-development?view=sql-server-ver16) ) completely, what I need to do is to know what work these libraries can do, that is enough.

I also found videos on YouTube (feeCodeCamp.org's Channel: [SQLite Databases With Python](https://www.youtube.com/watch?v=byHcYRpMgI4&t=3954s) and Jie Jenn's Channel: [How to Connect To SQL Server in Python](https://www.youtube.com/watch?v=g69lFxZdcVQ&t=16s) ) and in the process, sqlite3 extracts data (table names, column names and datatypes) and pyodbc creates tables and inserts data.

#### 2.2 The second thing is to understand the cursor function.

The cursor() appears frequently in the process, so it deserves to learn deeply.

I found an amazing image coming from[ Codecademy](https://www.codecademy.com/learn/learn-advanced-python/modules/database-operations/cheatsheet) and it has a comprehensive tutorial about Database Operations on Codecademy.

**Do not need to spend time on obscure material, find vivid material to read.**

*   execute() provides an area to run SQL statements in Python

*   fetchall() pulls data from a SQLite file

*   commit() to make changes to the database

*   close() to close the database connection

<div style="text-align: center">![](/images/cursor-7d81ebdb.png)</div>

#### 2.3 Have some conversations with ChatGPT if I have confusion about one point

For example, I did not understand why in the loop, it can iterate over each table using

```
for table_name in table_names:
       table_name = table_name[0]
```

The conversation is

<div style="text-align: center">![](/images/question%201.png)</div>

I have some ideas about how it worked, but I did not understand why it can iterate these tables one by one. Then I double-checked:

<div style="text-align: center">![](/images/question%202.png)</div>

Suddenly, I recalled the principle of for loop and **still remember an animation where there is a line of people coming into a room one by one** and understood that during for loop, the table_name would be assigned to each element one by one.

I did not find it online so I draw it coming from my memory:

<div style="text-align: center">![](/images/for%20loop.png)</div>

I comprehended it and moved on to the next confusion:

what is  `PRAGMA`?

<div style="text-align: center">![](/images/PRAGMA.png)</div>

<div style="text-align: center">![](/images/PRAGMA.%20EXAMPLE.png)</div>

To understand it, I used an analogy to help me understand it, as in SQL Server, INFORMATION_SCHEMA.COLUMNS is to extract column information. I could easily understand how it works. The efficient way is to compare them.

<div style="text-align: center">![](/images/comparison%20between%20PRAGMA.png)</div>

I figured out all my confusion and made sure I understood all codes.

### 3. Go back to the codes that ChatGPT provided and test it

I used VS Code to run and replaced the route of the SQLite database file, DRIVER, SERVER, and DATABASE for the connection.

I found that there was an issue in step 5, that was the datatypes are not right, and not every column is VARCHAR().

I told ChatGPT:
During step 5, the datatypes should correspond to each column, could you please extract the datatype from SQLite and use it automatically when creating tables in SQL Server?

*   **Give a context**: in step 5

*   **Clear purpose:** extract datatype and use it when creating tables

ChatGPT gave me an updated code:

<div style="text-align: center">![](/images/Update.png)</div>

Now, I have gotten enough codes for me to achieve my goal, I put everything together and got this in my SQL Server:

<div style="text-align: center">![](/images/Result.png)</div>

<div style="text-align: left">**You can find my final code on my** [**GitHub.**](https://github.com/xiangivyli/python-learning/blob/main/ETL/Extract_SQLITE_File_Load_SQL_Server.ipynb)</div>

### 4. Final Words

What I would like to show is how to learn code quickly by getting your hands dirty, and focusing on the final result that you want to achieve rather than trying to understand everything (I feel overwhelmed when facing all kinds of IT terms). What I can do is make the small step every day and I know that I will arrive somewhere I wish I could.

P.S. Make sure you understand every function using this method, otherwise, you will not be able to debug if there is something wrong. Good Luck :)


Education-Focused Analysis: How Assessment Types Shape the Final Result

Context:

Data Source

Data Validation

Data Cleansing

Power BI Step

Page 1 Overview of Module BBB

Page 2 Student Persona

Page 3 Final Results

Page 4 Relationship

The final words

Latest posts

Education-Focused Analysis: How Assessment Types Shape the Final Result

A self-service platform for GDP, Life Satisfaction and Education Level

Import SQLite File into SQL Server with Python and ChatGPT