Ivy's Home - xiangivyli

Welcome to Ivy's website

Empowering the World with My Data Analysis and Data Engineering Skills

“If you don't produce, you won't thrive - no matter how skilled or talented you are.”

- Cal Newport, Deep Work: Rules for Focused Success in a Distracted World

I am Ivy, a combination of healthcare and data background, am looking for a Data Engineer / Data Analyst position in the UK. Any help will be highly appreciated! I am adept at SQL, Python, and Data Visualisation, I am a certified Azure Data Engineer and GCP Data Engineer, have experience handling large datasets (one published dataset link) and love building data pipelines with Docker and Airflow.

3 years, when I studied in a lab, A/B testing to validate my hypothesis
2 years, when I worked in a biotech startup, I managed databases, prepared datasets
1 year, when I studied Business Analytics, I converted data into business insights
2 years, when I self-trained myself, I kept learning data engineering tools

“If you don't produce, you won't thrive - no matter how skilled or talented you are.”

- Cal Newport, Deep Work: Rules for Focused Success in a Distracted World

I am Ivy, a combination of healthcare and data background, am looking for a Data Engineer / Data Analyst position in the UK. Any help will be highly appreciated! I am adept at SQL, Python, and Data Visualisation, I am a certified Azure Data Engineer and GCP Data Engineer, have experience handling large datasets (one published dataset link) and love building data pipelines with Docker and Airflow.

3 years, when I studied in a lab, A/B testing to validate my hypothesis
2 years, when I worked in a biotech startup, I managed databases, prepared datasets
1 year, when I studied Business Analytics, I converted data into business insights
2 years, when I self-trained myself, I kept learning data engineering tools

“If you don't produce, you won't thrive - no matter how skilled or talented you are.”

- Cal Newport, Deep Work: Rules for Focused Success in a Distracted World

I am Ivy, a combination of healthcare and data background, am looking for a Data Engineer / Data Analyst position in the UK. Any help will be highly appreciated! I am adept at SQL, Python, and Data Visualisation, I am a certified Azure Data Engineer and GCP Data Engineer, have experience handling large datasets (one published dataset link) and love building data pipelines with Docker and Airflow.

3 years, when I studied in a lab, A/B testing to validate my hypothesis
2 years, when I worked in a biotech startup, I managed databases, prepared datasets
1 year, when I studied Business Analytics, I converted data into business insights
2 years, when I self-trained myself, I kept learning data engineering tools

Data Engineer Projects YouTube

Data Analysis Projects

Education-Focused Analysis: How Assessment Types Shape the Final Result

Dec 8, 2023

I used Power BI to explore the reasons why the mix and weightings of assessment types shaped the final result

A self-service platform for GDP, Life Satisfaction and Education Level

Aug 9, 2023

I used Tableau to build a self-service platform including information about GDP, Life Satisfaction and Education Level

BT Customer Churn Influencer

Mar 5, 2023

I used Power BI to visualise the features of churn customers in BT and used Python and logistic regression to calculate the key churn influencers.

ESG Analysis for Pfizer

Jan 12, 2022

I used Python to analyse the position of Pfizer in the pharm industry and linear regression to quantify the relationship between ESG scores and total assets

Revenue increase strategy analysis for Google merchandise store

May 12, 2022

I used Google Analytics and Looker studio to segement customers and analyse customer behaviour

Lloyds Bank Customer Profiling

Nov 23, 2022

I used Power BI to profile customers for Lloyds Bank

Database Projects

Data Platform Design for Healthcare Research

Mar 17, 2023

I used MySQL to create 11 tables for normalisation of clinical data and genetic data. The work is designed for healthcare research.

Normalisation for professors in organisations with SQL Server

May 29, 2023

I used SQL Server to normalise a informative table. The project focused on details, like primary key, surrogate key, relationship, ON DELETE NO ACTION and so on.

Small Projects

5 Tips to Store an online zip file locally(Python)

Apr 30, 2023

In my first blog, I talked about how to create a more structured directory technically, including f-strings, os module, requests and ZipFile library.

3 Steps to Clean Data in SQL Server

May 11, 2023

I change my on-premise SQL tool to SQL Server now. I prepared clean data for data visualisation, there are 3 main steps to do.

Import SQLite File into SQL Server with Python and ChatGPT

Jun 1, 2023

I learn how to use ChatGPT help achieve the connection between SQLite file and SQL Server with Python。

Mini Blogs for Data Analysis Tools

Around 200 to 300 words for an overview

SQL

Query Language

Power BI Desktop

Power BI Service

Introduction

😘 At the beginning of another rainy week. I want to continue introducing the Power BI service.

🤔 𝐖𝐡𝐲 𝐝𝐨 𝐰𝐞 𝐧𝐞𝐞𝐝 𝐭𝐡𝐞 𝐏𝐨𝐰𝐞𝐫 𝐁𝐈 𝐬𝐞𝐫𝐯𝐢𝐜𝐞? Publish reports from the Desktop to the Service means moving reports from local to shared networking, others can also utilise our work.

💁‍♀️ 𝐒𝐨 𝐡𝐨𝐰 𝐝𝐨𝐞𝐬 𝐢𝐭 𝐰𝐨𝐫𝐤?

a. It is a web-based resource management platform.

b. It holds my workspace (highest private) and can also create APP workspaces to group and share resources under our account, each workspace can have datasets, reports, dashboards, dataflows, datamarts (premium tier: support Power Query Editor + SQL/Visual query + Report Building, provide Desktop experience), and scoreboards.

c. Premium tier also has the deployment pipeline function which can help development by staging workspaces.

d. These resources can be shared extensively via creating APPs.

👉 𝐓𝐡𝐞 𝐟𝐨𝐥𝐥𝐨𝐰𝐢𝐧𝐠 𝐪𝐮𝐞𝐬𝐭𝐢𝐨𝐧𝐬 𝐚𝐫𝐞: 𝐇𝐨𝐰 𝐭𝐨 𝐦𝐚𝐢𝐧𝐭𝐚𝐢𝐧 𝐚𝐧𝐝 𝐬𝐞𝐜𝐮𝐫𝐞 𝐝𝐚𝐭𝐚 𝐢𝐧 𝐭𝐡𝐞 𝐒𝐞𝐫𝐯𝐢𝐜𝐞 𝐰𝐡𝐞𝐧 𝐭𝐡𝐞 𝐝𝐚𝐭𝐚 𝐛𝐞𝐜𝐨𝐦𝐞𝐬 𝐡𝐚𝐥𝐟-𝐩𝐮𝐛𝐥𝐢𝐜?

Firstly, schedule refresh for the updated information and apply sensitivity labels for datasets if needed.

Secondly, for internal workers during the development process, grant them one of 4 roles with the least privilege to change the workspace: Viewer < Contributor < Member < Admin

Thirdly, for consumers or external users, grant them viewer roles for the published APPs.

Additionally, in one report, row-level security can make sure viewers only read the information they need.

😁 Hope this post can give you a general overview of the Power BI service and make it easier to explore it. Have a nice following week.

Power BI DAX

Functional Language

🤗 At the beginning of August, I want to introduce the core power of Power BI: DAX (Data Analysis Expression)

🤔 𝐅𝐢𝐱𝐞𝐝 𝐐𝐮𝐞𝐬𝐭𝐢𝐨𝐧: 𝐖𝐡𝐲 𝐝𝐨 𝐰𝐞 𝐧𝐞𝐞𝐝 𝐃𝐀𝐗? DAX, as its name, is mainly for Data Analysis, in other words, it provides more flexibility to handle analysis

💁‍♀️ 𝐒𝐨 𝐡𝐨𝐰 𝐝𝐨𝐞𝐬 𝐢𝐭 𝐰𝐨𝐫𝐤?

It calculates existing data to get new data with functions.

It has three final formats: measure, calculated column, and calculated table.

𝐌𝐞𝐚𝐬𝐮𝐫𝐞 is like a set of procedures to calculate; it will be calculated again when it is used in the report under context.
𝐂𝐚𝐥𝐜𝐮𝐥𝐚𝐭𝐞𝐝 𝐂𝐨𝐥𝐮𝐦𝐧 is added to the data table for each row, it would not influence the source data but it is kept in the model
𝐂𝐚𝐥𝐜𝐮𝐥𝐚𝐭𝐞𝐝 𝐓𝐚𝐛𝐥𝐞 is often used to create a date table for hierarchy (the results can be analysed by year, quarter, month, and day)

👉 𝐖𝐡𝐚𝐭 𝐚𝐬𝐩𝐞𝐜𝐭𝐬 𝐝𝐨𝐞𝐬 𝐢𝐭 𝐜𝐨𝐧𝐭𝐫𝐢𝐛𝐮𝐭𝐞?

The most common and complex one is Analysis using aggregation, logical, date, and filter functions
Active relationship when there are multiple foreign keys in one table with 𝐔𝐒𝐄𝐑𝐄𝐋𝐀𝐓𝐈𝐎𝐍𝐒𝐇𝐈𝐏( )
Simplify row-level security by combining the user identity table and 𝐔𝐒𝐄𝐑𝐏𝐑𝐈𝐍𝐂𝐈𝐏𝐀𝐋𝐍𝐀𝐌𝐄( )

🤓 𝐈 𝐰𝐢𝐥𝐥 𝐧𝐨𝐭 𝐥𝐢𝐬𝐭 𝐞𝐯𝐞𝐫𝐲 𝐟𝐮𝐧𝐜𝐭𝐢𝐨𝐧 𝐛𝐮𝐭 𝐬𝐡𝐚𝐫𝐞 𝐡𝐨𝐰 𝐭𝐨 𝐥𝐞𝐚𝐫𝐧 𝐢𝐭 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭𝐥𝐲.

Tip 1: Have a look at the official document (link: https://lnkd.in/e5nFmMH8) or Learn path (link: https://lnkd.in/ewSGXZyj). The document classifies DAX functions and provides examples, we will have a general idea about what functions they have.

Tip 2: Each function will be explained clearly in the 𝐓𝐨𝐨𝐥𝐭𝐢𝐩 when entering ‘(‘ after the function, read what it is and what parameters it needs.

Tip 3: Variables can be brought in when the measure becomes complex.

😊 Happy Power BI.

Apache Parquet

Python

List Key Tools of Python

Data Structures - Set

my_set = {1, 2, 3}

Unordered, no index
Unique Elements
Heterogeneous, can contain integers, strings, tuples and so on at the same time
Immutable Elements, can not be changed but can be added or removed
Not require contiguous memory, hash table structure (an array of 'buckets')
Use cases:
- membership testing
- remove duplicates
- operations, like unions | , intersection & , difference - , symmetric difference ^ , issubset( ), isdisjoint( )

my_set = {1, "hello", (2, 3)} #integer, string and tuple
another_set = set([4, 5, 6]) #set function converts a list into a set

# add elements
my_set.add(4)

# remove elements
my_set.remove(4)

# membership testing, O(1)
2 in my_set

# set operations
union_set = my_set | another_set

Data Structures - Tuple

my_tuple = (1, "hello", 3.14)

0-based Indexing, access elements by their position with `[ ]`
Heterogeneous Data contain different data types, including other tuples, lists, and dictionaries
Immutable Tuple, once a tuple is created, elements of it can not be added, changed, or removed, but elements themselves can be modified if they are mutable objects
Contiguous Memory
Use cases
- Return values in function
- Immutable Data Collection
- Represent a row with columns as elements for databases
- Dictionary keys

my_tuple = ([1, 2], "hello")
my_tuple[0].append(3)
print(my_tuple) # Output: ([1, 2, 3], "hello")

# tuples without parentheses
another_tuple = 1, 2, 3

# single-element tuples
single_element_tuple = (1,)

# complicated tuples
book = (
     {"title": "1984", "author": "George Orwell", "publication_year": 1949},  
     ["Dystopian", "Political Fiction", "Social Science"],                
     (True, "John Doe")                                                      
      )

#Accessing book information
print(f"Title: {book[0]['title']}")
print(f"Author: {book[0]['author']}")
print(f"Genres: {', '.join(book[1])}")
print(f"Is Borrowed: {'Yes' if book[2][0] else 'No'}")

Data Structures - Arrays Build-in

int_array = array('i', [1, 2, 3]) # Array of integers (less used)

Created using the array class from array module
0-Based Indexing
Homogeneous, the type of elements is determined by a type code i
Mutable, can modify the elements, append, or remove
Contiguous Memory
Store the data values themselves in the contiguous block (memory efficient)
Use case
- large numeric datasets
- binary I/O operation

from array import array
int_array = array('i', [1, 2, 3])

print(int_array)
# Output is array('i', [1, 2, 3, 4, 5])

# access elements with index
print(int_array[0])
# Output is 1

# add new element
int_array.append(4)
print(int_array)
# Output is array('i', [1, 2, 3, 4, 5, 4])

Data Structures - 2D Arrays (Matrix)

my_numpy_array = np.array([[1, 2, 3], [4, 5, 6]])

Support Multidimensional Indexing, access elements using a comma-separated tuple of indices
Homogeneous, the same data type (improve efficiency)
Mutable, modifying elements or reshaping it
Contiguous Memory, is a fixed size determined at the time of creation
Use case
- Matrix Computation
- Scientific Computing
- Linear Algebra
- Large datasets

import numpy as np
# create a 2D array (matrix)
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# access elements with index
row2_column3 = matrix[1, 2]

# change the elements
matrix[1, 2] = 20

# matrix computation
matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])
matrix_sum = np.add(matrix_a, matrix_b)

# Matrix multiplication
matrix_product = np.dot(matrix_a, matrix_b)

Data Structure - Lists Build-in

my_list = [1, "Hello", 3.14, [2, 4, 6]]

0-based Indexing and slicing
Multidimensional List
Heterogeneous Data Types
Mutable
Dynamic resizing and memory overhead (e.g., append remove insert pop )
Concatenation (+) and Repetition (*)
Sorting and Reversing

#Creating a list and adding elements
my_list = [1, 2, 3]
my_list.append(4)  # Add at the end
my_list.insert(1, 'a')  # Insert at index 1

#Accessing elements
print("First element:", my_list[0])  # Access first element

#Modifying elements
my_list[2] = 'b'  # Change the element at index 2

#Removing elements by value and by index
my_list.remove('a')  # Remove the first occurrence of 'a'
popped_element = my_list.pop(2)  # Remove and return the element at index 2

#Slicing
sublist = my_list[1:2] # Elements from index 1

#Concatenation and repetition
another_list = ['a', 'b']
con_list = my_list + another_list
rep_list= my_list * 3

#Sorting and reversing
my_list.reverse()
con_list.sort(key=str) # Convert all elements to strings


# Multidimensional list
matrix = [[1, 2, 3], [4, 5, 6], [7, 8]]

Positional and Named Parameters

Positional parameters are based on the order in which I provide them.

def create_pizza(cheese, tomatoes, olives):    
    return f"Created a pizza with {cheese} cheese, {tomatoes} tomatoes, and {olives} olives."

# The sequence of parameters matters

my_pizza = create_pizza("mozzarella", "cherry", "green")
print(my_pizza)

To compare, Named (Keyword) Parameters

def create_pizza(cheese, tomatoes, olives):    
    return f"Created a pizza with {cheese} cheese, {tomatoes} tomatoes, and {olives} olives."

my_pizza = create_pizza(tomatoes="cherry", cheese="mozzarella", olives="green")
print(my_pizza)

Class

class Solution: def My_Method(self, word1: str, word2: str) -> str:

A blueprint for creating objects
Group data (Class attributes and Instance attributes) and functions (methods)
Each method in a class must have self as its first parameter, which refers to the instance of the class
Create an instance from the defined class, my_solution = Solution( )

#how to use class and methods included
#Create an instance of Solution, which is my_solution
my_solution = Solution()
result = my_solution.My_Method("abc", "pqr")
print(result)

Debug Methods

Key Points in Computer Science

A short summary for key concepts in CS50

BITS, BYTE

Data Types (C Programming Language)

Command Line

Sort Algorithms

Memory Address - Hexadecimal

Memory Address - Pointer C

CS50 Week 4 Memory

The pointer is the address of data in memory

int main() {
    // A regular integer variable
    int var = 10;    
    // A pointer variable that can hold the address of an int
    int *ptr;
    // Store the address of var in ptr
    ptr = &var;
    
    // Prints the value of var
    printf("Value of var: %d\n", var);
    // Prints the memory address of var 
    printf("Address of var: %p\n", &var);
    // Prints the memory address stored in ptr 
    printf("Value of ptr: %p\n", ptr); 
    // Dereferences ptr and prints the value of var  
    printf("Value pointed to by ptr: %d\n", *ptr);
    return 0;
}

Data Structure - Arrays C

int arr[3] = {10, 20, 30} //example

Linear Data Structure
Same Data Type
Value means actual data, Index means relative location, starts from 0
Fixed Size - Occupy the consecutive memory
- Specify the number of elements the array will have
- int arr[3]
Single Pointer

//C code
#include <stdio.h>

int main() {
   int arr[3] = {10, 20, 30};
   
   // Pointer to the first element of arr
   int *ptr = arr;
   
   for(int i = 0; i < 3; i++) {
      
       // Accessing array elements using pointer arithmetic
       printf("%d ", *(ptr + i)); 
   }
   return 0;
}