added readme file

2024-08-17 08:53:39 +08:00
parent 1625de948f
commit c089999d20
11 changed files with 157 additions and 1357024 deletions
--- a/README.md
+++ b/README.md
@@ -1 +1,116 @@
-# Database
+# Project Repository
+
+This repository contains various components essential for setting up and running a backend service. The components include database schema scripts, serverless Lambda functions, and API Gateway configuration files. Below are the details on each component and how to set up the PostgreSQL database.
+
+## Repository Structure
+
+- **/apigw/**: Contains the backup configuration file for the API Gateway.
+- **/dataset/**: Includes datasets and scripts related to data processing.
+  - `malicious_phish.csv`: A CSV file containing data related to malicious phishing links.
+  - `load_data.py`: A Python script for loading data into the database.
+- **/gmail-json/**: JSON files containing configurations or sample data for Gmail-related processing.
+- **/serverless/**: Contains the serverless Lambda function scripts.
+  - `safeqr-signup-post-confirmation`: A script for the Lambda function that handles post-signup confirmation in the SafeQR service.
+- **/sql/**: SQL scripts for setting up and managing the PostgreSQL database.
+  - `Create_all_tables.sql`: Script to create all necessary tables.
+  - `Drop_all_tables.sql`: Script to drop all tables.
+  - `Dummy_data.sql`: Script to insert dummy data into the tables.
+  - `qr_code_types_202408032138.sql`: Script to set up QR code types.
+- **.gitignore**: Specifies files and directories to be ignored by Git.
+- **README.md**: This file, providing an overview and setup instructions for the repository.
+
+## Prerequisites
+
+Before setting up the components in this repository, ensure you have the following installed:
+
+- **PostgreSQL**: The database system required for managing and storing the data.
+- **Python 3.7+**: Needed for running any Python scripts in the repository.
+- **AWS CLI**: For deploying serverless functions and managing AWS resources.
+- **Java 17**: Required if any part of the system uses Java-based components.
+
+## Setting Up the PostgreSQL Database
+
+To install PostgreSQL and set up the database using the provided SQL scripts, follow the steps below:
+
+### 1. Install PostgreSQL
+
+For macOS users:
+
+```bash
+brew install postgresql
+brew services start postgresql
+```
+
+For Ubuntu users:
+
+```bash
+sudo apt update
+sudo apt install postgresql postgresql-contrib
+sudo systemctl start postgresql
+```
+
+### 2. Access PostgreSQL and Create a Database
+
+Access the PostgreSQL command line:
+
+```bash
+psql postgres
+```
+
+Create a new database:
+
+```sql
+CREATE DATABASE safeqr;
+```
+
+### 3. Run the SQL Scripts
+
+Navigate to the `/sql/` directory and run the provided scripts to set up your database schema and populate it with dummy data:
+
+```bash
+psql -d your_database_name -f Create_all_tables.sql
+psql -d your_database_name -f Dummy_data.sql
+```
+
+### 4. Verifying the Database Setup
+
+Ensure that all tables are correctly created and populated with data by accessing the database:
+
+```bash
+psql -d safeqr
+```
+
+Then, you can list tables or query data to verify:
+
+```sql
+\dt
+SELECT * FROM your_table_name LIMIT 10;
+```
+
+## Serverless Lambda Function
+
+The `/serverless/` directory contains scripts for AWS Lambda functions. Ensure that you have the AWS CLI set up with the necessary permissions to deploy and manage Lambda functions.
+
+### Deploying the Lambda Function
+
+Navigate to the `/serverless/` directory and deploy the function:
+
+```bash
+aws lambda update-function-code --function-name your_function_name --zip-file fileb://path_to_your_zip_file.zip
+```
+
+## API Gateway Configuration
+
+The `/apigw/` directory contains a backup of the API Gateway configuration. This can be used to restore or update the API Gateway setup in AWS.
+
+### Restoring API Gateway
+
+Use the AWS CLI to restore the API Gateway configuration:
+
+```bash
+aws apigateway import-rest-api --body 'file://path_to_your_swagger_file.json'
+```
+
+## Conclusion
+
+This repository provides all the necessary components to set up the backend infrastructure, including database setup, serverless Lambda functions, and API Gateway configurations. Follow the instructions carefully to get everything up and running. For any issues, please refer to the documentation or raise an issue in the repository.
--- a/apigw/qrcode-apigw-api-swagger-apigateway.json
+++ b/apigw/qrcode-apigw-api-swagger-apigateway.json
@@ -31,6 +31,43 @@
              "statusCode" : "200"
            }
          },
+          "requestParameters" : {
+            "integration.request.header.accessToken" : "context.authorizer.claims.custom:access_token",
+            "integration.request.header.X-USER-ID" : "context.authorizer.claims.sub",
+            "integration.request.header.refreshToken" : "context.authorizer.claims.custom:refresh_token"
+          },
+          "passthroughBehavior" : "when_no_match",
+          "connectionType" : "VPC_LINK",
+          "tlsConfig" : {
+            "insecureSkipVerification" : true
+          },
+          "type" : "http_proxy"
+        }
+      }
+    },
+    "/v1/gmail/getScannedEmails" : {
+      "get" : {
+        "produces" : [ "application/json" ],
+        "responses" : {
+          "200" : {
+            "description" : "200 response",
+            "schema" : {
+              "$ref" : "#/definitions/Empty"
+            }
+          }
+        },
+        "security" : [ {
+          "Cognito" : [ ]
+        } ],
+        "x-amazon-apigateway-integration" : {
+          "connectionId" : "h1icfc",
+          "httpMethod" : "GET",
+          "uri" : "https://safeqr-nlb-6bd79c7ba50f3cb5.elb.ap-southeast-1.amazonaws.com:8443/v1/gmail/getScannedEmails",
+          "responses" : {
+            "default" : {
+              "statusCode" : "200"
+            }
+          },
          "requestParameters" : {
            "integration.request.header.X-USER-ID" : "context.authorizer.claims.sub"
          },
@@ -345,7 +382,6 @@
            "default" : {
              "statusCode" : "200"
            }
-            
          },
          "requestParameters" : {
            "integration.request.header.X-USER-ID" : "context.authorizer.claims.sub"
@@ -586,15 +622,15 @@
    "DEFAULT_4XX" : {
      "responseParameters" : {
        "gatewayresponse.header.Access-Control-Allow-Methods" : "'OPTIONS'",
-        "gatewayresponse.header.Access-Control-Allow-Origin" : "'*'",
-        "gatewayresponse.header.Access-Control-Allow-Headers" : "'Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token'"
+        "gatewayresponse.header.Access-Control-Allow-Headers" : "'Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token'",
+        "gatewayresponse.header.Access-Control-Allow-Origin" : "'*'"
      }
    },
    "DEFAULT_5XX" : {
      "responseParameters" : {
        "gatewayresponse.header.Access-Control-Allow-Methods" : "'OPTIONS'",
-        "gatewayresponse.header.Access-Control-Allow-Origin" : "'*'",
-        "gatewayresponse.header.Access-Control-Allow-Headers" : "'Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token'"
+        "gatewayresponse.header.Access-Control-Allow-Headers" : "'Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token'",
+        "gatewayresponse.header.Access-Control-Allow-Origin" : "'*'"
      }
    }
  },
--- a/dataset/concatenated_split_files1.csv
+++ b/dataset/concatenated_split_files1.csv
--- a/dataset/failed_requests_2.csv
+++ b/dataset/failed_requests_2.csv
--- a/dataset/hasExecutable.csv
+++ b/dataset/hasExecutable.csv
--- a/dataset/ipadd.csv
+++ b/dataset/ipadd.csv
--- a/dataset/load_data.py
+++ b/dataset/load_data.py
@@ -1,104 +0,0 @@
-import csv
-import os
-import requests
-import concurrent.futures
-
-# Define the endpoint URL
-endpoint_url = "http://localhost:8080/v1/qrcodetypes/scan"
-
-# Path to the CSV file
-csv_file_path = "hasExecutable.csv"
-
-# Directory to store the split CSV files
-split_files_dir = "split_csv_files"
-os.makedirs(split_files_dir, exist_ok=True)
-
-# File to store failed requests
-failed_requests_file = "failed_requests.csv"
-
-# Final concatenated CSV file
-final_concatenated_file = "concatenated_split_files.csv"
-
-# Function to ensure URL starts with http:// or https://
-def ensure_url_prefix(url):
-    if not (url.startswith("http://") or url.startswith("https://")):
-        return "https://" + url
-    return url
-
-# Read the CSV file and split into 199 files
-def split_csv_file(csv_file_path, split_files_dir, num_splits=199):
-    with open(csv_file_path, newline='') as csvfile:
-        reader = list(csv.DictReader(csvfile))
-        total_rows = len(reader)
-        rows_per_file = total_rows // num_splits
-        
-        for i in range(num_splits):
-            split_file_path = os.path.join(split_files_dir, f"split_file_{i+1}.csv")
-            with open(split_file_path, 'w', newline='') as split_file:
-                writer = csv.DictWriter(split_file, fieldnames=['url', 'type'])
-                writer.writeheader()
-                start_index = i * rows_per_file
-                end_index = (i + 1) * rows_per_file if i != num_splits - 1 else total_rows
-                for row in reader[start_index:end_index]:
-                    row['url'] = ensure_url_prefix(row['url'])
-                    writer.writerow(row)
-
-# Function to process a CSV file and send POST requests
-def process_csv_file(csv_file_path):
-    failed_requests = []
-    with open(csv_file_path, newline='') as csvfile:
-        reader = csv.DictReader(csvfile)
-        for row in reader:
-            url = row['url']  # Column header for URL is 'url'
-            response = requests.post(endpoint_url, json={"data": url})
-            if response.status_code == 200:
-                print(f"Successfully sent data: {url}")
-            else:
-                print(f"Failed to send data: {url}, Status code: {response.status_code}")
-                failed_requests.append({"url": url, "status_code": response.status_code})
-    return failed_requests
-
-# Function to write failed requests to a CSV file
-def write_failed_requests(failed_requests):
-    if not failed_requests:
-        return
-    with open(failed_requests_file, 'w', newline='') as csvfile:
-        fieldnames = ['url', 'status_code']
-        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
-        writer.writeheader()
-        for request in failed_requests:
-            writer.writerow(request)
-
-# Function to concatenate all split CSV files into one
-def concatenate_csv_files(split_files_dir, output_file):
-    fieldnames = ['url', 'type']
-    with open(output_file, 'w', newline='') as outfile:
-        writer = csv.DictWriter(outfile, fieldnames=fieldnames)
-        writer.writeheader()
-        for file in os.listdir(split_files_dir):
-            if file.endswith('.csv'):
-                with open(os.path.join(split_files_dir, file), newline='') as infile:
-                    reader = csv.DictReader(infile)
-                    for row in reader:
-                        writer.writerow(row)
-
-# Split the original CSV file into 199 parts
-split_csv_file(csv_file_path, split_files_dir)
-
-# Get the list of split CSV files
-split_files = [os.path.join(split_files_dir, file) for file in os.listdir(split_files_dir) if file.endswith('.csv')]
-
-# Execute the requests concurrently with 199 threads
-all_failed_requests = []
-with concurrent.futures.ThreadPoolExecutor(max_workers=199) as executor:
-    futures = [executor.submit(process_csv_file, split_file) for split_file in split_files]
-    for future in concurrent.futures.as_completed(futures):
-        all_failed_requests.extend(future.result())
-
-# Write all failed requests to a file
-write_failed_requests(all_failed_requests)
-
-# Concatenate all split CSV files into one final file
-concatenate_csv_files(split_files_dir, final_concatenated_file)
-
-print("Processing completed.")
--- a/dataset/malicious_phish.csv
+++ b/dataset/malicious_phish.csv
--- a/dataset/map_type.py
+++ b/dataset/map_type.py
@@ -1,40 +0,0 @@
-import pandas as pd
-
-# Load the CSV files
-file1 = pd.read_csv('concatenated_split_files1.csv')  
-file2 = pd.read_csv('_select_from_safeqr_url_url_left_join_safeqr_qr_code_qr_on_qr_id_202408101634.csv') 
-
-# Function to strip 'http://' or 'https://' from a URL
-def strip_protocol(url):
-    if isinstance(url, str):
-        return url.replace('https://', '').replace('http://', '')
-    return url
-
-# Apply the strip function to both file1 and file2 URLs
-file1['url_stripped'] = file1['url'].apply(strip_protocol)
-file2['contents_stripped'] = file2['contents'].apply(strip_protocol)
-
-# Create a dictionary from the second file for quick lookup of type and qr_code_id
-url_type_qr_dict = dict(zip(file2['contents_stripped'], zip(file2['result_category'], file2['qr_code_id'])))
-
-# Prepare a copy of file2 to modify without affecting the original
-file2_copy = file2.copy()
-
-# Fill in the result_category in file2_copy
-file2_copy['result_category'] = file2_copy['contents_stripped'].map(lambda x: url_type_qr_dict[x][0] if x in url_type_qr_dict else None)
-
-# Drop the id and stripped columns in file2_copy
-file2_copy = file2_copy.drop(columns=['id', 'contents_stripped'])
-
-# Prepare a copy of file1 to modify without affecting the original
-file1_copy = file1.copy()
-
-# Fill in the qr_code_id in file1_copy based on the match from file2
-file1_copy['qr_code_id'] = file1_copy['url_stripped'].map(lambda x: url_type_qr_dict[x][1] if x in url_type_qr_dict else None)
-
-# Drop the stripped column in file1_copy
-file1_copy = file1_copy.drop(columns=['url_stripped'])
-
-# Save the updated copies to new CSV files
-file1_copy.to_csv('file1_updated.csv', index=False)
-file2_copy.to_csv('db_updated.csv', index=False)
--- a/dataset/ssl_error.csv
+++ b/dataset/ssl_error.csv
--- a/dataset/url_db_cleaned.csv.zip
+++ b/dataset/url_db_cleaned.csv.zip