Preprocessing Methods to Prevent CSV File Import Errors

Currently, I am running Drupal on an instance of Oracle Linux 9.4 in Oracle Cloud. To check the server’s OS information, run the following command:

cat /etc/redhat-release

When you run this command, detailed information about the server’s OS will be displayed. Below is an example of the output:

NAME="Oracle Linux Server"
VERSION="9.4"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="9.4"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Oracle Linux Server 9.4"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:9:4:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://github.com/oracle/oracle-linux"

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 9"
ORACLE_BUGZILLA_PRODUCT_VERSION=9.4
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=9.4

The information above shows that the system is running Oracle Linux Server version 9.4, and it also includes links for Oracle’s support and bug reporting. The line ID_LIKE="fedora" indicates that this Linux distribution is similar to Fedora. Such OS details can be useful for troubleshooting or setting up the environment.”

“Next, since Drupal requires a database, I am using MySQL for this purpose. If MySQL is not installed, follow the steps below to install it:

Next, Drupal requires a database, and for that, I am using MySQL. If MySQL is not installed, follow these steps to install it:

Installing MySQL

First, run the following command to install MySQL Community Server:

sudo dnf install mysql-server

Then, start the MySQL server:

sudo systemctl start mysqld

Additional Explanation:

“This command starts the MySQL service manually. To enable the service to start automatically when the server reboots, use the following command:”

sudo systemctl enable mysqld

Managing Databases with MySQL Workbench

Normally, you would use the terminal to connect to MySQL, but this time, we will manage the database using a GUI tool called MySQL Workbench. MySQL Workbench allows for visual management, which is especially useful for those unfamiliar with the command line.


Connecting to MySQL and Checking Databases

To connect to MySQL, use the following command:

mysql -u root -p

After connecting, use the following command to display all the databases on the MySQL server:

show databases;

Additional Explanation:

“After connecting to MySQL, use the following command to display a list of all existing databases. This will show all the databases on the server.”

+--------------------+
| Database           |
+--------------------+
| drupaldb           |
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
5 rows in set (0.11 sec)

“Since we are connecting from an external source, some preparation is necessary. You need to allow traffic on port 3306 in the OS firewall. Run the following command to check the current firewall settings:”

sudo firewall-cmd --list-all

“To open port 3306, use the following command:”

sudo firewall-cmd --add-port=3306/tcp --permanent

“To apply the changes, reload the firewall with the following command:”

sudo firewall-cmd --reload

“When you check the status again, you should see a message like the one below. Make sure that ports: 3306/tcp is displayed to confirm that the port has been successfully opened.”

public (active)
target: default
icmp-block-inversion: no
interfaces: enp0s3
sources:
services: dhcpv6-client ssh
ports: 3306/tcp
protocols:
forward: yes
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:

“Additionally, you need to configure the ingress rules in the Oracle Cloud security list to allow external connections. This is to ensure that the server running in the cloud does not block connections from outside. Specifically, you need to allow traffic on port 3306, which MySQL uses.”

How to Set Up Ingress Rules

  1. Log in to the Oracle Cloud dashboard.
  2. From the menu on the left, select Networking and click Security Lists.
  3. Select the security list applied to your instance and click Add Ingress Rule.
  4. Configure the rule as follows:
    • Source CIDR: 0.0.0.0/0 (to allow all IP addresses. However, for security reasons, it is recommended to restrict this. If you want to allow specific IP addresses, enter them here.)
    • Protocol: TCP
    • Port Range: 3306
  5. Save the settings.

Now, external connections to MySQL’s default port 3306 should be allowed.


Connecting with MySQL Workbench

Next, use MySQL Workbench to connect to MySQL. MySQL Workbench is a tool that allows you to visually manage databases, and it’s particularly easier to use than the command line, especially for beginners.

  1. Launch MySQL Workbench.
  2. Create a new connection and enter the following information:
    • Host: The public IP address of your Oracle Cloud instance (e.g., 123.45.67.89)
    • Port: 3306
    • Username: The username you created in MySQL (e.g., root)
    • Password: The password for your MySQL user.

Then, click the Connect button to try connecting.


If an Error Occurs

If you see an error message saying, “Cannot connect,” follow these steps to troubleshoot. First, check the list of users who are allowed to connect to MySQL.

Open the terminal and connect to MySQL:

mysql -u root -p

After connecting, run the following command to check the list of users:

select user, host from mysql.user;

When you run this command, you will see something like the following:

+------------------+-----------+
| user             | host      |
+------------------+-----------+
| mamushi          | localhost |
| mysql.infoschema | localhost |
| mysql.session    | localhost |
| mysql.sys        | localhost |
| root             | localhost |
+------------------+-----------+

Cause of the Problem

As shown in the result, all users are set to connect only from localhost. This means that MySQL is configured to allow local connections only, and it does not permit external connections. localhost refers to the server itself, so users can only connect from within the server.

By default, MySQL does not create users who can connect from outside. Therefore, you need to create a new user who can connect from an external source. This will allow you to connect to the database remotely using tools like MySQL Workbench.

How to Add a User Who Can Connect Externally

Connecting to MySQL

First, open the terminal and connect to MySQL as the root user by running the following command:

mysql -u root -p

You will be prompted to enter the root user’s password. Enter the password and connect to MySQL.

Creating a New User

Next, create a new user who can connect from outside. You can assign any password to this user. The following command creates a user named workbench and allows connections from any IP address:

CREATE USER 'workbench'@'%' IDENTIFIED BY 'Password123#';

Explanation:

  • 'workbench'@'%': The % symbol allows connections from any IP address, meaning this user can connect from anywhere.
  • 'Password123#': This is the password for the user. It is recommended to use a strong password.

Granting Privileges to the User

After creating the user, grant the necessary permissions to operate on the database. The following command grants full access to all databases and tables:

GRANT ALL ON *.* TO 'workbench'@'%';

Explanation:

  • GRANT ALL: This grants all privileges to the user. The user will be able to operate existing databases and create new ones.
  • *.*: This means that the privileges apply to all databases and tables.

Security Consideration

If you’re concerned about security, you can replace the % with a specific IP address to restrict access to only certain locations. For example, if the source IP address is 192.168.1.100, you can modify the command as follows:

CREATE USER 'workbench'@'192.168.1.100' IDENTIFIED BY 'Password123#';
GRANT ALL ON *.* TO 'workbench'@'192.168.1.100';

Explanation:

  • This setting allows the workbench user to connect only from the IP address 192.168.1.100. It is recommended to specify an IP address if you want to strengthen security.

Applying the Privileges

After granting the privileges, run the following command to apply the changes:

FLUSH PRIVILEGES;

Explanation:

  • The FLUSH PRIVILEGES command ensures that the changes are applied to MySQL.

Retrying the Connection with MySQL Workbench

Now, you should be able to connect to MySQL from an external source using the workbench user. Try reconnecting with MySQL Workbench.

  1. Launch MySQL Workbench and create a new connection.
  2. Enter the required information:
    • Host: The public IP address of your Oracle Cloud instance (e.g., 123.45.67.89)
    • Port: 3306
    • Username: workbench
    • Password: Password123#

Click the Connect button to attempt the connection.

If the connection is successful, you will be able to access the MySQL database from an external source.

In this process, we created a new user and configured MySQL to allow external access. One important security consideration is that if you allow connections from any IP address (using %), you should always use a strong password and, if necessary, restrict IP addresses in Oracle Cloud’s security settings or the OS firewall.


External MySQL Access with Docker

Normally, when MySQL is installed on a server, external connections are not allowed by default, and manual configuration is required. However, in a Docker environment, external connections are enabled by default if you configure the correct port mapping when creating the MySQL container. This means that using Docker allows for external access more easily.

How to Enable External Access for MySQL in Docker

When creating a MySQL container in Docker, use the following port mapping to allow external connections:

docker run --name mysql-container -e MYSQL_ROOT_PASSWORD=Password123# -p 3306:3306 -d mysql:latest

Explanation:

  • The -p 3306:3306 option maps port 3306 on the host machine to port 3306 inside the MySQL container. This allows external access to MySQL through port 3306 on the host machine.
  • The MYSQL_ROOT_PASSWORD=Password123# sets the password for the root user. This password will be used when connecting to MySQL from an external source.

Connecting with MySQL Workbench

You can use GUI tools like MySQL Workbench to connect to MySQL within the container from an external source. The connection information is as follows:

  • Host: The IP address of the host machine (e.g., 123.45.67.89)
  • Port: 3306
  • Username: root
  • Password: The password you set when creating the container (e.g., Password123#)

With this configuration, you should be able to connect to the MySQL container from an external source.


Differences Between Docker and Traditional MySQL Installation

Traditional Installation:

When MySQL is installed on a server, external connections are not allowed by default. You need to manually configure the firewall and MySQL user settings.

With Docker:

In Docker, simply setting up the port mapping when creating the container allows external connections. This is convenient, especially for beginners, as it reduces the complexity of configuration.


Security Considerations

Even in a Docker environment, where external access is easier, it is important to set a strong password and, if necessary, restrict access through firewalls or Oracle Cloud’s security lists. When allowing external connections, always be mindful of security settings.


In a Docker environment, MySQL allows external connections by default if the port mapping is set correctly, making it easier to access from outside compared to a traditional installation. While Docker simplifies MySQL configuration for beginners, it is important to ensure that proper security measures are in place.


Next: Creating a Database and Importing a CSV File

Right-click on the newly created database and select Table Data Import Wizard. After the import is complete, expand the table and click Select Rows to view the data inside.

At this point, the SQL statements displayed in MySQL Workbench can also be copied and pasted into the terminal for execution.

SELECT * FROM test.`9984`;

The Importance of Defining the Table Structure Before Importing a CSV File

When importing a CSV file into MySQL, it is important to define the structure of the target table beforehand. If the table structure is not defined, errors may occur during the import process. One key point is ensuring that the columns in the CSV file match the columns in the database correctly.

1. Creating the Table

First, use MySQL Workbench to create a table. Execute the following SQL statement in MySQL Workbench to create a new table. This table is defined to correspond with the columns in the CSV file.

CREATE TABLE s20240913 (
ID INT AUTO_INCREMENT PRIMARY KEY,
code VARCHAR(255),
current_date DATE,
current_time TIME,
stock_name VARCHAR(255),
current_value DECIMAL(10,2),
volume BIGINT,
trading_value BIGINT,
market_cap BIGINT,
reverse_rate DECIMAL(10,2),
credit_ratio DECIMAL(10,2),
margin_ratio DECIMAL(10,2),
ex_dividend_date VARCHAR(255),
earnings_announcement_date VARCHAR(255),
PER DECIMAL(10,2),
PBR DECIMAL(10,2),
market_section_name VARCHAR(255)
);

Explanation:

  • Each column corresponds to a column in the CSV file, and the data types are appropriately set.
  • The ID column is set to AUTO_INCREMENT so that numbers are automatically assigned.
  • The date and numeric data types (such as DATE, DECIMAL, and BIGINT) are properly configured.

2. Steps for Importing a CSV File

Next, we will import the CSV file using MySQL Workbench.

Selecting the Table for Import

Right-click on the table you created and select Table Data Import Wizard.

Selecting the CSV File

Choose the CSV file that you want to import.

Checking Column Mapping

Make sure that each column in the CSV file matches the corresponding column in the table. Pay special attention to the data types, especially for dates and numbers. If they are not correctly mapped, errors may occur during the import process.


3. Verifying Data After the Import

Once the import is complete, expand the table and click Select Rows to check if the data has been imported correctly.

Tip:
The SQL statements displayed in MySQL Workbench can also be executed in the terminal. If an error occurs during the import process, check the error message to identify the problem, and modify the table definition or CSV file format as needed.


Reasons for CSV Import Failure and Solutions

When importing a CSV file into MySQL or other databases, the following issues can cause failures:

1. Blank Cells or Specific Characters

If there are blank cells, standalone - symbols, or long sequences of # symbols in the CSV file, MySQL may not process them correctly, causing errors. These values need to be converted into an appropriate format for the database.

2. Character Encoding Issues

CSV files containing Japanese data are often encoded in UTF-8 or Shift-JIS (also known as cp932). However, if the database does not support these encodings, you may encounter garbled text or import errors. MySQL typically uses utf8mb4 by default, so you may need to convert the CSV file to the appropriate encoding.


Solution via Script

This script is an effective way to preprocess the contents of a CSV file to prevent import errors. Here’s a brief explanation of what the script does.

Script Explanation

import csv
import re

# Specify the path to the CSV file as a raw string
input_file = r'c:\temp\20240913.csv'
output_file = r'c:\temp\import.csv'

# Open the input file and write to the output file
with open(input_file, mode='r', newline='', encoding='cp932') as infile, \
open(output_file, mode='w', newline='', encoding='cp932') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)

for row in reader:
# Replace blank cells, single hyphens ("-"), or long consecutive hash marks ("#") with 'NULL'
new_row = ['NULL' if x.strip() == '' or x.strip() == '-' or re.match(r'^#+$', x.strip()) else x for x in row]
writer.writerow(new_row)

This script replaces specific values that may cause errors during CSV file import (such as blank cells, single hyphens, and consecutive hash marks) with 'NULL'. This helps prevent errors when importing the data into the database.

Handling Blank Cells and Hyphens

If there are blank cells or single hyphens in the CSV file, importing them directly into the database could result in errors. This script replaces those values with 'NULL', allowing the database to handle the data correctly.

Handling Consecutive Hash Marks

If there are consecutive hash marks (#) in the CSV file, they will also be replaced with 'NULL'. These symbols often represent meaningless data and are best replaced to avoid import errors.

Character Encoding

This script uses the cp932 character encoding, which is based on Shift-JIS. If the database requires UTF-8, you can modify the script as follows:

with open(input_file, mode='r', newline='', encoding='utf-8') as infile, \
open(output_file, mode='w', newline='', encoding='utf-8') as outfile:

Tips for Importing a CSV File

Pay Attention to Encoding

If the character encoding of the CSV file does not match that of the database, you may encounter garbled text or import errors. Since MySQL typically recommends utf8mb4, it is advisable to save the CSV file in UTF-8 format.

Handling Blank Cells and Special Characters

By replacing blank cells or specific symbols with 'NULL', you can prevent errors during the import process. This preprocessing step is especially important when importing large amounts of data.

The main reasons for CSV file import failures are blank cells, specific symbols, and character encoding mismatches. This script provides an effective way to solve these issues, and it can be easily executed even by beginners. To prevent import errors, it is crucial to preprocess the CSV file and set the correct character encoding as needed.

Please share if you like it!
TOC