In the current era, data processing and analysis have become a fundamental pillar for decision making in various domains, from industry to science and medicine. Data, taken together, represents a rich source of information that can drive advancement and efficiency in multiple fields. However, the nature of data has evolved significantly in recent decades, presenting new challenges and opportunities. Modern data exhibits increasing volatility and diversity in its structure and format. This diversity manifests itself in two fundamental categories: structured and unstructured data. Structured data is characterized by its orderly organization in tables and relationships, which makes it easier to manipulate, consult and analyze. This format is common in relational databases and spreadsheets, and its predefined structure allows for efficient operations. On the other hand, unstructured data lacks this tabular organization and instead takes on more diverse forms, such as images, audio, video, free text, and more. Its lack of predefined structure complicates automated manipulation and analysis, leading to a more complex and challenging process. To date, the tools available to address this unstructured data are based on approaches that involve exporting and importing data into artificial intelligence models. These models seek to identify relevant characteristics for subsequent consultation and analysis. However, this approach introduces considerable processing time and can negatively impact query speed, especially on high-volume data sets. To address this challenge, we propose an innovative solution in the form of a Database Management System (DBMS) that integrates advanced neural models. Our research focuses on the architecture of this DBMS and its ability to efficiently perform query and analysis tasks on unstructured data. We seek to measure the performance of this architecture and compare it with existing techniques for these operations, with the aim of evaluating its effectiveness and its potential impact on optimizing the processing and analysis of unstructured data. This paper presents a documentation of the engine developed with highly coupled ANNs.
"ToySQL" is a small Python-based database management system (DBMS) project. Its purpose is to integrate an Artificial Neural Network (ANN) model at the kernel level to handle queries involving images. ToySQL makes use of the SQL language, specifically the ISO/IEC 9075-1:2023 standard, although it does not use all the features of the standard, but rather focuses on what is essential to carry out basic operations.
In the first instance, you must clone the GitHub repository: https://github.com/Santixddd2/ToySQL using the command:
git clone https://github.com/Santixddd2/ToySQL.git
The repository has a file called setup.py, which is responsible for installing the library. The next step is to go to the folder where the repository has been cloned and open a terminal and run the following command:
python setup.py install
Command which installs a folder in which the ToySQL engine can now be used, the folder has the following structure.
To import and use the library, the following commands must be imported into the Test.py file:
Imports all ToySQL classes and functions to interact with a database in a manner.
Import of class_comparation found in the Interpretatcion_functions file, used to interpret model predictions related to object classification.
Import of load_model found in the keras.model library, which is responsible for loading the model.
To create a database in the Databases folder, you must create a file with the name of the database with the extension .hdf5
Or go to the Databases folder directory and execute the following command: type null > NAME_BD .hdf5
Note: Change NAME_BD to the desired name.
Example:
C:\Users\User1\Desktop\ToySQL\Databases>type null > NAME_BD .hdf5
To use the database created earlier in the Test.py file we use the following instructions:
Command to define a name variable that stores the path to the database file (pb.hdf5)
Command to define a path variable that stores the path to the previously trained model file ( class_model.h5)
Command to load the previously defined model using the load_model function.
Definition of dimensions of the images to be used that represents an image of size 299 x 299 pixels with 3 color channels (RGB).
Regarding the process of opening and closing the connection, it is carried out as follows.
credentials = []
Define a list of credentials, in this case it is empty. The implementation will take place in the next version (Version 2)
obj = DatabaseController()
Create an object of type DatabaseController of the previously imported ToySQL class.
obj.open_connection(name,credentials)
The obj object uses an open_connection connection method which takes the database name and credentials as arguments.
To carry out an instruction to the database, the following step is defined:
query = 'SQL STATEMENT'
val = obj.query( query, model = model, dimension = dimension, function = class_comparison)
print(val)
And we proceed to execute the SQL query (query) using the DatabaseController method of the query object:
query: The SQL statement to be executed. model: The pre-trained and pre-loaded deep learning model. dimension: The expected input image size for the model. function: The class_comparation function, used to process the model output for the specific task
Example:
query = 'CREATE TABLE Users(Name VARCHAR(20),Age INT'
val = obj.query( query, model = model, dimension = dimension, function = class_comparison)
print(val)
The provided code defines a CREATE TABLE query to create a Users table with Name and Age columns.
Which will have as output:
Output:
connection open Schema Users created successfully, please enter again to the database
The UNSTRUCTURED type is a data type designed in ToySQL to work with unstructured data (Images)
To create a table, use the following instruction:query = 'CREATE TABLE Coms (Name VARCHAR(20),Picture UNSTRUCTURED)'
And as for inserting images, the following instruction is used which defines a name in this example Sebastian and a path to the image, located in the Images folder: query = 'INSERT INTO Coms VALUES("Sebastian",Images/Sebas_0056.png)'
In the process of performing a where query with unstructured data, the following is used, which compares Picture with the path of the selected image from the Images folder: query = "SELECT * FROM Coms WHERE Picture = Route(Images/Sebastian_0089.png)"
“ToySQL” has 3 levels in its architecture, in which the corresponding functions are found:
It is the most superficial part of the engine. Here the queries are received in String format, then they are parsed, prepared and the proper checks are made.
At this level of the architecture, the hdf5 type database file is accessed, the data is loaded into RAM and the appropriate insertion, deletion and update operations are carried out within the file. Additionally, this phase serves as a bridge between the high level and the RAM.
At this level, all the corresponding object verification operations are performed (objects returned by the High Level). Then the proper insertion, selection, deletion and update operations are carried out in the RAM and cache to later return to the Files level and save the due changes.
The artificial intelligence model is integrated with the engine. We identified two different approaches to coupling the engine with the model. One of them involves loading the data into RAM, allowing only the images or unstructured data needed for the query to be loaded. This strategy reduces consultation time. The other approach is to save the data, only storing images of classes that are not already present in the database. Both approaches help improve query time, but sacrifice performance in two other modules: inserting or loading data.
Coupling when loading data sacrifices preload time into RAM but optimizes queries and provides better abstraction to the user. Coupling when saving data sacrifices insertion time but gets the same optimization in queries and user abstraction. In ToySQL, it was chosen to sacrifice the pre-loading time, which results in a high coupling of the model in the module that is responsible for this task. It is important to note that this high coupling provides us with a direct or proportional relationship in the data that is loaded into RAM. The worse the model's performance, the more likely it is to get inaccurate data, which can lengthen query times (for example, taking more images of a class than there actually are). On the contrary, if the model is well trained, the loaded data will be accurate and maintain good query performance.
At this level, the string query is parsed into objects that the machine understands and interprets. HE The data is prepared and the respective checks are made.
The config class is used to store configuration related to artificial intelligence models and the size of images for processing in the database.
init(self, model, shape): Constructor of the class. Initializes a new configuration object with an AI model and the size of the images.
The DatabaseController class provides methods for interacting with a database. It allows you to open and close connections, create databases, run queries, and potentially manage configurations.
init(self): Constructor of the class. Initializes a new DatabaseController object. open_connection(self, route, credentials, model=None, dimension=None, function=None): Method to establish a connection with the database. Parameters: route (str): Database path. credentials (str): Connection credentials. model (str, optional): AI model to use. dimension (tuple, optional): Dimensions of the images. function (optional): Additional processing function. close_connection(self): Method to close the connection with the database. create_database(self, name, route): Method to create a new database. Parameters: name (str): Name of the database. route (str): Route where the database will be created. query(self, query, model=None, dimension=None, function=None): Method to execute a SQL query against the database. Parameters: query (str): SQL query to execute. model (str, optional): AI model to use. dimension (tuple, optional): Dimensions of the images. function (optional): Additional processing function. 5.3 DataBase Class 5.3.1 Description
The Database class is used to manage the creation of database files.
init(self): Constructor of the class. Initializes an empty list of databases. append(self, database): Method to add a new database name to the list. Parameters: database (str): Name of the database to add. create(self, name, route): Method to create a new database file with the extension .dat. Parameters: name (str): Name of the database. route (str): Path where the database file will be created.
Description: It is responsible for checking the integrity of the data according to the constraints of the schema and the provided query.
init (self): Constructor of the class. It does not initialize parameters because it is a functional class. check_type(self, data, columns, config, obj): It is responsible for checking the type of the class. It receives an “attribute” type object, the entered data, the corresponding configuration and a counter. is_primary(self): Method to check if the attribute is a primary key.
It is responsible for checking the integrity of the data according to the constraints of the schema and the query provided.
It does not yet present methods.
The parser class is responsible for parsing and executing SQL queries within the database system. It takes a query string and a reference to the database object as input, parses the query type (CREATE, INSERT, SELECT, DELETE, UPDATE), and delegates execution to the corresponding methods.
init(self, query, db): Constructor of the class. Initializes a new parser object with a query and a reference to the database. Parameters: query (str): SQL query to parse and execute. db (object): Reference to the database object. QUERY(self, config): Method to execute the SQL query. Parameters: config: Optional configuration. Returns: Result of the executed query. CREATE(self, statement, db, config): Method to process CREATE type queries. Parameters: statement: SQL query statement. db (object): Reference to the database object. config: Optional configuration. INSERT(self, statement, db, config): Method for processing INSERT type queries.
Parameters: statement: SQL query statement. db (object): Reference to the database object. config: Optional configuration. SELECT(self, statement, db, config): Method for processing SELECT type queries. Parameters: statement: SQL query statement. db (object): Reference to the database object. config: Optional configuration. DELETE(self, statement, db): Method to process DELETE type queries. Parameters: statement: SQL query statement. db (object): Reference to the database object. UPDATE(self, statement, db): Method for processing UPDATE type queries. Parameters: statement: SQL query statement. db (object): Reference to the database object.
It is a bridge between the file and RAM. In addition, it also serves as a bridge between the high level and RAM. 6.1 Kernel_schemas class
This class is responsible for managing the schemas (database structures) within the system. Provides methods for creating, reading, inserting, selecting, deleting, and updating data within these schemas. Additionally, it manages references between schemas and maintains the overall database file.
init(self, db_name): Constructor of the class. Initializes the kernel_schemas object with the name of the database. append_schema(self, attributes, name): Method to add a new schema to the system. read_schema_test(self, db, config): Read existing schemas from the database file. insert_data(self, schema, data, db, config): Inserts data into a specific schema. select_data(self, name, dat, columns): Selects data from a given schema. delete_data(self, name, dat, db): Delete data from a specific schema. update_data(self, name, set, dat): Updates data in a given schema. print_data(self, columns, r): Prints the selected data. save_reference(self, schema, attribute, name, atr): Saves references between schemas. get_schema(self, name): Checks if a schema with the given name exists. save_file_test(self, schema=None, attributes=None, data=None): Saves the schemas and data to the database file.
Represents an individual schema within the database. Manages the attributes (columns) of the schema and provides methods for creating, manipulating, and querying the associated data.
init(self, name, attributes): Constructor of the class. create_table(self): Creates the table structure for the schema. insert_table(self, dat, db, config): Inserts data into the schema table. select_table(self, dat, columns): Select data from the schema table. delete_table(self, dat, db): Delete data from the schema table. update_table(self, set, dat): Updates data in the schema table. reference_table(self, attribute, name, atr): Sets a reference between attributes. selection(self, dat): Performs a data selection in the schema table. select_type(self, obj, dat): Selects data based on the attribute type. reference_integrity(self, db, dat, type): Checks the referential integrity of the data. reference_verification(self, dat, db, i, type): Checks referential integrity. return_data(self, columns, r): Returns the selected data.
The attribute class represents an individual attribute (column) within a database schema (table). It is responsible for storing and managing the data associated with the attribute, as well as performing validations and reference checks.
init(self, name, type, length, reference, config): Constructor of the class. Initializes a new attribute object with the attribute name, type, length, reference, and configuration. Parameters: name (str): Name of the attribute. type (str): Type of the attribute. length (int): Length of the attribute. reference (list): Reference of the attribute. config: Optional configuration. insert(self, dat, id, db, config): Method to insert data into the attribute. Parameters: dat: Data to insert. id: Data identifier. db (object): Reference to the database object. config: Optional configuration.
insert_unstructured(self, dat, id, config): Method to insert unstructured data (images) into the attribute. insert_matrix(self, dat, id, config): Method to insert image matrices into the attribute. insert_normal(self, dat, id): Method to insert normal data into the attribute. select_name(self, dat): Method to select data by attribute name. select_uuid(self, dat): Method to select data by identifier (UUID) of the attribute. select_all(self): Method to select all unique identifiers of the attribute. select_unstructured(self, dat): Method to select unstructured data (images) similar to the provided image. delete_uuid(self, dat): Method to delete data by identifier (UUID) of the attribute. delete_name(self, dat): Method to delete data by attribute name. update_name(self, dat, set): Method to update data by attribute name. update_uuid(self, dat, set): Method to update data by identifier (UUID) of the attribute. check_type(self, data, columns, config): Method to check whether the data type is compatible with the attribute type. reference_integrity(self, db, dat, type): Method to check referential integrity before inserts, updates, or deletes. reference_checking(self, dat, db, i, type): Method to perform additional reference checks and manage counters.
The data class is responsible for storing and retrieving the data value and its identifier (ID).
init(self, data, id): Constructor of the class. Initializes a new data object with the given data and identifier. Parameters: data: Data value. id: Data identifier.
The unstructured class represents unstructured data, specifically images, and performs comparison operations with other images using deep learning models.
init(self, ref, config): Constructor of the class. Initializes a new unstructured object with the provided image reference and settings.
Parameters: ref: Reference to the image. config: Model configuration. summary(self): Method to print a summary of the architecture of the convolutional and classification models. comparison(self, matrix): Method to compare the current image with a matrix of feature vectors. to_key(self): Method to convert the current image to a summary key. to_vector(self): Method to convert the current image into a feature vector. to_matrix(self, vector): Method to convert a vector to a feature matrix.
Development of new features
Steps to follow:
Parse the query (create functions in parser). Sebas String checks and preparation. Sebas RAM checks and preparation. Santi Logic of operations in RAM. Santi Logic of operations on the Hard Drive. Mayer
String Transformation Methods
TransformsC(self, attributes): Transforms the attributes of the CREATE query for processing. TransformsA(self, attributes): Transforms the attributes of the INSERT query for processing. TransformsCO(self, name, where, columns, config): Transforms the SELECT, DELETE and UPDATE query attributes for processing. IsUnstructQuery(self, dat): Checks if the query is for unstructured data. TransformsRO(self, route): Gets the path of the unstructured data. haslenght(self, type): Gets the length of a data type. hasreference(self, name, atr, type, db): Checks if there are references in the query. TransformsAtr(self, type): Transforms the query attributes for processing. clear_attributes(self, attributes): Clear the attributes of the query. is_type(self, type, columns): Checks if the data type is valid.
This documentation provides a detailed description of the parser class and its methods, as well as an explanation of the string transformation methods used to process SQL queries.









