Skip to content

DogRunningSoftware/ToySQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

254 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ToySQL

Introduction

In the current era, data processing and analysis have become a fundamental pillar for decision making in various domains, from industry to science and medicine. Data, taken together, represents a rich source of information that can drive advancement and efficiency in multiple fields. However, the nature of data has evolved significantly in recent decades, presenting new challenges and opportunities. Modern data exhibits increasing volatility and diversity in its structure and format. This diversity manifests itself in two fundamental categories: structured and unstructured data. Structured data is characterized by its orderly organization in tables and relationships, which makes it easier to manipulate, consult and analyze. This format is common in relational databases and spreadsheets, and its predefined structure allows for efficient operations. On the other hand, unstructured data lacks this tabular organization and instead takes on more diverse forms, such as images, audio, video, free text, and more. Its lack of predefined structure complicates automated manipulation and analysis, leading to a more complex and challenging process. To date, the tools available to address this unstructured data are based on approaches that involve exporting and importing data into artificial intelligence models. These models seek to identify relevant characteristics for subsequent consultation and analysis. However, this approach introduces considerable processing time and can negatively impact query speed, especially on high-volume data sets. To address this challenge, we propose an innovative solution in the form of a Database Management System (DBMS) that integrates advanced neural models. Our research focuses on the architecture of this DBMS and its ability to efficiently perform query and analysis tasks on unstructured data. We seek to measure the performance of this architecture and compare it with existing techniques for these operations, with the aim of evaluating its effectiveness and its potential impact on optimizing the processing and analysis of unstructured data. This paper presents a documentation of the engine developed with highly coupled ANNs.

Project description

"ToySQL" is a small Python-based database management system (DBMS) project. Its purpose is to integrate an Artificial Neural Network (ANN) model at the kernel level to handle queries involving images. ToySQL makes use of the SQL language, specifically the ISO/IEC 9075-1:2023 standard, although it does not use all the features of the standard, but rather focuses on what is essential to carry out basic operations.

How to use?

Install library

In the first instance, you must clone the GitHub repository: https://github.com/Santixddd2/ToySQL using the command:

image

git clone https://github.com/Santixddd2/ToySQL.git

clonepng

The repository has a file called setup.py, which is responsible for installing the library. The next step is to go to the folder where the repository has been cloned and open a terminal and run the following command:

python setup.py install

Command which installs a folder in which the ToySQL engine can now be used, the folder has the following structure.

3

Import library

To import and use the library, the following commands must be imported into the Test.py file:

6

Imports all ToySQL classes and functions to interact with a database in a manner.

7

Import of class_comparation found in the Interpretatcion_functions file, used to interpret model predictions related to object classification.

10

Import of load_model found in the keras.model library, which is responsible for loading the model.

Create hdf5 database

To create a database in the Databases folder, you must create a file with the name of the database with the extension .hdf5

Or go to the Databases folder directory and execute the following command: type null > NAME_BD .hdf5

Note: Change NAME_BD to the desired name.

Example:

C:\Users\User1\Desktop\ToySQL\Databases>type null > NAME_BD .hdf5

To use the database created earlier in the Test.py file we use the following instructions:

13

Command to define a name variable that stores the path to the database file (pb.hdf5)

20

Command to define a path variable that stores the path to the previously trained model file ( class_model.h5)

24

Command to load the previously defined model using the load_model function.

30

Definition of dimensions of the images to be used that represents an image of size 299 x 299 pixels with 3 color channels (RGB).

Open or close connection

Regarding the process of opening and closing the connection, it is carried out as follows.

credentials = []

Define a list of credentials, in this case it is empty. The implementation will take place in the next version (Version 2)

obj = DatabaseController()

Create an object of type DatabaseController of the previously imported ToySQL class.

obj.open_connection(name,credentials)

The obj object uses an open_connection connection method which takes the database name and credentials as arguments.

How do I configure?

Make a query

To carry out an instruction to the database, the following step is defined:

query = 'SQL STATEMENT'

val = obj.query( query, model = model, dimension = dimension, function = class_comparison)

print(val)

And we proceed to execute the SQL query (query) using the DatabaseController method of the query object:

query: The SQL statement to be executed. model: The pre-trained and pre-loaded deep learning model. dimension: The expected input image size for the model. function: The class_comparation function, used to process the model output for the specific task

Example:

query = 'CREATE TABLE Users(Name VARCHAR(20),Age INT'

val = obj.query( query, model = model, dimension = dimension, function = class_comparison)

print(val)

The provided code defines a CREATE TABLE query to create a Users table with Name and Age columns.

Which will have as output:

Output:

connection open Schema Users created successfully, please enter again to the database

Unstructured type

The UNSTRUCTURED type is a data type designed in ToySQL to work with unstructured data (Images)

To create a table, use the following instruction:query = 'CREATE TABLE Coms (Name VARCHAR(20),Picture UNSTRUCTURED)'

And as for inserting images, the following instruction is used which defines a name in this example Sebastian and a path to the image, located in the Images folder: query = 'INSERT INTO Coms VALUES("Sebastian",Images/Sebas_0056.png)'

In the process of performing a where query with unstructured data, the following is used, which compares Picture with the path of the selected image from the Images folder: query = "SELECT * FROM Coms WHERE Picture = Route(Images/Sebastian_0089.png)"

Project architecture

“ToySQL” has 3 levels in its architecture, in which the corresponding functions are found:

High level

It is the most superficial part of the engine. Here the queries are received in String format, then they are parsed, prepared and the proper checks are made.

Files and loading in RAM

At this level of the architecture, the hdf5 type database file is accessed, the data is loaded into RAM and the appropriate insertion, deletion and update operations are carried out within the file. Additionally, this phase serves as a bridge between the high level and the RAM.

Ram and cache management

At this level, all the corresponding object verification operations are performed (objects returned by the High Level). Then the proper insertion, selection, deletion and update operations are carried out in the RAM and cache to later return to the Files level and save the due changes.

ANN routines

The artificial intelligence model is integrated with the engine. We identified two different approaches to coupling the engine with the model. One of them involves loading the data into RAM, allowing only the images or unstructured data needed for the query to be loaded. This strategy reduces consultation time. The other approach is to save the data, only storing images of classes that are not already present in the database. Both approaches help improve query time, but sacrifice performance in two other modules: inserting or loading data.

Coupling when loading data sacrifices preload time into RAM but optimizes queries and provides better abstraction to the user. Coupling when saving data sacrifices insertion time but gets the same optimization in queries and user abstraction. In ToySQL, it was chosen to sacrifice the pre-loading time, which results in a high coupling of the model in the module that is responsible for this task. It is important to note that this high coupling provides us with a direct or proportional relationship in the data that is loaded into RAM. The worse the model's performance, the more likely it is to get inaccurate data, which can lengthen query times (for example, taking more images of a class than there actually are). On the contrary, if the model is well trained, the loaded data will be accurate and maintain good query performance.

High level

At this level, the string query is parsed into objects that the machine understands and interprets. HE The data is prepared and the respective checks are made.

config class

Description

The config class is used to store configuration related to artificial intelligence models and the size of images for processing in the database.

Methods

init(self, model, shape): Constructor of the class. Initializes a new configuration object with an AI model and the size of the images.

DatabaseController class

Description

The DatabaseController class provides methods for interacting with a database. It allows you to open and close connections, create databases, run queries, and potentially manage configurations.

Methods

init(self): Constructor of the class. Initializes a new DatabaseController object. open_connection(self, route, credentials, model=None, dimension=None, function=None): Method to establish a connection with the database. Parameters: route (str): Database path. credentials (str): Connection credentials. model (str, optional): AI model to use. dimension (tuple, optional): Dimensions of the images. function (optional): Additional processing function. close_connection(self): Method to close the connection with the database. create_database(self, name, route): Method to create a new database. Parameters: name (str): Name of the database. route (str): Route where the database will be created. query(self, query, model=None, dimension=None, function=None): Method to execute a SQL query against the database. Parameters: query (str): SQL query to execute. model (str, optional): AI model to use. dimension (tuple, optional): Dimensions of the images. function (optional): Additional processing function. 5.3 DataBase Class 5.3.1 Description

The Database class is used to manage the creation of database files.

Methods

init(self): Constructor of the class. Initializes an empty list of databases. append(self, database): Method to add a new database name to the list. Parameters: database (str): Name of the database to add. create(self, name, route): Method to create a new database file with the extension .dat. Parameters: name (str): Name of the database. route (str): Path where the database file will be created.

Check class

Description

Description: It is responsible for checking the integrity of the data according to the constraints of the schema and the provided query.

Attributes

Methods

init (self): Constructor of the class. It does not initialize parameters because it is a functional class. check_type(self, data, columns, config, obj): It is responsible for checking the type of the class. It receives an “attribute” type object, the entered data, the corresponding configuration and a counter. is_primary(self): Method to check if the attribute is a primary key.

Prepare class

Description

It is responsible for checking the integrity of the data according to the constraints of the schema and the query provided.

Methods

It does not yet present methods.

Parser class

Description

The parser class is responsible for parsing and executing SQL queries within the database system. It takes a query string and a reference to the database object as input, parses the query type (CREATE, INSERT, SELECT, DELETE, UPDATE), and delegates execution to the corresponding methods.

Methods

init(self, query, db): Constructor of the class. Initializes a new parser object with a query and a reference to the database. Parameters: query (str): SQL query to parse and execute. db (object): Reference to the database object. QUERY(self, config): Method to execute the SQL query. Parameters: config: Optional configuration. Returns: Result of the executed query. CREATE(self, statement, db, config): Method to process CREATE type queries. Parameters: statement: SQL query statement. db (object): Reference to the database object. config: Optional configuration. INSERT(self, statement, db, config): Method for processing INSERT type queries.

Parameters: statement: SQL query statement. db (object): Reference to the database object. config: Optional configuration. SELECT(self, statement, db, config): Method for processing SELECT type queries. Parameters: statement: SQL query statement. db (object): Reference to the database object. config: Optional configuration. DELETE(self, statement, db): Method to process DELETE type queries. Parameters: statement: SQL query statement. db (object): Reference to the database object. UPDATE(self, statement, db): Method for processing UPDATE type queries. Parameters: statement: SQL query statement. db (object): Reference to the database object.

File and load into RAM

It is a bridge between the file and RAM. In addition, it also serves as a bridge between the high level and RAM. 6.1 Kernel_schemas class

Description

This class is responsible for managing the schemas (database structures) within the system. Provides methods for creating, reading, inserting, selecting, deleting, and updating data within these schemas. Additionally, it manages references between schemas and maintains the overall database file.

Methods

init(self, db_name): Constructor of the class. Initializes the kernel_schemas object with the name of the database. append_schema(self, attributes, name): Method to add a new schema to the system. read_schema_test(self, db, config): Read existing schemas from the database file. insert_data(self, schema, data, db, config): Inserts data into a specific schema. select_data(self, name, dat, columns): Selects data from a given schema. delete_data(self, name, dat, db): Delete data from a specific schema. update_data(self, name, set, dat): Updates data in a given schema. print_data(self, columns, r): Prints the selected data. save_reference(self, schema, attribute, name, atr): Saves references between schemas. get_schema(self, name): Checks if a schema with the given name exists. save_file_test(self, schema=None, attributes=None, data=None): Saves the schemas and data to the database file.

RAM and cache management

Kernel_attributes class

Description

Represents an individual schema within the database. Manages the attributes (columns) of the schema and provides methods for creating, manipulating, and querying the associated data.

Methods

init(self, name, attributes): Constructor of the class. create_table(self): Creates the table structure for the schema. insert_table(self, dat, db, config): Inserts data into the schema table. select_table(self, dat, columns): Select data from the schema table. delete_table(self, dat, db): Delete data from the schema table. update_table(self, set, dat): Updates data in the schema table. reference_table(self, attribute, name, atr): Sets a reference between attributes. selection(self, dat): Performs a data selection in the schema table. select_type(self, obj, dat): Selects data based on the attribute type. reference_integrity(self, db, dat, type): Checks the referential integrity of the data. reference_verification(self, dat, db, i, type): Checks referential integrity. return_data(self, columns, r): Returns the selected data.

Description

The attribute class represents an individual attribute (column) within a database schema (table). It is responsible for storing and managing the data associated with the attribute, as well as performing validations and reference checks.

Methods

init(self, name, type, length, reference, config): Constructor of the class. Initializes a new attribute object with the attribute name, type, length, reference, and configuration. Parameters: name (str): Name of the attribute. type (str): Type of the attribute. length (int): Length of the attribute. reference (list): Reference of the attribute. config: Optional configuration. insert(self, dat, id, db, config): Method to insert data into the attribute. Parameters: dat: Data to insert. id: Data identifier. db (object): Reference to the database object. config: Optional configuration.

insert_unstructured(self, dat, id, config): Method to insert unstructured data (images) into the attribute. insert_matrix(self, dat, id, config): Method to insert image matrices into the attribute. insert_normal(self, dat, id): Method to insert normal data into the attribute. select_name(self, dat): Method to select data by attribute name. select_uuid(self, dat): Method to select data by identifier (UUID) of the attribute. select_all(self): Method to select all unique identifiers of the attribute. select_unstructured(self, dat): Method to select unstructured data (images) similar to the provided image. delete_uuid(self, dat): Method to delete data by identifier (UUID) of the attribute. delete_name(self, dat): Method to delete data by attribute name. update_name(self, dat, set): Method to update data by attribute name. update_uuid(self, dat, set): Method to update data by identifier (UUID) of the attribute. check_type(self, data, columns, config): Method to check whether the data type is compatible with the attribute type. reference_integrity(self, db, dat, type): Method to check referential integrity before inserts, updates, or deletes. reference_checking(self, dat, db, i, type): Method to perform additional reference checks and manage counters.

Data class

Description

The data class is responsible for storing and retrieving the data value and its identifier (ID).

Methods

init(self, data, id): Constructor of the class. Initializes a new data object with the given data and identifier. Parameters: data: Data value. id: Data identifier.

unstructured

Description

The unstructured class represents unstructured data, specifically images, and performs comparison operations with other images using deep learning models.

Methods

init(self, ref, config): Constructor of the class. Initializes a new unstructured object with the provided image reference and settings.

Parameters: ref: Reference to the image. config: Model configuration. summary(self): Method to print a summary of the architecture of the convolutional and classification models. comparison(self, matrix): Method to compare the current image with a matrix of feature vectors. to_key(self): Method to convert the current image to a summary key. to_vector(self): Method to convert the current image into a feature vector. to_matrix(self, vector): Method to convert a vector to a feature matrix.

Framework

How to add new functionality

Development of new features

Steps to follow:

Parse the query (create functions in parser). Sebas String checks and preparation. Sebas RAM checks and preparation. Santi Logic of operations in RAM. Santi Logic of operations on the Hard Drive. Mayer

String Transformation Methods

TransformsC(self, attributes): Transforms the attributes of the CREATE query for processing. TransformsA(self, attributes): Transforms the attributes of the INSERT query for processing. TransformsCO(self, name, where, columns, config): Transforms the SELECT, DELETE and UPDATE query attributes for processing. IsUnstructQuery(self, dat): Checks if the query is for unstructured data. TransformsRO(self, route): Gets the path of the unstructured data. haslenght(self, type): Gets the length of a data type. hasreference(self, name, atr, type, db): Checks if there are references in the query. TransformsAtr(self, type): Transforms the query attributes for processing. clear_attributes(self, attributes): Clear the attributes of the query. is_type(self, type, columns): Checks if the data type is valid.

This documentation provides a detailed description of the parser class and its methods, as well as an explanation of the string transformation methods used to process SQL queries.

About

ToySQL is a little DBMS project in python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors