Effective Interpretation and Replication of Sign Language Using Robotics Arms
The usage of computer vision and artificial intelligence has become an important factor for many real-world applications. The automatic recognition of hand gestures using these two technologies are very much useful in sign language recoginition. In this paper we are going to see about the interpretation of the American Sign Language where the robots will recognize the sign and convey the information through audio or text to the normal persons and replicate the reply to the deaf and dumb people in sign language form using 3D printed robotic arms.
Index Terms— Artificial Intelligence, Computer Vision, Sign Language Interpretation, Convolution Neural Networks, Robotic Arms.
In today’s world voice communication plays an important role in everybody’s life. In fact, it is one of the fundamental reason for the quick development of the mankind especially in various fields such as sports, music, architecture, science, technology and so on. A day cannot start without a voice communication and at the same time a day cannot end without a voice communication. But even then, there are people who cannot able to speak or hear because of their disability. The problems faced by those people are unimaginable as they cannot communicate like the normal people or express. Especially those children’s who born with this disability will find it more difficult to learn. The best way to express their feelings or their thoughts can only be performed with the help of sign language which requires lots of complex hand movements. It is saild be complex because evene a small posture can have a variety of possible meanings. The way those people can communicate has seen a rapid development from pointing out few objects that is present in some directions to having a specific set of gestures in order to express the meaning in the form of sign language. The vocally challenged and hearing-impaired people use different forms of sign language since it varies from one country to other country.But, the main problem is that the number of people who know this form language (interpreters) are very less and therefore it is not feasible for such a smaller number of people to work with the disabled people. Therefore, they require some kind effective techniques to make them learn and communicate effectively so that they won’t feel no less than the normal people.
There has been lot of developments in the field of technologies to help these persons and so far, the most effective method is by using a robot. Robot uses different methods like machine learning, artificial intelligence to help the humans in their day to day life activities. The robot that helps these people by converting the text into a sign language. This action is achieved with the help of 3D arm printing and there are robots that can teach rhymes and other basic things to the children with disability. But these types of robots could not able to give proper face to face communication with those people. Therefore, we propose a robot that can able to look in to the person, understand their sign language and communicate to them in the same form.
The main objective of the project is to develop a program for an interpreting robotic system for example a Humanoid that can be used to address the challenges presented to the disabled people for effective communication especially in common places.
The recognizition of the sign language using Robots is an fundmental goal for any sign language interpreting system. This section provides the related reseach efforts that has been performed to address this objective.
According to Suarez and Murphy (2012), the important parameters or the factors to be considered for any type of sign recognition system include image acquisition, hand localization, pose estimation and gesture classification. Once the image is obtained from the depth camera, it is processed using various hand localization methods.
Temporal and spatial information of hands makes the hand tracking possible which in turn leads to dynamic gesture recognition. The research community most often uses the NITE body tracking middleware in combination with the Kinect SDK.
As the next step, various classification algorithms are used to categorize a particular sign or gesture. These type of algorithms are useful to segment the hand images and their tracjorties can be tracked and used as input so that it can make a prediction based on it. The most commonly used and the important classification method in any form of sign language recognition is the Hidden Markov Model (HMM). The HMM algorithm can be used for the manual gestures with temporal information and they have high classification rates when compared to other algorithms. Some other classification algorithm includes k-Nearest Neighbours which provides high classification rates for static poses in combination with some preprocessing, Support Vector Machines (SVMs) are also commonly utilized. In fact, in general the rapid development in the fields of Computer Vision and Machine Learning has helped advance HCI and HRI fields.
The other classification method that is widely used in the sign language recognition is neural network. The 2D ConvolutionNeural Network (CNN) implementation is performed by extracting a layer to feature map by using kernel window called local receptive field. The 2D CNN helps in decreasing the free variables and improves the generalization capability of the neural network. The most fundamental methods that are used in the 2D CNN is Padding and Strides. Padding helps to pad the edges with providing few additional fake pixels so that the kernel when sliding can allow the original pixels to be at its center while extending into the fake pixels beyond the edge, producing an output the same size as the input. The main help of the stride is to leave some of the slide locations of the kernel. In general, a stride of 1 means to take up the slides a pixel apart, this is nothing but each and every single slide will act as a standard convolution.
There are also other additional researches that are being performed using simple ANN for the recognition method. Few researches includes a combination of ANN and Elliptical Fourier Descriptor (EFD). The EFD can be used as the feature extractor that will be useful to describe the image in a 2D curve and each and every feature that is obtained using EFD will be unique from one another and this feature will be later recognized using ANN.
This paper does not foucus on advancing the state-of-the-art techniques in hand segmentation, localization and tracking but contributes with its approach of using previously developed gesture classification method to classify our dataset.
Initially we train the model using the existing data set to make the robot to understand the sign language. The robot that is proposed here uses depth camera in order to recognize the sign language of the disabled person and once the image is acquired it is processed using hand localization methods. Once the image is processed the robot will compare the processed image with the trained model and predicts the output.
The predicted output can be converted to a voice or text form and can be shared with the other person who cannot understand the sign language. Based on the person’s reply the robot will convert the text or voice into a sign language form and interpret to the disabled person in the same form. The reply can be done by using a 3D -printed robotic arm that translates the output in the form of the sign language.
Humans are the ones who have communication that is at the core of their characteristics. Communication in fact has many forms and the different languages across the world has helped the people to develop in an efficient manner. In the case of vocally challenged and hearing-impaired people they have a different type of language for communication and this is known as the Sign Language. Similar to different types of the language that is used by the normal people on different countries, this sign language also varies from country to country. Some of the widely used sign languages are American Sign Language (ASL), Indian Sign Language (ISL) etc. These type of sign languages consists of both isolated sign as well as continuous signs. An isolated sign referes to a single hand and for contionus signs as the name suggests it will be in a series of images that will be in continuous motion. The sign language helps those disabled people to improve their knowledge and also it helps those people especially the childrens to have a real chance at having a formal education. But in general, the lack of knowledge about the sign language among the normal people has made the life difficult for the vocally challenged and hearing-impaired people because of the fact that they could not able to properly communicate in public places like banks, schools, hospitals etc. Therefore, there is some level of automation is required in order to help these disabled people inorder to gain education and improve their lives in future.
The important part of the project is the Computer Vision which allows the robot to recognize the sign language thorugh the lens of the camera and then make logical interpretations. In this proposed project we use depth camera inorder to recognize the sign language. The computer vision technology has grown swiftly and with the help of the machine learning and artificial intelligence this technology has helped to solve huge number of real-world problems such as autonomous driving, image and speech recognition, facial recognition etc. The other important field where the computer vision technology has gained importance is in healthcare where it helped the physicians to diagnois various diseases. The use of the computer vision is in this project is to obtain the visial input and further process it. In order to perfrom this action there are number of libraries such as OpenCV, Point Cloud Library (PCL), ROS for robotic vision and MATLAB. In genral, I have used OpenCV for the continuous intergration into the python programs. The training and classification in OpenCV is performed with the help of the neural networks.
Convolution Neural Networks
The Convolution Neural Network (CNN) is a type of Artificial Neural Network (ANN). The filteration of the inputs and extracting the information is performed with the help of the convolution layers. The main function of the convolution neural network is to combine the data that is given as input (Also known as feature map) with a convolution kernel (known as filter) inorder to obtain a transformed feature map. The filters that are present in the convolution neural netowkrs layers are converted based on the parameters that are learned and this parameters will be extracted to get the most accurate information that can be used to perform the particular task. The advantage of the CNN is they automatically adjusts to obtain the most accurate features that are required to perform the task. The neural network functions in such a way that it filters the shape of an object when it performs a object recognition task but will extract the information of the color when it performs the animal recognition task. This function is because of the ability of CNN inorder to understand the different classes of the objects that will have different shapes but that different types of animals will likely differ in color than in shape. The convolution neural networks can be used in various applications that include natural language processing, sentiment analysis, image classification, video analysis, image recognition and when the convolution neural network is combined with the artificial intelligence then it can be used in applications such as virtual assistants, robots, and autonomous cars, drones , manufacturing machines and so on. The convolution neural network has been considered as the impoarnt part of the project that is discussed here because it is pretty usual for training our dataset of multiple gestures. The layers of the convolution neural network are input layer, output layer and the hidden layer which consists of the convolution layers, pooling layers, fully connected layers and normalization layers.
The other important part of this project is to replicate the sign language using the 3D printed Robotoics arms. After the robot recognizes the sign language of the vocally challenged or the hearing-impaired person using the depth camera,it will then process the pattern and predicts the accurate text for that pattern. This result will be conveyed to the normal person using text or speech. Once the message is conveyed the normal person will reply back to the message in either text or audio form. The robot will capture the reply from the normal person and convey the same to the disabled person in the form of the sign language. This action can be performed with the help of the 3D printed robotics arms. For this purpose, we can make use of Project Aslam which was developed by the engineering students of the University of Antwerp.
In order to train the robot, we need to create a dataset so that it can perform the desired task. In this project we need to create a dataset for each letters from A-Z and for the numbers from 0-9. Initially I tried to capture my own dataset by capturing the images using a camera but unfortunately it did not turn good and there was a shortage of time therefore I made use of the existing datasets that are available in the Internet. The images are converted into numerical form using the numpy library which consists of the mathematical functions. These mathematical functions are used to convert each fram into a matrix containing values for each pixel in the image.
Detailed submission guidelines can be found on the author resources Web pages. Author resource guidelines are specific to each journal, so please be sure to refer to the correct journal when seeking information. All authors are responsible for understanding these guidelines before submitting their manuscript. For further information on both submission guidelines, authors are strongly encouraged to refer to http://www.computer.org/portal/web/peerreviewjournals/author.
For papers accepted for publication, it is essential that the electronic version of the manuscript and artwork match the hardcopy exactly! The quality and accuracy of the content of the electronic material submitted is crucial since the content is not recreated, but rather converted into the final published version.
All papers in IEEE Computer Society Transactions are edited electronically. A final submission materials check list, transmission and compression information, and general publication materials can be found at:http://www.computer.org/portal/web/peerreviewjournals/author.
As demonstrated in this document, the numbering of sections is upper case Arabic numerals, then upper case Arabic numerals, separated by periods. Initial paragraphs after the section title are not indented. Only the initial, introductory paragraph has a drop cap.
IEEE Computer Society style is to note citations in individual brackets, followed by a comma, e.g. “, ” (as opposed to the more common “[1, 5]” form.) Citation ranges should be formatted as follows: , , ,  (as opposed to -, which is not IEEE Computer Society style). When citing a section in a book, please give the relevant page numbers . In sentences, refer simply to the reference number, as in . Do not use “Ref. ” or “reference ” At the beginning of a sentence use the author names instead of “Reference ,” e.g., “Smith and Smith  show … .” Please note that references will be formatted by IEEE Computer Society production staff in the same order provided by the author.
Cite this Essay
To export a reference to this article please select a referencing style below