We use cookies to give you the best experience possible. By continuing we’ll assume you’re on board with our cookie policy

Cyberbully Detection with Hierarchical Temporal Attention Network

The whole doc is available only for registered users

A limited time offer! Get a custom sample essay written according to your requirements urgent 3h delivery guaranteed

Order Now

Introduction and Problem Statement

Cyberbullying can occur through SMS, Text, and apps, or online in social media, forums, or gaming where people can view, participate in, or share content. Cyberbullying includes sending, posting, or sharing negative, harmful, false, or mean content about someone else. Cyberbullying can harm the online reputations of everyone involved – not just the person being bullied, but those doing the bullying or participating in it. National Crime Prevention Council reported that more than 40% of teenagers in the US have suffered from being cyberbullied. Multiple studies highlight the various negative effects of cyberbullying which include deep emotional trauma, psychological and psychosomatic disorders etc. Hence, to keep the online space safe, it is vital to detect and mitigate cyberbullying.

Cyberbullying is a continuous temporal phenomenon rather than one-off incident. Hence it makes sense to not only characterize cyberbullying event based on the textual content, but also use temporal characteristics of the event. This project aims to combine several temporal and textual context of such cyberbullying events to identify cyberbullying. We use the temporal characteristics given by Soni et al., as they show promising results when combined with textual features. Further, social features, are taken from a highly successful rumor-detection system. These features will be incorporated through an attention mechanism to a hierarchical bi-directional long short-term memory model (H-BLSTM) that is trained using the actual textual content. We use a recurrent neural network (RNN) as they easily learn the latent textual representation in a time-series stream. Thus, through various types of information about the event, we use a neural network to identify a cyberbullying incident. We use an Instagram based dataset that contains 678 bullying and 1540 non-bullying events.

Formally, we define our problem as follows:

Given – A dataset of sessions which are Instagram posts. Each session consists includes the submitted image (as an URL), social information (number of followers and follows for the original poster, and number of shares for the image), and the associated textual comments. Each session is also hand-labelled as representing cyber-bullying or not.

Problem – Train an H-BLSTM with textual data from comments and a combination of social and temporal features as attention mechanisms. Classify new events as showing cyber-bullying. Further, compare the model’s accuracy, precision, recall, and F1-score with scores provided by a related technique on the same dataset. Additionally, get the minimum number of comments required to detect cyber-bullying thus facilitating an early cyberbullying detection.

Proposed Method

We are implementing a hierarchical network with social attention for cyberbully detection (HSA-BLSTM). We first model the bully event as hierarchical time-series containing different semantic levels of information. Each event can contain several posts, these posts are nothing but re-posts and comments. Each post is further segmented into several words. A structured event is then fed into the hierarchical Bi-LSTM network. Social and temporal features are utilized as another clue to identify the prominent part of the bully. We implement these features via the attention mechanism in Bi-LSTM to obtain an accurate representation for bully detection.


An event contains a source post attached with a number of related posts (reposts and comments). For computational efficiency, we divide the posts into different time intervals and each interval is considered as a sub event. Each sub-event consists of ordered set of posts and each post contains a set of words. Besides the hierarchical textual information, we also extract social features and temporal features listed in table below. Formally, cyberbully detection on event-level aims to learn a projection F (e, s) → {0, 1}, where 0 and 1 indicate labels for non-bully and bully, respectively.

Hierarchical Network with Social Attention

To obtain the representation of an event, we model it via a hierarchical structure. This event representation constitutes of several parts: word level part containing a Bi-LSTM layer and attention layer, post level part containing a Bi-LSTM layer and a social feature attention layer, sub-event level part containing a Bi-LSTM layer and a social feature attention layer.

Word level: Embed every word into a low-dimensional semantic space. Feed this word vector to Bi-LSTM to generate forward and backward hidden states. Concatenate the forward and backward hidden states to get better representation. A score function is applied to give more weightage to important words. We then normalize the weight of each word using a softmax function. A final post representation is generated by calculating the weighted sum of hidden state vectors. Parameters of this score function are randomly initialized and jointly learned during the training process.

Where W_hw denotes the weight matrices and b_w is the bias term and F_w is the score function in word-level, which measures the significance of each word. Afterwards, we obtain the normalized weight of kth word α_(i,j)^k via a softmax function and compute the representation of jth post in ith sub-event p_(i,j) as a weighted sum of hidden states. The matrices W_hw is randomly initialized and jointly learned during the training process.


Inspired from, we implemented a hierarchical LSTM network with temporal attention for detecting cyberbullying. We used hierarchical LSTM to model a given events at three semantic levels. Since cyber-bullying is a continuous temporal phenomenon, we characterize cyberbullying event based on the textual content, but also use temporal characteristics of the event. Extensive experiments conducted on Instagram dataset show that the proposed model (H-BLSTM) can significantly outperform other state-of-the-arts models.


  1. Dinakar, Karthik, et al. ‘Common sense reasoning for detection, prevention, and mitigation of cyberbullying.’ ACM Transactions on Interactive Intelligent Systems (TiiS) 2.3 (2012): 18.
  2. Soni, Devin, and Vivek Singh. ‘Time Reveals All Wounds: Modeling Temporal Characteristics of Cyberbullying.’
  3. http://www.grantjenks.com/docs/wordsegment/
Related Topics

We can write a custom essay

According to Your Specific Requirements

Order an essay
Materials Daily
100,000+ Subjects
2000+ Topics
Free Plagiarism
All Materials
are Cataloged Well

Sorry, but copying text is forbidden on this website. If you need this or any other sample, we can send it to you via email.

By clicking "SEND", you agree to our terms of service and privacy policy. We'll occasionally send you account related and promo emails.
Sorry, but only registered users have full access

How about getting this access

Your Answer Is Very Helpful For Us
Thank You A Lot!


Emma Taylor


Hi there!
Would you like to get such a paper?
How about getting a customized one?

Can't find What you were Looking for?

Get access to our huge, continuously updated knowledge base

The next update will be in:
14 : 59 : 59