Weekly Update - Fadul Sikder

No update was provided for the week ending 2025-06-25.

Fadul:

Hello everyone, Most of last week I was away visiting New York, and due to flight delays, I only returned home late last night. As for the next action item, developing the graph traversal algorithm for our exploration task, I’ve been outlining a few potential approaches. Unfortunately, I haven’t had much time to work on them, so I don’t have any concrete results yet. However, I would be working the next few hours on the algorithm and preparing for our upcoming meeting.

No update was provided for the week ending 2025-06-11.

No update was provided for the week ending 2025-06-04.

Fadul:
Action Item:

This week, I worked with the Mega-Vul real-world vulnerability dataset and generated explanation scores across several examples.

Results and Insights:

When working with real-world programs manual analysis of explanation scores becomes significantly more challenging and less manageable due to increased program complexity and branching logic.

I manually analyzed 3 examples and observed the following key patterns:

Explanation scores consistently highlight vulnerable lines of code with high importance values.

Non-vulnerable lines, especially in complex branches, often receive moderately high scores as well, which may introduce ambiguity and lead to explore a complex path

In traversing the graph representation of these programs, the process turns into a decision-making task only at branch point:

In linear control flows, all lines are generally included.

At branching points (e.g., conditionals), we must choose which path to explore.

My current heuristic compares the cumulative explanation score for each branch (optionally scaled by path length) and selects the one with the higher score. In most examples, this strategy leads to the vulnerability

I believe this line of investigation should continue to help answer a key question: Aside from explanation score magnitude and path length, is there another meaningful heuristic derived from this explanation score to guide decision-making at branching points?

Current Conclusion:

Thus far, I have not identified any counterexamples that undermine the explanation-to-path methodology.

My manual analysis up-until now support the hypothesis that high-score nodes from GNN explanations correspond to vulnerability-relevant code,.

Plan and Next Steps:

I have a few implementation-specific questions to discuss with Prof. Ji during today’s meeting.

I am currently compiling a comprehensive write-up of findings, including annotated examples.

Due to the size and complexity of real-world C programs, the manual analysis has been time-consuming. However, I aim to complete the full report by the next meeting, and present a clear summary of insights for group discussion.

Next, I would start building up basline graph traversal algorithm.

Fadul:

Action Item Completed:

Completed four examples from the Juliet Test Suite using the explanation-to-path workflow.

Successfully set up and began training with the Mega-Vul real-world vulnerability dataset.

Ongoing Action Items:

Working on a baseline algorithm for path generation given a graph annotated with explanation scores.

Continuing exploration with more diverse, real-world code examples to test the generalizability of the approach.

Results / Insights:

After analyzing the four completed examples, I’ve observed that the explanation-to-path transitions largely follow the pattern discussed during Tuesday’s meeting. I have also identified some additional patterns, but I need more time to interpret and validate their significance in the broader context of explanation alignment and vulnerability localization.

Next Steps: I am slightly behind on the write-up but plan to complete and deliver the full report by Sunday night. The report will include observations from the Juliet examples, preliminary findings from Mega-Vul, and updates on the baseline path generation algorithm.

Fadul:

Have updated this accordingly

No update was provided for the week ending 2025-05-07.

No update was provided for the week ending 2025-04-30.

Fadul:

Hi Group,

Action Items:
Generating and Interpreting Explanation Scores

Objective: Generate explanation scores using a trained GNN model, examine statements exceeding a defined importance threshold, and evaluate their connection to known vulnerabilities.

I have trained a GNN model and explored several libraries to generate GNN explanations. Using one of the libraries, I produced an initial set of explanation scores. However, the interpretation of these results is still unclear, and I need to conduct further analysis to draw concrete conclusions.

I was slightly delayed in finalizing the PROMISE 2025 camera-ready version, but I completed it last night. I plan to send the final version to Professors Lei and Ji within the next two hours.
Replies:
Dr. Lei:

if you run into any technical issues with gnn explainers, you need to ask prof. ji, who is an expert on this topic.

Dr. Lei:

also, it seems that few times you can make your own deadlines. please reflect on this and discuss it in today’s or thu’s meeting

Fadul:

So far this week, I have been working on my GNN explanation-to-test-case generation project. I prepared the baseline proposal before Thursday’s meeting. Following that, I have been working on the action items, specifically generating explanations and mapping them back to known vulnerable contact to validate how this generated explanations may look like whether they can catch specific known patterns of a vulnerable contract. Additionally, some questions have been raised regarding GNN explanation ground truth and how it factors into my projects. I am researching these comments and plan to have some answers before Thursday’s meeting. Furthermore, I am working on the PROMISE 2025 paper. My initial target date was Monday, but I am slightly behind and will complete it by Friday night.

Replies:
Dr. Lei:

sounds good. try to work out some concrete examples and discuss any technical issues on teams as they arise.

Fadul:

This week, I continued working on the GNN explanation-to-test-case project, where I dod couple of manual example and presented a high-level overview of the approach based on that during Friday’s meeting. Based on the feedback received, I am now working to concretize the approch to a baseline and begin writing it up. My goal is to share a draft of this write-up with the group by Wednesday night.

On the implementation side, I’ve been worked with training the GNN model and generating explanations.

Additionally, I started addressing some of the reviewer comments on my PROMISE 2025 paper.

Replies:
Dr. Lei:

sounds good. please do send the draft by Wed night so that the meeting with Prof. Ji can be more productive. also, i plan to attend PROMISE and i have another paper in FSE which is co-located with PROMISE.

Fadul:

This week, I have continued working on the GNN explanation-to-test-case project. Based on the feedback from Thursday’s meeting, I have been manually testing some example cases to explore how explanations can inform test case generation.

On the implementation side, I’ve made further progress but still have pending action items, such as setting up a symbolic execution tool to support the project pipeline.

Additionally, I reviewed the reviewer comments on my accepted PROMISE 2025 paper and prepared a response document accordingly.

Replies:
Dr. Lei:

sounds good. try to present one or more examples you are manually doing and also try to identify a few obstacles in the process.

Fadul:

This week, my main focus has been on training a graph neural network (GNN) for vulnerability classification. I have been reviewing relevant literature and exploring existing repositories that implement GNN-based models. From my investigation, it appears that reproducing full research pipelines from these repositories is more complex than building a GNN from scratch, so I am currently working on developing a baseline implementation.

In parallel, I have been studying Professor Ji’s Illuminati paper and analyzing its associated codebase to understand how its methods could be adapted into my own work.

I have defined the following three short-term goals:

Action 1: Train a GNN that can perform graph-level classification as a baseline model.

Action 2: Explore how node-level explanations can be generated using the trained GNN and demonstrate a few examples for discussion with the group.

Action 3: Create a baseline approach for using these explanations to generate test cases that target specific vulnerabilities.

My aim is to make progress on all three items before Thursday meeting.

Fadul:

Weekly Research Update

Graph Neural Network (GNN) Implementation

This week, I have been focusing on implementing graph neural networks (GNNs) and exploring different methods of transforming a program into a graph representation. One approach I identified involves first converting the program into a relational database and then constructing the graph from that data. I am currently evaluating the validity and effectiveness of this approach.

My main objective is to build a GNN model capable of making line-level vulnerability predictions and train it to improve performance. Additionally, I have been reviewing repositories that utilize GNNExplainer to better understand their methodologies and applications.

Action Items for the Next Phase

I have been defining specific action items for the next steps of my project. My plan is to:

Finalize and refine these action items.

Share them before Thursday’s meeting for discussion and feedback.

New Research Idea Presentation

I have also prepared for my new idea presentation this week, focusing on the DeepSeekMath paper, which introduced Group Relative Policy Optimization (GRPO). GRPO is a reinforcement learning-based post-training method designed to improve reasoning capabilities in large language models (LLMs).

My preliminary idea explores whether GRPO can be applied to symbolic execution to enhance program analysis. While I haven’t found prior work explicitly applying GRPO in this domain, I am conducting a deeper analysis to assess its feasibility. This will be the key topic of discussion in my new idea session next week.

Next Steps

Continue refining my GNN implementation and validate the relational database approach.

Post the finalized action items before Thursday’s meeting.

Replies:
Dr. Lei:

before thursday’s meeting, try to think more about the topic we discussed earlier, i.e., how to take explanations produced by GNNExplainer or other types of explanations to produce actual test cases to verify if the explanations are valid.

in the future, please prepare slides for technical discussions.

Fadul:

This week, I submitted a paper for PROMISE 2025 on Saturday morning, which kept me occupied with the submission process. After that, I began exploring research papers related to a new research idea I am considering. My focus is on developing a project that post-trains LLMs with reinforcement learning for security analysis using symbolic execution. I plan to present this idea and discuss relevant papers in my new idea session next week.

Research Directions

Extending My Work from PROMISE 2025

Define action items based on my current solution from the PROMISE 2025 submission.

Refine and extend the idea into an approach-level framework for further development.

Exploring Graph Neural Networks (GNNs) for Vulnerability Classification

Use graph neural networks (GNNs) to classify vulnerabilities.

Apply GNNExplainer to interpret the classification results and generate test cases based on those explanations.

Next Steps

For both of these research directions, my immediate task is to outline concrete action items. I aim to have a draft and post them by tonight. Once posted, I will review and discuss them further to determine which direction to prioritize as my primary focus.

Replies:
Dr. Lei:

first, congrats on your first submission! the new research ideas look good. try to do some reflection on the past project and share with the group.

Fadul:

Hello everyone,

This week, I have been focused on finalizing my research paper. Additionally, I trained more models, as we discussed in our meetings, to provide a more comprehensive analysis of how freezing different layers affects LLM performance.

Initially, I estimated that I would complete the paper by Friday night. However, as I have been piecing together the writing, I realized that I had underestimated the timeline. Specifically, I spent a significant amount of time on the Introduction and Related Work sections and mappings of their citations—for better or worse. I am now aiming to complete a version by tonight.

—Fadul

Replies:
Dr. Lei:

it is not uncommon to underestimate the timeline. this is why it is important to budget some buffer time; otherwise, you can easily miss some deadlines. also, it is important for you to respond to emails timely. i think dr. Ji is waiting for your response as of now.

Fadul:

This week, I primarily focused on running experiments. I have obtained some partial results, including training the LLaMA 3.2 model with specific layer-freezing configurations that we have been discussing over the past few days. Additionally, I am running experiments on other models and expect to receive their results later this week.

Aside from experiments, I have also been working on submissions. The abstract submission is due tonight, and I have prepared an initial draft, which I will be sending to Dr. Lei and Dr. Ji within the next few hours. In today’s meeting, I plan to discuss these results in details.

Paper Submission Plan

With the paper submission deadline set for next Tuesday, my plan is as follows:

By Wednesday night or Thursday afternoon – Complete the initial draft of my paper.

Receive and incorporate feedback while obtaining the final set of experimental results. I anticipate having the complete set of results between Thursday to Friday. These results will include additional model variations currently being trained.

Weekend – Revise and refine the paper based on the feedback.

Replies:
Dr. Lei:

sounds good. try your best to make your first submission

Fadul:

This week, I focused on training and fine-tuning models by selectively unfreezing different layers. Some initial results were presented on Thursday. Following further discussions on Thursday and Friday, I reevaluated how to approach the research questions.

As part of this process, I conducted a literature review to better understand existing methods for layer-specific fine-tuning. I compiled my findings into a PDF, which I will share with Professor Ji and Professor Lei via email.

One key takeaway from my research is that layer-specific fine-tuning is already well-studied in existing work. As a result, I have to reconsider how to position my research contribution. Rather than framing it as a new research question, I need to refine the paper’s scope to present it in a more general yet impactful way.

My next steps involve completing the experiments and go over the paper the previous draft of the paper by Wednesday night.

Replies:
Dr. Lei:

challenge yourself to make your first submission on time. if possible try to include more LLMs in your experiments. but focus on one LLM at a time

Fadul:

Hello Everyone,

This week, I have been running several experiments that were planned based on our discussions in Thursday’s meeting. Additionally, I am working on the experimental design and formulating the rationale behind key decisions.

Additionally, I am conducting small-scale experiments on fine-tuning different layers of the LLaMA 3.2 model. Currently, I am in the process of gathering and analyzing the results.

My primary goal is to complete the experiments related to the first two research questions and report the results. In parallel, I aim to finalize the experimental design and decision-making section by Wednesday night.

Replies:
Dr. Lei:

in your writeup, try to clearly formulate the research question(s), a high-level plan to investigate the research questions, and then specific design decisions you make with justifications (w.r.t. the research questions/high-level plan).

also, it would be best to do the experiments for more than one open-source LLM if possible.

Fadul:
Milestone: Promise for February 24
Action Items and Updates:
Implementation Review and Debugging:

This week, I re-ran the last set of experiments from the previous framework using a new implementation. However, the results obtained so far are inconsistent and do not make sense. This indicates that further investigation is required to ensure the implementation is correct. My immediate focus is to debug and refine the implementation to achieve meaningful results.

Exploration of Research Questions:

Based on Professor Lei’s suggestions, I have identified and categorized three key research questions for further exploration:

Experiment 1: Layer-Specific Weight Updates in Neural Networks

I started exploring experiments involving weight updates in specific layers of LLIM (Layer-Level Incremental Modification). To support this, I reviewed LoRA (Low-Rank Adaptation) techniques and documented my findings on Slack. The current library allows selective training of specific attention layers. The goal is to determine if updating specific layers produces different results, which will form the basis of the first research question.

Experiment 2: Encoder-Decoder vs. Encoder-Only Models

This research question investigates how encoder-decoder models compare to encoder-only models, such as StarCoder vs. ModernBERT, in detecting vulnerabilities. I plan to implement and analyze their performance on vulnerability classification tasks by incorporating a custom classification head into the model architectures.

Experiment 3: Comparing Pre-Trained Models for Token Generation TasksThis experiment evaluates the performance of state-of-the-art models like OpenAI’s GPT and open-source LLaMA in zero-shot prompting tasks. The experiment involves providing these models with source code and prompts, specifying a particular class to be generated. The goal is to compare their output quality and report the results. My assumption is that the results would be not that good. Because as zero-shot prompting inefficiency have lead to involving a chain-of-thought approach and diverse prompting strategies, as detecting vulnerabilities in zero-shot scenarios is inherently challenging.

Additionally, I am exploring two other experiments possibility:

Using a reasoning model like GPT-01 Mini for zero-shot prompting. Specifically the need for CoT had led to this new idea of reasoning model where the model can generate its own reasoning steps at inference time for any zero-shot prompting tasks.

Fine-tuning a large language model (LLM) with a next-token generation objective to predict the desired class as part of the token sequence, without relying on a custom classification head.

Next Steps:

My goal is to complete as much of the experiments as possible and present preliminary results by Wednesday night.

Replies:
Dr. Lei:

considering the time constraint, it is important to prioritize, in terms of what would be most important for you to make the submission. for example experiment 3 is probably not important for this paper, unless you can do it very quickly.

Fadul:

Action Items, Updates, and Paper Submission Plan:

Exploring the Use of GNNExplainer for GNN-Based Vulnerability Detection

Conducted more literature review on smart contract vulnerability detection and GNN-based approaches.

Based on the review, I am determined that the MONDO project as the most suitable baseline if I extend existing GNN-based approaches for explanation purposes.

I have initially explored the GNNExplainer approaches and summarized findings in an attached document(in the email to Prof. Ji). The review revealed several techniques for explaining GNN models.

Key Decision Point: Should I conduct an in-depth exploration of all approaches and build reasoning around which works best, or select a state-of-the-art method without focusing on its specificity to my problem? This decision will be revisited after further research.

Next Steps: Perform additional research and attempt to finalize this decision by Thursday or bring the question to the meeting for discussion.

Contrastive Learning Paper Insights

The Key insight Contrastive Learning Paper**paper highlights that existing smart contract vulnerability detection (SCVD) methods treat each contract as an isolated entity, neglecting the correlations and relationships between different contracts. The proposed contrastive learning approach aims to capture these fine-grained inter-contract correlations

My objective to investigate how this insight can be integrated into my baseline.

PROMISE 2025 Submission Plan for Fine-Tuning LLM Paper
Current Context:

The framework used for the initial version of my implementation faces dependency issues and is no longer operational.

Two alternative frameworks have been identified for reimplementation.

The key issue in the previous paper was suboptimal results. To address this, I will re-run experiments with the following action items:

Increase Model Capacity: Explore whether expanding the model’s parameter improves training accuracy.

Compare Models: Study how decoder-based models (e.g., StarCoder) differ from encoder-based models (e.g., ModernBERT) in terms of embeddings. I truely want to have a depper understanding on this question. So, I am going to use ModernBert alongside Starcoder to run the experiments.

Experimentation: Conduct experiments using both types of models.

Paper Submission Plan:

Phase 1: Complete experiments and review results – Deadline: January 31st.

Phase 2: Conduct further experiments, review results, and prepare a revised version of the paper – Deadline: February 6th.

Internal Completion: Deliver a completed internal version of the paper – Deadline: February 13th.

Abstract Submission: Submit the abstract – Deadline: February 18th.

Paper Submission: Submit the final paper to PROMISE 2025 – Deadline: February 25th.

Replies:
Dr. Lei:

extremely well-written report. this sets an example for how a weekly report to be written. great job!

one suggestion: between now and the submission deadline, your top priority should be on the PROMISE submission. make a commitment: do whatever it takes to make it. this will be our last effort on this project. so don’t leave any regret, i.e., try to do whatever you can reasonably think of to save this project.

Fadul:
Milestone: USENIX Security ‘25
Key Dates:
Milestone Date: January 22
Completed Tasks:

Summarized major papers on contrastive learning and Graph Neural Networks and documented the findings for Dr. Lei and Dr. Ji.

Conducted experiments to explore how the vector representations in the embedding space can be projected to analyze the embedding distribution of the source code dataset.

Performed a literature search on papers related to tabular data generation and identified a few state-of-the-art works. Finalized that i would read and summerize 8 paper of importance. then, I have Summarized two of them. Target to have a new idea presentation in this domain.

Tasks to Do:

Finalize and submit an improved proposal on contrastive learning-based vulnerability detection to Dr. Lei and Dr. Ji by Wednesday.

Generate figures to better understand the embedding distribution of the dataset.

Continue summarizing additional identified papers related to tabular data generation.

Replies:
Dr. Lei:

very good. i think the writeups you are providing are really good and help make the discussions more productive. keep it up.

Fadul:

Hello everyone,

I’ve been a bit under the weather this week with flu-like symptoms, but I’m almost recovered. Despite this, I’ve been consolidating my research findings and plan to send them to Dr. Lei and Dr. Ji by the end of today.

Experimentation on Fine-tuning Smart Contracts

The fine-tuning process for the smart contract model we worked on for our last paper yielded suboptimal results. To address this, I’m investigating how transformers propagate information, specifically focusing on the embeddings of the last token. Typically, the last token embedding aggregates all prior information from the sequence. A key observation is that subtle changes in smart contract—such as variations in line order ( Reentrancy)or small syntactical differences(Interger Overflow)—can introduce vulnerabilities. To assess whether the model is learning these distinctions, I plan to analyze the cosine similarity of the embeddings for different contracts by comparing the last token embeddings: Small cosinesimilarity between vectors suggests ineffective learning, indicating the model fails to distinguish between vulnerable and safe contracts and larger distances imply the model is effectively capturing discriminative patterns, critical for identifying syntactical differences that lead to vulnerabilities.

While reviewing the literature on graph-based neural network approaches for vulnerability detection in smart contracts, I’ve noticed: no work has applied Graph Isomorphic Networks (GIN) in this domain. In theory GIN has stronger discriminative power compared to Graph Convolutional Networks (GCN) or Graph Attention Networks (GAT). One potential direction is to explore the application of GIN for vulnerability detection. Since this direction is relatively unexplored, it may present an opportunity.

Fadul:

Over the past few days, I have been reviewing literature to understand how graph-based neural networks, particularly those utilizing attention mechanisms, are being applied to smart contract vulnerability detection. Below is a summary of the methodologies I came to find:

Multi-Relational Nested Graph Convolutional Network (2023):

Proposes a GNN-based method enhanced with self-attention mechanisms.

Examines both inter-function (call graphs) and intra-function (control/data flow) relationships.

Introduces Multi-Relational Nested Graphs to effectively locate vulnerabilities in Solidity smart contract functions.

VulDet: Graph Attention Networks for Vulnerability Detection (2023):

Constructs contract graphs with node features tied to vulnerabilities.

Utilizes Graph Attention Networks to detect vulnerabilities such as reentrancy and timestamp dependency.

MANDO-GURU: Heterogeneous Graph Embeddings (2022):

Employs heterogeneous graph attention neural networks for Solidity vulnerability detection.

Combines control-flow and call graphs to encode both structural and semantic relationships at contract and line levels.

Combine Sliced Joint Graph with Graph Neural Networks (2023):

Adopts a Bidirectional Gated GNN model incorporating hybrid attention mechanisms.

Leverages program slicing, along with AST, CFG, and PDG graphs, to improve vulnerability detection performance.

SCVHUNTER: Heterogeneous Graph Attention Network (2024):

Builds semantic graphs representing smart contract features.

Implements attention mechanisms to prioritize critical nodes, enabling detection of multiple types of vulnerabilities.

Current Focus and Next Steps:

While I am still in the exploratory phase, I aim to narrow down the approaches and finalize key methodologies soon. The field s seems quite crowded little bit confuse which would be the best direction to take

I am working on a proposal document detailing two potential approaches, which I plan to complete by tomorrow evening. This document will serve as a basis for a more structured discussion during our Thursday meeting.

The ultimate goal is to identify actionable steps and align on a strategy moving forward.

Replies:
Dr. Lei:

please try to write down your findings and send to the group before our meeting on Thu

Fadul:
Milestone: USENIX Security ‘25
Key Dates:

Milestone Date: January 22

Internal Deadline: December 15-21

Weekly Updates:

Application of Slicing Criteria for Feature Space Reduction

I have been exploring how slicing criteria can effectively reduce the feature space. This analysis focuses on assessing its adaptability to various types of vulnerabilities.

Literature Review on Contrastive Learning and Graph-Based Neural Networks

I reviewed a paper on contrastive learning (only paper that is out there for Smart Contract Vulnerability Detection with this method), which proposes a mechanism to enrich the feature space, allowing for improved learning outcomes. I am currently investigating how this approach can be adapted and integrated into my ongoing work on program slicing for vulnerability detection.

My review of 4 papers from the past three years for GNN based Smart Contract Detection:

One paper highlights their GNN designs that prioritize general-purpose feature engineering, remaining largely agnostic to specific vulnerabilities.

The other three papers incorporate domain-specific expert knowledge into GNN architectures to enhance performance.

Replies:
Dr. Lei:

one of the two new directions we discussed is not about GNN. instead it is about GNN-Attention networks, i.e., combining GNN with Attention mechanism or pretrained LLM.

Fadul:
Milestone: USENIX Security ‘25
Key Dates:

Milestone Date: January 22

Internal Deadline: December 15-21

Tasks Completed This Week:

I have been analyzing the program slicing process with a focus on Solidity smart contracts. Building on insights from a 2024 paper that employs program slicing to detect reentrancy vulnerabilities, I aim to refine and expand its approach. Specifically, my work involves:

Program Dependency Graph Creation:The 2024 paper begins by constructing a program dependency graph, which serves as the foundation for identifying slicing criteria.

Defining Slicing Criteria for Reentrancy Vulnerabilities:The paper identifies specific programming statements related to external calls as the starting point. It then backtracks through the dependency graph to trace all relevant lines and their dependencies.

Extending the Criteria to Additional Vulnerabilities:I am exploring how these slicing criteria and backtracking processes can be adapted to slice up program for a second type of vulnerability. My objective is to evaluate how much input reduction can be achieved by applying this methodology across two vulnerability definitions.

I am documenting these slicing criteria approach in details. I aim to share the document by midday. Additionally, I am implementing these criteria to assess their effectiveness in put size reduction.

Projects Page: https://fadulsikder.github.io/portfolio/

Fadul:
Milestone: USENIX Security ‘25
Key Dates:

Milestone Date: January 22

Internal Deadline: December 15-21

Tasks Completed This Week:

Conducted a literature review on feature engineering research focused on Solidity smart contracts and program slicing.

Manually created example contracts, marking lines of significance. Currently, I am exploring vulnerability-specific detection methods to do this task.

Pattern Identification in Vulnerabilities:

I can think of two ways to move forword One approach to vulnerability identification involves recognizing common patterns in abstract domains, such as Abstract Syntax Trees (AST), Control Flow Graphs (CFG), or taint analysis chains. Existing studies uses state variables or opcode-level similarities or some other form of similarities for detecting vulnerabilities. Another potential strategy is to apply a state-of-the-art approach to my data and assess its effectiveness in identifying key code segments of interest with out delving deep first.

Fadul:

Hello everyone,

Action Items Completed:

Finished a complete draft of my fine-tuning paper on LLMs.

Next Steps: I have three main action items identified going forward:

Conduct experiments with a new model and configuration on a custom head.

Explore feature engineering approaches to refine the input space representation of smart contracts, making it more targeted.

Rewrite parts of my experiments with a more detailed approach to improve interpretation and debugging, especially where I encountered issues previously. I already started this rewriting phase after Friday.

Upcoming Goals: My immediate goal is to conduct a literature review and compile research on feature engineering. I plan to approach this in two phases:

Phase 1: Meta-level paper collection and initial analysis by this Thursday.

Phase 2: In-depth exploration, with findings ready to present by next Thursday.

While I haven’t yet decided on a specific conference date for my next Milestone, I aim to finalize this choice by my next update.

Replies:
Dr. Lei:

sounds good. one suggestion is perhaps looking beyond vulnearability detection. there is a lot of work on vulnerability detection, which is kind of crowded. are there any other problems to address with smart contracts, e.g., fault localization, contract synthesis (i.e., automatically construct a smart contract based on some kind of user description), and others?

Fadul:
Milestone: Submission to ISSTA on October 31st

This week, I have focused on conducting baseline experiments and setting up additional ones to observe the results. I have completed about 70% of the experiments section but am still awaiting baseline results from experiments currently running on TACC. Additionally, I have drafted the related work section, which took more time than anticipated. I haven’t updated the Overleaf document yet but plan to do so by this evening or tomorrow afternoon. Finally, I have addressed all technical comments from the last draft and adjusted the writing accordingly.

Fadul:

Next Milestone: ISSTA Paper Submission on Oct 31.

I have been working on the experiments and completing the remaining tasks. I obtained one set of results: for binary classification, an accuracy of 67%, and for multi-label classification, 48%, across a total of 17 classes with five vulnerabilities. Here is the current dataset distribution:

safe: 3588

ARTHM: 2487

ARTHM+LE: 882

LE: 580

TimeO: 488

ARTHM+TimeO: 346

RENT: 313

ARTHM+RENT: 92

RENT+TimeO: 71

LE+TimeO: 66

TimeM: 62

LE+RENT: 48

LE+RENT+TimeO: 29

ARTHM+RENT+TimeO: 26

ARTHM+LE+TimeO: 26

ARTHM+LE+RENT: 25

I believe some of the multi-label classifications are too many for the limited number of contracts available. I already reduced the original 37 labels to 17, but this still may be too many. Therefore, I aim to reduce the number of classes further and run another set of experiments with 7 labels.

The revised dataset distribution is as follows:

safe: 3588

ARTHM: 2487

ARTHM+LE: 882

LE: 580

TimeO: 488

ARTHM+TimeO: 346

RENT: 313

I am currently running this new set of experiments and aim to complete the updated version by tomorrow night.

Replies:
Dr. Lei:

even though the results are not as good as expected, still try to make the submission with the results you have.

Fadul:
Next Major Milestone: Submit the paper to ISSTA 2025 - October 31st
Action Item Completed:

Initially, I planned to complete the environment setup for the training pipeline by Friday. However, resolving dependency conflicts proved more complex than anticipated, requiring manual inspection and management of numerous packages. I eventually completed this by Monday afternoon.

Current Experiments in Progress:

Conducting binary classification and multi-label classification training on ScrawlID Dataset (9K Contracts) dataset

Ablation Study: Evaluating the impact of excluding preprocessing on contracts using my approach.

Gathering data on token lengths and GPU requirements.

Dataset-specific Experiments: Using my composed dataset exclusively as a test set for a top-performing pre-trained model.

Dr. Lei, when you have a moment, could you please review my current draft?

Replies:
Dr. Lei:

is your latest version on overleaf? time is running out, and this is a deadline that you cannot miss. you need to have an internal deadline, say have a reasonable, complete version latest by one week before the submission.

Fadul:

Next Major Milestone: Submit the paper to ISSTA 2025 - October 31st

Action Items Completed:

Reviewed the previous draft and addressed initial comments.

Rewrote and reconnected missing scripts. This task is nearly complete, but I am still encountering some runtime errors that prevent the pipeline from running as before. Further debugging is needed.

Actions to be Completed:

By Friday, I plan to complete the experiment setup and run the same tests on the ScrawID dataset.

Once I obtain the results and initial analysis, I will fully define the scope of experiments to be conducted and how those can be reported into the paper.

After receiving feedback on the current draft, I aim to complete another revision pass by next Tuesday.

Replies:
Dr. Lei:

sounds good. just want to say, time is very tight. take it as challenge to you to make the submission.

Fadul:

Next Milestone: October 1st – Complete the current version by October 4th.

Actions performed: Set up the experiments, but experienced a setback due to unfortunate data deletion. I am hoping to resolve this issue by today or tomorrow and get everything up and running again. Wrote the latest version addressing the comments. However, some data support and diagrams still need to be added, as suggested. I aim to finish this version by October 3rd with all the result.

Actions for the next meeting: Complete binary and multi-label classification experiments and generate other data statistics and incorporate them in paper.

Fadul:

Next Milestone: October 1st – Complete the current version.

Actions performed:

Conducted binary classification on the old dataset.

Written a script to scrape Solidity source code from Etherscan with SrcawlID dataset contract adresses

With the source code, prepared a file with appropriate labels and created a CSV file to carry out experiments

Actions for the next meeting:

Complete binary and multi-label classification experiments and present results on Friday.

Review the draft version with necessary edits.

Replies:
Dr. Lei:

i made comments on your draft. if you want to make the milestone, you really need to work hard. there is still a lot to do for a complete version.

Fadul:
Milestone: October 1

Complete the current paper, including all experiments and writing.

Action Items Completed:

Ran experiments on binary classification. Will present the results in the meeting.

Explored potential directions for training the model with a larger dataset. These options will be discussed in the meeting.

Prepared a presentation for a new idea.

Completed the lab website implementation for PhD student updates.

Action Items for Next Meeting:

Train the model with a larger dataset.

Focus on achieving higher training accuracy while addressing the challenge of improving test accuracy. I will experiment with this.

Write the “Related Work” section of the paper.

Incorporate longer sequences into the fine-tuning process using distributed training.

Fadul:

added an integration to this channel: <https://softwareengin-m1a1973.slack.com/services/B07MAHUQFFD

Website Update Notification>

Fadul:
Milestone: October 1

Complete the current paper, carrying out all experiments and writing.

Action Items Before Next Meeting:

Obtain results for binary classification.

Train the model with a larger dataset.

Action Items Performed:

Completed the first draft.

Corrected data balancing implementation for training the model.

Experimented with binary classification (results pending)

Replies:
Dr. Lei:

make a commitment to meet this milestone, whatever it takes

Weekly Updates for Fadul Sikder

Enter Password

Weekly Updates for Fadul Sikder

Update: 2025-06-25

Update: 2025-06-12

Update: 2025-06-11

Update: 2025-06-04

Update: 2025-05-22

Update: 2025-05-16

Update: 2025-05-09

Update: 2025-05-07

Update: 2025-04-30

Update: 2025-04-22

Update: 2025-04-15

Update: 2025-04-08

Update: 2025-04-01

Update: 2025-03-25

Update: 2025-03-18

Update: 2025-03-04

Update: 2025-02-25

Update: 2025-02-18

Update: 2025-02-11

Update: 2025-02-04

Update: 2025-01-28

Update: 2025-01-21

Update: 2024-12-17

Update: 2024-12-10

Update: 2024-12-03

Update: 2024-11-26

Update: 2024-11-20

Update: 2024-11-12

Update: 2024-11-05

Update: 2024-10-29

Update: 2024-10-22

Update: 2024-10-15

Update: 2024-10-08

Update: 2024-10-01

Update: 2024-09-24

Update: 2024-09-17

Update: 2024-09-13

Update: 2024-09-10

Enter Password