Saif:
Hi group,
Milestone: I am yet to decide a concrete study and milestone date
Action Items completed:
Action Item 1 – Saturday, June 28 (done)
Action Item 2 – Thursday, July 3 (done for 10 problems with 3 generated solutionsfor each)
Results/Findings:
Questions/issues:
Saif:
Hi group,
Action Item in progress:
We will construct our own dataset using problem instructions and corresponding code samples
This week I continued working on dataset creation using DeepMind’s code_contests dataset. I have successfully extracted 342 problems from the first chunk. From those, I identified 276 problems that include at least one correct C++ solution.
I configured and ran CodeLlama-7B.Q4_K_M.gguf locally via Ollama, and used it to generate C++ solutions for all 276 problems. The model was able to produce complete C++ code snippets based solely on the problem descriptions.
Currently, I am beginning to compare CWE (Common Weakness Enumeration) differences between the human-written C++ solutions and the ones generated by CodeLlama. This will help us assess whether LLM-generated code introduces or avoids common coding vulnerabilities when compared to human-authored examples.
Results/Findings:
Question/Issue:
No blockers at the moment. Continuing with CWE analysis.
Replies:
Dr. Lei:
Sounds good. Try to think deep and see if you can make any interesting observations about the results.
No update was provided for the week ending 2025-04-30.
Saif:
Hi group,
Milestone: Identify a research topic - May 04
Last week, I generated a report on existing datasets that include vulnerable codes generated by various LLMs reported by existing literature.
This week I am focusing on the following action items:
Replies:
Dr. Lei:
try to send a written report about your findings before thu meeting. the report does not have to be well-written or very detailed. but it should contain the major points.
Saif:
Hi Group,
Milestone: Diagnostic Evaluation (awaiting response from Dr. Khalili)
Milestone Complete a Literature review on LLM Explanation May 4
This week I focused on the following action items:
- Empirical and Theoretical Distinguishing Characteristics: Stylistic/Structural Patterns: LLM code tends to be more formulaic, concise, and uses narrower token/naming distributions; comments are often more consistent but sometimes mismatched with code
- Coding Style Markers: Differences in variable/class naming, control flow depth, code and comment length/distribution; human code is more diverse
2. Publicly available LLM-generated code datasets with vulnerability :
- Scope of Security Bugs and Limitations: Most datasets focus on well-known bug/vulnerability types (CWE Top-25, input validation, memory safety, etc.), with variable coverage of edge cases, multi-step chain issues, or “in-the-wild” project code
Replies:
Dr. Lei:
good job. a few comments:
Saif:
hi group,
Milestone: Diagnostic Evaluation (awaiting response from Dr. Khalili)
Milestone Complete a Literature review on LLM Explanation May 4
This week I focused on the following paper:
Additional note:
Replies:
Dr. Lei:
sounds good. please prepare a presentation on your findings. try to focus on the most important points. also try to think about three ideas that you could think of on the topic of security testing of LLM-generated code, and discuss them when we meet on thursday
Saif:
Ok Professor.
Saif:
Hello group,
PhD Milestone: Diagnostic Evaluation (April 7)
Milestone Complete a Literature review on LLM Explanation May 4
New Study: Vulnerabilities in LLM generated code
Last week I have went through the following paper:
I am conducting a literature search for Vulnerabilities in LLM generated code. I have shared the incomplete list in my channel. I will provide the comprehensive list tomorrow (Wednesday, 4/2) and plan to discuss on Thursday.
Replies:
Dr. Lei:
sounds good. please prepare some slides for thu meeting to discuss
Saif:
Good morning,
Milestone: Diagnostic Evaluation (April 7)
Milestone Complete a Literature review on LLM Explanation May 4
This week I focused on the challenges in defining locality for generating accurate explanations. I am trying to understand different versions of LIME. I have gone through the following papers:
This paper proposes an alternative sampling method to improve the local fidelity of surrogate models and evaluates it against LIMEi
The paper investigates the challenges of generating sample points in an instance’s neighborhood, balancing interpretability with explanation accuracy, and determining the appropriate sample size. The findings emphasize issues with LIME’s kernel-based weighting and boundary approximation.
Saif:
Good morning,
Milestone: Diagnostic Evaluation
Current work: LLM explanation
Last week I tried to run a part of the experiment from the paper I have presented last time to have a good grasp on LIME.
For diagnostic, I have sent out emails to the tentative committee. Two of the professors have already replied. I am awaiting Dr. Ji’s response.
Replies:
Dr. Lei:
please try to make your status report more informative
Saif:
Hello group,
Milestone: Diagnostic Evaluation (Tentative - 3rd Week of April)
Milestone Complete a Literature review on LLM Explanation May 4
This week I am planning to run the experiments from the empirical study I have presented on Friday.
Replies:
Dr. Lei:
we need to discuss who you will invite to be on your committee.
Saif:
Yes Dr. Lei. I have sent you a direct message on slack describing my current academic progress and tentative Professor list. Thank you.
Saif:
Hello group,
Milestone: Diagnostic Evaluation (Tentative - First Week of April)
Current work: LLM Explanation (Milestones yet to be decided)
Action items completed:
Replies:
Dr. Lei:
please note that the purpose fo the New Ideas session is to introduce interesting ideas/perspectives to the group. try to select a topic/paper that truly excites you; otherwise, it would not serve the purpose.
Saif:
Hello group,
Milestone: Diagnostic Evaluation (Tentative - First Week of April)
Current work: LLM Explanation
Action items completed:
After the discussion of Friday meeting, I have compiled a comprehensive list of papers that covers perturbation for LLMs. I have added the comprehensive list to my channel.
Action items for this week:
I have not yet organized the workflow and milestones dates. But this week, I will focus on setting up and running experiments.
Replies:
Dr. Lei:
good job on the collection of papers. at this stage, i suggest you give priority to the big picture before you go into the details of each paper. try to find a good survey paper on this topic first, if exist.
Saif:
Hello group,
Milestone: Diagnostic Evaluation (Tentative - First Week of April)
Current work: Investigating Sparse AutoEncoders for LLM Explanation
I have been studying about the approaches of LLM explanation. I came along this paper that discusses the existing techniques and challenges of the approaches: “Explainability for Large Language Models: A Survey” - by H Zhao · 2024
From the existing approaches, I would like to explore the following two–
Feature Interpretability via Concept Vectors (CAVs):
What it is: Use post-hoc linear classifiers or other mechanisms to define high-level concept vectors in the latent space that align with human-understandable constructs (e.g., sentiment, gender).
Applications:
Explain model decisions in human terms by projecting latent representations onto concept vectors.
Diagnose the presence of biases or specific abstract features (e.g., fairness in language models).
Sparse Feature Extraction:
What it is: Neuroscience-inspired methods like sparse autoencoders or dictionary learning extract sparse or disentangled features from LLM activation spaces.
Replies:
Dr. Lei:
as i suggested, focus on the big picture first before you dive into the details of a particular approach
Saif:
Hello group,
Milestone: Diagnostic Evaluation (Tentative - First Week of April)
Current work: Investigating Sparse AutoEncoders for LLM Explanation
Action items done in the previous week:
– Regarding fundamentals of SAEs, I came along with the following:
Action items for next week:
– My primary plan for this week is to look for small open source SAE models that can be trained in smaller scale
– And continue grouping the papers
Replies:
Dr. Lei:
i suggest you think more about the problem, i.e., explaining LLM models. in addition to SAEs, what are other approaches to LLM explanations? what are the general technical problems, and what the existing approaches to solving these problems?
Saif:
Ok Professor. I will look into this according to your suggestion.
Saif:
Hello group,
Milestone: Diagnostic Evaluation(Tentative - First Week of April)
Current work: Investigating Sparse AutoEncoders for LLM Explanation
Action items completed:
2. Challenges in Implementation:
Team meeting update:
I was working on compiling the papers. There was not much to discuss this week so we skipped the meeting.
Action items for this week:
I will continue summarizing the papers from the complied list. I would like to sign up for a technical discussion on Friday on my finding on this topic.
Replies:
Dr. Lei:
looks good. keep up with the good work
Saif:
Good morning group,
Milestone: Diagnostic Evaluation (Tentative - First Week of April)
Milestone: Literature Review Draft - January 23, 2025
Collaboration with Fadul:
Replies:
Dr. Lei:
sounds good. please work with Fadul and Sunny on possible new topics. the current focus for you is to find a topic to dive into in the next two weeks.
No update was provided for the week ending 2024-12-18.
Saif:
Hi Group,
Updated Milestone:Literature Review Draft - January 23, 2025
Based on the feedback on Friday’s meeting, I have updated the timeline for the literature review.
Action Items from last week:
After Friday, I refined the study design section, added research questions, and identified that performance metric summaries needed to be included..
I have an exam to proctor today and after that the Professor would like to discuss the grading with me. I might be late to join the meeting.
Replies:
Dr. Lei:
this is probably your meaningful status update, which i am really happen to see. keep it up
Saif:
Thank you Dr. Lei.
Saif:
Action items Completed:
We had a general discussion with Fadul regarding collaboration.
Saif:
Hi group,
I had Compilers final exam on Friday and two project submissions until yesterday. I couldn’t spend much time on research.
I will be presenting the new idea today.
Saif:
Milestone: Complete Literature Survey - Dec 5
Saif:
Hi group,
Milestone: Complete Literature Survey - Dec 5
Saif:
Hi group,
Milestone: Complete Literature Survey - Dec 5
Action items–
Last week, I have summarized these four papers for the literature survey
This Friday, I will brief my current findings from the literature search.
Replies:
Dr. Lei:
if i remember correctly, i suggested you make a schedule towards the paper submission. can you share the schedule?
Saif:
Hi group,
Milestone: Complete Literature Survey - Dec 5
Action items
Last week, I have summarized these three papers for the literature survey
I will present my current findings to the group on the following Friday (11/8).
Saif:
Milestone: Complete a Survey paper - Dec 5 (venue yet to decide)
Action items -
Last week I have read these four papers from my literature search and jotted down the problems they are addressing, key insights and inspiration for solution, steps they are taking toward the solution, and the inputs and outputs generated from their approach
Replies:
Dr. Lei:
please break down this milestone and have some intermediate deliverables, so that you can gauge the intermediate progress.
Saif:
Milestone: Complete a Survey paper - Dec 5 (venue yet to decide)
– for research I have updated the list for my literature search.
– I have documented papers that use LLM approach for source code smell / code summarization task
– I was mostly working on my course projects last week
– I have been grading Midterm papers and an assignment for TA work
Saif:
Hello everyone,
Broad Milestone: Complete Code smell detection project by Thanksgiving (Nov 26)
Complete grouping the papers Sep 29 - Partially completed
Present the findings of the existing literature Oct 11th. for this one I am preparing a presentation to show the findings of LLM approaches in codesmell detection.
Action Items from last week:
Replies:
Dr. Lei:
For a milestone to be meaningful, it must have clear deliverables and an objective way to check whether it is completed. Speaking of a research project being completed, i would expect a research paper to be submitted.
Dr. Lei:
also for your project to make real progress, i want you to make a proposal to the group. in the proposal, please clearly specify what problem you are trying to address, in terms of input/output, what are the technical challenges, and what are your ideas to address the challenges, and how your idea would compare to existing work. also you must break down a project into smaller tasks and put a target date on each small task.
Saif:
Thank you for the feedback Professor.
I’ll revise the milestone to clearly define goals and to break it into smaller tasks.
Our plan was to aim for survey paper. Should I target for a conference in December/January deadline?
Dr. Lei:
typically a survey paper is published in a journal paper. not many conferences accept survey papers. one target journal is ACM Computing Surveys. journal submission can be made anytime, i.e., there is no specific deadlines. i suggest you target the end of this semester.
Saif:
Sounds good. I will adjust the milestone deadline accordingly.
No update was provided for the week ending 2024-10-02.
Saif:
Hello everyone,
Broad Milestone: Complete Code smell detection project by Thanksgiving (Nov 26)
Action Items from last week:
I am still working categorization of the papers for the literature review
No update was provided for the week ending 2024-09-17.
No update was provided for the week ending 2024-09-13.
© 2023 Jeff Lei's Lab.
Site made with Jekyll. Template from the Allan Lab
We are part of the CSE Department at University of Texas at Arlington.